embeddings usage / test-case

by drzraf - opened 6 days ago

6 days ago

First I want to share with your team how grateful and happy I am to see this sort of model being built and provided.

It couldn't be overstated how useful this could become in many countries wrt lost/abandoned cats/dogs.

I gave it a quick shot on a small sample (234 pictures of 15 distinct, randomly chosen, dogs).

There is no precise indication about embedding actual usage, but as far as I understand it, model outputs isn't pixel-related in any way but rather but only a face / morphological / similarity ID isn't?

So I went with computing the torch.cosine_similarity() of the output tensor of every images pair, and for a given reference image, selected the 5 best matches over 0.93 (OK indicates this is the same individual as the reference in the first column. ERROR otherwise)

I'm not even sure whether I:

should be amazed that it even work out of the box with nothing more involved than this with something as small as a 85MB model?
should be disappointed by some of the blatant false-positive observed (where very distinct individual share a high torch.cosine_similarity()) and many unmatched pairs for some pictures?
be hopeful that there are easy way to get far superior results?

But I'm mostly wondering whether I'm doing it right and/or any guidance you could kindly provide to make the best out of this model.

Thank you!

match_1	match_2	match_3	match_4	match_5
OK 0.977	OK 0.945
OK 0.954	ERROR 0.939
OK 0.953

ERROR 0.980	ERROR 0.976	ERROR 0.971	ERROR 0.956	ERROR 0.942
OK 0.979	OK 0.974	OK 0.949



ERROR 0.939
OK 0.970
ERROR 0.969	ERROR 0.968	OK 0.954

OK 0.956
OK 0.987	ERROR 0.961	OK 0.959	ERROR 0.940	ERROR 0.937

ERROR 0.956
ERROR 0.931
ERROR 0.968	ERROR 0.947	ERROR 0.946	ERROR 0.940

OK 0.953	ERROR 0.949

ERROR 0.968	ERROR 0.935	ERROR 0.933

ERROR 0.949
ERROR 0.932
ERROR 0.941
OK 1.000	ERROR 0.959	ERROR 0.947	ERROR 0.947	ERROR 0.940
OK 0.946	OK 0.945	OK 0.937
ERROR 0.964
ERROR 0.931
ERROR 0.981	ERROR 0.980	ERROR 0.966	ERROR 0.941	ERROR 0.938
ERROR 0.964	OK 0.954	OK 0.945
OK 1.000	ERROR 0.959	ERROR 0.947	ERROR 0.947	ERROR 0.940
ERROR 0.961	ERROR 0.958	ERROR 0.930

OK 0.984	ERROR 0.956	ERROR 0.956	ERROR 0.955	OK 0.954
ERROR 0.976	OK 0.959	ERROR 0.957	ERROR 0.956	ERROR 0.955

OK 0.995	OK 0.972	ERROR 0.968	OK 0.946
ERROR 0.956

OK 0.939
OK 0.956
ERROR 0.930
OK 0.987	ERROR 0.958	OK 0.950	ERROR 0.947
OK 0.974	OK 0.974	OK 0.971
OK 0.979	OK 0.971	OK 0.967
OK 0.970
ERROR 0.941
OK 0.995	OK 0.975	ERROR 0.969	OK 0.945

ERROR 0.932
ERROR 0.949
OK 0.939

OK 0.959
ERROR 0.948	ERROR 0.946	ERROR 0.935
ERROR 0.967	ERROR 0.956	ERROR 0.956
OK 0.980	OK 0.977
ERROR 0.948

ERROR 0.981	ERROR 0.971	ERROR 0.950	ERROR 0.941	ERROR 0.940

OK 0.959	ERROR 0.930
OK 0.975	OK 0.972	OK 0.937
OK 0.974	OK 0.967	OK 0.949
OK 0.984	ERROR 0.967	ERROR 0.967	ERROR 0.964	ERROR 0.957

ERROR 0.949	ERROR 0.940	ERROR 0.940	ERROR 0.930
ERROR 0.959	ERROR 0.959	ERROR 0.930
ERROR 0.967	ERROR 0.956	ERROR 0.950	ERROR 0.950	ERROR 0.938
ERROR 0.933

AvitoTech1

AvitoTech org 5 days ago

@drzraf
Thank you for your comment. We noticed the issue and fixed it. 🤝🔧

We've updated the description of how the model should be initialized in the README.
It should work better now, so, you may want to check this out – we expect better results!

drzraf

4 days ago

Wow, it's way way better. I could reduce the similarity to 0.85 threshold and got, if not perfect (a couple of misses) at least very decent results !

Would you mind providing some more information model outputs. There structures, how to best used them, visualized(?) and consider from a high-level perspective?

Another questions, the models takes a list of PIL Image, but passing multiple arguments only returns one tensor. Am I doing something wrong? (I didn't tried a Dataset yet, but I believe it would have to work with just plain list to begin with).

Big thanks to your team!
Keep up the good work!

match_1	match_2	match_3	match_4	match_5
OK 0.982	OK 0.977	OK 0.963	OK 0.937	OK 0.911
OK 0.951	OK 0.912	OK 0.908	OK 0.893
OK 0.907
OK 0.928	OK 0.905	OK 0.903	OK 0.874

OK 0.975	OK 0.950	OK 0.899
OK 0.889
OK 0.930	OK 0.928	OK 0.909	OK 0.904
OK 0.930	OK 0.925	OK 0.903	OK 0.885
OK 0.936	OK 0.932	OK 0.906	OK 0.888
OK 0.971	OK 0.966	OK 0.951	OK 0.883
OK 0.948	OK 0.906	OK 0.893	OK 0.873
OK 0.951	OK 0.947	OK 0.941	OK 0.921
OK 0.932	OK 0.924	OK 0.915	OK 0.911	OK 0.908
OK 0.934	OK 0.876	OK 0.868
OK 0.921	OK 0.911	OK 0.909	OK 0.883
OK 0.924	OK 0.912	OK 0.906	OK 0.894
OK 0.881

OK 0.931	OK 0.894
OK 0.907


OK 0.937	OK 0.932	OK 0.932	OK 0.930	OK 0.917
OK 0.931	OK 0.916
OK 0.932	OK 0.925	OK 0.864	OK 0.862
OK 0.909	OK 0.902	OK 0.885	OK 0.874
OK 1.000	OK 0.909
OK 0.883	OK 0.871
OK 0.951	OK 0.924	OK 0.908	OK 0.873
OK 0.932	OK 0.912	OK 0.900

OK 0.904
OK 1.000	OK 0.909

OK 0.877	OK 0.853
OK 0.903	OK 0.853
OK 0.934	OK 0.916	OK 0.873
OK 0.928	OK 0.877
OK 0.967	OK 0.965	OK 0.906	OK 0.870

OK 0.948	OK 0.908	OK 0.908	OK 0.894

OK 0.977	OK 0.969	OK 0.957	OK 0.930	OK 0.924
OK 0.916	OK 0.894
OK 0.934	OK 0.916	OK 0.868
OK 0.952	OK 0.950
OK 0.899	OK 0.891
OK 0.992	OK 0.966	OK 0.941	OK 0.909

OK 0.967	OK 0.933	OK 0.888
OK 0.904
OK 0.928	OK 0.889	OK 0.881	OK 0.853

OK 0.863
OK 0.932	OK 0.900	OK 0.883	OK 0.870

OK 0.937	OK 0.853

OK 0.969	OK 0.963	OK 0.935	OK 0.917	OK 0.908
OK 0.992	OK 0.971	OK 0.947	OK 0.911
OK 0.912	OK 0.864	OK 0.862
OK 0.909	OK 0.909


OK 0.965	OK 0.936	OK 0.933	OK 0.900	OK 0.871
OK 0.975	OK 0.952	OK 0.891	OK 0.870
OK 0.937	OK 0.903
OK 0.870
OK 0.925	OK 0.905	OK 0.904	OK 0.902
OK 0.936	OK 0.864
OK 0.934	OK 0.876	OK 0.873
OK 0.936	OK 0.925	OK 0.900	OK 0.864
OK 0.863
OK 0.982	OK 0.957	OK 0.935	OK 0.932	OK 0.915

korallll

1 day ago

•

edited 1 day ago

@drzraf

Thanks a lot for the detailed feedback, it is great to hear that updated version gives you very reasonable results.

Regarding the model outputs: we discuss their structure, interpretation, and recommended ways to use them (including some visualization ideas) in a paper that is currently under review. As soon as the paper is published, we will add a link to it in the repository so that all the details are documented in one place.

About your second question: the behavior you see is expected. The model takes a list of PIL.Image objects, internally stacks them into a batch, and returns a single tensor of shape [batch_size, embedding_dim] where batch_size is the number of images in your list, which is the usual convention for PyTorch vision models. You may not have to use a Dataset for simple experiments; passing a plain list is perfectly fine. Here is a minimal example you can use to process a set of images:

import glob
from PIL import Image
import torch
import torch.nn.functional as F

paths = glob.glob("*.jpeg")
images = [Image.open(p).convert("RGB") for p in paths]

with torch.no_grad():
    embeddings = model(images)          # shape: [len(images), embedding_dim]
    embeddings = F.normalize(embeddings, dim=1)

print(embeddings.shape)

Thanks again for the kind words and for taking the time to test the model so thoroughly. This kind of feedback is very helpful for us.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment