Reformat code snippets
Hello!
Congratulations on your model release!
Pull Request overview
- Reformat the code snippets
- MLR -> MRL dimensions
- Add default query prompt for
encode_queryusage
Details
The code snippets should be a little bit easier to read now.
Also, Sentence Transformers v5.0+ adds encode_query and encode_document, which automatically use the "query" and "document" prompts from here: https://huggingface.co/KaLM-Embedding/KaLM-embedding-multilingual-mini-instruct-v2.5/blob/main/config_sentence_transformers.json#L8-L9
I set the default query to "Instruct: Given a query, retrieve documents that answer the query \n Query: ", so that users can do this in case they don't want to come up with a prompt themselves:
from sentence_transformers import SentenceTransformer
import torch
model = SentenceTransformer(
"KaLM-Embedding/KaLM-embedding-multilingual-mini-instruct-v2.5",
trust_remote_code=True,
model_kwargs={
"torch_dtype": torch.bfloat16,
"attn_implementation": "flash_attention_2", # Optional
},
)
model.max_seq_length = 512
queries = [
"What is the capital of China?",
"Explain gravity",
]
documents = [
"The capital of China is Beijing.",
"Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
'''
tensor([[0.9034, 0.2563],
[0.3153, 0.7396]])
'''
The query prompt is only used with model.encode_query or model.encode(..., prompt_name="query"). The former is roughly a shorthand for the latter.
- Tom Aarsen
Thanks a lot, Tom! Really appreciate your contribution π