Reformat code snippets

#1
by tomaarsen HF Staff - opened

Hello!

Congratulations on your model release!

Pull Request overview

  • Reformat the code snippets
  • MLR -> MRL dimensions
  • Add default query prompt for encode_query usage

Details

The code snippets should be a little bit easier to read now.
Also, Sentence Transformers v5.0+ adds encode_query and encode_document, which automatically use the "query" and "document" prompts from here: https://huggingface.co/KaLM-Embedding/KaLM-embedding-multilingual-mini-instruct-v2.5/blob/main/config_sentence_transformers.json#L8-L9

I set the default query to "Instruct: Given a query, retrieve documents that answer the query \n Query: ", so that users can do this in case they don't want to come up with a prompt themselves:

from sentence_transformers import SentenceTransformer
import torch

model = SentenceTransformer(
    "KaLM-Embedding/KaLM-embedding-multilingual-mini-instruct-v2.5",
    trust_remote_code=True,
    model_kwargs={
        "torch_dtype": torch.bfloat16,
        "attn_implementation": "flash_attention_2",  # Optional
    },
)
model.max_seq_length = 512

queries = [
    "What is the capital of China?",
    "Explain gravity",
]
documents = [
    "The capital of China is Beijing.",
    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
]

query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)

similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
'''
tensor([[0.9034, 0.2563],
        [0.3153, 0.7396]])
'''

The query prompt is only used with model.encode_query or model.encode(..., prompt_name="query"). The former is roughly a shorthand for the latter.

  • Tom Aarsen
tomaarsen changed pull request status to open
KaLM-Embedding org

Thanks a lot, Tom! Really appreciate your contribution πŸ™Œ

Yuki131 changed pull request status to merged

Sign up or log in to comment