Reformat code snippets

by tomaarsen HF Staff - opened Oct 6

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+79

-19

tomaarsen

Oct 6

•

edited Oct 6

Hello!

Congratulations on your model release!

Pull Request overview

Reformat the code snippets
MLR -> MRL dimensions
Add default query prompt for encode_query usage

Details

The code snippets should be a little bit easier to read now.
Also, Sentence Transformers v5.0+ adds encode_query and encode_document, which automatically use the "query" and "document" prompts from here: https://huggingface.co/KaLM-Embedding/KaLM-embedding-multilingual-mini-instruct-v2.5/blob/main/config_sentence_transformers.json#L8-L9

I set the default query to "Instruct: Given a query, retrieve documents that answer the query \n Query: ", so that users can do this in case they don't want to come up with a prompt themselves:

from sentence_transformers import SentenceTransformer
import torch

model = SentenceTransformer(
    "KaLM-Embedding/KaLM-embedding-multilingual-mini-instruct-v2.5",
    trust_remote_code=True,
    model_kwargs={
        "torch_dtype": torch.bfloat16,
        "attn_implementation": "flash_attention_2",  # Optional
    },
)
model.max_seq_length = 512

queries = [
    "What is the capital of China?",
    "Explain gravity",
]
documents = [
    "The capital of China is Beijing.",
    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
]

query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)

similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
'''
tensor([[0.9034, 0.2563],
        [0.3153, 0.7396]])
'''

The query prompt is only used with model.encode_query or model.encode(..., prompt_name="query"). The former is roughly a shorthand for the latter.

Tom Aarsen

Update code snippets, MRLd1cf1c16

Add default query for 'model.encode_query' usage88d5ea6d

tomaarsen changed pull request status to open Oct 6

Yuki131

KaLM-Embedding org Oct 7

Thanks a lot, Tom! Really appreciate your contribution 🙌

Yuki131 changed pull request status to merged Oct 7

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment