Instructions to use google/translategemma-4b-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/translategemma-4b-it with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="google/translategemma-4b-it") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("google/translategemma-4b-it") model = AutoModelForImageTextToText.from_pretrained("google/translategemma-4b-it") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use google/translategemma-4b-it with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "google/translategemma-4b-it" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/translategemma-4b-it", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/google/translategemma-4b-it
- SGLang
How to use google/translategemma-4b-it with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "google/translategemma-4b-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/translategemma-4b-it", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "google/translategemma-4b-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/translategemma-4b-it", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use google/translategemma-4b-it with Docker Model Runner:
docker model run hf.co/google/translategemma-4b-it
Example script for fine-tuning on a new destination language
Hi! Can you provide an example script for fine-tuning the model on a new language?
Thanks,
Hi @gmallen ,
To get started, please take a look at this, it's my implementation,(Contains only important config/format I used) not official.
- Set model_id and new_lang_code
- Model loading: This is optimised to via Unsloth, just so that it runs in colab
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = MODEL_ID,
max_seq_length = 2048,
dtype = None,
load_in_4bit = True,
) - Apply LORA adaptors. (Standard procedure)
- Data Formatting. (Must wrap our raw text in the exact JSON schema the model expects)
def format_to_google_schema(examples):
texts = []
for source, target in zip(examples['en'], examples['target']):
json_payload = json.dumps([
{
"type": "text",
"source_lang_code": "en",
"target_lang_code": NEW_LANG_CODE,
"text": source
}
], ensure_ascii=False)
full_prompt = f"user\n{json_payload}\nmodel\n{target}"
texts.append(full_prompt)
return {"text": texts}
- Dataset, use your JSONL loading logic
- Training loop, use SFTTrainer or nay other to create trainer you are comfortable with.
trainer.train() - Inference check: FastLanguageModel.for_inference(model)
- That's all, create inputs, json payloads and prompt and save the model.
The training objective is standard casual language modeling, but strict adherence to the JSON format is non-negotiable.
Please refer to Gemma cookbook for more details on required structure.
Please reach out if you need further help.
I tried this model and got these results…
Hi @srikanta-221 , The chat template translates the language code into the plain language name in the Jinja file.
Can you explain to me the rationale behind passing the language instead of the language name directly, please?
Thanks
Check out this github repo for how to fine-tune this model: https://github.com/grctest/finetuned-gemmatranslate-cy