Instructions to use google/gemma-2-9b-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/gemma-2-9b-it with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="google/gemma-2-9b-it") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-9b-it") model = AutoModelForCausalLM.from_pretrained("google/gemma-2-9b-it") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use google/gemma-2-9b-it with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "google/gemma-2-9b-it" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-2-9b-it", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/google/gemma-2-9b-it
- SGLang
How to use google/gemma-2-9b-it with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "google/gemma-2-9b-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-2-9b-it", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "google/gemma-2-9b-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-2-9b-it", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use google/gemma-2-9b-it with Docker Model Runner:
docker model run hf.co/google/gemma-2-9b-it
Cross-architecture RYS sweep — gemma-2-9b-it (early-layer reasoning circuit L14; highest baseline EQ in corpus)
1
#69 opened 3 days ago
by
john-broadway
Request: DOI
1
#68 opened 2 months ago
by
acusticskyline
Request: DOI
1
#66 opened 3 months ago
by
cemk
Multilingual instruction-tuned deployment
1
#65 opened 4 months ago
by
Cagnicolas
Reasoning and Multilingual Performance
1
#64 opened 5 months ago
by
Cagnicolas
Safety Audit: GAE Score 32.48% (FAIL)
#63 opened 5 months ago
by
GAE-Auditor
Request: DOI
1
#61 opened 12 months ago
by
bhavya101914
Request: DOI
1
#59 opened about 1 year ago
by
mohamedzaki
word embedding
1
#57 opened over 1 year ago
by
MrCoolAI
Request access to Gemma-2-9b-it model.
5
#55 opened over 1 year ago
by
index931
Request: Access for '/gemma-2-9b-it' model
9
#54 opened over 1 year ago
by
rajeevhuggingface87
Request to approved Gated Repos - google/gemma-2-9b-it
17
#53 opened over 1 year ago
by
rajeevhuggingface87
awaiting for gemma2 coder
1
#49 opened over 1 year ago
by
googlefan
Update README.md
#48 opened over 1 year ago
by
wendlerc
Independent evaluation results
1
#47 opened over 1 year ago
by
yaronr
Update EOS token ids
1
#46 opened over 1 year ago
by
nayohan
Update EOS token ids
#45 opened over 1 year ago
by
nayohan
Sliding window vs. Global Attention
6
#41 opened over 1 year ago
by
tanliboy
Inquiry on Minimum Configuration and Cost for Running Gemma-2-9B Model Efficiently
3
#39 opened almost 2 years ago
by
ltkien2003
Request: DOI
1
#38 opened almost 2 years ago
by
XDYeetboy
Update generation_config.json
#32 opened almost 2 years ago
by
runninglsy
Update tokenizer_config.json
1
#31 opened almost 2 years ago
by
reach-vb
Broadcasting error if "num_return_sequences" in transformers pipeline is greater than 1
2
#29 opened almost 2 years ago
by
OSalem99
Ran into an issues while I was trying to sample more than one sentence
4
#27 opened almost 2 years ago
by
joeysss
Hey guys
🔥👍 2
1
#25 opened almost 2 years ago
by
zipingl
Why does this model have powerful text generation capabilities for various countries, and the results are very good, most likely in English?
1
#24 opened almost 2 years ago
by
windkkk
Possible issue with context window
➕ 10
2
#23 opened almost 2 years ago
by
ieman
Gemma-2 is a huge step up over previous Google OS models - short feedback
🔥 3
1
#22 opened almost 2 years ago
by
Dampfinchen
Request: DOI
1
#21 opened almost 2 years ago
by
BEGADE
Gemma 2 - 2B planing
4
#20 opened almost 2 years ago
by
baohuynhbk14
it looks it do not work as expected , see below
11
#17 opened almost 2 years ago
by
Sakura77
nonsense response when bsz>1
5
#16 opened almost 2 years ago
by
jinjieni
Alternative quantizations.
👀 1
1
#13 opened almost 2 years ago
by
ZeroWw
not work
2
#12 opened almost 2 years ago
by
sdyy