Instructions to use google/gemma-2-27b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/gemma-2-27b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="google/gemma-2-27b")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-27b") model = AutoModelForCausalLM.from_pretrained("google/gemma-2-27b") - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use google/gemma-2-27b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "google/gemma-2-27b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-2-27b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/google/gemma-2-27b
- SGLang
How to use google/gemma-2-27b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "google/gemma-2-27b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-2-27b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "google/gemma-2-27b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-2-27b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use google/gemma-2-27b with Docker Model Runner:
docker model run hf.co/google/gemma-2-27b
Access the model for educational purpose.
Hi, I am a Master's Degree student and I would like to request access to this model.
Thanks in advance.
Hi @cristofermestrado ,
When you open any Gemma model card in HuggingFace, you'll be prompted to acknowledge the license. By clicking on it, you provide your consent and agree to the terms and conditions of Gemma.
To create a new access token with read permissions:
- Go to your profile and select Settings.
- Navigate to Access Tokens in the settings menu.
- Click Create New Token.
- Set Repositories permissions to Read access.
- Click Generate Token and securely save it, as it will only be shown once.
After that, follow the lines of code below in your notebook:
from huggingface_hub import login
login(access_token)
Could you please re-check the access_token you have assigned and ensure that you are using the access token for the gemma-2-27b model.
If you prefer to use the model locally, you can download the weights and integrate them into your project. Please refer to this link
Please let us know if the issue still persists.
Thank you.
