Instructions to use nora-team/qwen-1.5b-grounded-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use nora-team/qwen-1.5b-grounded-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("/home/elicer/simon/models/qwen2.5-1.5b") model = PeftModel.from_pretrained(base_model, "nora-team/qwen-1.5b-grounded-lora") - Transformers
How to use nora-team/qwen-1.5b-grounded-lora with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="nora-team/qwen-1.5b-grounded-lora") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("nora-team/qwen-1.5b-grounded-lora", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use nora-team/qwen-1.5b-grounded-lora with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "nora-team/qwen-1.5b-grounded-lora" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nora-team/qwen-1.5b-grounded-lora", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/nora-team/qwen-1.5b-grounded-lora
- SGLang
How to use nora-team/qwen-1.5b-grounded-lora with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "nora-team/qwen-1.5b-grounded-lora" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nora-team/qwen-1.5b-grounded-lora", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "nora-team/qwen-1.5b-grounded-lora" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nora-team/qwen-1.5b-grounded-lora", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use nora-team/qwen-1.5b-grounded-lora with Docker Model Runner:
docker model run hf.co/nora-team/qwen-1.5b-grounded-lora
- Qwen 2.5-1.5B Fine-tuned for Grounded Text Generation with Citations
- Model Details
- Uses
- Bias, Risks, and Limitations
- How to Get Started with the Model
- Training Details
- Evaluation
- Model Examination [optional]
- Environmental Impact
- Technical Specifications [optional]
- Citation [optional]
- Glossary [optional]
- More Information [optional]
- Model Card Authors [optional]
- Model Card Contact
Qwen 2.5-1.5B Fine-tuned for Grounded Text Generation with Citations
This model is a fine-tuned version of Qwen/Qwen2.5-1.5B using LoRA adapters. It has been trained to generate accurate answers with proper source citations based on provided documents.
Model Details
Model Description
This model generates answers to questions by:
- Reading provided source documents
- Generating accurate, concise answers
- Citing sources using [1], [2], [3] format
- Only using information from the provided documents
- Developed by: sungmineom
- Model type: Causal Language Model (Fine-tuned with LoRA)
- Language(s): English (primary), Korean
- License: Same as base model (Qwen2.5-1.5B)
- Finetuned from model: Qwen/Qwen2.5-1.5B
Training Details
- Training data: combined_train.json (10,000 samples)
- Validation data: combined_test.json (1,000 samples)
- LoRA rank: 16
- LoRA alpha: 32
- Batch size: 2 (with gradient accumulation steps: 8)
- Learning rate: 2e-4
- Epochs: 3
- Max sequence length: 2048 tokens
- Quantization: 4-bit (nf4) for training efficiency
Uses
Direct Use
This model is designed for Question Answering tasks where you want:
- Accurate answers based on specific documents
- Proper source attribution with citations
- Grounded generation (no hallucination from outside sources)
Usage Example
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
# Load base model
base_model_name = "Qwen/Qwen2.5-1.5B"
base_model = AutoModelForCausalLM.from_pretrained(
base_model_name,
torch_dtype=torch.bfloat16,
device_map="auto"
)
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "sungmineom/qwen-1.5b-grounded-lora")
tokenizer = AutoTokenizer.from_pretrained("sungmineom/qwen-1.5b-grounded-lora")
# Prepare input
question = "What are the benefits of exercise?"
docs = [
{"title": "Health Benefits", "text": "Exercise improves cardiovascular health..."},
{"title": "Mental Health", "text": "Exercise reduces anxiety and depression..."}
]
doc_text = ""
for i, doc in enumerate(docs, 1):
doc_text += f"Document [{i}](Title: {doc['title']}): {doc['text']}\n"
prompt = f"""Instruction: Write an accurate, engaging, and concise answer for the given question using only the provided search results (some of which might be irrelevant) and cite them properly. Use an unbiased and journalistic tone. Always cite for any factual claim. When citing several search results, use [1][2][3]. Cite at least one document and at most three documents in each sentence. If multiple documents support the sentence, only cite a minimum sufficient subset of the documents.
Question: {question}
{doc_text}
Answer:"""
# Generate
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, top_p=0.9)
answer = tokenizer.decode(outputs[0], skip_special_tokens=True).split("Answer:")[-1].strip()
print(answer)
Downstream Use [optional]
[More Information Needed]
Out-of-Scope Use
[More Information Needed]
Bias, Risks, and Limitations
[More Information Needed]
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
How to Get Started with the Model
Use the code below to get started with the model.
[More Information Needed]
Training Details
Training Data
[More Information Needed]
Training Procedure
Preprocessing [optional]
[More Information Needed]
Training Hyperparameters
- Training regime: [More Information Needed]
Speeds, Sizes, Times [optional]
[More Information Needed]
Evaluation
Testing Data, Factors & Metrics
Testing Data
[More Information Needed]
Factors
[More Information Needed]
Metrics
[More Information Needed]
Results
[More Information Needed]
Summary
Model Examination [optional]
[More Information Needed]
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: [More Information Needed]
- Hours used: [More Information Needed]
- Cloud Provider: [More Information Needed]
- Compute Region: [More Information Needed]
- Carbon Emitted: [More Information Needed]
Technical Specifications [optional]
Model Architecture and Objective
[More Information Needed]
Compute Infrastructure
[More Information Needed]
Hardware
[More Information Needed]
Software
[More Information Needed]
Citation [optional]
BibTeX:
[More Information Needed]
APA:
[More Information Needed]
Glossary [optional]
[More Information Needed]
More Information [optional]
[More Information Needed]
Model Card Authors [optional]
[More Information Needed]
Model Card Contact
[More Information Needed]
Framework versions
- PEFT 0.17.1
- Downloads last month
- -
Model tree for nora-team/qwen-1.5b-grounded-lora
Base model
Qwen/Qwen2.5-1.5B