Instructions to use arham-15/llama3_8B_finetuned_by_arham with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use arham-15/llama3_8B_finetuned_by_arham with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="arham-15/llama3_8B_finetuned_by_arham")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("arham-15/llama3_8B_finetuned_by_arham") model = AutoModelForCausalLM.from_pretrained("arham-15/llama3_8B_finetuned_by_arham") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use arham-15/llama3_8B_finetuned_by_arham with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "arham-15/llama3_8B_finetuned_by_arham" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "arham-15/llama3_8B_finetuned_by_arham", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/arham-15/llama3_8B_finetuned_by_arham
- SGLang
How to use arham-15/llama3_8B_finetuned_by_arham with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "arham-15/llama3_8B_finetuned_by_arham" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "arham-15/llama3_8B_finetuned_by_arham", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "arham-15/llama3_8B_finetuned_by_arham" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "arham-15/llama3_8B_finetuned_by_arham", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use arham-15/llama3_8B_finetuned_by_arham with Docker Model Runner:
docker model run hf.co/arham-15/llama3_8B_finetuned_by_arham
Llama3-8B-fine tuned by Arham
This model is a fine-tuned version of Llama 3, optimized for open-ended text generation. It has been trained on SportsQA dataset to improve its ability to generate more relevant, coherent, and informative responses for football related queries.
Model Details
Base Model: LLama 3 - 8B Version
Fine Tuned On: Sports QA Dataset. Visit here: https://huggingface.co/datasets/PedroCJardim/QASports
Use Case: Open Ended questions regarding football.
About Base Model Llama 3
LLaMA 3 8B is a transformer-based LLM with 8 billion parameters, trained on 15 trillion tokens for improved language understanding. It supports a 128,000-token context window, making it capable of handling longer text sequences. Compared to LLaMA 2, it processes text more efficiently and generates more accurate responses. The model balances performance and resource efficiency, making it deployable on commercial hardware. It's ideal for NLP tasks like resume screening, offering strong text analysis without heavy computational demands.
Performance
Compared to the base Llama 3 model, this fine-tuned version shows improvements in:
- More accurate and contextually relevant responses.
- Better coherence and consistency in generated text.
- Improved understanding and interpretation of prompts.
Usage
You can load and use this model with transformers:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "arham-15/llama3_8B_finetuned_by_arham"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
device = "cuda" if torch.cuda.is_available() else "cpu"
input_text = "The greatest footballer of all time is"
inputs = tokenizer(input_text, return_tensors="pt").to(device)
output = model.generate(**inputs, max_length=50)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Limitations & Future Improvements
- The model may still generate repetitive responses in some cases.
- Further fine-tuning can improve domain-specific knowledge.
Contribute & Feedback
If you find this model useful, feel free to share feedback or contribute improvements!
- Downloads last month
- 2
Model tree for arham-15/llama3_8B_finetuned_by_arham
Base model
meta-llama/Meta-Llama-3-8B