Instructions to use arham-15/llama3_8B_finetuned_by_arham with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use arham-15/llama3_8B_finetuned_by_arham with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="arham-15/llama3_8B_finetuned_by_arham")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("arham-15/llama3_8B_finetuned_by_arham")
model = AutoModelForCausalLM.from_pretrained("arham-15/llama3_8B_finetuned_by_arham")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use arham-15/llama3_8B_finetuned_by_arham with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "arham-15/llama3_8B_finetuned_by_arham"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "arham-15/llama3_8B_finetuned_by_arham",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/arham-15/llama3_8B_finetuned_by_arham

SGLang

How to use arham-15/llama3_8B_finetuned_by_arham with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "arham-15/llama3_8B_finetuned_by_arham" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "arham-15/llama3_8B_finetuned_by_arham",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "arham-15/llama3_8B_finetuned_by_arham" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "arham-15/llama3_8B_finetuned_by_arham",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use arham-15/llama3_8B_finetuned_by_arham with Docker Model Runner:
```
docker model run hf.co/arham-15/llama3_8B_finetuned_by_arham
```

Llama3-8B-fine tuned by Arham

This model is a fine-tuned version of Llama 3, optimized for open-ended text generation. It has been trained on SportsQA dataset to improve its ability to generate more relevant, coherent, and informative responses for football related queries.

Model Details

Base Model: LLama 3 - 8B Version

Fine Tuned On: Sports QA Dataset. Visit here: https://huggingface.co/datasets/PedroCJardim/QASports

Use Case: Open Ended questions regarding football.

About Base Model Llama 3

LLaMA 3 8B is a transformer-based LLM with 8 billion parameters, trained on 15 trillion tokens for improved language understanding. It supports a 128,000-token context window, making it capable of handling longer text sequences. Compared to LLaMA 2, it processes text more efficiently and generates more accurate responses. The model balances performance and resource efficiency, making it deployable on commercial hardware. It's ideal for NLP tasks like resume screening, offering strong text analysis without heavy computational demands.

Performance

Compared to the base Llama 3 model, this fine-tuned version shows improvements in:

More accurate and contextually relevant responses.
Better coherence and consistency in generated text.
Improved understanding and interpretation of prompts.

Usage

You can load and use this model with transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "arham-15/llama3_8B_finetuned_by_arham"

model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

device = "cuda" if torch.cuda.is_available() else "cpu"

input_text = "The greatest footballer of all time is"
inputs = tokenizer(input_text, return_tensors="pt").to(device)

output = model.generate(**inputs, max_length=50)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Limitations & Future Improvements

The model may still generate repetitive responses in some cases.
Further fine-tuning can improve domain-specific knowledge.

Contribute & Feedback

If you find this model useful, feel free to share feedback or contribute improvements!

Downloads last month: 2

Model tree for arham-15/llama3_8B_finetuned_by_arham

Base model

meta-llama/Meta-Llama-3-8B

Quantized

(276)

this model

arham-15
/

llama3_8B_finetuned_by_arham