Instructions to use solidrust/Nous-Hermes-2-Mistral-7B-DPO-AWQ with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use solidrust/Nous-Hermes-2-Mistral-7B-DPO-AWQ with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="solidrust/Nous-Hermes-2-Mistral-7B-DPO-AWQ")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("solidrust/Nous-Hermes-2-Mistral-7B-DPO-AWQ")
model = AutoModelForCausalLM.from_pretrained("solidrust/Nous-Hermes-2-Mistral-7B-DPO-AWQ")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use solidrust/Nous-Hermes-2-Mistral-7B-DPO-AWQ with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "solidrust/Nous-Hermes-2-Mistral-7B-DPO-AWQ"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "solidrust/Nous-Hermes-2-Mistral-7B-DPO-AWQ",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/solidrust/Nous-Hermes-2-Mistral-7B-DPO-AWQ

SGLang

How to use solidrust/Nous-Hermes-2-Mistral-7B-DPO-AWQ with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "solidrust/Nous-Hermes-2-Mistral-7B-DPO-AWQ" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "solidrust/Nous-Hermes-2-Mistral-7B-DPO-AWQ",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "solidrust/Nous-Hermes-2-Mistral-7B-DPO-AWQ" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "solidrust/Nous-Hermes-2-Mistral-7B-DPO-AWQ",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use solidrust/Nous-Hermes-2-Mistral-7B-DPO-AWQ with Docker Model Runner:
```
docker model run hf.co/solidrust/Nous-Hermes-2-Mistral-7B-DPO-AWQ
```

Add missing quant_config.json for compatibility with vLLM backends out of the box.

by vaclavkosar - opened Feb 24, 2024

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

-0

Add missing quant_config.json for compatibility with vLLM backends out of the box.aa2a3bfa

vaclavkosar

Feb 24, 2024

No description provided.

Suparious

SolidRusT Networks org Feb 27, 2024

Thank-you.

Suparious changed pull request status to merged Feb 27, 2024

vaclavkosar

Mar 24, 2024

Would you know how to AWQ Starling-LM-7B-beta? It seem that it could be a better model still.

Suparious

SolidRusT Networks org Mar 24, 2024

•

edited Mar 24, 2024

Would you know how to AWQ Starling-LM-7B-beta? It seem that it could be a better model still.

I just tested it at full bfloat16 and it doesn't seem to respond well, also it has a tiny context window (8192) compared to other Mistral fine tunes.

Today I compared Nous Hermes 2 Pro 7B with Gorilla LLM 7B, Raven v2 13B and Starling 7B.

did you try the Alpha version: TheBloke/Starling-LM-7B-alpha-AWQ

I can make a quant of the beta now if you like.

it is simple, as I just use the example script from the CasperHansen AutoAWQ repo.

https://github.com/SolidRusT/srt-model-quantizing.git

Suparious

SolidRusT Networks org Mar 24, 2024

•

edited Mar 24, 2024

OK, the 'Nexusflow/Starling-LM-7B-beta' model is in the AWQ quant queue now.

vaclavkosar

Mar 24, 2024

Would you know how to AWQ Starling-LM-7B-beta? It seem that it could be a better model still.

I just tested it at full bfloat16 and it doesn't seem to respond well, also it has a tiny context window (8192) compared to other Mistral fine tunes.

"Nous Hermes 2 - Mistral 7B - DPO" is fine-tune originaly from Mistral-7B-v0.1 which has 8k token context. Only the newer Mistral-7B-v0.2 has 32k context.

vaclavkosar

Mar 24, 2024

I tried the EagleX on CPU today. Incredibly slow.

Suparious

SolidRusT Networks org Mar 24, 2024

Just because the original Mistral model was limited to 16k context with a 4k sliding window, does not make fine-tune variants have the same limitations. This Nous Hermes 2 Pro handles up to 32k context.

I have only been able to use it with 16k context, due to a VRAM limitation. Maybe check some examples of LLlama with 128k context, to learn more about how these authors are widening the default context window.

This Starling quant is on it's way. uploading the AWQ now: https://huggingface.co/solidrust/Starling-LM-7B-beta-AWQ

vaclavkosar

Mar 24, 2024

Hermes-2-Pro-Mistral-7B is interesting, but I supect that for chat without functions DPO version will be better.

vaclavkosar

Mar 25, 2024

You were right, the Starling-LM-7B-beta-AWQ is not that good. It is very chatgpt like sounding and does not follow instructions. I am testing the Hermes-2-Pro-Mistral-7B.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment