Instructions to use skt/A.X-4.0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use skt/A.X-4.0 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="skt/A.X-4.0")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("skt/A.X-4.0")
model = AutoModelForCausalLM.from_pretrained("skt/A.X-4.0")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use skt/A.X-4.0 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "skt/A.X-4.0"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "skt/A.X-4.0",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/skt/A.X-4.0

SGLang

How to use skt/A.X-4.0 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "skt/A.X-4.0" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "skt/A.X-4.0",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "skt/A.X-4.0" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "skt/A.X-4.0",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use skt/A.X-4.0 with Docker Model Runner:
```
docker model run hf.co/skt/A.X-4.0
```

fix chat template

by minpeter - opened Oct 2, 2025

base: refs/heads/main

←

from: refs/pr/3

Discussion Files changed

+63

-1

minpeter

Oct 2, 2025

•

edited Oct 2, 2025

Combined conditional checks for add_generation_prompt, message.reasoning_content, and </think> to reduce nesting.
Ensured content is explicitly set when the condition is not met.
Updated tool output block to avoid string concatenation, outputting message.content as a separate template expression for better readability.

Update tokenizer_config.json27b69002

Create chat_template.jinjabf8fe505

Update chat_template.jinja0966401c

Update chat_template.jinja09db929b

Update chat_template.jinja5709760b

minpeter changed pull request status to open Oct 2, 2025

minpeter

Oct 2, 2025

In the current template, when both a tool call and a system prompt are used, "<|im_start|><|system|>" seems to be rendered twice. Is this the intended behavior?

minpeter

Oct 2, 2025

e.g.,

<|im_start|><|system|>당신은 도구 호출 기능을 갖춘 유용한 도우미입니다. 사용자의 요청을 처리하기 위해서 필요한 도구가 주어진 목록에 있는 경우 도구 호출로 응답하세요.
필요한 도구가 목록에 없는 경우에는 도구 호출 없이 사용자가 요구한 정보를 제공하세요.
필요한 도구가 목록에 있지만 해당 도구를 호출하는데 필요한 argument 정보가 부족한 경우 해당 정보를 사용자에게 요청하세요.
사용자의 요청을 처리하기 위해 여러번 도구를 호출할 수 있어야 합니다.
도구 호출 이후 도구 실행 결과를 입력으로 받으면 해당 결과를 활용하여 답변을 생성하세요.

다음은 접근할 수 있는 도구들의 목록 입니다:
<tools>
{"name": "get_weather", "description": "Get current weather information for a location", "parameters": {"type": "object", "properties": {"location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"}, "unit": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "The unit of temperature to use"}}, "required": ["location"]}}
</tools>

도구를 호출하려면 아래의 JSON으로 응답하세요.
도구 호출 형식: <tool_call>{"name": 도구 이름, "arguments": dictionary 형태의 도구 인자값}</tool_call><|im_end|><|im_start|><|system|>SYSTEM PROMPT - SYSTEM PROMPT<|im_end|>

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment