Spaces:

jeanbaptdzd
/

open-finance-llm-8b

Paused

App Files Files Community

open-finance-llm-8b / README.md

jeanbaptdzd

Cleanup: remove redundant docs, condense README

7a92d8e 16 days ago

preview code

raw

history blame contribute delete

2.46 kB

	---
	title: Open Finance LLM 8B
	emoji: 🐉
	colorFrom: red
	colorTo: red
	sdk: docker
	pinned: false
	app_port: 7860
	suggested_hardware: l4x1
	---

	# Open Finance LLM 8B

	OpenAI-compatible API powered by DragonLLM/Qwen-Open-Finance-R-8B.

	## Deployment

	\| Platform \| Backend \| Dockerfile \| Use Case \|
	\|----------\|---------\|------------\|----------\|
	\| Hugging Face Spaces \| Transformers \| `Dockerfile` \| Development, L4 GPU \|
	\| Koyeb \| vLLM \| `Dockerfile.koyeb` \| Production, L40s GPU \|

	## Features

	- OpenAI-compatible API
	- Tool/function calling support
	- Streaming responses
	- Rate limiting (30 req/min, 500 req/hour)
	- Statistics tracking via `/v1/stats`

	## Quick Start

	```bash
	curl -X POST "https://your-endpoint/v1/chat/completions" \
	-H "Content-Type: application/json" \
	-d '{
	"model": "DragonLLM/Qwen-Open-Finance-R-8B",
	"messages": [{"role": "user", "content": "What is compound interest?"}],
	"max_tokens": 500
	}'
	```

	```python
	from openai import OpenAI

	client = OpenAI(base_url="https://your-endpoint/v1", api_key="not-needed")
	response = client.chat.completions.create(
	model="DragonLLM/Qwen-Open-Finance-R-8B",
	messages=[{"role": "user", "content": "What is compound interest?"}],
	max_tokens=500
	)
	```

	## Configuration

	\| Variable \| Required \| Default \| Description \|
	\|----------\|----------\|---------\|-------------\|
	\| `HF_TOKEN_LC2` \| Yes \| - \| Hugging Face token \|
	\| `MODEL` \| No \| `DragonLLM/Qwen-Open-Finance-R-8B` \| Model name \|
	\| `PORT` \| No \| `8000` (vLLM) / `7860` (Transformers) \| Server port \|

	vLLM-specific (Koyeb):
	- `ENABLE_AUTO_TOOL_CHOICE=true` - Enable tool calling
	- `TOOL_CALL_PARSER=hermes` - Parser for Qwen models
	- `MAX_MODEL_LEN=8192` - Max context length
	- `GPU_MEMORY_UTILIZATION=0.90` - GPU memory fraction

	## API Endpoints

	\| Endpoint \| Method \| Description \|
	\|----------\|--------\|-------------\|
	\| `/v1/models` \| GET \| List available models \|
	\| `/v1/chat/completions` \| POST \| Chat completion \|
	\| `/v1/stats` \| GET \| Usage statistics \|
	\| `/health` \| GET \| Health check \|

	## Technical Specs

	- Model: DragonLLM/Qwen-Open-Finance-R-8B (8B parameters)
	- vLLM Backend: vllm-openai:latest with hermes tool parser
	- Transformers Backend: 4.45.0+ with PyTorch 2.5.0+ (CUDA 12.4)
	- Minimum VRAM: 20GB (L4), recommended 48GB (L40s)

	## Development

	```bash
	pip install -r requirements.txt
	uvicorn app.main:app --reload --port 8080
	pytest tests/ -v
	```

	## License

	MIT License