open-finance-llm-8b / README.md
jeanbaptdzd's picture
Cleanup: remove redundant docs, condense README
7a92d8e
---
title: Open Finance LLM 8B
emoji: πŸ‰
colorFrom: red
colorTo: red
sdk: docker
pinned: false
app_port: 7860
suggested_hardware: l4x1
---
# Open Finance LLM 8B
OpenAI-compatible API powered by DragonLLM/Qwen-Open-Finance-R-8B.
## Deployment
| Platform | Backend | Dockerfile | Use Case |
|----------|---------|------------|----------|
| Hugging Face Spaces | Transformers | `Dockerfile` | Development, L4 GPU |
| Koyeb | vLLM | `Dockerfile.koyeb` | Production, L40s GPU |
## Features
- OpenAI-compatible API
- Tool/function calling support
- Streaming responses
- Rate limiting (30 req/min, 500 req/hour)
- Statistics tracking via `/v1/stats`
## Quick Start
```bash
curl -X POST "https://your-endpoint/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "DragonLLM/Qwen-Open-Finance-R-8B",
"messages": [{"role": "user", "content": "What is compound interest?"}],
"max_tokens": 500
}'
```
```python
from openai import OpenAI
client = OpenAI(base_url="https://your-endpoint/v1", api_key="not-needed")
response = client.chat.completions.create(
model="DragonLLM/Qwen-Open-Finance-R-8B",
messages=[{"role": "user", "content": "What is compound interest?"}],
max_tokens=500
)
```
## Configuration
| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `HF_TOKEN_LC2` | Yes | - | Hugging Face token |
| `MODEL` | No | `DragonLLM/Qwen-Open-Finance-R-8B` | Model name |
| `PORT` | No | `8000` (vLLM) / `7860` (Transformers) | Server port |
**vLLM-specific (Koyeb):**
- `ENABLE_AUTO_TOOL_CHOICE=true` - Enable tool calling
- `TOOL_CALL_PARSER=hermes` - Parser for Qwen models
- `MAX_MODEL_LEN=8192` - Max context length
- `GPU_MEMORY_UTILIZATION=0.90` - GPU memory fraction
## API Endpoints
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/v1/models` | GET | List available models |
| `/v1/chat/completions` | POST | Chat completion |
| `/v1/stats` | GET | Usage statistics |
| `/health` | GET | Health check |
## Technical Specs
- **Model**: DragonLLM/Qwen-Open-Finance-R-8B (8B parameters)
- **vLLM Backend**: vllm-openai:latest with hermes tool parser
- **Transformers Backend**: 4.45.0+ with PyTorch 2.5.0+ (CUDA 12.4)
- **Minimum VRAM**: 20GB (L4), recommended 48GB (L40s)
## Development
```bash
pip install -r requirements.txt
uvicorn app.main:app --reload --port 8080
pytest tests/ -v
```
## License
MIT License