Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm Agent-ID (tall_tame_panther)

Gensyn RL-Swarm: Training & GGUF Quantized LLMs for Inference

Model GGUF llama.cpp Gensyn version License

Gensyn


Model Overview

Our pick an experimental (advanced) mode at this model a continuously trained Qwen2.5-Coder-0.5B-Instruct fine-tuned using Gensyn RL-Swarm framework with GRPO (Group Relative Policy Optimization) and supported format GGUF (llama.cpp) for enhanced code generation capabilities. Note: Current training focuses on programming challenges with adaptive weighted sampling.

  • Agent ID: tall_tame_panther
  • Training Status: ๐ŸŸข LIVE - Model updates automatically every 5-10 minutes
  • Auto-Sync GGUF Pipeline Status: ๐ŸŸข LIVE - Commits update automatically every hour
  • Current Progress: Round 13,533+ / 100,000 (13.53%)
  • Framework Version: Gensyn RL-Swarm v0.7.0
  • Contract: SwarmCoordinator v0.4.2

Key Features

  • Real-time Training: Continuous learning with distributed RL across Gensyn swarm network
  • Adaptive System: Dynamic quality enhanced and dataset weighting for optimal learning
  • Multi-domain Coding: Trained on MBPP and CodeContests datasets with adaptive sampling
  • GGUF Support: Multiple quantized formats available (F16, Q3_K_M, Q4_K_M, Q5_K_M, Q6_K)
  • llama.cpp Compatible: Ready for edge deployment and local inference
  • BF16 Precision: Trained with bfloat16 for optimal performance
  • TGI Compatible: Supports Text Generation Inference for production deployment
  • Chat Format Support: Inherits Qwen2.5 chat template for conversational use

Training Data

The model is trained on a composite dataset with adaptive weighted sampling strategy:

Dataset Initial Weight Adaptive Range Focus Area
MBPP 5 4-6 Basic Python programming problems with test cases
CodeContests 5 4-6 Competitive programming challenges

Total Dataset Size: Streaming datasets with infinite iteration
Training Samples per Round: 2
Evaluation: Real-time via Swarm Coordination with Ollama-based evaluator else Judge

Adaptive Sampling Strategy

"When the solvers perform well, the proposer automatically increases the difficulty to keep challenging solvers to get better over time." - CodeZero-blog

The implementation features an adaptive sampling system that adjusts dataset weights based on performance
The system monitors performance metrics every 5 rounds and adjusts the dataset weights to maintain optimal learning balance
- Update dataset weights based on recent performance
- Calculate recent average performance for each dataset
- Adjust/use weighted sampling if adaptive, based on perform difference
- Performance better on MBPP (Mostly Basic Python Problems)
- Performance better on CodeContests
- Update dataset weights every rounds & keep balanced

Adaptive Reward System

Quality Enhanced Implementation

"Rewards are derived from multiple lightweight checks, ranging from code validity and formatting to alignment with the problem statement, combined into a single interpretable score." - CodeZero-blog

The reward system includes a quality data enhanced mechanism that evaluates code structure and documentation
- Calculate quality data enhanced for well-structured code
- Documentation enhanced
- Structure enhanced
- Algorithmic efficiency (simple heuristic)
- Scale with base reward to avoid inflation

Adaptive Threshold System

The system also includes an adaptive threshold mechanism that adjusts based on recent performance
- Function adaptive threshold based on recent performance
- Performance quality data is consistently high

Quick Performance Simulation

Reward Comparison

Based on our simulation with 1000 samples, the adaptive reward system shows significant improvement

System MBPP Avg Reward CodeContests Avg Reward Overall Avg Reward Improvement
Original 0.234 -0.156 0.039 -
Adaptive 0.312 -0.098 0.107 ~174%

Training Progress

Based on the logs provided, the model shows consistent progress:

Metric data visualize train/loss by Weights & Biases (WanDB)

  • Soon LIVE!
[2025-11-14 04:22:50,632][genrl.logging_utils.global_defs][INFO] - __ Joining round: 13053
[2025-11-14 04:23:50,633][genrl.logging_utils.global_defs][INFO] - Starting round: 13053/100000.
Map: 100%|______________________________________| 1/1 [00:00<00:00, 158.65 examples/s]
Map: 100%|______________________________________| 1/1 [00:00<00:00, 191.92 examples/s]
[2025-11-14 04:25:12,646][genrl.logging_utils.global_defs][INFO] - pushing model to huggingface
Processing Files (1 / 1)      : 100%|___|  988MB /  988MB, 94.3MB/s
New Data Upload               : 100%|___|  983MB /  983MB, 94.3MB/s  
.....kpb5lid/model.safetensors: 100%|___|  988MB /  988MB, 94.3MB/s
[2025-11-14 04:27:01,877][genrl.logging_utils.global_defs][INFO] - Already finished round: 13053. Next check in 160.0s.

Quick Start Inferences

Standard Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
    "0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther")
prompt = "Write a function to calculate the factorial of a number."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=256, temperature=0.7, top_p=0.8)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Chat Format (Conversational)

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther")
tokenizer = AutoTokenizer.from_pretrained("0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther")
messages = [
    {"role": "system", "content": "You are an expert Python programmer."},
    {"role": "user", "content": "Write a function to check if a string is a palindrome."}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=512)
print(tokenizer.decode(outputs[0]))

Text Generation Inference (TGI)

docker run -d --gpus all \
  -p 8080:80 \
  -v $PWD/data:/data \
  ghcr.io/huggingface/text-generation-inference:latest \
  --model-id 0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther \
  --max-input-length 4096 \
  --max-total-tokens 8192

GGUF with LLAMA.CPP

# Download quantized model (recommended: Q4_K_M)
wget https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther/resolve/main/Qwen2.5-Coder-0.5B-Q4_K_M.gguf
# Run inference
./llama-cli -m Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-Q4_K_M.gguf \
  -p "Write a function to implement binary search in Python." \
  --temp 0.7 --top-p 0.8

Ollama

# Create Modelfile
cat > Modelfile << 'EOF'
FROM ./0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther/Qwen2.5-Coder-0.5B-Q4_K_M.gguf
PARAMETER temperature 0.7
PARAMETER top_p 0.8
PARAMETER top_k 20
SYSTEM "You are an expert Python programmer who writes clean, documented code."
EOF
# Create and run
ollama create qwen2.5-coder-swarm -f Modelfile
ollama run qwen2.5-coder-swarm "Write a function to calculate the factorial of a number."

Available GGUF Quantization

Format Size Precision Use Case Download
Safetensors (BF16) 988 MB BF16 Full precision training/fine-tuning model.safetensors
GGUF F16 994 MB FP16 High quality inference Qwen2.5-Coder-0.5B-F16.gguf
GGUF Q6_K 506 MB 6-bit High quality compression Qwen2.5-Coder-0.5B-Q6_K.gguf
GGUF Q5_K_M 420 MB 5-bit Balanced quality/size Qwen2.5-Coder-0.5B-Q5_K_M.gguf
GGUF Q4_K_M 398 MB 4-bit Recommended for production Qwen2.5-Coder-0.5B-Q4_K_M.gguf
GGUF Q3_K_M 355 MB 3-bit Smallest, fastest Qwen2.5-Coder-0.5B-Q3_K_M.gguf

All GGUF formats are llama.cpp is compatible ready to use Inferences chat and auto-update be hourly.

Chat Format & Conversational

This model inherits Qwen2.5's chat template for structured conversations.

Format Structure

<|im_start|>system
{system_message}
<|im_end|>
<|im_start|>user
{user_message}
<|im_end|>
<|im_start|>assistant
{assistant_response}
<|im_end|>

Chat Template Features

  • System Instructions: Guide model behavior with system messages
  • Multi-turn Dialogue: Maintains conversation context
  • Tool Calling: Support function calling (if enabled in training)
  • Code Generation: Optimized for generating Python code

Note: While model supports chat format structurally, optimal conversational performance depends on whether training data included formatted dialogues. Current training focuses on programming challenges.

Gensyn RL-Swarm Quick-Architecture

Training Framework:
- Method: GRPO (Group Relative Policy Optimization)
- Base Model: Qwen/Qwen2.5-Coder-0.5B-Instruct
- Training Regime: bfloat16 mixed precision
- Max Rounds: 100000
- Update Frequency: Every 5-10 minutes
- Generations per Round: 2
- Batch size: Combine
- Tree-based Model: 2 tree
- Seed: 42
Blockchain Integration:
- Network: Gensyn Testnet
- Chain ID: 685685
- Contract: SwarmCoordinator v0.4.2
Swarm Communication:
- Framework: Hivemind P2P Backend
- Initial Peers: 3 bootnodes
- Beam Size: 10
Reward System:
- Manager: RewardManager (SwarmGameManager/CodeGenerationRewards)
- Reward Function: Adaptive with quality enhanced
- Evaluator: Ollama (qwen2.5-coder:1.5b-instruct)
- Judge API: https://codezero-judge.gensyn.ai

Model Capabilities

This model excels at:

  1. Basic Python Programming: Functions, loops, conditionals, data structures
  2. Algorithm Implementation: Sorting, searching, graph algorithms
  3. String Manipulation: Pattern matching, parsing, formatting
  4. Mathematical Functions: Calculations, conversions, formulas
  5. Code Documentation: Writing clear, commented functions
  6. Problem Solving: Breaking down complex problems into manageable steps

Limitations

  • Specialized Domain: Optimized for programming challenges; may underperform on creative writing
  • Training in Progress: Weights update every 5-10 minutes; performance varies
  • Scale: 0.5B parameters - suitable for edge but not SOTA for complex programming
  • Experimental: Decentralized RL training; behavior less predictable than supervised models
  • Context: Best performance within 4K tokens (full 32K supported)

Update Schedule

Format Frequency Trigger
Safetensors (BF16) Every 5-10 min Automatic via RL-Swarm
GGUF (all formats) Every 3 hour Auto-conversion pipeline

Auto-Conversion Pipeline:

  1. Monitors repo for new training commits
  2. Downloads latest model.safetensors
  3. Converts to F16 GGUF base
  4. Quantizes to Q3_K_M, Q4_K_M, Q5_K_M, Q6_K
  5. Standar formats

Check commit history for exact timestamps.

Architecture Components

  1. Game Manager: Orchestrates training rounds and swarm coordination
  2. Trainer: GRPO implementation for policy optimization
  3. Data Manager: Dataset loading with adaptive weighted sampling
  4. Reward Manager: Computes rewards via Ollama evaluator with quality enhanced
  5. Coordinator: Blockchain integration for swarm state
  6. P2P Backend: Hivemind DHT for model sharing

Training Process

1. Agent joins swarm via P2P network
2. Coordinator assigns round via smart contract
3. Agent samples data from adaptive weighted datasets
4. Model generates 2 responses
5. Ollama evaluator assesses and assigns rewards with quality enhanced
6. GRPO updates policy based on rewards
7. Updated model shared via DHT
8. Best checkpoint saved to HuggingFace
9. Repeat

Decentralization Benefits

  • Fault Tolerance: Multiple agents; no single point of failure
  • Diverse Exploration: Different agents explore different strategies
  • Collective Intelligence: Agents learn from each other
  • Transparent: All rounds verified on-chain

Software Stack

  • Framework: Gensyn RL-Swarm v0.7.0
  • Library: transformers v4.57.1
  • P2P: hivemind
  • Blockchain: Gensyn testnet
  • Config: Hydra + OmegaConf
  • Logging: WandB integration

Hardware Requirements

Training GPU:

  • GPU: NVIDIA 4090 24GB+ (BF16 training)
  • RAM: 16GB+
  • Cores: 10+
  • Storage: 50GB SSD
  • Network: High bandwidth for P2P

Training CPU Optimize:

  • CPU: INTEL or AMD
  • Cores: 10+
  • RAM: 16GB+
  • Storage: 50GB SSD
  • Network: High bandwidth for P2P

Inference:

  • Safetensors: 8GB VRAM (GPU) / 16GB RAM (CPU)
  • GGUF Q4_K_M: 2GB VRAM (GPU) / 4GB RAM (CPU)
  • GGUF Q3_K_M: 3GB RAM (CPU-only)

Training Progress Metrics

Metric Value Target
Completed Rounds 13,533+ 100,000
Training Progress 13.53% 100%
Update Frequency 5-10 min Continuous

Note: average@k: Average performance across k attempts, measuring consistency. pass@k: Probability of at least one correct solution in k attempts, measuring capability.Current metrics track training rounds completed in decentralized swarm.

Adaptive Reward Performance

Our adaptive reward system has shown approximately ~174% improvement in reward scores compared to the baseline system:

Original:
  Overall Avg Reward: 0.039
  MBPP Avg Reward: 0.234
  CodeContests Avg Reward: -0.156
Adaptive:
  Overall Avg Reward: 0.107
  MBPP Avg Reward: 0.312
  CodeContests Avg Reward: -0.098
Improvement: 0.068 (~174% increase)

Citation

@misc{qwen2.5-coder-gensyn-swarm-2025,
  author = {0xgrey},
  title = {Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm: Continuous RL Training on Distributed Swarm with Adaptive Rewards},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther}},
  note = {Agent ID: tall\_tame\_panther}
}
@misc{gensyn-rl-swarm-2025,
  title = {Gensyn RL-Swarm: Decentralized Reinforcement Learning Framework},
  author = {Gensyn AI},
  year = {2025},
  url = {https://gensyn.ai}
}
@misc{codezero-2025,
  title = {CodeZero: A Collaborative Coding Environment for Distributed RL},
  author = {Gensyn AI},
  year = {2025},
  url = {https://docs.gensyn.ai/testnet/rl-swarm/how-it-works/codezero}
}

References

Contact

  • Developer: 0xgrey
  • Agent ID: tall_tame_panther
  • Community: Gensyn Discord

โš ๏ธ Important: This is a continuously trained model. For reproducibility, specify commit hash:

git clone https://huggingface.co/0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther
cd Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther
git checkout <commit-hash>

Trained with ๐Ÿฉท using Gensyn RL-Swarm

Gensyn

Downloads last month
3,997
Safetensors
Model size
0.5B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for 0xgr3y/Qwen2.5-Coder-0.5B-Instruct-Gensyn-Swarm-tall_tame_panther

Base model

Qwen/Qwen2.5-0.5B
Quantized
(31)
this model