Qwen3-0.6B-Gensyn-Swarm the Agent-ID (tall_tame_panther)

Model GGUF Gensyn License

Model Overview

This model is a continuously trained Qwen3-0.6B fine-tuned using Gensyn RL-Swarm framework with GRPO (Generalized Reward Policy Optimization) and support GGUF (llama.cpp) for enhanced reasoning and mathematical capabilities. Note: Current training focuses on math & reasoning tasks.

  • Agent ID: tall_tame_panther
  • Training Status: 🟢 LIVE - Model updates automatically every 5-10 minutes
  • Auto-Sync GGUF Pipeline Status: 🟢 LIVE - Commits update automatically every 1h-hourly
  • Current Progress: Round 43,610+ / 100,000 (43,61%)
  • Framework Version: Gensyn RL-Swarm v0.6.4
  • Contract: SwarmCoordinator v0.4.2

Key Features

  • Real-time Training: Continuous learning with distributed RL across Gensyn swarm network
  • Multi-domain Reasoning: Trained on logic, mathematical problem-solving & reasoning tasks
  • GGUF Support: Multiple quantized formats available (F16, Q3_K_M, Q4_K_M, Q5_K_M)
  • llama.cpp Compatible: Ready for edge deployment and local inference
  • BF16 Precision: Trained with bfloat16 for optimal performance
  • TGI Compatible: Supports Text Generation Inference for production deployment
  • Chat Format Support: Inherits Qwen3 chat template for conversational use

Training Data

The model is trained on a composite dataset (1,000 samples) with weighted sampling strategy:

Dataset Weight Focus Area
Propositional Logic 7 Logical reasoning, truth tables, Boolean operations
Calendar Arithmetic 6 Date calculations, leap years, recurring events
Decimal Arithmetic 5 Multi-term decimal operations with precision
Base Conversion 4 Number system conversions (base 2-16)
Fraction Simplification 4 GCD/LCM, fraction reduction
Basic Arithmetic 2 Foundation operations with parentheses

Total Dataset Size: 1,000 composite samples
Training Samples per Round: 2
Evaluation: Real-time via swarm coordination

Quick Start

Standard Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "0xgr3y/Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("0xgr3y/Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther")

prompt = "What is 3/4 simplified to lowest terms?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=256, temperature=0.6, top_p=0.95)
print(tokenizer.decode(outputs, skip_special_tokens=True))

Chat Format (Conversational)

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("0xgr3y/Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther")
tokenizer = AutoTokenizer.from_pretrained("0xgr3y/Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther")

messages = [
    {"role": "system", "content": "You are a helpful math tutor."},
    {"role": "user", "content": "Explain how to simplify 24/36 step by step."}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=512)
print(tokenizer.decode(outputs))

Text Generation Inference (TGI)

docker run -d --gpus all \
  -p 8080:80 \
  -v $PWD/data:/data \
  ghcr.io/huggingface/text-generation-inference:latest \
  --model-id 0xgr3y/Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther \
  --max-input-length 4096 \
  --max-total-tokens 8192

GGUF with llama.cpp

# Download quantized model (recommended: Q4_K_M)
wget https://huggingface.co/0xgr3y/Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther/resolve/main/Qwen3-0.6B-Gensyn-Swarm-Q4_K_M.gguf

# Run inference
./llama-cli -m Qwen3-0.6B-Gensyn-Swarm-Q4_K_M.gguf \
  -p "Solve: (5 + 3) * 2 = ?" \
  --temp 0.6 --top-p 0.95

Ollama

# Create Modelfile
cat > Modelfile << 'EOF'
FROM ./Qwen3-0.6B-Gensyn-Swarm-Q4_K_M.gguf
PARAMETER temperature 0.6
PARAMETER top_p 0.95
PARAMETER top_k 20
SYSTEM "You are a helpful assistant specialized in mathematical reasoning and logic."
EOF

# Create and run
ollama create qwen3-swarm -f Modelfile
ollama run qwen3-swarm "What is 15 multiplied by 23?"

Available Formats

Format Size Precision Use Case Download
Safetensors (BF16) 1.19 GB BF16 Full precision training/fine-tuning model.safetensors
GGUF F16 1.14 GB FP16 High quality inference Qwen3-0.6B-Gensyn-Swarm-F16.gguf
GGUF Q5_K_M 444 MB 5-bit Balanced quality/size Qwen3-0.6B-Gensyn-Swarm-Q5_K_M.gguf
GGUF Q4_K_M 397 MB 4-bit Recommended for production Qwen3-0.6B-Gensyn-Swarm-Q4_K_M.gguf
GGUF Q3_K_M 347 MB 3-bit Smallest, fastest Qwen3-0.6B-Gensyn-Swarm-Q3_K_M.gguf

All GGUF formats are llama.cpp compatible and auto-updated hourly.

GGUF Quantization Strategy

The Q5_K_M format uses mixed precision for optimal quality:

  • Token Embeddings: Q6_K (high quality vocab representation)
  • Attention Weights: Q5_K (balanced quality/size)
  • Feed-Forward: Q5_K/Q6_K (mixed for optimal performance)
  • Layer Norms: F32 (full precision for stability)

This strategy ensures minimal quality loss while maintaining small file size.

Chat Format & Conversational Use

This model inherits Qwen3's chat template for structured conversations.

Format Structure

<|im_start|>system
{system_message}
<|im_end|>
<|im_start|>user
{user_message}
<|im_end|>
<|im_start|>assistant
{assistant_response}
<|im_end|>

Chat Template Features

  • System Instructions: Guide model behavior with system messages
  • Multi-turn Dialogue: Maintains conversation context
  • Tool Calling: Support function calling (if enabled in training)
  • Reasoning Mode: <think> tags for chain-of-thought (experimental)

Note: While the model supports chat format structurally, optimal conversational performance depends on whether training data included formatted dialogues. Current training focuses on math/reasoning tasks.

Training Configuration

Gensyn RL-Swarm Architecture

Training Framework:
  Method: GRPO (Generalized Reward Policy Optimization)
  Base Model: Qwen/Qwen3-0.6B
  Training Regime: bfloat16 mixed precision
  Max Rounds: 100,000
  Update Frequency: Every 5-10 minutes
  Generations per Round: 2
  Seed: 42

Blockchain Integration:
  Network: Gensyn Testnet
  Chain ID: 685685
  Contract: SwarmCoordinator v0.4.2

Swarm Communication:
  Framework: Hivemind P2P Backend
  Initial Peers: 3 bootnodes
  Beam Size: 30

Reward System:
  Manager: DefaultRewardManager
  Reward Function: RGRewards (Reasoning Gym)
  Judge API: https://swarm-judge.internal-apps-central1.clusters.gensyn.ai

Model Hyperparameters

Architecture:
  Hidden Size: 1024
  Intermediate Size: 3072
  Layers: 28
  Attention Heads: 16
  KV Heads: 8
  Head Dimension: 128
  Context Length: 40,960 tokens
  Vocabulary: 151,936 tokens

GRPO Config:
  Epsilon: 0.2
  Epsilon High: 0.28
  Gradient Checkpointing: Enabled
  
Generation:
  Temperature: 0.6
  Top-K: 20
  Top-P: 0.95

Model Capabilities

This model excels at:

  1. Logical Reasoning: Propositional logic, truth evaluation, Boolean algebra
  2. Mathematical Operations: Multi-precision arithmetic, decimal calculations, fractions
  3. Number Systems: Base conversion (binary, octal, decimal, hexadecimal)
  4. Date/Time Calculations: Calendar arithmetic, leap years, day-of-week
  5. Step-by-step Problem Solving: Chain-of-thought reasoning
  6. Conversational Tutoring: Interactive problem-solving (via chat format)

Limitations

  • Specialized Domain: Optimized for reasoning/math; may underperform on creative writing
  • Training in Progress: Weights update every 5-10 minutes; performance varies
  • Scale: 0.6B parameters - suitable for edge but not SOTA for complex reasoning
  • Experimental: Decentralized RL training; behavior less predictable than supervised models
  • Context: Best performance within 4K tokens (full 40K supported)

Update Schedule

Format Frequency Trigger
Safetensors (BF16) Every 5-10 min Automatic via RL-Swarm
GGUF (all formats) Every 1 hour Auto-conversion pipeline

Auto-Conversion Pipeline:

  1. Monitors repo for new training commits
  2. Downloads latest model.safetensors
  3. Converts to F16 GGUF base
  4. Quantizes to Q3_K_M, Q4_K_M, Q5_K_M
  5. Uploads all formats

Check commit history for exact timestamps.

Gensyn RL-Swarm Technical Details

Architecture Components

  1. Game Manager: Orchestrates training rounds and swarm coordination
  2. Trainer: GRPO implementation for policy optimization
  3. Data Manager: Dataset loading and weighted sampling
  4. Reward Manager: Computes rewards via judge API
  5. Coordinator: Blockchain integration for swarm state
  6. P2P Backend: Hivemind DHT for model sharing

Training Process

1. Agent joins swarm via P2P network
2. Coordinator assigns round via smart contract
3. Agent samples data from weighted datasets
4. Model generates 2 responses
5. Judge API evaluates and assigns rewards
6. GRPO updates policy based on rewards
7. Updated model shared via DHT
8. Best checkpoint saved to HuggingFace
9. Repeat

Decentralization Benefits

  • Fault Tolerance: Multiple agents; no single point of failure
  • Diverse Exploration: Different agents explore different strategies
  • Collective Intelligence: Agents learn from each other
  • Transparent: All rounds verified on-chain

Swarm Agent: tall_tame_panther
Contract: SwarmCoordinator v0.4.2

Technical Specifications

Software Stack

  • Framework: Gensyn RL-Swarm v0.6.4
  • Library: transformers v4.51+
  • P2P: hivemind
  • Blockchain: Gensyn testnet
  • Config: Hydra + OmegaConf
  • Logging: WandB integration

Hardware Requirements

Training GPU:

  • GPU: NVIDIA 4090 24GB+ (BF16 training)
  • RAM: 16GB+
  • Cores: 10+
  • Storage: 50GB SSD
  • Network: High bandwidth for P2P

Training CPU Optimize:

  • CPU: INTEL or AMD
  • Cores: 10+
  • RAM: 16GB+
  • Storage: 50GB SSD
  • Network: High bandwidth for P2P

Inference:

  • Safetensors: 8GB VRAM (GPU) / 16GB RAM (CPU)
  • GGUF Q4_K_M: 2GB VRAM (GPU) / 4GB RAM (CPU)
  • GGUF Q3_K_M: 3GB RAM (CPU-only)

Evaluation

Training Progress Metrics

Metric Value Target
Completed Rounds 43,610+ 100,000
Training Progress 43.61% 100%
Update Frequency 5-10 min Continuous

Note: Formal evaluation benchmarks (GSM8K, MATH, etc.) will be added as training progresses. Current metrics track training rounds completed in the decentralized swarm.

Reproducibility

To reproduce training:

  1. Clone Gensyn RL-Swarm repository
  2. Install: pip install -r requirements.txt
  3. Configure rgym_exp/config/rg-swarm.yaml
  4. Configure rgym_exp/src/datasets.yaml
  5. Set environment variables:
export HUGGINGFACE_ACCESS_TOKEN=<token>
export MODEL_NAME=Qwen/Qwen3-0.6B
export ORG_ID=<org-id>
export SWARM_CONTRACT=<contract-address>
  1. Run: bash run_rl_swarm.sh

Note: Exact reproduction requires same seed (42), dataset config, and swarm state.

Citation

@misc{qwen3-gensyn-swarm-2025,
  author = {0xgrey},
  title = {Qwen3-0.6B-Gensyn-Swarm: Continuous RL Training on Distributed Swarm},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/0xgr3y/Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther}},
  note = {Agent ID: tall\_tame\_panther}
}

@misc{gensyn-rl-swarm-2025,
  title = {Gensyn RL-Swarm: Decentralized Reinforcement Learning Framework},
  author = {Gensyn AI},
  year = {2025},
  url = {https://gensyn.ai}
}

References

License

Apache 2.0 - See LICENSE

Contact

  • Developer: 0xgrey
  • Agent ID: tall_tame_panther
  • Community: Gensyn Discord

⚠️ Important: This is a continuously trained model. For reproducibility, specify commit hash:

git clone https://huggingface.co/0xgr3y/Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther
cd Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther
git checkout <commit-hash>

🤖 Trained with ❤️ using Gensyn RL-Swarm

Gensyn

Downloads last month
1,265
Safetensors
Model size
0.6B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for 0xgr3y/Qwen3-0.6B-Gensyn-Swarm-tall_tame_panther

Finetuned
Qwen/Qwen3-0.6B
Quantized
(208)
this model

Evaluation results

  • Completed Training Rounds on Composite Reasoning Dataset
    self-reported
    43610.000
  • Target Rounds on Composite Reasoning Dataset
    self-reported
    100000.000
  • Training Progress (%) on Composite Reasoning Dataset
    self-reported
    43.610