PI0 Hanoi Policy Gradient Checkpoint (30k steps)
This is a checkpoint for the PI0 (Physical Intelligence 0) model trained on the Hanoi task using a subtask-based approach with policy gradient methods.
Model Details
- Task: Hanoi Tower puzzle (subtask decomposition)
- Training Steps: 30,000
- Model Type: Policy gradient with subtask learning
- Framework: JAX/Flax
- Dataset: hanoi_300_lerobot
- Architecture: Vision-Language-Action model with subtask masking
Key Features
- Subtask Learning: Decomposes Hanoi puzzle into manageable subtasks
- No Masking: Trained without subtask masking for better generalization
- Policy Gradient: Uses policy gradient optimization
- End-to-End: Learns from visual observations to actions
Checkpoint Structure
params/: Model parameterstrain_state/: Training stateassets/: Additional assets including normalization statistics_CHECKPOINT_METADATA: Checkpoint metadata
Usage
This checkpoint can be loaded using the appropriate JAX/Flax model loading utilities in your training pipeline.
Training Configuration
- Dataset: hanoi_300_lerobot
- Training approach: Subtask-based policy gradient
- Subtask masking: Disabled (no_masking)
- Total training steps: 30,000
- Model: PI0 with subtask decomposition
Model Card
This model is trained for the Hanoi Tower puzzle task using a subtask-based policy gradient approach. The model decomposes the complex Hanoi puzzle into simpler subtasks and learns to solve them sequentially.
Performance
- Training Steps: 30,000
- Architecture: Vision-Language-Action model with subtask learning
- Approach: Policy gradient without subtask masking
Advantages
- Subtask Decomposition: Breaks down complex tasks into manageable parts
- Better Generalization: No masking allows for more flexible learning
- Policy Gradient: Direct optimization of policy performance
- End-to-End: Learns complete task from visual input
Limitations
- Trained specifically on Hanoi Tower puzzle with subtask structure
- Performance may vary on different manipulation tasks
- Requires appropriate normalization statistics for inference
- Subtask decomposition may not generalize to all manipulation tasks
Technical Details
- Subtask Approach: Decomposes Hanoi puzzle into logical subtasks
- No Masking: Allows model to learn across subtask boundaries
- Policy Gradient: Direct policy optimization without value function
- Vision Input: Processes visual observations from robot cameras
- Action Output: Generates robot manipulation actions
Evaluation results
- Success Rate on Hanoi 300 LeRobot Datasetself-reported0.000