PI0 Hanoi Policy Gradient Checkpoint (30k steps)

This is a checkpoint for the PI0 (Physical Intelligence 0) model trained on the Hanoi task using a subtask-based approach with policy gradient methods.

Model Details

  • Task: Hanoi Tower puzzle (subtask decomposition)
  • Training Steps: 30,000
  • Model Type: Policy gradient with subtask learning
  • Framework: JAX/Flax
  • Dataset: hanoi_300_lerobot
  • Architecture: Vision-Language-Action model with subtask masking

Key Features

  • Subtask Learning: Decomposes Hanoi puzzle into manageable subtasks
  • No Masking: Trained without subtask masking for better generalization
  • Policy Gradient: Uses policy gradient optimization
  • End-to-End: Learns from visual observations to actions

Checkpoint Structure

  • params/: Model parameters
  • train_state/: Training state
  • assets/: Additional assets including normalization statistics
  • _CHECKPOINT_METADATA: Checkpoint metadata

Usage

This checkpoint can be loaded using the appropriate JAX/Flax model loading utilities in your training pipeline.

Training Configuration

  • Dataset: hanoi_300_lerobot
  • Training approach: Subtask-based policy gradient
  • Subtask masking: Disabled (no_masking)
  • Total training steps: 30,000
  • Model: PI0 with subtask decomposition

Model Card

This model is trained for the Hanoi Tower puzzle task using a subtask-based policy gradient approach. The model decomposes the complex Hanoi puzzle into simpler subtasks and learns to solve them sequentially.

Performance

  • Training Steps: 30,000
  • Architecture: Vision-Language-Action model with subtask learning
  • Approach: Policy gradient without subtask masking

Advantages

  • Subtask Decomposition: Breaks down complex tasks into manageable parts
  • Better Generalization: No masking allows for more flexible learning
  • Policy Gradient: Direct optimization of policy performance
  • End-to-End: Learns complete task from visual input

Limitations

  • Trained specifically on Hanoi Tower puzzle with subtask structure
  • Performance may vary on different manipulation tasks
  • Requires appropriate normalization statistics for inference
  • Subtask decomposition may not generalize to all manipulation tasks

Technical Details

  • Subtask Approach: Decomposes Hanoi puzzle into logical subtasks
  • No Masking: Allows model to learn across subtask boundaries
  • Policy Gradient: Direct policy optimization without value function
  • Vision Input: Processes visual observations from robot cameras
  • Action Output: Generates robot manipulation actions
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Evaluation results