--- license: mit tags: - robotics - pi-zero - diffusion - vision-language-action - aloha - manipulation - bolt-nut-sorting base_model: google/paligemma-3b-pt-224 library_name: openpi pipeline_tag: robotics --- # Pi-0 Bolt Nut Sort Model This is a Pi-0 (Pi-Zero) model trained for bolt and nut sorting tasks using the OpenPI framework. ## Model Description - **Architecture**: Pi-0 (diffusion-based vision-language-action model) - **Base Model**: PaLiGemma 3B with SigLIP vision encoder - **Task**: Sorting bolts and nuts into separate baskets - **Robot**: Dual-arm ALOHA setup - **Action Space**: 14-DoF (7 per arm: 6 joints + 1 gripper) - **Training Steps**: 29,999 - **Action Horizon**: 50 steps - **Image Resolution**: 224x224 ## Dataset Trained on the `naungth/pi0_bolt_nut_sort` dataset with the task instruction: "sort the bolts and the nuts into separate baskets" ## Usage ### With OpenPI ```python from openpi.policies import policy_config from openpi.training import config # Load the model configuration config_name = "pi0_bns" train_config = config.get_config(config_name) # Create policy from your local checkpoint policy = policy_config.create_trained_policy( train_config, "path/to/checkpoint", default_prompt="sort the bolts and the nuts into separate baskets" ) # Use for inference observation = { "images": { "cam_high": image_array, # [H, W, 3] uint8 "cam_left_wrist": left_wrist_image, # [H, W, 3] uint8 "cam_right_wrist": right_wrist_image, # [H, W, 3] uint8 }, "state": joint_positions, # [14] float32 "prompt": "sort the bolts and the nuts into separate baskets" } actions = policy.infer(observation)["actions"] # [50, 14] ``` ### With Policy Server ```bash # Start the policy server uv run scripts/serve_policy.py policy:checkpoint \ --policy.config=pi0_bns \ --policy.dir=path/to/checkpoint # Use with client from openpi_client import websocket_client_policy client = websocket_client_policy.WebsocketClientPolicy("localhost", 8000) actions = client.infer(observation) ``` ## Model Architecture - **Vision Encoder**: SigLIP-So400m/14 - **Language Model**: Gemma 2B + Gemma 300M (action expert) - **Training**: Diffusion-based action prediction - **Input**: Multi-camera RGB + proprioception + language instruction - **Output**: Future action sequence (50 timesteps) ## Training Details - **Framework**: JAX/Flax with OpenPI - **Optimizer**: AdamW - **Base Checkpoint**: Pi-0 base model from Google - **Fine-tuning**: Task-specific fine-tuning on bolt nut sort data - **Normalization**: Dataset-specific state/action normalization ## License MIT License ## Citation If you use this model, please cite: ```bibtex @article{pi0, title={Pi-Zero: A Diffusion-Based Policy for Robot Manipulation}, author={TODO: Add authors}, year={2024} } ``` ## Acknowledgments - Built using the [OpenPI](https://github.com/google-deepmind/openpi) framework - Based on the Pi-0 architecture - Training data from bolt nut sorting demonstrations