Action Chunking Transformer

That's a basic model for solving simplest imitation learning tasks. The original implementations can be found here.

The model takes images from one or multiple cameras and robot state and produces a chunk of actions, which robot can execute as a sequence of movements in real world.

The model weights are random and provided only for testing purposes.

How to Use

Installation

uv pip install physicalai numpy

Running Inference

The following API example showcases inference API for this model:

import numpy as np
from physicalai.inference import InferenceModel

model = InferenceModel("act-fp16-ov", device="CPU")

# Build a dummy LIBERO-style observation.
# LIBERO provides two cameras (agentview + wrist) and an 8-dim robot state.
# Images use the LeRobot convention: float32 in [0, 1], shape (C, H, W).
observation = {
    "images.image": np.random.rand(1, 3, 256, 256).astype(np.float32),
    "images.image2": np.random.rand(1, 3, 256, 256).astype(np.float32),
    "state": np.zeros((1, 8), dtype=np.float32),
}

chunk = model.predict_action_chunk(observation)

Note that the model should be downloaded and saved to the act-fp16-ov folder prior to running this script.

Legal information

The original model is distributed under Apache 2.0 license.

Disclaimer

Intel is committed to respecting human rights and avoiding causing or contributing to adverse impacts on human rights. See Intel’s Global Human Rights Principles. Intel’s products and software are intended only to be used in applications that do not cause or contribute to adverse impacts on human rights.

Downloads last month
58
Video Preview
loading

Collection including OpenVINO/act-fp16-ov