NVIDIA Nova Carter navigation in a warehouse
Divide objects based on specific properties
Multiple NVIDIA Nova Carter navigation in office
Stacking
Limo navigation with fallback in a warehouse
Pick and place
Simulations in NVIDIA Isaac Sim demonstrate the effective execution of generated behavior trees across both navigation and manipulation tasks in our BT Benchmark
The model takes as input a natural language task description of a robotic task along with the set of available robot action primitives, generating a ROS-2 compatible BT in XML format. The model is adapted using a QLoRA adapter while keeping pre-trained parameters frozen. Generated BT are initially validated at inference time, checking for syntax and action-space consistency before execution. Additionally, at runtime, an inline logger tracks stack traces and blackboard states. In case of errors, the runtime parser triggers subtree regeneration for recovery.
Starting with the TSE dataset, a new instruction-following dataset is created through four key steps: (1) cleanse the raw data using a Python XML parser, (2) for each original BT, use gpt-4o-mini to generate three variants, (3) repeat step 2 with the newly obtained dataset, (4) merge all resulting datasets while producing a natural-language description for each BT.
A representative example from the generated instruction-following dataset with its three components: the instruction that provides system contextual information, the input comprising a natural-language task description and its corresponding robot actions, and the output showcasing the generated XML-based behavior tree.
Average ROUGE and BLEU scores (mean ± std) with increasing dataset size:
ROUGE: 46.2 ± 1.94, 66.2 ± 1.94, 77.6 ± 0.80, 82.2 ± 0.75;
BLEU: 27.4 ± 1.02, 48.6 ± 1.85, 68.0 ± 1.10, 74.8 ± 0.75,
corresponding to 600, 1,413, 3,791, and 5,204 samples.
Standard deviation evaluated across 5 runs with temperature=0.9.
Overall, the results show that BTGenBot-2 delivers the strongest combination of reliability, structural fidelity, and efficiency across tasks and prompting settings. It achieves substantial gains over the previous BTGenBot and consistently outperforms proprietary models, establishing itself as a robust and effective tool for generating complex BTs from natural language instructions.
The figure presents two tasks simulated using NVIDIA Isaac Sim. The top panels illustrate a manipulation task where a robotic arm sorts cubes by color, while the bottom panels show a navigation task in which the iw.hub locates at least one box from specified positions. The left column shows the initial state, the center column displays intermediate states with the corresponding BT execution status, and the right column shows the final state. In the BT visualizations, green nodes indicate success, yellow nodes indicate a running process, and red nodes indicate failure.
SO-ARM 101 performing manipulation tasks using behavior trees generated by BTGenBot-2
@article{izzo2026btgenbot,
title={BTGenBot-2: Efficient Behavior Tree Generation with Small Language Models},
author={Izzo, Riccardo Andrea and Bardaro, Gianluca and Matteucci, Matteo},
journal={arXiv preprint arXiv:2602.01870},
year={2026}
}