Understanding one's physical form: A novel vision-based system trains robots to comprehend their own mechanical structures

In a groundbreaking development, researchers at the Massachusetts Institute of Technology's Computer Science and Artificial Intelligence Laboratory (CSAIL) have created a visual-based system that enables soft robots to learn their own internal model for robotic control, using only visual data from a single camera. This system, known as Neural Jacobian Fields (NJF), opens up a new era of robotics, moving away from traditional programming methods and towards a more intuitive teaching approach.

At the heart of NJF is a neural network that captures two intertwined aspects of a robot's embodiment: its three-dimensional geometry and its sensitivity to control inputs. By training self-supervised from video streams, the system infers control-motion relationships purely from visual data of the robot's own behavior.

The soft robotic hand at CSAIL is controlled by a single camera and does not contain mechanical design or embedded sensors. The hand runs at approximately 12 Hertz, allowing for real-time closed-loop control. The system has proven robust across a range of robot types, including soft robotic hands, rigid hands, 3D-printed robotic arms, and a rotating platform with no embedded sensors.

The system learns from the spatial-temporal visual data how motor commands translate into body movements. It models a Jacobian field describing local differential changes in robot geometry relative to actuator inputs. This allows predicting small changes in configuration for closed-loop control.

NJF builds an internal model of a robot's controllability by modeling how specific points deform or shift in response to action, creating a dense map. This internal model allows the robot to generalize motion across its body, even when data are noisy or incomplete.

The current version of NJF requires multiple cameras and must be re-done for each robot. However, the researchers envision a more accessible version of NJF in the future, where hobbyists could use their phones to record a robot's movements and create a control model.

It's worth noting that the system lacks force or tactile sensing, limiting its effectiveness on contact-rich tasks. Nevertheless, the capability of the hand comes from NJF, which figures out which motors control which parts of the robot without being programmed, a discovery that emerges naturally through learning.

The research was supported by the Solomon Buchsbaum Research Fund, an MIT Presidential Fellowship, the National Science Foundation, and the Gwangju Institute of Science and Technology. The open-access paper about the work was published in Nature on June 25.

With potential applications in various real-world scenarios such as agricultural tasks, construction sites, and dynamic environments, NJF represents a significant step forward in the field of robotics, particularly in soft and bio-inspired robots. By flipping the paradigm from "programming" to "teaching" robots through vision, NJF equips soft and compliant robots with a kind of bodily self-awareness, dramatically expanding their control capabilities in flexible, unpredictable environments without manual modeling or costly instrumentation.

The Visual-based system, Neural Jacobian Fields (NJF), developed by researchers at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL), utilizes technology in teaching soft robots through vision.
The NJF system, instead of traditional programming methods, focuses on learning robotic control through the physics of a robot's embodiment, specifically its three-dimensional geometry and sensitivity to control inputs.
In the realm of engineering, the NJF system infers control-motion relationships purely from visual data of the robot's behavior, using a single camera for control of the soft robotic hand at CSAIL.
The environment in which the NJF system operates varies, demonstrating robustness across a range of robot types, including soft robotic hands, rigid hands, 3D-printed robotic arms, and a rotating platform with no embedded sensors.
The learning process of NJF involves modeling a Jacobian field describing local differential changes in robot geometry relative to actuator inputs, which allows for predicting small changes in configuration for closed-loop control.
The future of NJF, according to the researchers, may include a more accessible version where hobbyists can create a control model using their phones to record a robot's movements, potentially expanding the field of robotics to undergraduate students and enthusiasts.
While the current version of NJF requires multiple cameras, the research in data-and-cloud-computing and artificial-intelligence is working towards enhancing the system for performances in contact-rich tasks, including agriculture and construction.
The breakthrough in NJF represents groundbreaking research in science, teaching robots through vision rather than programming, dramatically expanding their control capabilities in flexible, unpredictable environments without manual modeling or costly instrumentation.

Understanding one's physical form: A novel vision-based system trains robots to comprehend their own mechanical structures