ManiCast: Collaborative Manipulation with Cost-Aware Human Forecasting

CoRL 2023

Cornell University
Collaborative Manipulation Tasks

Our framework ManiCast, learns cost-aware human motion forecasts and plans with such forecasts for collaborative manipulation tasks.


Seamless human-robot manipulation in close proximity relies on accurate forecasts of human motion. While there has been significant progress in learning forecast models at scale, when applied to manipulation tasks, these models accrue high errors at critical transition points leading to degradation in downstream planning performance. Our key insight is that instead of predicting the most likely human motion, it is sufficient to produce forecasts that capture how future human motion would affect the cost of a robot's plan. We present ManiCast, a novel framework that learns cost-aware human forecasts and feeds them to a model predictive control planner to execute collaborative manipulation tasks. Our framework enables fluid, real-time interactions between a human and a 7-DoF robot arm across a number of real-world tasks such as reactive stirring, object handovers, and collaborative table setting. We evaluate both the motion forecasts and the end-to-end forecaster-planner system against a range of learned and heuristic baselines while additionally contributing new datasets.

Collaborative Manipulation Dataset (CoMaD)

We release a high-quality dataset collected using a motion capture system, consisting of two humans collaborating to perform daily household activities.

Explainer Video (1 min)

Forecasting Objective for Cost-Aware Planning

Optimizing the above objective directly is challenging since the cost function has non-differentiable components. We approximate this objective and introduce cost-awareness with two different strategies.

Strategy #1: Importance Sampling.

Identify "transitions" when the human comes into the robot's workspace and upsampling these regions in our dataset.

Strategy #2: Dimension Weighting.

The planning cost is sensitive to forecasting errors along certain dimensions of the human pose, so we upweight them.

ManiCast uses a cost-aware loss to optimize planning performance

Overview of our framework ManiCast, which learns cost-aware human motion forecasts and plans with such forecasts for collaborative manipulation tasks. At train time, we fine-tune pre-trained human motion forecasting models on task specific datasets by upsampling transition points and upweighting joint dimensions that dominate the cost of the robot's planned trajectory. At inference time, we feed these forecasts into a model predictive control (MPC) planner to compute robot plans that are reactive and keep a safe distance from the human.



title={ManiCast: Collaborative Manipulation with Cost-Aware Human Forecasting},
author={Kushal Kedia and Prithwish Dan and Atiksh Bhardwaj and Sanjiban Choudhury},
booktitle={7th Annual Conference on Robot Learning},