Research

Foundations

Our research is grounded in the fundamentals of artificial intelligence, focusing on the key question of imitation learningHow can an agent learn new behaviors by observing and interacting with a teacher?

Imitation learning offers a simple yet scalable way to implicitly program agents through demonstrations, interventions, or preferences. This has widespread impacts across various disciplines ranging from teaching your home robot to make you a bowl of soup, to aligning large language models from human preferences, to teaching self-driving cars to drive more like humans.

We explore a diverse array of questions in our research:

  • Efficient Inverse Reinforcement Learning: How can we design algorithms that are exponentially more efficient than reinforcement learning?
  • Vision-Language Demonstrations How can we learn complex, long-horizon tasks from vision and language demonstrations?
  • Suboptimal experts How do we learn from noisy, suboptimal experts?
  • Human-Robot Teaming Behaviors How can we learn effective human-robot collaboration from human-human teams?

… and much more! Checkout some of our projects.

Applications

We test our ideas across a broad range of applications:

  1. Everyday Robots: Our primary focus is building home robots that interact with everyday users to learn personalized tasks like collaborative cooking, cleaning and assembly.

  2. Collaborative Games: Games are a fun way to learn how humans collaborate, and there’s lots of data! Through games, we explore new algorithms and architectures for effective human-robot collaboration.

  3. Self-Driving: With industry partners Aurora, we develop ML models that enable safe, human-like driving.

Projects


2023_demo2code

Demo2Code: From Summarizing Demonstrations to Synthesizing Code via Extended Chain-of-Thought
Yuki Wang, Gonzalo Gonzalez-Pumariega, Yash Sharma, Sanjiban Choudhury
Preprint , 2023
website / paper

Demo2Code leverages LLMs to translate demonstrations to robot task code via an extended chain-of-thought that recursively summarizes demos to specification, and recursively expands specification to code.


2023_lamps

The Virtues of Laziness in Model-based RL: A Unified Objective and Algorithms
Anirudh Vemula, Yuda Song, Aarti Singh, J. Andrew Bagnell, Sanjiban Choudhury
International Conference on Machine Learning (ICML), 2023
paper

We propose a novel, lazy approach that addresses two fundamental challenges in Model-based Reinforcement Learning (MBRL): the computational expense of repeatedly finding a good policy in the learned model, and the objective mismatch between model fitting and policy computation.


2022_filter

Inverse Reinforcement Learning without Reinforcement Learning
Gokul Swamy, Sanjiban Choudhury, J Andrew Bagnell, and Zhiwei Steven Wu
International Conference on Machine Learning (ICML), 2023
website / paper

We explore inverse reinforcement learning and show that leveraging the state distribution of the expert can significantly reduce the complexities of the RL problem, theoretically providing an exponential speedup and practically enhancing performance in continuous control tasks.


2021_feedback_il

Impossibly Good Experts and How to Follow Them
Aaron Walsman, Muru Zhang, Sanjiban Choudhury, Dieter Fox, Ali Farhadi
International Conference on Learning Representations (ICLR), 2023
paper

We investigate sequential decision making with "Impossibly Good" experts possessing privileged information, propose necessary criteria for an optimal policy recovery within limited information, and introduce a novel approach, ELF Distillation, outperforming baselines in Minigrid and Vizdoom environments.


2022_sequence_il

Sequence Model Imitation Learning with Unobserved Contexts
Gokul Swamy, Sanjiban Choudhury, Zhiwei Steven Wu, and J Andrew Bagnell
Advances in Neural Information Processing Systems (NeurIPS), 2022
paper

We study imitation learning when the expert has privileged information and show that on-policy algorithms provably learn to recover from their initially suboptimal actions, while off-policy methods naively repeat the past action.


2022_minimax

Minimax optimal online imitation learning via replay estimation
Gokul Swamy, Nived Rajaraman, Matt Peng, Sanjiban Choudhury, J Bagnell, Steven Z Wu, Jiantao Jiao, Kannan Ramchandran
Advances in Neural Information Processing Systems (NeurIPS), 2022
paper

Imitation learning from noisy experts leads to biased policies! Replay estimation fixes this by smoothing the expert by repeatedly executing cached expert actions in a stochastic simulator and imitating that.


2022_sequence_il

Towards Uniformly Superhuman Autonomy via Subdominance Minimization
Brian Ziebart, Sanjiban Choudhury, Xinyan Yan, and Paul Vernaza
International Conference on Machine Learning (ICML), 2022
paper

We look at imitation learning where the demonstrators have varying quality and seek to induce behavior that is unambiguously better (i.e., Pareto dominant or minimally subdominant) than all human demonstrations.


2021_moment_matching

Of Moments and Matching: Trade-offs and Treatments in Imitation Learning
Gokul Swamy, Sanjiban Choudhury, Zhiwei Steven Wu, and J Andrew Bagnell
International Conference on Machine Learning (ICML), 2021
project page / paper / video / code

All of imitation learning can be reduced to a game between a learner (generator) and a value function (discriminator) where the payoff is the performance difference between learner and expert.


2021_blended_mpc

Blending MPC & Value Function Approximation for Efficient Reinforcement Learning
Mohak Bhardwaj, Sanjiban Choudhury, and Byron Boots
International Conference on Learning Representations (ICLR), 2021
paper

Blend model predictive control (MPC) with learned value estimates to trade-off MPC model errors with learner approximation errors.


2021_feedback_il

Feedback in Imitation Learning: The Three Regimes of Covariate Shift
Jonathan Spencer, Sanjiban Choudhury, Arun Venkatraman, Brian Ziebart, and J Andrew Bagnell
arXiv preprint arXiv:2102.02872, 2021
paper / talk

Not all imitation learning problems are alike -- some are easy (do behavior cloning), some are hard (call interactive expert), and some are just right (just need a simulator).


2021_brpo

Bayesian Residual Policy Optimization: Scalable Bayesian Reinforcement Learning with Clairvoyant Experts
Gilwoo Lee, Brian Hou, Sanjiban Choudhury and Siddhartha S. Srinivasa
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021
paper / talk

In Bayesian RL, while solving the belief MDP is hard, solving individual latent MDP is easy. Combine value functions from each MDP along with a learned residual belief policy.


2021_guild

Guided Incremental Local Densification for Accelerated Sampling-based Motion Planning
Aditya Mandalika, Rosario Scalise, Brian Hou, Sanjiban Choudhury, Siddhartha S. Srinivasa
IEEE International Conference on Robotics and Automation (ICRA), 2023
paper

Instead of sampling from an informed set ellipse (high recall, low precision), samples from sets of sub-ellipses (lower recall, higher precision).


2019_fdiv_il

Imitation Learning as f-Divergence Minimization
Liyiming Ke, Sanjiban Choudhury, Matt Barnes, Wen Sun, Gilwoo Lee and Siddhartha Srinivasa
Workshop on the Algorithmic Foundations of Robotics (WAFR), 2020
paper

Many old (and new!) imitation learning algorithms are simply minimizing various f-divergences estimates between the expert and the learner trajectory distributions.


2019_eil

Learning from Interventions: Human-robot interaction as both explicit and implicit feedback
Jonathan Spencer, Sanjiban Choudhury, Matt Barnes and Siddhartha Srinivasa
Robotics: Science and Systems (RSS), 2020
paper / talk

How can we learn from human interventions? Every intervention reveals some information about expert's implicit value function. Infer this function and optimize it.


2021_coactive

Learning Online from Corrective Feedback: A Meta-Algorithm for Robotics
Matthew Schmittle, Sanjiban Choudhury, and Siddhartha Srinivasa
arXiv preprint arXiv:2104.01021, 2020
paper

We can model multi-modal feedback from human (demonstrations, interventions, verbal) as a stream of losses that can be minimized using any no-regret online learning algorithm.


2019_roman_short_advert

Toward fieldable human-scale mobile manipulation using RoMan
C. Kessens, J. Fink, A. Hurwitz, M. Kaplan, P. R. Osteen, T. Rocks, J. Rogers, E. Stump, L. Quang, M. DiBlasi, M. Gonzalez, D. Patel, J. Patel, S. Patel, M. Weiker, J. Bowkett, R. Detry, S. Karumanchi, J. Burdick, L. Matthies, Y. Oza, A. Agarwal, A. Dornbush, M. Likhachev, K. Schmeckpeper, K. Daniilidis, A. Kamat, S. Choudhury, A. Mandalika, S. Srinivasa
Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications II, 2020
paper

A full-stack mobile manipulator that autonomously navigates and manipulates objects in the wild.


2020_psmp

Posterior Sampling for Anytime Motion Planning on Graphs with Expensive-to-Evaluate Edges
Brian Hou, Sanjiban Choudhury, Gilwoo Lee, Aditya Mandalika, and Siddhartha Srinivasa
IEEE International Conference on Robotics and Automation (ICRA), 2020
paper / video

Anytime motion planning can be viewed through a Bayesian lens where we are initially uncertain about the shortest path, and must probe the environment to progressively yield shorter and shorter paths.


2019_gls

Generalized Lazy Search for Robot Motion Planning: Interleaving Search and Edge Evaluation via Event-based Toggles
Aditya Mandalika, Sanjiban Choudhury, Oren Salzman and Siddhartha Srinivasa
International Conference on Automated Planning and Scheduling (ICAPS), 2019
Best Student Paper Award
paper / long paper

Unified framework for interleaving search and edge evaluation to provably minimize total planning time.


2019_btp

The Blindfolded Robot : A Bayesian Approach to Planning with Contact Feedback
Brad Saund, Sanjiban Choudhury, Siddhartha Srinivasa, and Dmitry Berenson.
International Symposium on Robotics Research (ISRR), 2019
paper / video

Casts manipulation under occlusion as a search on a graph where feasibility of an edge is only revealed when an agent attempts to traverse it. Use Bayesian prior to explore exploit.


2019_lsp

Leveraging Experience in Lazy Search
Mohak Bhardwaj, Sanjiban Choudhury, Byron Boots and Siddhartha Srinivasa
Robotics: Science and Systems (RSS), 2019
paper

The laziest search is one that checks the minimal number of edges to eliminate all potential shortest paths. We use imitation learning to imitate such oracles to learn truly lazy planners.


2019_lego

LEGO: Leveraging Experience in Roadmap Generation for Sampling-Based Planning
Rahul Kumar, Aditya Mandalika, Sanjiban Choudhury and Siddhartha S. Srinivasa
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019
paper

Learn a sampling distribution that creates graphs that are sparse (for speedy search) but with nodes carefully placed at bottleneck regions (to cover optimal paths).


2018_bpo

Bayesian Policy Optimization for Model Uncertainty
Gilwoo Lee, Brian Hou, Aditya Mandalika Vamsikrishna, Jeongseok Lee, Sanjiban Choudhury, Siddhartha S. Srinivasa
International Conference on Learning Representations (ICLR), 2019
paper

Learn a policy that directly maps state and belief over MDPs to action. Leverages the fact that belief can be compressed significantly.


2018_bcpace

Bayes-CPACE: PAC Optimal Exploration in Continuous Space Bayes-Adaptive Markov Decision Processes
Gilwoo Lee, Sanjiban Choudhury, Brian Hou, and Siddhartha Srinivasa
arXiv preprint arXiv:1810.03048, 2018
paper

Computes a near-optimal value function by covering the continuous state-belief-action space with a finite set of representative samples and exploiting the Lipschitz continuity of the value function


2018_cinematography

Autonomous Aerial Cinematography In Unstructured Environments With Learned Artistic Decision-Making
Rogerio Bonatti, Wenshan Wang, Cherie Ho, Aayush Ahuja, Mirko Gschwindt, Efe Camci, Erdal Kayacan, Sanjiban Choudhury and Sebastian Scherer
Journal of Field Robotics (JFR), 2019
JFR / IROS'19 / ISER'19 / video 1 / video 2 / video 3

A selfie drone that can film a target moving in a cluttered environment with almost no prior information.


2018_direct

Bayesian Active Edge Evaluation
Sanjiban Choudhury, Siddhartha Srinivasa and Sebastian Scherer
International Joint Conferences on Artificial Intelligence (IJCAI), 2018
paper / blue sky

Given a prior over possible worlds, learn a decision tree to near-optimally collpase uncertainty to compute a feasible path.


2017_bisect

Near-Optimal Edge Evaluation in Explicit Generalized Binomial Graphs
Sanjiban Choudhury, Shervin Javdani, Siddhartha Srinivasa and Sebastian Scherer
Neural Information Processing Systems (NeurIPS), 2017
paper / talk

Given a graph with N edges that could independently be 0/1, and a prior belief, check an optimal number of edges till you find a feasible path.


2018_data_driven_planning

Data-driven Planning via Imitation Learning
Sanjiban Choudhury, Mohak Bhardwaj, Sankalp Arora, Ashish Kapoor, Gireeja Ranade, Sebastian Scherer, Debadeepta Dey
The International Journal of Robotics Research (IJRR), 2018
Finalist for Best Paper of the Year
paper

Train planners (that operate on partial information) to imitate clairvoyant planners (that have full information) to choose optimal planning decisions. (applies to heuristic search, exploration planning, etc)


2017_sail

Learning Heuristic Search via Imitation
Mohak Bhardwaj, Sanjiban Choudhury, Sebastian Scherer
Conference on Robot Learning (CoRL), 2017, Oral (8%)
paper / video

Search algorithms use heuristics to balance exploration, i.e., discovering promising new states, and exploitation, i.e., expanding the current best state. Learn heuristics by imitating optimal planners.


2017_adaptive_info

Adaptive Information Gathering via Imitation Learning
Sanjiban Choudhury, Ashish Kapoor, Gireeja Ranade, Sebastian Scherer, Debadeepta Dey
Robotics Science and Systems (RSS), 2017
paper

POMDPs are hard. But MDPs are relatively easy. Train POMDP policies by imitating MPD oracles to get good, and sometimes near-optimal, POMDP policies.


2016_explore

Learning to Gather Information via Imitation
Sanjiban Choudhury, Ashish Kapoor, Gireeja Ranade, Debadeepta Dey
IEEE International Conference on Robotics and Automation (ICRA), 2017
paper

How efficiently a robot can map a new area depends on the geometry of the world. We show how learning can be leveraged to design more efficient information gathering policies.


2016_densification

Densification Strategies for Anytime Motion Planning over Large Dense Roadmaps
Shushman Choudhury, Oren Salzman, Sanjiban Choudhury, Siddhartha Srinivasa
IEEE International Conference on Robotics and Automation (ICRA), 2017
paper long paper

Anytime motion planning by -- select a subgraph, search it, and use the results to select a better subgraph till the shortest path on the original dense graph is found.


2016_rabitstar

Regionally Accelerated Batch Informed Trees (RABIT*): A Framework to Integrate Local Information into Optimal Path Planning
Sanjiban Choudhury, Jonathan D. Gammell, Timothy D. Barfoot, Siddhartha Srinivasa, Sebastian Scherer
IEEE International Conference on Robotics and Automation (ICRA), 2016
paper

Interleave search and optimization by applying CHOMP to only a subset of promising edges in a BIT* search tree.


2016_pete

List Prediction Applied To Motion Planning
Abhijeet Tallavajhula, Sanjiban Choudhury, Sebastian Scherer, Alonzo Kelly
IEEE International Conference on Robotics and Automation (ICRA), 2016
paper

Train a learner to produce a diverse set of planner options (initializations, heuristics, hyperparameters) that can be run in parallel such that at least one has good performance.


2015_percolation

Theoretical Limits of Speed and Resolution for Kinodynamic Planning in a Poisson Forest
Sanjiban Choudhury, Sebastian Scherer and J. Andrew Bagnell
Robotics Science and Systems (RSS), 2015
paper / long paper

How fast can a drone fly in a forest even if it knew the location of every single tree? We answer this question with the help of percolation theory on random graphs.


2013_2017_aacus

The Planner Ensemble and Trajectory Executive: A High Performance Motion Planning System with Guaranteed Safety
Sanjiban Choudhury, Sankalp Arora and Sebastian Scherer
American Helicopter Society (AHS) 70th Annual Forum, 2014
Best Paper Award
paper / long paper / Clip 1 / Clip 2 / Clip 3 / Clip 4 / Clip 5

The first approach to planning safe, real-time trajectories for a full-scale, autonomous helicopter from takeoff to landing, with more than 700 flight test hours.


2015_pete

The Planner Ensemble: Motion Planning by Executing Diverse Algorithms
Sanjiban Choudhury, Sankalp Arora and Sebastian Scherer
IEEE International Conference on Robotics and Automation (ICRA) , 2015
paper

Can a single planner solve all planning problems? Hedge our bets and predict an ensemble of diverse planners that can be run in parallel such that at least one solves the problem.


2015_dpf

The Dynamics Projection Filter (DPF) – Real-Time Nonlinear Trajectory Optimization Using Projection Operators
Sanjiban Choudhury and Sebastian Scherer
IEEE International Conference on Robotics and Automation (ICRA) , 2015
paper

A nonlinear projection operator as a control Lyapunov function that takes an optimized workspace trajectory and projects it to a configuration space trajectory with guarantees on sub-optimality.


2015_riverine

Autonomous Exploration and Motion Planning for an Unmanned Aerial Vehicle Navigating Rivers
Stephen T. Nuske, Sanjiban Choudhury, Sezal Jain, Andrew D. Chambers, Luke Yoder, Sebastian Scherer, Lyle J. Chamberlain, Hugh Cover and Sanjiv Singh
Journal of Field Robotics (JFR), 2015
paper / video

A fully autonomous UAV that can map riverines stretching over several hundred-meters of tight winding rivers. </td> </tr>


2013_spartan

Sparse Tangential Network (SPARTAN): Motion Planning for Micro Aerial Vehicles
Hugh Cover, Sanjiban Choudhury, Sebastian Scherer and Sanjiv Singh

IEEE International Conference on Robotics and Automation (ICRA) , 2013
paper / long paper / video

A fast, 3D sparse visibility graph that is orders of magnitude faster than sampling-based or discrete search. Key idea -- shortest path is a geodesic that can only deviate around surface normals. </td> </tr>


2012_emergency.jpg

Autonomous Emergency Landing of a Helicopter: Motion Planning with Hard Time-Constraints
Sanjiban Choudhury, Sebastian Scherer and Sanjiv Singh
American Helicopter Society (AHS) Forum 69 , 2013
paper

A planning system that lands a helicopter safely when it's engines fail -- chooses a set of safe landing sites, plans a diverse set of routes, and controls vehicle to touchdown.


2012_el

RRT*-AR: Sampling-Based Alternate Routes Planning with Applications to Autonomous Emergency Landing of a Helicopter
Sanjiban Choudhury, Sebastian Scherer and Sanjiv Singh
IEEE International Conference on Robotics and Automation (ICRA) , 2013
paper / tech report / short video / long video

Find multiple, diverse, near-optimal solutions to a planning problem. Define equivalence classes in path space and not allow the search tree to have 2 paths in the same class.