Research
Foundations
Our research is grounded in the fundamentals of artificial intelligence, focusing on the key question of imitation learning — How can an agent learn new behaviors by observing and interacting with a teacher?
Imitation learning offers a simple yet scalable way to implicitly program agents through demonstrations, interventions, or preferences. This has widespread impacts across various disciplines ranging from teaching your home robot to make you a bowl of soup, to aligning large language models from human preferences, to teaching self-driving cars to drive more like humans.
We explore a diverse array of questions in our research:
- Efficient Inverse Reinforcement Learning: How can we design algorithms that are exponentially more efficient than reinforcement learning?
- Vision-Language Demonstrations How can we learn complex, long-horizon tasks from vision and language demonstrations?
- Suboptimal experts How do we learn from noisy, suboptimal experts?
- Human-Robot Teaming Behaviors How can we learn effective human-robot collaboration from human-human teams?
… and much more! Checkout some of our projects.
Applications
We test our ideas across a broad range of applications:
-
Everyday Robots: Our primary focus is building home robots that interact with everyday users to learn personalized tasks like collaborative cooking, cleaning and assembly.
-
Collaborative Games: Games are a fun way to learn how humans collaborate, and there’s lots of data! Through games, we explore new algorithms and architectures for effective human-robot collaboration.
-
Self-Driving: With industry partners Aurora, we develop ML models that enable safe, human-like driving.
Projects
|
Demo2Code: From Summarizing Demonstrations to Synthesizing Code via Extended Chain-of-Thought
Yuki Wang, Gonzalo Gonzalez-Pumariega, Yash Sharma, Sanjiban Choudhury Preprint , 2023 website / paper Demo2Code leverages LLMs to translate demonstrations to robot task code via an extended chain-of-thought that recursively summarizes demos to specification, and recursively expands specification to code. |
|
The Virtues of Laziness in Model-based RL: A Unified Objective and Algorithms
Anirudh Vemula, Yuda Song, Aarti Singh, J. Andrew Bagnell, Sanjiban Choudhury International Conference on Machine Learning (ICML), 2023 paper We propose a novel, lazy approach that addresses two fundamental challenges in Model-based Reinforcement Learning (MBRL): the computational expense of repeatedly finding a good policy in the learned model, and the objective mismatch between model fitting and policy computation. |
|
Inverse Reinforcement Learning without Reinforcement Learning
Gokul Swamy, Sanjiban Choudhury, J Andrew Bagnell, and Zhiwei Steven Wu International Conference on Machine Learning (ICML), 2023 website / paper We explore inverse reinforcement learning and show that leveraging the state distribution of the expert can significantly reduce the complexities of the RL problem, theoretically providing an exponential speedup and practically enhancing performance in continuous control tasks. |
|
Impossibly Good Experts and How to Follow Them
Aaron Walsman, Muru Zhang, Sanjiban Choudhury, Dieter Fox, Ali Farhadi International Conference on Learning Representations (ICLR), 2023 paper We investigate sequential decision making with "Impossibly Good" experts possessing privileged information, propose necessary criteria for an optimal policy recovery within limited information, and introduce a novel approach, ELF Distillation, outperforming baselines in Minigrid and Vizdoom environments. |
|
Sequence Model Imitation Learning with Unobserved Contexts
Gokul Swamy, Sanjiban Choudhury, Zhiwei Steven Wu, and J Andrew Bagnell Advances in Neural Information Processing Systems (NeurIPS), 2022 paper We study imitation learning when the expert has privileged information and show that on-policy algorithms provably learn to recover from their initially suboptimal actions, while off-policy methods naively repeat the past action. |
|
Minimax optimal online imitation learning via replay estimation
Gokul Swamy, Nived Rajaraman, Matt Peng, Sanjiban Choudhury, J Bagnell, Steven Z Wu, Jiantao Jiao, Kannan Ramchandran Advances in Neural Information Processing Systems (NeurIPS), 2022 paper Imitation learning from noisy experts leads to biased policies! Replay estimation fixes this by smoothing the expert by repeatedly executing cached expert actions in a stochastic simulator and imitating that. |
|
Towards Uniformly Superhuman Autonomy via Subdominance Minimization
Brian Ziebart, Sanjiban Choudhury, Xinyan Yan, and Paul Vernaza International Conference on Machine Learning (ICML), 2022 paper We look at imitation learning where the demonstrators have varying quality and seek to induce behavior that is unambiguously better (i.e., Pareto dominant or minimally subdominant) than all human demonstrations. |
|
Of Moments and Matching: Trade-offs and Treatments in Imitation Learning
Gokul Swamy, Sanjiban Choudhury, Zhiwei Steven Wu, and J Andrew Bagnell International Conference on Machine Learning (ICML), 2021 project page / paper / video / code All of imitation learning can be reduced to a game between a learner (generator) and a value function (discriminator) where the payoff is the performance difference between learner and expert. |
|
Blending MPC & Value Function Approximation for Efficient Reinforcement Learning
Mohak Bhardwaj, Sanjiban Choudhury, and Byron Boots International Conference on Learning Representations (ICLR), 2021 paper Blend model predictive control (MPC) with learned value estimates to trade-off MPC model errors with learner approximation errors. |
|
Feedback in Imitation Learning: The Three Regimes of Covariate Shift
Jonathan Spencer, Sanjiban Choudhury, Arun Venkatraman, Brian Ziebart, and J Andrew Bagnell arXiv preprint arXiv:2102.02872, 2021 paper / talk Not all imitation learning problems are alike -- some are easy (do behavior cloning), some are hard (call interactive expert), and some are just right (just need a simulator). |
|
Bayesian Residual Policy Optimization: Scalable Bayesian Reinforcement Learning with Clairvoyant Experts
Gilwoo Lee, Brian Hou, Sanjiban Choudhury and Siddhartha S. Srinivasa IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021 paper / talk In Bayesian RL, while solving the belief MDP is hard, solving individual latent MDP is easy. Combine value functions from each MDP along with a learned residual belief policy. |
|
Guided Incremental Local Densification for Accelerated Sampling-based Motion Planning
Aditya Mandalika, Rosario Scalise, Brian Hou, Sanjiban Choudhury, Siddhartha S. Srinivasa IEEE International Conference on Robotics and Automation (ICRA), 2023 paper Instead of sampling from an informed set ellipse (high recall, low precision), samples from sets of sub-ellipses (lower recall, higher precision). |
|
Imitation Learning as f-Divergence Minimization
Liyiming Ke, Sanjiban Choudhury, Matt Barnes, Wen Sun, Gilwoo Lee and Siddhartha Srinivasa Workshop on the Algorithmic Foundations of Robotics (WAFR), 2020 paper Many old (and new!) imitation learning algorithms are simply minimizing various f-divergences estimates between the expert and the learner trajectory distributions. |
|
Learning from Interventions: Human-robot interaction as both explicit and implicit feedback
Jonathan Spencer, Sanjiban Choudhury, Matt Barnes and Siddhartha Srinivasa Robotics: Science and Systems (RSS), 2020 paper / talk How can we learn from human interventions? Every intervention reveals some information about expert's implicit value function. Infer this function and optimize it. |
|
Learning Online from Corrective Feedback: A Meta-Algorithm for Robotics
Matthew Schmittle, Sanjiban Choudhury, and Siddhartha Srinivasa arXiv preprint arXiv:2104.01021, 2020 paper We can model multi-modal feedback from human (demonstrations, interventions, verbal) as a stream of losses that can be minimized using any no-regret online learning algorithm. |
|
Toward fieldable human-scale mobile manipulation using RoMan
C. Kessens, J. Fink, A. Hurwitz, M. Kaplan, P. R. Osteen, T. Rocks, J. Rogers, E. Stump, L. Quang, M. DiBlasi, M. Gonzalez, D. Patel, J. Patel, S. Patel, M. Weiker, J. Bowkett, R. Detry, S. Karumanchi, J. Burdick, L. Matthies, Y. Oza, A. Agarwal, A. Dornbush, M. Likhachev, K. Schmeckpeper, K. Daniilidis, A. Kamat, S. Choudhury, A. Mandalika, S. Srinivasa Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications II, 2020 paper A full-stack mobile manipulator that autonomously navigates and manipulates objects in the wild. |
|
Posterior Sampling for Anytime Motion Planning on Graphs with Expensive-to-Evaluate Edges
Brian Hou, Sanjiban Choudhury, Gilwoo Lee, Aditya Mandalika, and Siddhartha Srinivasa IEEE International Conference on Robotics and Automation (ICRA), 2020 paper / video Anytime motion planning can be viewed through a Bayesian lens where we are initially uncertain about the shortest path, and must probe the environment to progressively yield shorter and shorter paths. |
|
Generalized Lazy Search for Robot Motion Planning: Interleaving Search and Edge Evaluation via Event-based Toggles
Aditya Mandalika, Sanjiban Choudhury, Oren Salzman and Siddhartha Srinivasa International Conference on Automated Planning and Scheduling (ICAPS), 2019 Best Student Paper Award paper / long paper Unified framework for interleaving search and edge evaluation to provably minimize total planning time. |
|
The Blindfolded Robot : A Bayesian Approach to Planning with Contact Feedback
Brad Saund, Sanjiban Choudhury, Siddhartha Srinivasa, and Dmitry Berenson. International Symposium on Robotics Research (ISRR), 2019 paper / video Casts manipulation under occlusion as a search on a graph where feasibility of an edge is only revealed when an agent attempts to traverse it. Use Bayesian prior to explore exploit. |
|
Leveraging Experience in Lazy Search
Mohak Bhardwaj, Sanjiban Choudhury, Byron Boots and Siddhartha Srinivasa Robotics: Science and Systems (RSS), 2019 paper The laziest search is one that checks the minimal number of edges to eliminate all potential shortest paths. We use imitation learning to imitate such oracles to learn truly lazy planners. |
|
LEGO: Leveraging Experience in Roadmap Generation for Sampling-Based Planning
Rahul Kumar, Aditya Mandalika, Sanjiban Choudhury and Siddhartha S. Srinivasa IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019 paper Learn a sampling distribution that creates graphs that are sparse (for speedy search) but with nodes carefully placed at bottleneck regions (to cover optimal paths). |
|
Bayesian Policy Optimization for Model Uncertainty
Gilwoo Lee, Brian Hou, Aditya Mandalika Vamsikrishna, Jeongseok Lee, Sanjiban Choudhury, Siddhartha S. Srinivasa International Conference on Learning Representations (ICLR), 2019 paper Learn a policy that directly maps state and belief over MDPs to action. Leverages the fact that belief can be compressed significantly. |
|
Bayes-CPACE: PAC Optimal Exploration in Continuous Space Bayes-Adaptive Markov Decision Processes
Gilwoo Lee, Sanjiban Choudhury, Brian Hou, and Siddhartha Srinivasa arXiv preprint arXiv:1810.03048, 2018 paper Computes a near-optimal value function by covering the continuous state-belief-action space with a finite set of representative samples and exploiting the Lipschitz continuity of the value function |
|
Autonomous Aerial Cinematography In Unstructured Environments With Learned Artistic Decision-Making
Rogerio Bonatti, Wenshan Wang, Cherie Ho, Aayush Ahuja, Mirko Gschwindt, Efe Camci, Erdal Kayacan, Sanjiban Choudhury and Sebastian Scherer Journal of Field Robotics (JFR), 2019 JFR / IROS'19 / ISER'19 / video 1 / video 2 / video 3 A selfie drone that can film a target moving in a cluttered environment with almost no prior information. |
|
Bayesian Active Edge Evaluation
Sanjiban Choudhury, Siddhartha Srinivasa and Sebastian Scherer International Joint Conferences on Artificial Intelligence (IJCAI), 2018 paper / blue sky Given a prior over possible worlds, learn a decision tree to near-optimally collpase uncertainty to compute a feasible path. |
|
Near-Optimal Edge Evaluation in Explicit Generalized
Binomial Graphs
Sanjiban Choudhury, Shervin Javdani, Siddhartha Srinivasa and Sebastian Scherer Neural Information Processing Systems (NeurIPS), 2017 paper / talk Given a graph with N edges that could independently be 0/1, and a prior belief, check an optimal number of edges till you find a feasible path. |
|
Data-driven Planning via Imitation Learning
Sanjiban Choudhury, Mohak Bhardwaj, Sankalp Arora, Ashish Kapoor, Gireeja Ranade, Sebastian Scherer, Debadeepta Dey The International Journal of Robotics Research (IJRR), 2018 Finalist for Best Paper of the Year paper Train planners (that operate on partial information) to imitate clairvoyant planners (that have full information) to choose optimal planning decisions. (applies to heuristic search, exploration planning, etc) |
|
Learning Heuristic Search via Imitation
Mohak Bhardwaj, Sanjiban Choudhury, Sebastian Scherer Conference on Robot Learning (CoRL), 2017, Oral (8%) paper / video Search algorithms use heuristics to balance exploration, i.e., discovering promising new states, and exploitation, i.e., expanding the current best state. Learn heuristics by imitating optimal planners. |
|
Adaptive Information Gathering via Imitation Learning
Sanjiban Choudhury, Ashish Kapoor, Gireeja Ranade, Sebastian Scherer, Debadeepta Dey Robotics Science and Systems (RSS), 2017 paper POMDPs are hard. But MDPs are relatively easy. Train POMDP policies by imitating MPD oracles to get good, and sometimes near-optimal, POMDP policies. |
|
Learning to Gather Information via Imitation
Sanjiban Choudhury, Ashish Kapoor, Gireeja Ranade, Debadeepta Dey IEEE International Conference on Robotics and Automation (ICRA), 2017 paper How efficiently a robot can map a new area depends on the geometry of the world. We show how learning can be leveraged to design more efficient information gathering policies. |
|
Densification Strategies for Anytime Motion Planning over Large Dense Roadmaps
Shushman Choudhury, Oren Salzman, Sanjiban Choudhury, Siddhartha Srinivasa IEEE International Conference on Robotics and Automation (ICRA), 2017 paper long paper Anytime motion planning by -- select a subgraph, search it, and use the results to select a better subgraph till the shortest path on the original dense graph is found. |
|
Regionally Accelerated Batch Informed Trees (RABIT*): A Framework to Integrate Local Information into Optimal Path Planning
Sanjiban Choudhury, Jonathan D. Gammell, Timothy D. Barfoot, Siddhartha Srinivasa, Sebastian Scherer IEEE International Conference on Robotics and Automation (ICRA), 2016 paper Interleave search and optimization by applying CHOMP to only a subset of promising edges in a BIT* search tree. |
|
List Prediction Applied To Motion Planning
Abhijeet Tallavajhula, Sanjiban Choudhury, Sebastian Scherer, Alonzo Kelly IEEE International Conference on Robotics and Automation (ICRA), 2016 paper Train a learner to produce a diverse set of planner options (initializations, heuristics, hyperparameters) that can be run in parallel such that at least one has good performance. |
|
Theoretical Limits of Speed and Resolution for Kinodynamic
Planning in a Poisson
Forest
Sanjiban Choudhury, Sebastian Scherer and J. Andrew Bagnell Robotics Science and Systems (RSS), 2015 paper / long paper How fast can a drone fly in a forest even if it knew the location of every single tree? We answer this question with the help of percolation theory on random graphs. |
|
The Planner Ensemble and Trajectory Executive: A High
Performance Motion
Planning System with Guaranteed Safety
Sanjiban Choudhury, Sankalp Arora and Sebastian Scherer American Helicopter Society (AHS) 70th Annual Forum, 2014 Best Paper Award paper / long paper / Clip 1 / Clip 2 / Clip 3 / Clip 4 / Clip 5 The first approach to planning safe, real-time trajectories for a full-scale, autonomous helicopter from takeoff to landing, with more than 700 flight test hours. |
|
The Planner Ensemble: Motion Planning by Executing Diverse
Algorithms
Sanjiban Choudhury, Sankalp Arora and Sebastian Scherer IEEE International Conference on Robotics and Automation (ICRA) , 2015 paper Can a single planner solve all planning problems? Hedge our bets and predict an ensemble of diverse planners that can be run in parallel such that at least one solves the problem. |
|
The Dynamics Projection Filter (DPF) – Real-Time Nonlinear
Trajectory Optimization Using Projection Operators
Sanjiban Choudhury and Sebastian Scherer IEEE International Conference on Robotics and Automation (ICRA) , 2015 paper A nonlinear projection operator as a control Lyapunov function that takes an optimized workspace trajectory and projects it to a configuration space trajectory with guarantees on sub-optimality. |
|
Autonomous Exploration and Motion Planning for an Unmanned Aerial Vehicle Navigating Rivers
Stephen T. Nuske, Sanjiban Choudhury, Sezal Jain, Andrew D. Chambers, Luke Yoder, Sebastian Scherer, Lyle J. Chamberlain, Hugh Cover and Sanjiv Singh Journal of Field Robotics (JFR), 2015 paper / video A fully autonomous UAV that can map riverines stretching over several hundred-meters of tight winding rivers. </td> </tr> |
|
Sparse Tangential Network (SPARTAN): Motion Planning for Micro Aerial Vehicles
Hugh Cover, Sanjiban Choudhury, Sebastian Scherer and Sanjiv Singh IEEE International Conference on Robotics and Automation (ICRA) , 2013 paper / long paper / video A fast, 3D sparse visibility graph that is orders of magnitude faster than sampling-based or discrete search. Key idea -- shortest path is a geodesic that can only deviate around surface normals. </td> </tr> |
|
Autonomous Emergency Landing of a Helicopter: Motion Planning with Hard Time-Constraints
Sanjiban Choudhury, Sebastian Scherer and Sanjiv Singh American Helicopter Society (AHS) Forum 69 , 2013 paper A planning system that lands a helicopter safely when it's engines fail -- chooses a set of safe landing sites, plans a diverse set of routes, and controls vehicle to touchdown. |
|
RRT*-AR: Sampling-Based Alternate Routes Planning with Applications to Autonomous Emergency Landing of a Helicopter
Sanjiban Choudhury, Sebastian Scherer and Sanjiv Singh IEEE International Conference on Robotics and Automation (ICRA) , 2013 paper / tech report / short video / long video Find multiple, diverse, near-optimal solutions to a planning problem. Define equivalence classes in path space and not allow the search tree to have 2 paths in the same class. |