Systems 1 and 2 for robot learning
Time/Location:
3–4 PM on Wednesday April 10th (reception to follow)
BA 5187
Humans can successfully handle both easy (mundane) and hard (new and rare) tasks simply by thinking harder and being more focused. In contrast, today's robots spend a fixed amount of compute in both familiar and rare tasks, that lie inside and far from the training distribution, respectively, and do not have a way to recover once their fixed-compute inferences fail. How can we develop robots that think harder and do better on demand?
In this talk, we will marry today's generative models and traditional evolutionary search and 3D scene representations to enable better generalization of robot policies, and the ability to test-time think through difficult scenarios, akin to a robot system 2 reasoning. We will discuss learning behaviours through language instructions and corrections from both humans and vision-language foundational models that shape the robots' reward functions on-the-fly, and help us automate robot training data collection in the simulator and in the real world. The models we will present achieve state-of-the-art performance in RLbench, Calvin, nuPlan, Teach, and Scannet++, which are established benchmarks for manipulation, driving, embodied dialogue understanding and 3D scene understanding.
Bio:
Katerina Fragkiadaki is an Assistant Professor in the Machine Learning Department in Carnegie Mellon University. She received her undergraduate diploma from Electrical and Computer Engineering in the National Technical University of Athens. She received her Ph.D. from University of Pennsylvania and was a postdoctoral fellow in UC Berkeley and Google research after that. Her work focuses on combining forms of common sense reasoning, such as spatial understanding and 3D scene understanding, with deep visuomotor learning. The goal of her work is to enable few-shot learning and continual learning for perception, action and language grounding. Her group develops methods for computer vision for mobile agents, 2D and 3D visual parsing, 2D-to-3D perception, vision-language grounding, learning of object dynamics, navigation and manipulation policies. Pioneering innovations of her group’s research include 2D-to-3D geometry-aware neural networks for 3D understanding from 2D video streams, analogy-forming networks for memory-augmented few-shot visual parsing, and language-grounding in 2D and 3D scenes with bottom-up and top-down attention. Her work has been awarded with a best Ph.D. thesis award, an NSF CAREER award, AFOSR Young Investigator award, a DARPA Young Investigator award, Google, TRI, Amazon, UPMC and Sony faculty research awards. She is a program chair for ICLR 2024.