I am Yaowen Ye (or Elwin, or 叶耀文), a first-year CS PhD student at UC Berkeley advised by Prof. Jacob Steinhardt and Prof. Stuart Russell. I work on understanding the limitations of human oversight of AI systems and developing scalable approaches to address them. Recently, I've been thinking about:
- What emergent risks can arise when many agents are deployed in shared environments? Can existing oversight methods scale to address them?
- Can we predict which generalizations will emerge during LLM training, especially surprising or problematic ones? How might automated analysis of training data and model internals help?
- LLM chatbots operate in an unusual environment: their human users. How can we design incentives that discourage manipulating this environment for higher reward?
- How to make RL with imperfect reward functions robust? How to prevent reward hacking?
Feel free to reach out if you're interested in my research! I also enjoy mentoring, so if you are an undergrad and think my advice might be helpful, I'd be happy to connect.
Before joining Berkeley, I did my undergrad at The University of Hong Kong. During my undergrad, I also worked on cognitive reasoning, intuitive physics, learning on graphs, and recommender systems. I was fortunate to be advised by Prof. Yixin Zhu at PKU Cognitive Reasoning Lab and Prof. Chao Huang at HKU Data Intelligence Lab.
Links: [X] [Scholar] [Email] [Give me feedback!]