Focus areas
Technical AI alignment research
Certain types of technical research may decrease the chances that future AI systems become uncontrollable or otherwise pose catastrophic risks. For instance, mechanistic interpretability research aims to develop techniques for reaching a “gears-level” understanding of otherwise black-box AI systems. These techniques may uncover dangerous capabilities before it’s too late, or might enable us to design future AI systems with enhanced precision and intentionality, curbing unwanted behavior. Scalable oversight research involves developing methods for humans to continue to appropriately oversee AI systems, even as these systems achieve capabilities beyond the human range.
AI Policy
As AI advances, its path will undoubtedly be influenced by policies, including both laws passed by countries and voluntary corporate policies. Policy work aims to ensure that these policies appropriately guard against catastrophic risks. For instance, such work may involve developing methods for monitoring critical aspects of cutting-edge AI systems, or developing rules that discourage negligent or reckless use of cutting-edge AI models.
Building AI safety research capacity
Today, there are at least tens of thousands of researchers advancing AI capabilities, but less than a thousand researchers working specifically on AI safety and relevant policy issues. Field-building for AI safety and related issues can help decrease this disparity. Such work includes tactful outreach towards researchers who have the skills to work on these problems, as well as grants to specific individual researchers to spend time learning the necessary background to productively contribute to these fields.