Last summer, we ran the first iteration of the PIBBSS Summer Research Fellowship. In this post, we share some reflections on how the program went.
Note that this post deals mostly with high-level reflections and isn’t maximally comprehensive. It primarily focusses on information we think might be relevant for other people and initiatives in this space. We also do not discuss specific research outputs produced by fellows within the scope of this post. Further, there are some details that we may not covered in this post for privacy reasons.
How to navigate this post:
PIBBSS (Principles of Intelligent Behavior in Biological and Social Systems) aims to facilitate research studying parallels between intelligent behavior in natural and artificial systems, and to leverage these insights towards the goal of building safe and aligned AI.
To this purpose, we organized a 3-month Summer Research Fellowship bringing together scholars with graduate-level research experience (or equivalent) from a wide range of relevant disciplines to work on research projects under the mentorship of experienced AI alignment researchers. The disciplines of interest included fields as diverse as the brain sciences; evolutionary biology, systems biology and ecology; statistical mechanics and complex systems studies; economic, legal and political theory; philosophy of science; and more.
This approach broadly -- the PIBBSS bet -- is something we think is a valuable frontier for expanding the scientific and philosophical enquiry on AI risk and the alignment problem. In particular, this aspires to bring in more empirical and conceptual grounding to thinking about advanced AI systems. It can do so by drawing on understanding that different disciplines already possess about intelligent and complex behavior, while also remaining vigilant about the disanalogies that might exist between natural systems and candidate AI designs.
Furthermore, bringing diverse epistemic competencies to bear upon the problem also puts us in a better position to identify neglected challenges and opportunities in alignment research. While we certainly recognize that familiarity with ML research is an important part of being able to make significant progress in the field, we also think that familiarity with a large variety of intelligent systems and models of intelligent behavior constitutes an underserved epistemic resource. It can provide novel research surface area, help assess current research frontiers, de- (and re-)construct the AI risk problem, help conceive of novel alternatives in the design space, etc.
This makes interdisciplinary and transdisciplinary research endeavors valuable, especially given how they otherwise are likely to be neglected due to inferential and disciplinary distances. That said, we are skeptical of “interdisciplinary for the sake of it”, but consider it exciting insofar it explores specific research bets or has specific generative motivations for why X is interesting.
For more information on PIBBSS, see this introduction post, this discussion of the epistemic bet, our research map (currently undergoing a significant update, to be released soon), and these notes on our motivations and scope.
This is how we thought of the fellowships’ value propositions prior to running it:
Overall, we believe that most of the value of the program came from researchers gaining a better understanding of research directions and the AI risk problems. Some of this value manifests as concrete research outputs, some of it as effects on fellows’ future trajectories.
Overall, we believe the PIBBSS summer research fellowship (or a close variation of it) is worth running again. We applied for funding to do so.
The key dimensions of improvement we are envisioning for the 2023 fellowship are:
More tentatively, we might explore ways of running (part of the program) in a (more) mentor-less fashion. While we think this is hard to do well, we also think this is attractive for several reasons, mainly because mentorship is scarce in the field. Some potential avenues of exploration include:
Beyond the format of the summer research fellowship, we tentatively think the following (rough) formats are worth further thought. Note that we are not saying these programs are, all things considered, worthwhile, but that, given our experience, these are three directions that may be worth exploring further.
PIBBSS is interested in exploring these, or other, avenues further. If you have feedback or ideas, or are interested in collaborating, feel encouraged to reach out to us (firstname.lastname@example.org).
For invaluable help in making the program a success, we want to thank our fellow organizing team members Anna Gadjdova and Cara Selvarajah; and several other people who contributed to the different parts of this endeavor, including Amrit Sidhu-Brar, Gavin Leech, Adam Shimi, Sahil Kulshrestha, Nandi Schoots, Tan Zhi Xuan, Tomáš Gavenčiak, Jan Kulveit, Mihaly Barasz, Max Daniel, Owen Cotton-Barrat, Patrick Butlin, John Wentworth, Andrew Critch, Vojta Kovarik, Lewis Hamilton, Rose Hadshar, Steve Byrnes, Damon Sasi, Raymond Douglas, Radim Lacina, Jan Pieter Snoeji, Cullen O’Keefe, Guillaume Corlouer, Elizabeth Garrett, Kristie Barnett, František Drahota, Antonín Kanát, Karin Neumannova, Jiri Nadvornik, and anyone else we might have forgotten to mention here - our sincere apologies!). Of course, we are also most grateful to all our mentors and fellows.
In this section, we will discuss our reflections on the portfolio of research bets that fellows worked on, which are distributed across a range of “target domains”, in particular:
We will focus the discussion on aspects like the theory of impact for different target domains and the tractability of insight transfer. The discussion will aim to abstract away from fellow- or project-specific factors. Note, that we shall skip the discussion of specific projects or other details here in the public post.
TL;DR: At a high level, projects aimed towards i) Agent Foundations, ii) Alignment of Complex Systems, and iii) Digital Minds and Brain-inspired Alignment most consistently made valuable progress. Projects aimed at iv) Prosaic Alignment faced the largest challenges. Specifically, they seem to require building new vocabulary and frameworks to assist in the epistemic transfer and would have benefited from fellows having more familiarity with concepts in technical ML, which we were insufficiently able to provide through our pedagogical efforts. We believe this constitutes an important dimension of improvement.
1. Agent Foundations (25–30%) [AF]
2. Alignment of Complex Systems (20–25%) [CS]
3. Digital Minds (and Brain-inspired alignment research; 5–10%) [DM]
Discussion of AF, CS and DM:
The above three target domains (i.e. Agent Foundations, Alignment of Complex Systems, and Digital Minds) are all, in some sense, similar insofar as they are all basically pursuing conceptual foundations of intelligent systems, even if the three approach the problem from slightly different starting positions, and with different methodologies. The three bundles together accounted for about 50-55% of projects, and roughly 50% of them were successful in terms of generating research momentum. This makes it meaningful to pay attention to two other similarities between them: a) the overlapping vocabularies and interests with respective neighboring disciplines, and b) the degree of separation (or indirectness) in their theory of impact.
The object-level interests in AF, CS, or DM mostly have the same type signature as questions that motivate researchers in the respective scientific and philosophical disciplines (such as decision theory, information theory, complex systems, cognitive science, etc.). This also means that interdisciplinary dialogue can be conducted relatively more smoothly, due to shared conceptual vocabulary and ontologies. Consequently, we can interpret the motivational nudges provided by PIBBSS here as being some gentle selection pressure towards alignment-relevance of which specific questions get investigated.
At the same time, the (alignment-relevant) impact from research progress here is mostly indirect, coming from better foresight of AI behavior and as an input to future specification and/or interpretability research (see discussion in Rice and Manheim 2022). This provides an important high-level constraint on value derived here.
4. Prosaic Alignment Foundations (25–35%) [PF]
Discussion of PF:
While some PF projects did succeed in finding promising research momentum, there was higher variance in the tractability of progress. This bundle also had a meaningfully lower ex-post excitement by mentors (compared to the rest of the portfolio), and caused us significant updates about the epistemic hardness of transferring insights from other disciplines.
Unlike AF+CS+DM discussed above, the interdisciplinary transfer of insights towards prosaic alignment seems to involve building entirely new vocabularies and ontologies to a significantly higher degree. For example, transferring insights from evolutionary theory towards understanding any particular phenomenon of relevance in deep learning, seems to be bottlenecked by the absence of a much richer understanding of isomorphism than what already exists. In fact, the projects in this bundle that did succeed in finding good research momentum were strongly correlated with prior ML familiarity of the fellows.
However, given the potential of high-value insights coming from bets like this, we think further exploring the ways of building relevant ML familiarity for the fellows, such that they can more efficiently and constructively contribute, seems worth investigating further. At the very least, we intend to add pedagogical elements for familiarizing fellows with Machine Learning Algorithms and with ML Interpretability, in addition to improving the pedagogical elements on Information Theory and RL Theory from the previous iteration.
5. Socio-technical ML Ethics (10%) [ME]
6. Experimental and Applied Prosaic Alignment (5–10%) [EP]
We organized two retreats during the fellowship program for familiarization to AI risk and the alignment problem, facilitation of cross-cohort dialogue, and other benefits of in-person research gatherings. Both the retreats had a mix of structured and unstructured parts, where the structured parts included talks, invited speakers, etc., as well as sessions directed at research planning and orientation, while the unstructured parts included discussions and breakout sessions. A small sample of recurring themes in the unstructured parts included deconfusing and conceptualizing consequentialist cognition, mechanizing goal-orientedness, role of representations in cognition, distinguishing assistive behavior from manipulative behavior, etc.
The first retreat was organized at a venue outside Oxford, at the beginning of the summer, and included sessions on different topics in:
The second retreat was organized near Prague a few weeks before the formal end of the fellowship, and was scheduled adjacent to the Human-Aligned AI Summer School (HAAISS) 2022. It included fellows presenting research updates and seeking feedback, some talks continuing the themes from the previous retreat (eg. why alignment problems contain some hard parts, problematizing consequentialist cognition and second-person ethics, etc), and practising double crux on scientific disagreements (such as whether there are qualitative differences in the role of representations in human and cellular cognition).
Close to the end of the fellowship program, Peter Eckerlsey, one of our mentors - as well as a mentor and admired friend of people involved in PIBBSS - passed away. We mourn this loss, and are grateful for his participation in our program.
Here is an explanation of how deep reading groups work. We were very happy with how the format suited our purposes. Kudos to Sahil Kulshrestha for suggesting and facilitating the format!
By “primary” sources of value, we mean those values that ~directly manifest in the world. By “secondary” values we mean things that are valuable in that they aid in generating (more) primary value (in the future). We can also think of secondary values as “commons” produced by the program.
6 out of 20 fellows had negligible prior exposure to AI risk and alignment; 10 out of 20 had prior awareness but lack of exposure to AI-risk technical discussions; 4 out of 20 had prior technical exposure to AI risk.
Some numbers about our application process:
- Stage 1: 121 applied,- Stage 2: ~60 were invited for work tasks,- Stage 3: ~40 were invited for interviews,- Final number of offers accepted: 20
Issa Rice and David Manheim (2022), Arguments about Highly Reliable Agent Designs as a Useful Path to Artificial Intelligence Safety, https://arxiv.org/abs/2201.02950
Nora Ammann (2022), Epistemic Artifacts of (conceptual) AI alignment research, https://www.alignmentforum.org/s/4WiyAJ2Y7Fuyz8RtM/p/CewHdaAjEvG3bpc6C
The write up proposes an identification of “four categories of epistemic artifacts we may hope to retrieve from conceptual AI alignment research: a) conceptual de-confusion, b) identifying and specifying risk scenarios, c) characterizing target behavior, and d) formalizing alignment strategies/proposals.”