What's new at FAR AI

AdamGleave; EuanMcLean

Summary

We are FAR AI: an AI safety research incubator and accelerator. Since our inception in July 2022, FAR has grown to a team of 12 full-time staff, produced 13 academic papers, opened the coworking space FAR Labs with 40 active members, and organized field-building events for more than 160 ML researchers.

Our organization consists of three main pillars:

Research. We rapidly explore a range of potential research directions in AI safety, scaling up those that show the greatest promise. Unlike other AI safety labs that take a bet on a single research direction, FAR pursues a diverse portfolio of projects. Our current focus areas are building a science of robustness (e.g. finding vulnerabilities in superhuman Go AIs), finding more effective approaches to value alignment (e.g. training from language feedback), and model evaluation (e.g. inverse scaling and codebook features).

Coworking Space. We run FAR Labs, an AI safety coworking space in Berkeley. The space currently hosts FAR, AI Impacts, MATS, and several independent researchers. We are building a collaborative community space that fosters great work through excellent office space, a warm and intellectually generative culture, and tailored programs and training for members. Applications are open to new users of the space (individuals and organizations).

Field Building. We run workshops, primarily targeted at ML researchers, to help build the field of AI safety research and governance. We co-organized the International Dialogue for AI Safety bringing together prominent scientists from around the globe, culminating in a public statement calling for global action on AI safety research and governance. We will soon be hosting the New Orleans Alignment Workshop in December for over 140 researchers to learn about AI safety and find collaborators.

We want to expand, so if you’re excited by the work we do, consider donating or working for us! We’re hiring research engineers, research scientists and communications specialists.

Incubating & Accelerating AI Safety Research

Our main goal is to explore new AI safety research directions, scaling up those that show the greatest promise. We select agendas that are too large to be pursued by individual academic or independent researchers but are not aligned with the interests of for-profit organizations. Our structure allows us to both (1) explore a portfolio of agendas and (2) execute them at scale. Although we conduct the majority of our work in-house, we frequently pursue collaborations with researchers at other organizations with overlapping research interests.

Our current research falls into three main categories:

Science of Robustness. How does robustness vary with model size? Will superhuman systems be vulnerable to adversarial examples or “jailbreaks” similar to those seen today? And, if so, how can we achieve safety-critical guarantees?

Relevant work:

Value Alignment. How can we learn reliable reward functions from human data? Our research focuses on enabling higher bandwidth, more sample-efficient methods for users to communicate preferences for AI systems; and improved methods to enable training with human feedback.

Relevant work:

Model Evaluation: How can we evaluate and test the safety-relevant properties of state-of-the-art models? Evaluation can be split into black-box approaches that focus only on externally visible behavior (“model testing”), and white-box approaches that seek to interpret the inner workings (“interpretability”). These approaches are complementary, with black-box approaches less powerful but easier to use than white-box methods, so we pursue research in both areas.

Relevant work:

Model testing:
- Inverse scaling,
- Moral beliefs;
Interpretability:
- Codebook features,
- Tuned lens.

So far, FAR has produced 13 papers that have been published in top peer-reviewed venues such as ICML and EMNLP, and our work has been featured in major media outlets such as the Financial Times, The Times and Ars Technica. For more information on our research, check out our accompanying post.

We also set up our own HPC cluster, codenamed flamingo, for use by FAR staff and partner organizations. Over the next year, we hope to not only scale our current programs but also explore new novel research directions.

We wish we had more capacity to help cultivate more AI safety research agendas, but the time of our researchers and engineers is limited. We have however found other ways to support other organizations in the AI safety sphere. Most notably:

FAR Labs: An AI Safety co-working space in Berkeley

FAR Labs is a coworking hub in downtown Berkeley for organizations and individuals working on AI safety and related issues. Since opening the space in March 2023, we have grown to host approximately 40 members. Our goal is to incubate and accelerate early-stage organizations and research agendas by enabling knowledge sharing and mutual support between members.

Our members are primarily drawn from four anchor organizations, but we also host several independent researchers and research teams. The space is equipped with everything needed for a productive and lively coworking space: workstations, meeting rooms, call booths, video conferencing facilities, snacks and meals. We run lightning talks, lunch & learn sessions, workshops, and happy hours.

FAR Labs also hosts the weekly FAR Seminar series, welcoming speakers from a range of organizations including FAR, AI Impacts, Rethink Priorities and Oxford University.

We welcome applications from both organizations and individuals to work at FAR Labs, as well as short-term visitors. See here for more information on amenities, culture, and pricing. You can apply here.

Although we are excited to help others progress their research, we are aware that AI safety as a whole is still small compared to the magnitude of the problem. Avoiding risks from advanced AI systems will require not just more productive contributors, but also more contributors. This motivates the third pillar of our efforts: to grow the field of AI safety.

Fieldbuilding & Outreach

We run workshops to educate ML researchers on the latest AI safety research and are building a community that enables participants to more easily find collaborators and remain engaged in the field. We have organized two workshops in 2023, with a total of around 150 participants. We also develop online educational resources on AI safety, both for the general public (e.g. the AI Digest) and a technical audience (e.g. an upcoming interview series with AI safety researchers).

Our workshops are typically targeted at ML researchers, leveraging FAR’s knowledge of the ML community and the field of technical AI safety research. We recently hosted the first International Dialogue on AI Safety bringing together leading AI scientists to build a shared understanding of risks from advanced AI systems. The meeting was convened by Turing Award winners Yoshua Bengio and Andrew Yao, UC Berkeley professor Stuart Russell, OBE, and founding Dean of the Tsinghua Institute for AI Industry Research Ya-Qin Zhang. We ran the event partnership with CHAI and the Ditchley Foundation. The event culminated in a joint statement with specific technical and policy recommendations.

We will soon welcome over 140 ML researchers to the New Orleans Alignment Workshop. Taking place immediately before NeurIPS, the workshop will inform attendees of the latest developments in AI safety, help them explore new research directions and find collaborators with shared research interests.

We are also building AI safety educational resources. We collaborated with Sage Futures to build the AI Digest: a website to help non-technical AI researchers understand the pace of progress in frontier language models. We are also running a series of interviews with AI safety researchers about the theory of change of their research (if you would like to take part, contact euan@far.ai!).

Who’s working at FAR?

FAR’s team consists of 11.5 full-time equivalents (FTEs). FAR is headed by Dr. Adam Gleave (CEO) and Karl Berzins (COO). Our research team consists of five technical staff members, who have gained ML research and engineering experience from graduate school and work experience from places like Jane Street, Cruise, and Microsoft. Our 3-person operations team supports our research efforts, runs FAR Labs, and handles the production of our field-building events. Our 1.5 FTE communications team helps disseminate our research findings clearly and widely. We also benefit from a wide network of collaborators and research advisors.

Tony Wang and Adam Gleave presenting our KataGo attack results at ICML 2023

How can I get involved?

We’re hiring!

We’re currently hiring research scientists, research engineers and communication specialists. We are excited to add as many as five technical staff members in the next 12 months. We are particularly eager to hire senior research engineers, or research scientists with a vision for a novel agenda, although we will also be making several junior hires and would encourage a wide range of individuals to apply. See the full list of openings and apply here.

We’re looking for collaborators!

We frequently collaborate with researchers at other academic, non-profit and – on occasion – for-profit research institutes. If you’re excited to work with us on a project, please reach out at hello@far.ai.

Want to donate?

You can help us ensure a positive future by donating here. Additional funds will enable us to grow faster. Based on currently secured funding, we would be comfortable expanding by 1-2 technical staff in the next 12 months, whereas we would like to add up to 5 technical staff. We are very grateful for your help!

Want to learn more about our research?

Have a look at our latest research update, our list of publications, and our blog. You can also reach out to us directly at hello@far.ai.

We look forward to hearing from you!

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

19

What's new at FAR AI

19

Summary

Incubating & Accelerating AI Safety Research

FAR Labs: An AI Safety co-working space in Berkeley

Fieldbuilding & Outreach

Who’s working at FAR?

How can I get involved?