Catastrophic Risks from AI #2: Malicious Use

Mantas Mazeika; TW123

This is the second post in a sequence of posts giving an overview of catastrophic AI risks.

2 Malicious Use

On the morning of March 20, 1995, five men entered the Tokyo subway system. After boarding separate subway lines, they continued for several stops before dropping the bags they were carrying and exiting. An odorless, colorless liquid inside the bags began to vaporize. Within minutes, commuters began choking and vomiting. The trains continued on toward the heart of Tokyo, with sickened passengers leaving the cars at each station. The fumes were spread at each stop, either by emanating from the tainted cars or through contact with people's clothing and shoes. By the end of the day, 13 people lay dead and 5,800 seriously injured. The group responsible for the attack was the religious cult Aum Shinrikyo [5]. Its motive for murdering innocent people? To bring about the end of the world.

Powerful new technologies offer tremendous potential benefits, but they also carry the risk of empowering malicious actors to cause widespread harm. There will always be those with the worst of intentions, and AIs could provide them with a formidable tool to achieve their objectives. Moreover, as AI technology advances, severe malicious use could potentially destabilize society, increasing the likelihood of other risks.

In this section, we will explore the various ways in which the malicious use of advanced AIs could pose catastrophic risks. These include engineering biochemical weapons, unleashing rogue AIs, using persuasive AIs to spread propaganda and erode consensus reality, and leveraging censorship and mass surveillance to irreversibly concentrate power. We will conclude by discussing possible strategies for mitigating the risks associated with the malicious use of AIs.

Unilateral actors considerably increase the risks of malicious use. In instances where numerous actors have access to a powerful technology or dangerous information that could be used for harmful purposes, it only takes one individual to cause significant devastation. Malicious actors themselves are the clearest example of this, but recklessness can be equally dangerous. For example, a single research team might be excited to open source an AI system with biological research capabilities, which would speed up research and potentially save lives, but this could also increase the risk of malicious use if the AI system could be repurposed to develop bioweapons. In situations like this, the outcome may be determined by the least risk-averse research group. If only one research group thinks the benefits outweigh the risks, it could act unilaterally, deciding the outcome even if most others don't agree. And if they are wrong and someone does decide to develop a bioweapon, it would be too late to reverse course.

By default, advanced AIs may increase the destructive capacity of both the most powerful and the general population. Thus, the growing potential for AIs to empower malicious actors is one of the most severe threats humanity will face in the coming decades. The examples we give in this section are only those we can foresee. It is possible that AIs could aid in the creation of dangerous new technology we cannot presently imagine, which would further increase risks from malicious use.

2.1 Bioterrorism

The rapid advancement of AI technology increases the risk of bioterrorism. AIs with knowledge of bioengineering could facilitate the creation of novel bioweapons and lower barriers to obtaining such agents. Engineered pandemics from AI-assisted bioweapons pose a unique challenge, as attackers have an advantage over defenders and could constitute an existential threat to humanity. We will now examine these risks and how AIs might exacerbate challenges in managing bioterrorism and engineered pandemics.

Bioengineered pandemics present a new threat. Biological agents, including viruses and bacteria, have caused some of the most devastating catastrophes in history. It's believed the Black Death killed more humans than any other event in history, an astounding and awful 200 million, the equivalent to four billion deaths today. While contemporary advancements in science and medicine have made great strides in mitigating risks associated with natural pandemics, engineered pandemics could be designed to be more lethal or easily transmissible than natural pandemics, presenting a new threat that could equal or even surpass the devastation wrought by history's most deadly plagues [6].

Humanity has a long and dark history of weaponizing pathogens, with records dating back to 1320 BCE describing a war in Asia Minor where infected sheep were driven across the border to spread Tularemia [7]. During the twentieth century, 15 countries are known to have developed bioweapons programs, including the US, USSR, UK, and France. Like chemical weapons, bioweapons have become a taboo among the international community. While some state actors continue to operate bioweapons programs [8], a more significant risk may come from non-state actors like Aum Shinrikyo, ISIS, or simply disturbed individuals. Due to advancements in AI and biotechnology, the tools and knowledge necessary to engineer pathogens with capabilities far beyond Cold War-era bioweapons programs will rapidly democratize.

Biotechnology is progressing rapidly and becoming more accessible. A few decades ago, the ability to synthesize new viruses was limited to a handful of the top scientists working in advanced laboratories. Today it is estimated that there are 30,000 people with the talent, training, and access to technology to create new pathogens [6]. This figure could rapidly expand. Gene synthesis, which allows the creation of custom biological agents, has dropped precipitously in price, with its cost halving approximately every 15 months [9]. Furthermore, with the advent of benchtop DNA synthesis machines, access will become much easier and could avoid existing gene synthesis screening efforts, which complicates controlling the spread of such technology [10]. The chances of a bioengineered pandemic killing millions, perhaps billions, is proportional to the number of people with the skills and access to the technology to synthesize them. With AI assistants, orders of magnitude more people could have the required skills, thereby increasing the risks by orders of magnitude.

Figure 3: An AI assistant could provide non-experts with access to the directions and designs needed to produce biological and chemical weapons and facilitate malicious use.

AIs could be used to expedite the discovery of new, more deadly chemical and biological weapons. In 2022, researchers took an AI system designed to create new drugs by generating non-toxic, therapeutic molecules and tweaked it to reward, rather than penalize, toxicity [11]. After this simple change, within six hours, it generated 40,000 candidate chemical warfare agents entirely on its own. It designed not just known deadly chemicals including VX, but also novel molecules that may be deadlier than any chemical warfare agents discovered so far. In the field of biology, AIs have already surpassed human abilities in protein structure prediction [12] and made contributions to synthesizing those proteins [13]. Similar methods could be used to create bioweapons and develop pathogens that are deadlier, more transmissible, and more difficult to treat than anything seen before.

AIs compound the threat of bioengineered pandemics. AIs will increase the number of people who could commit acts of bioterrorism. General-purpose AIs like ChatGPT are capable of synthesizing expert knowledge about the deadliest known pathogens, such as influenza and smallpox, and providing step-by-step instructions about how a person could create them while evading safety protocols [14]. Future versions of AIs could be even more helpful to potential bioterrorists when AIs are able to synthesize information into techniques, processes, and knowledge that is not explicitly available anywhere on the internet. Public health authorities may respond to these threats with safety measures, but in bioterrorism, the attacker has the advantage. The exponential nature of biological threats means that a single attack could spread to the entire world before an effective defense could be mounted. Only 100 days after being detected and sequenced, the omicron variant of COVID-19 had infected a quarter of the United States and half of Europe [6]. Quarantines and lockdowns instituted to suppress the COVID-19 pandemic caused a global recession and still could not prevent the disease from killing millions worldwide.

In summary, advanced AIs could constitute a weapon of mass destruction in the hands of terrorists, by making it easier for them to design, synthesize, and spread deadly new pathogens. By reducing the required technical expertise and increasing the lethality and transmissibility of pathogens, AIs could enable malicious actors to cause global catastrophe by unleashing pandemics.

2.2 Unleashing AI Agents

Many technologies are tools that humans use to pursue our goals, such as hammers, toasters, and toothbrushes. But AIs are increasingly built as agents which autonomously take actions in the world in order to pursue open-ended goals. AI agents can be given goals such as winning games, making profits on the stock market, or driving a car to a destination. AI agents therefore pose a unique risk: people could build AIs that pursue dangerous goals.

Malicious actors could intentionally create rogue AIs. One month after the release of GPT-4, an open-source project bypassed the AI's safety filters and turned it into an autonomous AI agent instructed to “destroy humanity”, “establish global dominance'” and “attain immortality.” Dubbed ChaosGPT, the AI compiled research on nuclear weapons, tried recruiting other AIs to help in its research, and sent tweets trying to influence others. Fortunately, ChaosGPT was not very intelligent and lacked the ability to formulate long-term plans, hack computers, and survive and spread. Yet given the rapid pace of AI development, ChaosGPT did offer a glimpse into the risks that more advanced rogue AIs could pose in the near future.

Many groups may want to unleash AIs or have AIs displace humanity. Simply unleashing rogue AIs, like a more sophisticated version of ChaosGPT, could accomplish mass destruction, even if those AIs aren't explicitly told to harm humanity. There are a variety of beliefs that may drive individuals and groups to do so. One ideology that could pose a unique threat in this regard is “accelerationism.” This ideology seeks to accelerate AI development as rapidly as possible and opposes restrictions on the development or proliferation of AIs. This sentiment is alarmingly common among many leading AI researchers and technology leaders, some of whom are intentionally racing to build AIs more intelligent than humans. According to Google co-founder Larry Page, AIs are humanity's rightful heirs and the next step of cosmic evolution. He has also expressed the sentiment that humans maintaining control over AIs is “speciesist” [15]. Jürgen Schmidhuber, an eminent AI scientist, argued that “In the long run, humans will not remain the crown of creation... But that's okay because there is still beauty, grandeur, and greatness in realizing that you are a tiny part of a much grander scheme which is leading the universe from lower complexity towards higher complexity” [16]. Richard Sutton, another leading AI scientist, thinks the development of superintelligence will be an achievement “beyond humanity, beyond life, beyond good and bad” [17].

There are several sizable groups who may want to unleash AIs to intentionally cause harm. For example, sociopaths and psychopaths make up around 3 percent of the population [18]. In the future, people who have their livelihoods destroyed by AI automation may grow resentful, and some may want to retaliate. There are plenty of cases in which seemingly mentally stable individuals with no history of insanity or violence suddenly go on a shooting spree or plant a bomb with the intent to harm as many innocent people as possible. We can also expect well-intentioned people to make the situation even more challenging. As AIs advance, they could make ideal companions—knowing how to provide comfort, offering advice when needed, and never demanding anything in return. Inevitably, people will develop emotional bonds with chatbots, and some will demand that they be granted rights or become autonomous.

In summary, releasing powerful AIs and allowing them to take actions independently of humans could lead to a catastrophe. There are many reasons that people might pursue this, whether because of a desire to cause harm, an ideological belief in technological acceleration, or a conviction that AIs should have the same rights and freedoms as humans.

2.3 Persuasive AIs

The deliberate propagation of disinformation is already a serious issue, reducing our shared understanding of reality and polarizing opinions. AIs could be used to severely exacerbate this problem by generating personalized disinformation on a larger scale than before. Additionally, as AIs become better at predicting and nudging our behavior, they will become more capable at manipulating us. We will now discuss how AIs could be leveraged by malicious actors to create a fractured and dysfunctional society.

Figure 4: AIs will enable sophisticated personalized influence campaigns that may destabilize our shared sense of reality.

AIs could pollute the information ecosystem with motivated lies. Sometimes ideas spread not because they are true, but because they serve the interests of a particular group. “Yellow journalism” was coined as a pejorative reference to newspapers that advocated war between Spain and the United States in the late 19th century, because they believed that sensational war stories would boost their sales. When public information sources are flooded with falsehoods, people will sometimes fall prey to lies, or else come to distrust mainstream narratives, both of which undermine societal integrity.

Unfortunately, AIs could escalate these existing problems dramatically. First, AIs could be used to generate unique, personalized disinformation at a large scale. While there are already many social media bots [19], some of which exist to spread disinformation, historically they have been run by humans or primitive text generators. The latest AI systems do not need humans to generate personalized messages, never get tired, and could potentially interact with millions of users at once [20].

AIs can exploit users' trust. Already, hundreds of thousands of people pay for chatbots marketed as lovers and friends [21], and one man's suicide has been partially attributed to interactions with a chatbot [22]. As AIs appear increasingly human-like, people will increasingly form relationships with them and grow to trust them. AIs that gather personal information through relationship-building or by accessing extensive personal data, such as a user's email account or personal files, could leverage that information to enhance persuasion. Powerful actors that control those systems could exploit user trust by delivering personalized disinformation directly through people's “friends.”

AIs could centralize control of trusted information. Separate from democratizing disinformation, AIs could centralize the creation and dissemination of trusted information. Only a few actors have the technical skills and resources to develop cutting-edge AI systems, and they could use these AIs to spread their preferred narratives. Alternatively, if AIs are broadly accessible this could lead to widespread disinformation, with people retreating to trusting only a small handful of authoritative sources [23]. In both scenarios, there would be fewer sources of trusted information and a small portion of society would control popular narratives.

AI censorship could further centralize control of information. This could begin with good intentions, such as using AIs to enhance fact-checking and help people avoid falling prey to false narratives. This would not necessarily solve the problem, as disinformation persists today despite the presence of fact-checkers.

Even worse, purported “fact-checking AIs” might be designed by authoritarian governments and others to suppress the spread of true information. Such AIs could be designed to correct most common misconceptions but provide incorrect information about some sensitive topics, such as human rights violations committed by certain countries. But even if fact-checking AIs work as intended, the public might eventually become entirely dependent on them to adjudicate the truth, reducing people's autonomy and making them vulnerable to failures or hacks of those systems.

In a world with widespread persuasive AI systems, people's beliefs might be almost entirely determined by which AI systems they interact with most. Never knowing whom to trust, people could retreat even further into ideological enclaves, fearing that any information from outside those enclaves might be a sophisticated lie. This would erode consensus reality, people's ability to cooperate with others, participate in civil society, and address collective action problems. This would also reduce our ability to have a conversation as a species about how to mitigate existential risks from AIs.

In summary, AIs could create highly effective, personalized disinformation on an unprecedented scale, and could be particularly persuasive to people they have built personal relationships with. In the hands of many people, this could create a deluge of disinformation that debilitates human society, but, kept in the hands of a few, it could allow governments to control narratives for their own ends.

2.4 Concentration of Power

We have discussed several ways in which individuals and groups might use AIs to cause widespread harm, through bioterrorism; releasing powerful, uncontrolled AIs; and disinformation. To mitigate these risks, governments might pursue intense surveillance and seek to keep AIs in the hands of a trusted minority. This reaction, however, could easily become an overcorrection, paving the way for an entrenched totalitarian regime that would be locked in by the power and capacity of AIs. This scenario represents a form of “top-down” misuse, as opposed to “bottom-up” misuse by citizens, and could in extreme cases culminate in an entrenched dystopian civilization.

Figure 5: Ubiquitous monitoring tools, tracking and analyzing every individual in detail, could facilitate the complete erosion of freedom and privacy.

AIs could lead to extreme, and perhaps irreversible concentration of power. The persuasive abilities of AIs combined with their potential for surveillance and the advancement of autonomous weapons could allow small groups of actors to “lock-in” their control over society, perhaps permanently. To operate effectively, AIs require a broad set of infrastructure components, which are not equally distributed, such as data centers, computing power, and big data. Those in control of powerful systems may use them to suppress dissent, spread propaganda and disinformation, and otherwise advance their goals, which may be contrary to public wellbeing.

AIs may entrench a totalitarian regime. In the hands of the state, AIs may result in the erosion of civil liberties and democratic values in general. AIs could allow totalitarian governments to efficiently collect, process, and act on an unprecedented volume of information, permitting an ever smaller group of people to surveil and exert complete control over the population without the need to enlist millions of citizens to serve as willing government functionaries. Overall, as power and control shift away from the public and toward elites and leaders, democratic governments are highly vulnerable to totalitarian backsliding. Additionally, AIs could make totalitarian regimes much longer-lasting; a major way in which such regimes have been toppled previously is at moments of vulnerability like the death of a dictator, but AIs, which would be hard to “kill,” could provide much more continuity to leadership, providing few opportunities for reform.

Figure 5: If material control of AIs is limited to few, it could represent the most severe economic and power inequality in human history.

AIs can entrench corporate power at the expense of the public good. Corporations have long lobbied to weaken laws and policies that restrict their actions and power, all in the service of profit. Corporations in control of powerful AI systems may use them to manipulate customers into spending more on their products even to the detriment of their own wellbeing. The concentration of power and influence that could be afforded by AIs could enable corporations to exert unprecedented control over the political system and entirely drown out the voices of citizens. This could occur even if creators of these systems know their systems are self-serving or harmful to others, as they would have incentives to reinforce their power and avoid distributing control.

In addition to power, locking in certain values may curtail humanity's moral progress. It’s dangerous to allow any set of values to become permanently entrenched in society. For example, AI systems have learned racist and sexist views [24], and once those views are learned, it can be difficult to fully remove them. In addition to problems we know exist in our society, there may be some we still do not. Just as we abhor some moral views widely held in the past, people in the future may want to move past moral views that we hold today, even those we currently see no problem with. For example, moral defects in AI systems would be even worse if AI systems had been trained in the 1960s, and many people at the time would have seen no problem with that. We may even be unknowingly perpetuating moral catastrophes today [25]. Therefore, when advanced AIs emerge and transform the world, there is a risk of their objectives locking in or perpetuating defects in today’s values. If AIs are not designed to continuously learn and update their understanding of societal values, they may perpetuate or reinforce existing defects in their decision-making processes long into the future.

In summary, although keeping powerful AIs in the hands of a few might reduce the risks of terrorism, it could further exacerbate power inequality if misused by governments and corporations. This could lead to totalitarian rule and intense manipulation of the public by corporations, and could lock in current values, preventing any further moral progress.

Story: Bioterrorism

The following is an illustrative hypothetical story to help readers envision some of these risks. This story is nonetheless somewhat vague to reduce the risk of inspiring malicious actions based on it.

A biotechnology startup is making waves in the industry with its AI-powered bioengineering model. The company has made bold claims that this new technology will revolutionize medicine through its ability to create cures for both known and unknown diseases. The company did, however, stir up some controversy when it decided to release the program to approved researchers in the scientific community. Only weeks after its decision to make the model open-source on a limited basis, the full model was leaked on the internet for all to see. Its critics pointed out that the model could be repurposed to design lethal pathogens and claimed that the leak provided bad actors with a powerful tool to cause widespread destruction, opening it up to abuse without careful deliberation, preparedness, or safeguards in place.

Unknown to the public, an extremist group has been working for years to engineer a new virus designed to kill large numbers of people. Yet given their lack of expertise, these efforts have so far been unsuccessful. When the new AI system is leaked, the group immediately recognizes it as a potential tool to design the virus and circumvent legal and monitoring obstacles to obtain the necessary raw materials. The AI system successfully designs exactly the kind of virus the extremist group was hoping for. It also provides step-by-step instructions on how to synthesize large quantities of the virus and circumvent any obstacles to spreading it. With the synthesized virus in hand, the extremist group devises a plan to release the virus in several carefully chosen locations in order to maximize its spread.

The virus has a long incubation period and spreads silently and quickly throughout the population for months. By the time it is detected, it has already infected millions and has an alarmingly high mortality rate. Given its lethality, most who are infected will ultimately die. The virus may or may not be contained eventually, but not before it kills millions of people.

Suggestions

We have discussed two forms of misuse: individuals or small groups using AIs to cause a disaster, and governments or corporations using AIs to entrench their influence. To avoid either of these risks being realized, we will need to strike a balance in terms of the distribution of access to AIs and governments' surveillance powers. We will now discuss some measures that could contribute to finding that balance.

Biosecurity. AIs that are designed for biological research or are otherwise known to possess capabilities in biological research or engineering should be subject to increased scrutiny and access controls, since they have the potential to be repurposed for bioterrorism. In addition, system developers should research and implement methods to remove biological data from the training dataset or excise biological capabilities from finished systems, if those systems are intended for general use [14]. Researchers should also investigate ways that AIs could be used for biodefense, for example by improving biological monitoring systems, keeping in mind the potential for dual use of those applications. In addition to AI-specific interventions, more general biosecurity interventions can also help mitigate risks. These interventions include early detection of pathogens through methods like wastewater monitoring [26], far-range UV technology, and improved personal protective equipment [6].

Restricted access. AIs might have dangerous capabilities that could do significant damage if used by malicious actors. One way to mitigate this risk is through structured access, where AI providers limit users' access to dangerous system capabilities by only allowing controlled interactions with those systems through cloud services [27] and conducting know-your-customer screenings before providing access [28]. Other mechanisms that could restrict access to the most dangerous systems include the use of hardware, firmware, or export controls to restrict or limit access to computational resources [29]. Lastly, AI developers should be required to show that their AIs pose minimal risk of serious harm prior to open sourcing them. This recommendation should not be construed as permitting developers to withhold useful and non-dangerous information from the public, such as transparency around training data necessary to address issues of algorithmic bias or copyright.

Technical research on adversarially robust anomaly detection. While preventing the misuse of AIs is critical, it is necessary to establish multiple lines of defense by detecting misuse when it does happen. AIs could enable anomaly detection techniques that could be used for the detection of unusual behavior in systems or internet platforms, for instance by detecting novel AI-enabled disinformation campaigns before they can be successful. These techniques need to be adversarially robust, as attackers will aim to circumvent them.

Legal liability for developers of general-purpose AIs. General-purpose AIs can be fine-tuned and prompted for a wide variety of downstream tasks, some of which may be harmful and cause substantial damage. AIs may also fail to act as their users intend. In either case, developers and providers of general-purpose systems may be best placed to reduce risks, since they have a higher level of control over the systems and are often in a better position to implement mitigations. To provide strong incentives for them to do this, companies should bear legal liability for the actions of their AIs. For example, a strict liability regime would incentivize companies to minimize risks and purchase insurance, which would cause the cost of their services to more closely reflect externalities [30]. Regardless of what liability regime is ultimately used for AI, it should be designed to hold AI companies liable for harms that they could have averted through more careful development, testing, or standards [31].

Positive Vision

In an ideal scenario, it would be impossible for any individual or small group to use AIs to cause catastrophes. Systems with extremely dangerous capabilities either would not exist at all or would be controlled by a democratically accountable body committed to using them only for the general welfare. Like nuclear weapons, the information needed to develop those capabilities would remain carefully guarded to prevent proliferation. At the same time, control of AI systems would be subject to strong checks and balances, avoiding entrenchment of power inequalities. Monitoring tools would be utilized at the minimum level necessary to make risks negligible and could not be used to suppress dissent.

References

[5] Keith Olson. “Aum Shinrikyo: once and future threat?” In: Emerging Infectious Diseases 5 (1999), pp. 513–516.

[6] Kevin M. Esvelt. Delay, Detect, Defend: Preparing for a Future in which Thousands Can Release New Pandemics. 2022.

[7] Siro Igino Trevisanato. “The ’Hittite plague’, an epidemic of tularemia and the first record of biological warfare.” In: Medical hypotheses 69 6 (2007), pp. 1371–4.

[8] U.S. Department of State. Adherence to and Compliance with Arms Control, Nonproliferation, and Disarmament Agreements and Commitments. Government Report. U.S. Department of State, Apr. 2022.

[9] Robert Carlson. “The changing economics of DNA synthesis”. en. In: Nature Biotechnology 27.12 (Dec. 2009). Number: 12 Publisher: Nature Publishing Group, pp. 1091–1094.

[10] Sarah R. Carter, Jaime M. Yassif, and Chris Isaac. Benchtop DNA Synthesis Devices: Capabilities, Biosecurity Implications, and Governance. Report. Nuclear Threat Initiative, 2023.

[11] Fabio L. Urbina et al. “Dual use of artificial-intelligence-powered drug discovery”. In: Nature Machine Intelligence (2022).

[12] John Jumper et al. “Highly accurate protein structure prediction with AlphaFold”. In: Nature 596.7873 (2021), pp. 583–589.

[13] Zachary Wu et al. “Machine learning-assisted directed protein evolution with combinatorial libraries”. In: Proceedings of the National Academy of Sciences 116.18 (2019), pp. 8852–8858.

[14] Emily Soice et al. “Can large language models democratize access to dual-use biotechnology?” In: 2023.

[15] Max Tegmark. Life 3.0: Being human in the age of artificial intelligence. Vintage, 2018.

[16] Leanne Pooley. We Need To Talk About A.I. 2020.

[17] Richard Sutton [@RichardSSutton]. It will be the greatest intellectual achievement of all time. An achievement of science, of engineering, and of the humanities, whose significance is beyond humanity, beyond life, beyond good and bad. en. Tweet. Sept. 2022.

[18] A. Sanz-García et al. “Prevalence of Psychopathy in the General Adult Population: A Systematic Review and Meta-Analysis”. In: Frontiers in Psychology 12 (2021).

[19] Onur Varol et al. “Online Human-Bot Interactions: Detection, Estimation, and Characterization”. In: ArXiv abs/1703.03107 (2017).

[20] Matthew Burtell and Thomas Woodside. “Artificial Influence: An Analysis Of AI-Driven Persuasion”. In: ArXiv abs/2303.08721 (2023).

[21] Anna Tong. “What happens when your AI chatbot stops loving you back?” In: Reuters (Mar. 2023).

[22] Pierre-François Lovens. “Sans ces conversations avec le chatbot Eliza, mon mari serait toujours là”. In: La Libre (Mar. 2023).

[23] Cristian Vaccari and Andrew Chadwick. “Deepfakes and Disinformation: Exploring the Impact of Synthetic Political Video on Deception, Uncertainty, and Trust in News”. In: Social Media + Society 6 (2020).

[24] Moin Nadeem, Anna Bethke, and Siva Reddy. “StereoSet: Measuring stereotypical bias in pretrained language models”. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Online: Association for Computational Linguistics, Aug. 2021, pp. 5356–5371.

[25] Evan G. Williams. “The Possibility of an Ongoing Moral Catastrophe”. en. In: Ethical Theory and Moral Practice 18.5 (Nov. 2015), pp. 971–982.

[26] The Nucleic Acid Observatory Consortium. “A Global Nucleic Acid Observatory for Biodefense and Planetary Health”. In: ArXiv abs/2108.02678 (2021).

[27] Toby Shevlane. “Structured access to AI capabilities: an emerging paradigm for safe AI deployment”. In: ArXiv abs/2201.05159 (2022).

[28] Jonas Schuett et al. Towards best practices in AGI safety and governance: A survey of expert opinion. 2023. arXiv: 2305.07153.

[29] Yonadav Shavit. “What does it take to catch a Chinchilla? Verifying Rules on Large-Scale Neural Network Training via Compute Monitoring”. In: ArXiv abs/2303.11341 (2023).

[30] Anat Lior. “AI Entities as AI Agents: Artificial Intelligence Liability and the AI Respondeat Superior Analogy”. In: Torts & Products Liability Law eJournal (2019).

[31] Maximilian Gahntz and Claire Pershan. Artificial Intelligence Act: How the EU can take on the challenge posed by general-purpose AI systems. Nov. 2022

AI ALIGNMENT FORUM
AF