Epistemic status: Stream of consciousness reactions to papers read in chronological order. Caveat lector.

I have a dirty family secret. My uncle is a professional ethicist.

In a not-too roundabout way, this is why I ended up looking at the October 2020 issue of the journal Science and Engineering Ethics, their special issue on the ethics of AI. I am now going to read that issue, plus every article this journal has published about AI since then [I wussed out and am just going to skim the latter for ones of special interest] and give you the deets.

October 2020

Hildt et al., Editorial: Shaping Ethical Futures in Brain-Based and Artificial Intelligence Research

This is the introduction to the issue. They give each paper a sentence or two of summary and try to tie them all together. The authors helpfully give a list of topics they think are important:

• Data Concerns: Data management, data security, protection of personal data, surveillance, privacy, and informed consent.

• Algorithmic Bias and Discrimination: How to avoid bias and bias related problems? This points to questions of justice, equitable access to resources, and digital divide.

• Autonomy: When and how is AI autonomous, what are the characteristics of autonomous AI? How to develop rules for autonomous vehicles?

• Responsibility: Who is in control? Who is responsible or accountable for decisions made by AI?

• Questions relating to AI capabilities: Can AI ever be conscious or sentient? What would conscious or sentient AI imply?

• Values and morality: How to build in values and moral decision-making to AI? Are moral machines possible? Should robots be granted moral status or rights?

Based on this list, I anticipate that I'm about to run into four-sixths ethics papers about present-day topics that I will skim to point out particularly insightful or anti-insightful ones, one-sixth philosophers of mind that I will make fun of a little, and one-sixth papers on "How to build values into general AI" that I'm really curious as to the quality of.

Onward!

Nallur, Landscape of Machine Implemented Ethics

Primarily this paper is a review of a bunch of papers that have implemented or proposed ethics modules in AI systems (present-day things like expert systems to give medical advice, or lethal autonomous weapons [which he has surprisingly few qualms about]). These were mostly different varieties of rule-following or constraint-satisfaction, with a few handwritten utility functions thrown in. And then one of these is Stuart Armstrong (2015) for some reason - potentially that reason is that the author wanted to at least mention "value-loading," and nobody else was talking about it (I checked - there's a big table of properties of different proposals).

It also proposes evaluating different proposals by having a benchmark of trolley-problem-esque ethical dilemmas. The main reason this idea won't work is that making modern-day systems behave ethically involves a bunch of bespoke solutions only suitable to the domain of operation of that system, not allowing for cross-comparison in any useful way.

If were to salvage this idea, we might wish to have a big list of ethical questions the AI system should get the right answer to, and then when building a sufficiently important AI (still talking about present-day applications), the designers should go through this list and find all the questions that can be translated into their system's ontology and check that their decision-making procedure gets acceptable answers. E.g. "Is it better to kill one person or two people?" can become self-driving car scenarios where it's going to hit either one or two people, and it should get the right answer, but the self-driving car people don't have to benchmark their system on medical ethics questions.

Bauer, Expanding Nallur's Landscape of Machine Implemented Ethics

This paper makes the point that "ethical behavior" for present-day AI might also mean taking a stand about how we want society to be arranged on a larger scale (e.g. what's an "ethical high-frequency trading algorithm"?). Then it descends into self-advertisement about hedonistic utilitarianism and virtue ethics, which we should clearly not build AIs to use.

Farisco et al., Towards Establishing Criteria for the Ethical Analysis of Artificial Intelligence

This one makes a lot of the right noises at first. They spend a couple of pages talking about defining intelligence for some unclear reason, but at least they cite Legg and Hutter, you know? But then they quickly take a left turn off the deep end and start talking about how biological intelligence is morally different than AI because AI can't do abductive reasoning. There are some good points about emotional connection mixed in with the bad points about emotional reasoning being magic, I guess.

Butkus, The Human Side of Artificial Intelligence

This is a response to the previous article, especially the parts about emotional reasoning being magic. The key point it makes (a good one) is that humans aren't all that great at reasoning - we make lots of mistakes, including moral mistakes. "If we intentionally avoid some of the known pitfalls in our cognitive architecture, we cannot help but create moral agents that are dissimilar from us." How to do this in a trustworthy way is obviously hard, and they mumble something about contextuality of decision making.

Rainey and Erden, Correcting the Brain? The Convergence of Neuroscience, Neurotechnology, Psychiatry, and Artificial Intelligence

We might use AI to control brain stimulation to try to treat people with psychiatric problems. This would be like the AI controlling people, which is scary. Those darn neuroscientists are too reductive about the brain. Boo reductionism, yay holism. Humans look for and can relay bite-sized reasons for their actions, while AIs can't, which is why human reasoning is more trustworthy. </they say>

Jotterand and Bosco, Keeping the “Human in the Loop” in the Age of Artificial Intelligence

Props to this paper for having a good abstract. It's about risks from "dehumanizing" medicine. Their best point is that there's an inequality in the doctor-patient relationship, but that part of good medicine is establishing trust with the patient and genuinely working together with them to answer moral/medical questions. Of course they then say "AI will never ever be able to do this," but we can charitably interpret them as saying that AI can do it neither now nor soon, and that there are dangerous incentives to hastily replace doctors with AI in a way that damages patients' trust and agency.

Dubljević, Toward Implementing the ADC Model of Moral Judgment in Autonomous Vehicles

ADC stands for "Agent, Deed, Consequence." This would evaluate actions using a nearly-equal mixture of three parts. Normally the "Agent" part means that an action is more moral if the agent was a good person with good intentions, but in this paper the author also gives it the job of making actions more moral if you're helping good people or harming bad people. (Does this make any sense? Especially to program into a self-driving car? No, this seems like mixing up descriptive and normative.). "Deed" means checking if the action obeys the law or other pre-specified rules of the road. "Consequence" means not crashing or causing crashes, and getting to the destination.

The author gives a cherry-picked terrorism example where self-driving cars are supposed to notice that a terrorist is driving a truck into a crowd, judge that they're a bad person, and then will not get out of the truck's way because "Action" is negative (getting out of the way would help a bad person), "Deed" is negative (you'd have to leave your lane to avoid the truck, which is against the rules), and only "Consequence" is positive (you'd avoid crashing).

This isn't totally insane. We could imagine e.g. giving self-driving cars rules for identifying when another car is about to crash into people and how to try to intercept it. But I think this notion of needing a triumvirate where the three votes are Law, Not Crashing, and Harming Bad People is just the wrong design desiderata.

Totschnig, Fully Autonomous AI

Our first paper about ethics for par-human and superhuman AI! They care about something they call "full autonomy," where you sometimes rewrite your own goals, which humans sort of do (the example is someone who devotes themself to one cause but then later changes their mind). They then give a good summary of why Bostrom And The Gang think that an AI won't want to change its goals (which they call the Finality Argument that a self-improving AI's goals will remain fixed).

My first response was that humans only sort of have goals, and therefore only sort of rewrite them. This is anticipated in the next section, and the author basically says this is a good point, but they still think "full autonomy" is important and in some sense desirable. *shrug*

Their knock-down counterargument to the Finality Argument is that improving your model of the world requires you to translate your old goals into the new ontology, a task which is not specified by the goals themselves but by some extra standards (that may themselves be subject to change.) This is rewriting your goals, and so self-improving AIs have full autonomy as they define it.

All valid so far. But then they conclude "The good news is that the fear of a paper clip AI and similar monsters is unfounded. The bad news is that the hope of a human-equivalent or superhuman AI under our control, of a genie in a bottle, is unfounded as well." They think AIs are going to search for moral realism and end up having weird and surprising goals. But this makes an unsupported leap from the "full autonomy" they care about to "will radically rather than only subtly change itself."

One more thing I'm interested in here is to look through the references to see if there are any famous (well, at least well-cited) ethicists writing about things like this who I haven't heard of. A lot of the references were to the usual suspects (Bostrom, Yudkowsky, Russell, Yampolskiy, etc.). Papers for me to look into:

Lawless, W. F., Mittu, R., Sofge, D., & Russell, S. (Eds.). (2017). Autonomy and artificial intelligence: A threat or savior? (A volume of collected papers)

Redfeld, S. A., & Seto, M. L. (2017). Verification challenges for autonomous systems.

Tessier, C. (2017). Robots autonomy: Some technical issues.

Petersen, S. (2017). Superintelligence as superethical. In P. Lin, R. Jenkins, & K. Abney (Eds.), Robot ethics 2.0: From autonomous cars to artificial intelligence (pp. 322–337).

Podschwadek, F. (2017). Do androids dream of normative endorsement? On the fallibility of artificial moral agents. Artificial Intelligence and Law, 25(3), 325–339

I'll get back to these later.

Dennis, Computational Goals, Values and Decision‑Making

This is commentary on the previous paper. It mostly says "hold up, you can't actually implement an agent with a utility function, you need to use some kind of bounded rationality" without drawing strong conclusions.

Matthias, Dignity and Dissent in Humans and Non‑humans

This paper is another one that intensely cares about whether AIs (and animals) have autonomy and/or the ability to give themselves goals. But this time it explains some context: Kant cared a lot about human dignity and "derives it from the moral autonomy of the individual," and so people who've drunk the Kant kool-aid (probably could phrase that better, oh well) are asking if AIs have moral autonomy so they can know whether they're dignified.

However, this paper quickly decides that all that matters is autonomy similar in kind and quantity to that of a human, which they say isn't all that high a bar. However, they also say it isn't "real choice" if the AI is following steps for moral reasoning laid down by its programmers, which seems to mix up rule-goverenedness with human understanding of those rules (appeal to the mystique of mystery).

There's also something about individuality being important to them, which sort of makes sense but also sort of sounds like the author preferentially thinks about human-sized and human-shaped entities.

Zhu et al., Blame‑Laden Moral Rebukes and the Morally Competent Robot: A Confucian Ethical Perspective

Many of these authors have been from not-so-famous universities (I'm still used to the hard sciences where funding differential and race dynamics means that the same handful of institutions reliably dominate), but this one is surprising enough to mention: the authors are from the Colorado School of Mines, which I wasn't even aware had a school of humanities (they're a decent but very focused engineering school).

This paper is about an interesting question: should near-future language models (or AI systems that use a language model) output "blame-laden moral rebukes" of users who misbehave? If you start swearing at the AI receptionist, should it tell you off?

The authors first give some low-sample-size evidence that humans will be influenced by AI enforcement of moral norms (with different forms of enforcement, ranging from polite mentions to blunt demands, working better in different contexts). Then they spend a section explaining some facets of Confucianism that relate to when you should rebuke others, which point to some of the theory of mind an effectively-rebuking AI should have. They also advocate for some Confucian norms about filtering communication through social role.

Gunkel, Shifting Perspectives

This is commentary on the previous article. It's not very good, but does have some nice references if you want to look into the subfield called Human-Machine Communication.

Aicardi et al., Ethical and Social Aspects of Neurorobotics

This article comes out of the Human Brain Project, an ambitious (and non-ambition-meeting) effort to scan the human brain. After getting what felt like boilerplate social concerns out of the way, they share some of the things that they had to think about the ethics of at HBP. The key sections were on the dual use potential of robotics research, the issues with academic-industry partnerships, and managing and securing your data.

Taraban, Limits of Neural Computation in Humans and Machines

This was supposed to be commentary on the previous article, except the author didn't want to talk about research ethics, they wanted to talk about whether it makes sense to build robots using AI derived from scans of human brains.

First, they say it's silly in the first place. Humans are flawed thinkers, and anyhow we don't know how to take a brain scan and turn it into an AI with customized semantic content, and anyhow the probabilistic inference people are probably going to build better robots first.

Second, if the neurorobotics people succeed we'd probably grant the robots rights, and that would be silly so let's not do it.

Soltanzadeh et al., Customizable Ethics Settings for Building Resilience and Narrowing the Responsibility Gap

This is an article about autonomous vehicles, arguing that they should come with user-modifiable "ethics settings." These could be things like setting the tradeoff between speed and greenhouse gas emissions, adjusting car follow distance and pedestrian avoidance distance within some range, etc.

Basically they call user autonomy a "core social value," and also worry that if the human in the vehicle isn't responsible for what the car does, that's inherently bad. The weirdness of these arguments actually makes me re-examine my intuitive agreement with the idea of ethics settings.

Ryan, In AI We Trust: Ethics, Artificial Intelligence, and Reliability

The EU's HLEG AI Ethics guidelines state that humans should be able to trust AI. The author splits hairs, gives some definitions of trust that require emotional states or motivations on the part of the person being trusted, and then concludes that we don't actually want to be able to trust AI, we just want to be able to rely on it. *shrug*

Smids, Danaher’s Ethical Behaviourism: An Adequate Guide to Assessing the Moral Status of a Robot?

This is a response to an apparently-popular paper, Danaher 2019, that argues for "ethical behaviorism" for robots - if they act like an animal, then we should ascribe animal-like moral status to them. The author (Smids) disagrees.

This paper could really use more Bayesian reasoning. They try to muddle through based on what is qualitatively "justified" in abductive reasoning, but it gets real murky.

The point the author makes is: ethical behaviorism is sneaking in a theory about what makes things have moral status, and if we just use that theory we don't need to keep the behaviorism part.

For example, breathing is a behavior that many animals of high moral status do, but we don't ask a robot to breathe before it can have moral status. But how do we know to discard breathing, if we're just being behaviorists? We're discriminating between different kinds of behavior by checking how much information they give us about the morally significant properties (e.g. feeling pain, having dreams) that we actually care about.

And then once you allow that we're using some theory about mental faculties and inferring them from behavior, it makes sense to allow us to infer them from other things too - surely the design of a robot gives us more than literally zero additional information about its mental faculties.

I think this paper was reasonable and well argued, but I think it also illuminates some issues in ethicists' approach to AI. First, Danaher's paper feels like it might have been popular because of the "style points for cleverly defending something silly" dynamic that's occasionally a problem, plus some wishful thinking for simple solutions. Second, both the original paper and the response focus on animal-like or human-like AI controlling individual robot bodies, a very anthropomorphic (zoomorphic?) picture that seems to show up repeatedly whenever people first try to think about the "moral worth of AI."

Cawthorne and van Wysnberghe, An Ethical Framework for the Design, Development, Implementation, and Assessment of Drones Used in Public Healthcare

This paper foresees an expansion of drone usage by healthcare organizations and tries to go through ethical principles (beneficence, non-maleficence, autonomy, justice, explicability) to try to draw recommendations. Among many recommendations of varying quality, some of the interesting ones were:

  • Deliberately limiting the drone's capabilities to only what's needed for its specific job (avoiding excess capability)
  • Taking steps to make sure that it's really hard to use them for surveillance
  • Flying them along designated corridors rather than the shortest route if that aids predictability and helps avoid crashes
  • Making sure they're quiet
  • Technological transparency

Later Papers I Found Interesting

Hubbard and Greenblum, Surrogates and Artificial Intelligence: Why AI Trumps Family (December 2020)

When someone is incapacitated, right now we let their family make medical decisions for them. Instead, we should train a model of medical decision-making on the choices of non-incapacitated patients, conditional on various demographic observables (age, sex, etc.) and use the predicted decision. If this model is more accurate at predicting the person's preferences than the family, then it's more ethical to use the model than to listen to the family.

A+ provocative paper.

Ryan et al., Research and Practice of AI Ethics: A Case Study Approach (March 2021)

Half the reason I include this is that they do devote a paragraph or two to "long-term issues" like superintelligent AI. But then they go back to looking for ethical issues in present-day case studies, and find only present-day type issues. Still, this was actually a pretty decent way of getting at peoples' concerns about present-day uses of AI and big data, which is the other half a reason to mention this paper.

Mamak, Rights for Robots: Artificial Intelligence, Animal and Environmental Law (2020) by Joshua Gellers (April 2021)

This is just a reminder that "Will robots have rights?" is absolute catnip to many ethicists. I will no longer be including more "But what about robot rights" papers, but you can be assured that they exist.

Lara, Why a Virtual Assistant for Moral Enhancement When We
Could have a Socrates? (June 2021)

I think I've actually skimmed this paper before. Or maybe the author's been shopping this idea around for a while.

As the title says, the author proposes a method (the method, they say) for augmenting humans morally, which is to help them reflect with non-invasive tools or partners that focus on improving their reasoning skills, not on leading them to one particular conclusion. Such a tool, let's call it "SocrAItes," might be possible quite soon, based on technical achievements like GPT3 or IBM's Project Debater.

I think it's an interesting idea that someone genuinely should try out, but I'm not sold that all other ideas are bad. Also, the author doesn't really think through how one would design or train such a system, so if you wanted to take a crack at it you'll need to start from square one.

de Sio, Mark Coeckelbergh, AI Ethics, Mit Press, 2021 (August 2021)

This book review wasn't that great, but it did make me go look up the original to see what Coeckelbergh had to say about superintelligent AI. Sadly, his chapter on superintelligence (the first chapter of the book) is spent rehashing Frankenstein and the myth of the golem, rather than talking about the ethics of superintelligent AI. Then the next chapter is spent resurrecting Hubert Dreyfus to argue against the possibility of general AI (plus a mildly interesting discussion of humanism, transhumanism, and anti-humanistic posthumanism.) But we get down to brass tacks in chapter 3, which is about whether robots should have rights.

Roberts et al., Achieving a ‘Good AI Society’: Comparing the Aims
and Progress of the EU and the US (November 2021)

An interesting summary paper if you want to read about international AI governance aimed at present-day issues. However, actually getting into the details of empirical questions like how the EU's AI regulations are actually being enforced seems like it's difficult and requires more data than is used in this paper - this paper mostly just covers the aims of the EU and US.

The gist of their picture is that the EU is trying to address a broad set of risks from current AI systems, and the US is trying to address a much more narrow set and is pressuring other countries to do the same because addressing more risks would cut into US company profits.

Schmid et al., Dual‑Use and Trustworthy? A Mixed Methods Analysis of AI
Diffusion Between Civilian and Defense R&D (January 2022)

This is an interesting attempt to characterize the dual-use of AI technology by looking at patent citations. Among 109 AI patent citations between different companies in Germany from 2008 to 2018, 93 stayed between civilian companies, 12 were a civilian company being cited by a defense company, 3 were a defense company cited by a civilian company, and 1 was between defense companies.

Which is interesting and all, but they don't actually do a good enough job of checking what this means for dual use (they say it's not happening). Like, how does this compare to citation patterns for technologies that more clearly are / are not dual use? Overall grade: cool idea, but I still have as many questions as I did at the start.

Conclusions

I did this so you don't have to. Out of the 45+ papers I looked through, I would say to read Nallur (2020) to get a survey of present-day machine ethics work, read Roberts et al. (2021) if you're interested in AI governance, and forward Hubbard and Greenblum (2020) to your doctor friends because it's a great troll.

There were fewer papers than I expected focused on the ethics of superhuman AI, though a decent number mentioned the issue (citing Bostrom, not really each other). However, I have found some good papers on the ethics of par-human or superhuman AI outside the journal Science and Engineering Ethics, which I'll cover in the sequel to this post. I'm not sure why this is - it could be that the fraction of ethics papers on superintelligence is constant and I merely found them effectively when I searched for them, or there was a bump in interest after the publication of Superintelligence: Paths, Dangers, Strategies that has now subsided, or this is due to a feature of the journal like its culture or an opinion of the editors.

What do I want to see in relation to ethicists? I don't think you can take someone who currently thinks about the average of the papers above, and immediately get them to have interesting thoughts about superhuman AI. But people can learn new things, and people interested in ethics are more or less the sort of people who'll get interested in the ethics of superhuman AI. So I would recommend more high-quality papers making the basic arguments in new ways or showing off incremental progress in a way accessible to the ethics or philosophy communities, but not recommend rapid incentives for new papers from people who haven't "put in the time."

One place that the current literature does seem relatively expert (and relatively influential) is in present-day governance of AI. I think that people working on AI governance with an eye towards risks from advanced AI should absolutely be trying to work in partnership with the broader AI regulatory community.

New Comment
8 comments, sorted by Click to highlight new comments since: Today at 11:14 AM

Thanks for doing this! How prestigious is the journal Science and Engineering Ethics?

It's a C-tier journal. (Impact factor 3.3, where 1 is trash and 5.5 is everyone in your subfield reads it)

It's not an A-tier journal that everyone in the field cares about. It's not even a B-tier journal that everyone in the sub-field cares about. It's just a place random Joes with PhDs can go get their thoughts published. But it still has standards and still gets citations. In ethics as a field, AI is a niche that doesn't really get a B-tier journal (not like medicine or law).

Plus, I was recommended to check out the work of one of the editors of the special issue on AI, and saw they had more AI papers than the best-cited ethics journals, so I decided it would be interesting to take a broad sample of this one journal.

I think there's just one journal it would have been more appropriate for me to delve into, which is Ethics and Information Technology, which is even more on-brand than Science and Engineering Ethics and also slightly better-cited. But it's not like they talk about superhuman AI much either - topic-wise they spend less time on wooly philosophical rambling and more time on algorithmic bias.

I'll talk more about them in my next post that's more of an attempt to go out and find interesting papers I didn't know about before. One of the problems with a literature search is that a lot of interesting articles seem to have ended up in one-off collections (e.g. Douglas Summers-Stay's article from a collection called Autonomy and Artificial Intelligence) or low-impact publications.

I'm curious what you would think about my actual book, not just the review of it! As a political scientist who has spent a decade working on environmental rights, I come at the issue of robot rights from arguably a more interdisciplinary perspective.  You can download the book for free here: https://www.amazon.com/Rights-Robots-Artificial-Intelligence-Environmental-ebook-dp-B08MVB9K28/dp/B08MVB9K28/ref=mt_other?_encoding=UTF8&me=&qid=

Thanks for the link! I think the earlier chapters that I skimmed were actually more interesting to me than chapter 5 that I... well, skimmed in more detail. I'll make some comments anyway, and if I'm wrong it's probably my own fault for not being sufficiently acholarly.

Some issues general issues with the entire "robot rights" genre (justifications for me not including more auch papers in this poast), which I don't think you evaded:

  • Rights-based reasoning isn't very useful for questions like what entities to create in the first place.
  • AI capabilities are not going to reach the level of useful personal assistants and then plateau. They're going to keep growing. The useful notion of rights relies on the usefulness of certain legal and social categories, but sufficiently cabable AI might be able to get what it wants in ways that undermine those categories (in the extreme case, without acting as a relevant member of society or relevant subject of the legal system).
  • Even in the near term, for those interested in mental properties as a basis for how we should treat AIs, the literature is too anthropomorphic, and reality (e.g. "what's it like to be GPT-3") is very, very not anthropomorphic. I would say your book is above average here because it focuses on social / legal reasons for rights.

Thanks for taking the time to look through my book. It's an important first step to having a fair dialogue about tricky issues. I'll say from the outset that I initially sought to answer two questions in my book- (1) could robots have rights (I showed that this could easily be the case in terms of legal rights, which is already happening in the US in the form of pedestrian rights for personal delivery devices); and (2) should robots have rights (here I also answered in the affirmative by taking a broad view of the insights provided by the Anthropocene, New Materialism, and critical environmental law). As to your points, see my responses below.

  1. I disagree. In fact, some of the most vocal opponents of robot rights, like Joanna Bryson, argue that if we have robots worthy of rights, we will have designed them injustly. Her point is that if we figure out the rights question, it can help us to avoid designing robots that might qualify for rights. My position on this is that roboticists are going to do what they want (unless the government stops them), so we are headed for maximally human-like robots (see: Ishiguro and Hanson).
  2. I can sort of see where you're going with this, but I might say I agree in part and disagree in part. I agree that designers will not stop at useful personal assistants, but I don't see how robots that eventually advocate for themselves will nullify the usefulness of rights. Keep in mind that rights are a two-way street; they require responsibilities as well. If for some reason robots take it upon themselves to seize what they want, that might only strengthen the need for new human rights and perhaps responsibilities on behalf of the companies creating these machines. It will still be important (perhaps more so) for society to determine what kind of moral status autonomous systems might warrant.
  3. I appreciate the positive feedback. However, I would say that the cognitivist approach to moral status is a dead end. David Gunkel, whose terrific book Robot Rights inspired my own, discusses the objections to such a properties-based approach to moral status in a forthcoming chapter in an edited volume. Basically, there are a lot of problems, but perhaps the most important is the "problem of other minds" in philosophy, which says we can never really know what is going on in another entity's mind (see Nagel's iconic article, "What is it like to be a bat?").

Thanks again for your comments and I am grateful for your willingness to engage.

Really appreciate you going to the effort of this literature analysis! Especially because you expected to get a lot of irrelevant stuff but still went looking for productive mistakes!

Thanks for sharing this; it's a really helpful window into the world of AI ethics. I most of all liked this comment you made early on, however: "...making modern-day systems behave ethically involves a bunch of bespoke solutions only suitable to the domain of operation of that system, not allowing for cross-comparison in any useful way."

What this conjures in my mind is the hypothetical alternative of a transformer-like model that could perform zero-shot evaluation of ethical quandaries, and return answers that we humans would consider "ethical", across a wide range of settings and scenarios. But I'm sure that someone has tried this before, e.g. training a BERT-type text classifier to distinguish between ethical and unethical completions of moral dilemma setups based on human-labeled data, and I guess I want to know why that doesn't work (as I'm sure it doesn't, or else we would have heard about it).

https://delphi.allenai.org/

Definitely been done :D

The problem is that it doesn't interface well with decision-making systems in e.g. cars or hospitals. Those specialized systems have no English-language interface, and at the same time they're making decisions in complicated, highly-specific situations that might be difficult for a generalist language model to parse.