I wanted to share a recently published paper on mechanisms to decrease the likelihood of ASI being destructive towards living beings in general.

Abstract:

Artificial superintelligent (ASI) agents that will not cause harm to humans or other organisms are central to mitigating a growing contemporary global safety concern as artificial intelligent agents become more sophisticated. We argue that it is not necessary to resort to implementing an explicit theory of ethics, and that doing so may entail intractable difficulties and unacceptable risks. We attempt to provide some insight into the matter by defining a minimal set of boundary conditions potentially capable of decreasing the probability of conflict with synthetic intellects intended to prevent aggression towards organisms. Our argument departs from causal entropic forces as good general predictors of future action in ASI agents. We reason that maximising future freedom of action implies reducing the amount of repeated computation needed to find good solutions to a large number of problems, for which living systems are good exemplars: a safe ASI should find living organisms intrinsically valuable. We describe empirically-bounded ASI agents whose actions are constrained by the character of physical laws and their own evolutionary history as emerging from H. sapiens, conceptually and memetically, if not genetically. Plausible consequences and practical concerns for experimentation are characterised, and implications for life in the universe are discussed.

Our approach attempts to avoid direct implementation of machine ethics by harnessing the concept of causal entropic forces (i.e. forces that emerge from the microscale as a result of emergent phenomena having a mechanistic origin), and then building a set of boundary conditions for a new class of agents:

Let us define an empirically bounded ASI agent (EBAA) as a type of superintelligent agent whose behaviour is driven by a set of interlocking causal entropic forces and a minimal set of boundary conditions informed by empirical measurements of its accessible portion of the universe. Its entropic forces and its boundary conditions define and constrain its top-level goal satisfaction process.

These agents attempt to satisfy two goals:

(1) Build predictive empirical explanations for events in the accessible universe as sophisticated and generally applicable as possible.

and

(2) Choose histories that maximise long-term sustained information gain.

We define each boundary condition, as well as auxiliary principles that can help accelerate the search from life-friendly ASI solutions. We also provide a model in connection to the Drake equation and life in the cosmos, as well as reasoning around brain-machine interfaces.

A short teaser from the Conclusions section:

In this paper, we have developed a rigorous speculation around a viable path for the development of safe artificial superintelligence by equating intelligence with a set of embodied, local causal entropic forces that maximise future freedom of action, and by postulating top-level goals, boundary condi- tions and auxiliary principles rooted in our best understanding of physical laws as safeguards which are likely to remain as ASI agents increase their sophistication.

While it is almost certain that an ASI agents will replace these boundary conditions and principles, those provided here appear to have higher chance of leading to safe solutions for humans and other lifeforms, and be more directly implementable than the solutions described by research around ethics and critical infrastructure. Our main contention is that constructing ASI agents solely for the sake of human benefit is likely lead to unexpected and possibly catastrophic consequences, and that the safer scenario is to imbue ASI agents with a desire to experience interactions with very advanced forms of intelligence.

I would be happy to share the complete paper via email to those interested.

New to LessWrong?

New Comment