Tomáš Gavenčiak — AI Alignment Forum

AI ALIGNMENT FORUM
AF

Apply now to Human-Aligned AI Summer School 2025

Our speaker lineup is now confirmed and features an impressive roster of researchers, see the updated list above.

As of late June, we still have capacity for additional participants, and we'd appreciate your help with outreach! We have accepted a very strong cohort so far, and the summer school will be a great place to learn from our speakers, engage in insightful discussions, and connect with other researchers in AI alignment. In addition to people already familiar with AI alignment, we're particularly interested in reaching technical students and researchers who may be newer to AI alignment but bring strong backgrounds in ML/AI, mathematics, computer science, or related fields. e also have some additional funding to financially support some of the participants.

See the website for details.

Cyborg Periods: There will be multiple AI transitions

Tomáš Gavenčiak3y10

The transitions in more complex, real-world domains may not be as sharp as e.g. in chess, and it would be useful to model and map the resource allocation ratio between AIs and humans in different domains over time. This is likely relatively tractable and would be informative for prediction of future development of the transitions.

While the dynamic would differ between domains (not just the current stage but also the overall trajectory shape), I would expect some common dynamics that would be interesting to explore and model.

A few examples of concrete questions that could be tractable today:

What fraction of costs in quantitative trading is expert analysts and AI-based tools? (incl. their development, but perhaps not including e.g. basic ML-based analytics)
What fraction of costs is already used for AI assistants in coding? (not incl. e.g. integration and testing costs - these automated tools would point to an earlier transition to automation that is not of main interest here)
How large fraction of costs of PR and advertisement agencies is spent on AI, both facing customers and influencing voters? (may incl. e.g. LLM analysis of human sentiment, generating targeted materials, and advanced AI-based behavior models, though a finer line would need to be drawn; I would possibly include experts who operate those AIs if the company would not employ them without using an AI, as they may incur significant part of the cost)

While in many areas the fraction of resources spent on (advanced) AIs is still relatively small, it is ramping up quite quickly and even those may provide informative to study (and develop methodology and metrics for, and create forecasts to calibrate our models).

Cyborg Periods: There will be multiple AI transitions

Tomáš Gavenčiak3y31

Seeing some confusion on whether AI could be strictly stronger than AI+humans: A simple argument there may be that - at least in principle - adding more cognition (e.g. a human) to a system should not make it strictly worse overall. But that seems true only in a very idealized case.

One issue is incorporating human input without losing overall performance even in situation when the human's advice is much wore than the AI's in e.g. 99.9% of the cases (and it may be hard to tell apart the 0.1% reliably).

But more importantly, a good framing here may be the optimal labor cost allocation between AIs and Humans on a given task. E.g. given a budget of $1000 for a project:

Human period: optimal allocation is $1000 to human labor, $0 to AI. (Examples: making physical art/sculpture, some areas of research^[1])
Cyborg period: optimal allocation is something in between, and neither AI nor human optimal component would go to $0 even if their price changed (say) 10-fold. (Though the ratios here may get very skewed at large scale, e.g. in current SotA AI research lab investments into compute.)
AI period: optimal allocation of $1000 to AI resources. Moving the marginal dollar to humans would make the system strictly worse (whether for drop in overall capacity or for noisiness of the human input).^[2]

^{^}
This is still not a very well-formalized definition as even the artists and philosophers already use some weak AIs efficiently in some part of their business, and a boundary needs to be drawn artificially around the core of the project.
^{^}
Although even in AI period with a well-aligned AI, the humans providing their preferences and feedback are a very valuable part of the system. It is not clear to me whether to include this in cyborg or AI period.

Announcing the Alignment of Complex Systems Research Group

Tomáš Gavenčiak4y90

The concept of "interfaces of misalignment" does not mainly point to GovAI-style research here (although it also may serve as a framing for GovAI). The concrete domains separated by the interfaces in the figure above are possibly a bit misleading in that sense:

For me, the "interfaces of misalignment" are generating intuitions about what it means to align a complex system that may not even be self-aligned - rather just one aligning part of it. It is expanding not just the space of solutions, but also the space of meanings of "success". (For example, one extra way to win-lose: consider world trajectories where our preferences are eventually preserved and propagated in a way that we find repugnant now but with a step-by-step endorsed trajectory towards it.)

My critique of the focus on "AI developers" and "one AI" interface in isolation is that we do not really know what the "goal of AI alignment" is, and it works with a very informal and a bit simplistic idea of what aligning AGI means (strawmannable as "not losing right away").

While a broader picture may seem to only make the problem strictly harder (“now you have 2 problems”), it can also bring new views of the problem. Especially, new views of what we actually want and what it means to win (which one could paraphrase as a continuous and multi-dimensional winning/losing space).

The Solomonoff Prior is Malign

Tomáš Gavenčiak5y30

Complexity indeed matters: the universe seems to be bounded in both time and space, so running anything like Solomonoff prior algorithm (in one of its variants) or AIXI may be outright impossible for any non-trivial model. This for me significantly weakens or changes some of the implications.

A Fermi upper bound of the direct Solomonoff/AIXI algorithm trying TMs in the order of increasing complexity: even if checking one TM took one Planck time on one atom, you could only check cca 10^250=2^800 machines within a lifetime of the universe (~10^110 years until Heat death), so the machines you could even look at have description complexity a meager 800 bits.

You could likely speed the greedy search up, but note that most algorithmic speedups do not have a large effect on the exponent (even multiplying the exponent with constants is not very helpful).
Significantly narrowing down the space of TMs to a narrow subclass may help, but then we need to take look at the particular (small) class of TMs rather than have intuitions about all TMs. (And the class would need to be really narrow - see below).
Due to the Church-Turing thesis, any limiting the scope of the search is likely not very effective, as you can embed arbitrary programs (and thus arbitrary complexity) in anything that is strong enough to be a TM interpreter (which the universe is in multiple ways).
It may be hypothetically possible to search for the "right" TMS without examining them individually (witch some future tech, e.g. how sci-fi imagined quantum computing), but if such speedup is possible, any TMs modelling the universe would need to be able to contain this. This would increase any evaluation complexity of the TMs, making them more significantly costly than the Planck time I assumed above (would need a finer Fermi estimate with more complex assumptions?).

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

Posts

Wikitag Contributions

Comments