I've been accepted as a mentor for the next AI Safety Camp. You can apply to work with me on the tiling problem. The goal will be to develop reflectively consistent UDT-inspired decision theories, and try to prove tiling theorems for them.
The deadline for applicants is November 17.
The program will run from January 11 to April 27. It asks for a 10 hour/week commitment.
I am not being funded for this.[1] You can support my work on Patreon.
My project description follows:
The Tiling Agents problem (aka reflective consistency) consists of analysing when one agent (the "predecessor") will choose to deliberately modify another agent (the "successor"). Usually, the predecessor and successor are imagined as the same agent across time, so we are studying self-modification. A set of properties "tiles" if those...
A simple, weak notion of corrigibility is having a "complete" feedback interface. In logical induction terms, I mean the AI trainer can insert any trader into the market. I want to contrast this with "partial" feedback, in which only some propositions get feedback and others ("latent" propositions) form the structured hypotheses which help predict the observable propositions -- for example, RL, where only rewards and sense-data is observed.
(Note: one might think that the ability to inject traders into LI is still "incomplete" because traders can give feedback on the propositions themselves, not on other traders; so the trader weights constitute "latents" being estimated. However, a trader can effectively vote against another trader by computing all that trader's trades and counterbalancing them. Of course, we can also more...
Yeah, I totally agree. This was initially a quick private message to someone, but I thought it was better to post it publicly despite the inadequate explanations. I think the idea deserves a better write-up.
This post is a follow-up to "why assume AGIs will optimize for fixed goals?". I'll assume you've read that one first.
I ended the earlier post by saying:
[A]gents with the "wrapper structure" are inevitably hard to align, in ways that agents without it might not be. An AGI "like me" might be morally uncertain like I am, persuadable through dialogue like I am, etc.
It's very important to know what kind of AIs would or would not have the wrapper structure, because this makes the difference between "inevitable world-ending nightmare" and "we're not the dominant species anymore." The latter would be pretty bad for us too, but there's a difference!
In other words, we should try very hard to avoid creating new superintelligent agents that have the "wrapper structure."
What about...
I continue to think this is a great post. Part of why I think that is that I haven't forgotten it; it keeps circling back into my mind.
Recently this happened and I made a fun connection: What you call wrapper-minds seem similar to what Plato (in The Republic) calls people-with-tyrannical-souls. i.e. people whose minds are organized the way a tyrannical city is organized, with a single desire/individual (or maybe a tiny junta) in total control, and everything else subservient.
I think the concepts aren't exactly the same though -- Plato would have put more e...
Subhash and Josh are co-first authors. Work done as part of the two week research sprint in Neel Nanda’s MATS stream
Fluid interfaces for sensemaking at pace with AI development.
Disclaimer: this is both an invitation and something of a reference doc. I've put the more reference-y material in collapsible sections at the end.
This is an extended rendezvous for creating culture and technology, for those enlivened by reimagining AI-based interfaces.
Specifically, we’re creating for the near future where our infrastructure is embedded with realistic...
We (Connor Leahy, Gabriel Alfour, Chris Scammell, Andrea Miotti, Adam Shimi) have just published The Compendium, which brings together in a single place the most important arguments that drive our models of the AGI race, and what we need to do to avoid catastrophe.
We felt that something like this has been missing from the AI conversation. Most of these points have been shared before, but a “comprehensive worldview” doc has been missing. We’ve tried our best to fill this gap, and welcome feedback and debate about the arguments. The Compendium is a living document, and we’ll keep updating it as we learn more and change our minds.
We would appreciate your feedback, whether or not you agree with us:
From footnote 2 to The state of AI today:
GPT-2 cost an estimated $43,000 to train in 2019; today it is possible to train a 124M parameter GPT-2 for $20 in 90 minutes.
Isn't $43,000 the estimate for the 1.5B replication of GPT-2 rather than for the 124M? If so, this phrasing is somewhat misleading. We only need $250 even for the 1.5B version, but still.
How would you become confident that a UANFSI approach was NFSI?