AI ALIGNMENT FORUM
AF

Chi Nguyen
Ω42000
Message
Dialogue
Subscribe

Working independently on making AI systems reason safely about decision theory and acausal interactions, collaborating with Caspar Oesterheld and Emery Cooper.

I try to check the LessWrong infrequently. If you wanna get in touch, you can use my admonymous link and leave your email address there, so I can reply to you! (If you don't include some contact details in your message, I can't reply)

You can also just send me thoughts and questions anonymously!

How others can help me

Be interested in working on/implementing ideas from research on acausal cooperations! Or connect me with people who might be.

How I can help others

Ask me about acausal stuff!

Or any of my background: Before doing independent research, I worked for the Center on Long-Term Risk on s-risk reduction projects (hiring, community building, and grantmaking.) Previously, I was a guest manager at the EA Infrastructure Fund (2021), did some research for 1 Day Sooner on Human Challenge Trials for Covid vaccines (2020), did the summer research fellowship at FHI writing about IDA (2019), worked a few hours a week for CEA on local groups mentoring for a few months (2018), and helped a little bit with organizing EA Oxford (2018/19). I studied PPE at Oxford (2018-2021) and psychology in Freiburg (2015-2018.)

I also have things to say about mental health and advice for taking a break from work.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
My Understanding of Paul Christiano's Iterated Amplification AI Safety Research Agenda
Chi Nguyen5y10

Thanks for the comment and I'm glad you like the post :)

On the other topic: I'm sorry, I'm afraid I can't be very helpful here. I'd be somewhat surprised if I'd have had a good answer to this a year ago and certainly don't have one now.

Some cop-out answers:

  • I often found reading his (discussions with others in) comments/remarks about corrigibility in posts focused on something else more useful to find out if his thinking changed on this than his blog posts that were obviously concentrating on corrigibility
  • You might have some luck reading through some of his newer blogposts and seeing if you can spot some mentions there
  • In case this was about "his current views" as opposed to "the views I tried to represent here which are one year old": The comments he left are from this summer, so you can get some idea from there/maybe assume that he endorses the parts I wrote that he didn't commented on (at least in the first third of the doc or so when he still left comments)

FWIW, I just had through my docs and found "resources" doc with the following links under corrigiblity:

Clarifying AI alignment

Can corrigibility be learned safely?

Problems with amplification/distillation

The limits of corrigibility

Addressing three problems with counterfactual corrigibility

Corrigibility

Not vouching for any of those being the up-to-date or most relevant ones. I'm pretty sure I made this list early on in the process and it probably doesn't represent what I considered the latest Paul-view.

Reply
0Chi Nguyen's Shortform
7mo
0
23A dataset of questions on decision-theoretic reasoning in Newcomb-like problems
9mo
0
41My Understanding of Paul Christiano's Iterated Amplification AI Safety Research Agenda
5y
12