All of plex's Comments + Replies

Solve Corrigibility Week

I've got a slightly terrifying hail mary "solve alignment with this one weird trick"-style paradigm I've been mulling over for the past few years which seems like it has the potential to solve corrigibility and a few other major problems (notably value loading without Goodharting, using an alternative to CEV which seems drastically easier to specify). There are a handful of challenging things needed to make it work, but they look to me maybe more achievable than other proposals which seem like they could scale to superintelligence I've read.

Realistically I... (read more)

1Logan Riggs Smith6mo
I've updated my meeting times to meet more this week if you'd like to sign up for a slot? (link w/ a pun [https://calendly.com/elriggs/chat]) , and from his comment, I'm sure diffractor would also be open to meeting. I will point out that there's a confusion in terms that I noticed in myself of corrigibility meaning either "always correctable" and "something like CEV", though we can talk that over a call too:)