jungofthewon — AI Alignment Forum

coo @ ought.org. by default please assume i am uncertain about everything i say unless i say otherwise :)

This was really interesting, thanks for running and sharing! Overall this was a positive update for me.

Results are here

I think this just links to PhilPapers not your survey results?

I also appreciated reading this.

This was really helpful and fun to read. I'm sure it was nontrivial to get to this level of articulation and clarity. Thanks for taking the time to package it for everyone else to benefit from.

If anyone has questions for Ought specifically, we're happy to answer them as part of our AMA on Tuesday.

I think we could play an endless and uninteresting game of "find a real-world example for / against factorization."

To me, the more interesting discussion is around building better systems for updating on alignment research progress -

What would it look like for this research community to effectively update on results and progress?
What can we borrow from other academic disciplines? E.g. what would "preregistration" look like?
What are the ways more structure and standardization would be limiting / taking us further from truth?
What does the "institutional memory" system look like?
How do we coordinate the work of different alignment researchers and groups to maximize information value?

Access

Alignment-focused policymakers / policy researchers should also be in positions of influence.

Knowledge

I'd add a bunch of human / social topics to your list e.g.

Policy
Every relevant historical precedent
Crisis management / global logistical coordination / negotiation
Psychology / media / marketing
Forecasting

Research methodology / Scientific “rationality,” Productivity, Tools

I'd be really excited to have people use Elicit with this motivation. (More context here and here.)

Re: competitive games of introducing new tools, we did an internal speed Elicit vs. Google test to see which tool was more efficient for finding answers or mapping out a new domain in 5 minutes. We're broadly excited to structure and support competitive knowledge work and optimize research this way.

This is exactly what Ought is doing as we build Elicit into a research assistant using language models / GPT-3. We're studying researchers' workflows and identifying ways to productize or automate parts of them. In that process, we have to figure out how to turn GPT-3, a generalist by default, into a specialist that is a useful thought partner for domains like AI policy. We have to learn how to take feedback from the researcher and convert it into better results within session, per person, per research task, across the entire product. Another spin on it: we have to figure out how researchers can use GPT-3 to become expert-like in new domains.

We’re currently using GPT-3 for classification e.g. “take this spreadsheet and determine whether each entity in Column A is a non-profit, government entity, or company.” Some concrete examples of alignment-related work that have come up as we build this:

One idea for making classification work is to have users generate explanations for their classifications. Then have GPT-3 generate explanations for the unlabeled objects. Then classify based on those explanations. This seems like a step towards “have models explain what they are doing.”
I don’t think we’ll do this in the near future but we could explore other ways to make GPT-3 internally consistent, for example:
- Ask GPT-3 why it classified Harvard as a “center for innovation.”
- Then ask GPT-3 if that reason is true for Microsoft.
  - Or just ask GPT-3 if Harvard is similar to Microsoft.
- Then ask GPT-3 directly if Microsoft is a “center for innovation.”
- And fine-tune results until we get to internal consistency.
We eventually want to apply classification to the systematic review (SR) process, or some lightweight version of it. In the SR process, there is one step where two human reviewers identify which of 1,000-10,000 publications should be included in the SR by reviewing the title and abstract of each paper. After narrowing it down to ~50, two human reviewers read the whole paper and decide which should be included. Getting GPT-3 to skip these two human processes but be as good as two experts reading the whole paper seems like the kind of sandwiching task described in this proposal.

We'd love to talk to people interested in exploring this approach to alignment!

I generally agree with this but think the alternative goal of "make forecasting easier" is just as good, might actually make aggregate forecasts more accurate in the long run, and may require things that seemingly undermine the virtue of precision.

More concretely, if an underdefined question makes it easier for people to share whatever beliefs they already have, then facilitates rich conversation among those people, that's better than if a highly specific question prevents people from making a prediction at all. At least as much, if not more, of the value of making public, visual predictions like this comes from the ensuing conversation and feedback than from the precision of the forecasts themselves.

Additionally, a lot of assumptions get made at the time the question is defined more precisely, which could prematurely limit the space of conversation or ideas. There are good reasons why different people define AGI the way they do, or the moment of "AGI arrival" the way they do, that might not come up if the question askers had taken a point of view.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments