NaiveTortoise

Posts

Sorted by New

0NaiveTortoise's Short Form Feed

6y

0

Wiki Contributions

Comments

New safety research agenda: scalable agent alignment via reward modeling

NaiveTortoise5y30

Thanks a lot! This definitely clears things up and also highlights the difference between recursive reward modeling and typical amplification/the expert imitation approach you mentioned.

Reply

New safety research agenda: scalable agent alignment via reward modeling

NaiveTortoise5y10

Was anyone else unconvinced/confused (I was charitably confused, uncharitably unconvinced) by the analogy between recursive task/agent decomposition and first-order logic in section 3 under the heading "Analogy to Complexity Theory"? I suspect I'm missing something but I don't see how recursive decomposition is analogous to **alternating** quantifiers?

It's obvious that, at the first level, finding an $x$ that satisfies $ϕ (x)$ is similar to finding the right action, but I don't see how finding $x$ and $y$ that satisfy $\exists x \forall y ϕ (x, y)$ is similar to $A_{2}$ 's solving of one of $A_{1}$ 's decomposed tasks is similar to universal quantification.

To take a very basic example, if I ask an agent to solve a simple problem like, "what is 1+2+3+4?" and the first agent decomposes it into "what is 1+2?", what "what is 3+4?", and "what is the result of '1+2' plus the result of '3+4'?" (this assumes we have some mechanism of pointing and specifying dependencies like Ought's working on), what would this look like in the alternating quantifier formulation?

Reply