AI ALIGNMENT FORUM
AF

645
Jack Gallagher
Ω10226
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
5Asymptotic Decision Theory
9y
2
3A new proposal for logical counterfactuals
9y
3
Evaluating the historical value misspecification argument
gallabytes2y518

To pick a toy example, you can use text as a bottleneck to force systems to "think out loud" in a way which will be very directly interpretable by a human reader, and because language understanding is so rich this will actually be competitive with other approaches and often superior.

I'm sure you can come up with more ways that the existence of software that understands language and does ~nothing else makes getting computers to do what you mean easier than if software did not understand language. Please think about the problem for 5 minutes. Use a clock.

Reply
Evaluating the historical value misspecification argument
gallabytes2y2-18

Historically you very clearly thought that a major part of the problem is that AIs would not understand human concepts and preferences until after or possibly very slightly before achieving superintelligence. This is not how it seems to have gone.

 

Everyone agrees that you assumed superintelligence would understand everything humans understand and more. The dispute is entirely about the things that you encounter before superintelligence. In general it seems like the world turned out much more gradual than you expected and there's information to be found in what capabilities emerged sooner in the process.

Reply93321
Where does ADT Go Wrong?
gallabytes8y00

When considering an embedder F, in universe U, in response to which SADT picks policy π, I would be tempted to apply the following coherence condition:

E[F(π)]=E[F(DDT)]=E[U]

(all approximately of course)

I'm not sure if this would work though. This is definitely a necessary condition for reasonable counterfactuals, but not obviously sufficient.

Reply
A new proposal for logical counterfactuals
gallabytes9y00

By censoring I mean a specific technique for forcing the consistency of a possibly inconsistent set of axioms.

Suppose you have a set of deduction rules D over a language ℓ. You can construct a function fD:P(ℓ)→P(ℓ) that takes a set of sentences S and outputs all the sentences that can be proved in one step using D and the sentences in S. You can also construct a censored f′D by letting f′D(S)={ϕ | ϕ∈fD(S)∧¬ϕ∉S}.

Reply
Programming in Dependent Type Theory
9 years ago
(+252/-70)
Programming in Dependent Type Theory
9 years ago
(+4039)
Type theory
9 years ago
(+37)