DanielFilan

Comments

ricraz's Shortform

Perhaps the lesson is that terminology that is acceptable in one field (in this case philosophy) might not be suitable in another (in this case machine learning).

ricraz's Shortform

Well now I'm less sure that it's incorrect. I was originally imagining that like in Solomonoff induction, the TMs basically directly controlled AIXI's actions, but that's not right: there's an expectimax. And if the TMs reinforce actions by shaping the rewards, in the AIXI formalism you learn that immediately and throw out those TMs.

ricraz's Shortform

Also, presumably many of the component Turing machines have reusable parameters and reinforce behaviour, altho this is hidden by the formalism.

Actually I think this is total nonsense produced by me forgetting the difference between AIXI and Solomonoff induction.

ricraz's Shortform

I kind of think the lack of episodes makes it more realistic for many problems, but admittedly not for simulated games. Also, presumably many of the component Turing machines have reusable parameters and reinforce behaviour, altho this is hidden by the formalism. [EDIT: I retract the second sentence]

Some AI research areas and their relevance to existential safety

This comment is heavily informed by the perspectives that I understand to be advanced in the books The Myth of the Rational Voter, that democracies often choose poor policies because it isn't worth voters' time and effort to learn relevant facts and debias themselves, and The Problem of Political Authority, that democratic governance is often unjust, altho note that I have read neither book.

I also apologize for the political nature of this and the above comment. However, I don't know how to make it less political while still addressing the relevant parts of the post. I also think that the post is really great and thank Critch for writing it, despite the negative nature of the above comment.

Some AI research areas and their relevance to existential safety

One frustration I have with the piece is that I read it as broadly in favour of the empirical distribution of governance demands. The section in the introduction talks of the benefits of legitimizing and fulfilling governance demands, and merely focussing on those demands that are helpful for existential safety. Similarly, I read the section on accountability in ML as broadly having a rhetorical stance that accountability is by default good, altho the recommendation to "help tech company employees and regulators to reflect on the principle of accountability and whether tech companies themselves should be more subject to it at various scales" would, if implemented literally, only promote the forms of accountability that are in fact good.

I'm frustrated by this stance that I infer the text to be taking, because I think that many existing and likely demands for accountability will be unjust and minimally conducive to existential safety. One example of unjust and ineffective accountability is regulatory capture of industries, where regulations tend to be overly lenient for incumbent players that have 'captured' the regulator and overly strict for players that might enter and compete with incumbents. Another is regulations of some legitimate activity by people uninformed about the activity and uninterested in allowing legitimate instances of the activity. My understanding is that most people agree that either regulation of abortions in many conservative US states or regulation of gun ownership in many liberal US states falls into this category. Note my claim is not that there are no legitimate governance demands in these examples, but that actual governance in these cases is unjust and ineffective at promoting legitimate ends, because it is not structured in a way that tends to produce good outcomes.

I am similarly frustrated by this claim:

The European General Data Protection Regulation (GDPR) is a very good step for regulating how tech companies relate with the public. I say this knowing that GDPR is far from perfect. The reason it's still extremely valuable is that it has initialized the variable defining humanity's collective bargaining position (at least within Europe...) for controlling how tech companies use data.

I read this as conflating European humanity with the European Union. I think the correct perspective to take is this: corporate boards keep corporations aligned with some aspects of some segment of humanity, and EU regulation keeps corporations aligned with different aspects of a different segment of humanity. Instead of thinking of this as a qualitative change from 'uncontrolled by humanity' to 'controlled by European humanity', instead I would rather have this be modelled as a change in the controlling structure, and have attention brought to bear on whether the change is in fact good.

Now, for the purpose of enhancing existential safety, I think it likely that any way of growing the set of people who can demand that AI corporations act in a way that serves those people's interests is better than governance purely by a board or employees of the company, because preserving existential safety is a broadly-held value, and outsiders may not be subject to as much bias as insiders about how dangerous the firm's technology is. Indeed, an increase in the size of this set by several orders of magnitude likely causes a qualitative shift. Nevertheless, I don't think there is much reason to think that the details of EU regulation is likely to be closely aligned with the interests of Europeans, and if the GDPR is valuable as a precedent to ensure that the EU can regulate data use, the alignment of the details of this data use is of great importance. As such, I think the structure of this governance is more important to focus on than the number taking part in governance.

In summary:

  • I hope that technical AI x-risk/existential safety researchers focus on legitimizing and fulfilling those governance and accountability demands that are in fact legitimate.
  • I hope that discussion of AI governance and accountability does not inhabit a frame in which demands for governance and accountability are reliably legitimate.
Some AI research areas and their relevance to existential safety

Regarding the two ways you enumerate in which AI alignment could serve to further existential safety, I think a third, more viable, way is missing:

AI alignment solutions allow humans to build powerful AI systems that behave as planned without compromising existential safety.

I presume that it is desirable to build powerful AI systems - either to do object-level useful things, or to help humanity regulate other AI systems. There is a family of arguments that I associate with Bostrom and Yudkowsky that it is difficult to align such powerful AI systems that are aligned with what their creator wants them to do, either for 'outer alignment' reasons of difficulty in objective specification, or for 'inner alignment' reasons of inherent difficulties in optimization. This family of arguments also advances the idea that such alignment failures can have consequences that compromise existential safety. If you believe these arguments, then it appears to me that AI alignment solutions are necessary, but not sufficient, for existential safety.

DanielFilan's Shortform Feed

I do not have many ideas here, so it might mostly be me talking about the category-theoretic definition of products and co-products.

DanielFilan's Shortform Feed

'Seminar' announcement: me talking quarter-bakedly about products, co-products, deferring, and transparency. 3 pm PT tomorrow (actually 3:10 because that's how time works at Berkeley).

I was daydreaming during a talk earlier today (my fault, the talk was great), and noticed that one diagram in Dylan Hadfield-Menell's off-switch paper looked like the category-theoretic definition of the product of two objects. Now, in category theory, the 'opposite' of a product is a co-product, which in set theory is the disjoint union. So if the product of two actions is deferring to a human about which action to take, what's the co-product? I had an idea about that which I'll keep secret until the talk, when I'll reveal it (you can also read the title to figure it out). I promise that I won't prepare any slides or think very hard about what I'm going to say. I also won't really know what I'm talking about, so hopefully one of you will. The talk will happen in my personal zoom room. Message me for the passcode.

Time in Cartesian Frames

Yosef can observe for the purpose of deciding the first digit, but can't observe for the purpose of deciding the third digit.

Am I missing something, or should this be the other way around? Intuitively, I'd think that it makes sense that Yosef can observe the second digit when choosing the third, but not when choosing the first.

Load More