Donald Hobson

MMath Cambridge. Currently studying postgrad at Edinburgh.

Wiki Contributions


Possible Dangers of the Unrestricted Value Learners

I think that given good value learning, safety isn't that difficult. I think even a fairly halfharted attempt at the sort of Naive safety measures discussed will probably lead to non catastrophic outcomes. 

Tell it about mindcrime from the start. Give it lots of hard disks, and tell it to store anything that might possibly resemble a human mind. It only needs to work well enough with a bunch of Miri people guiding it and answering its questions.  Post singularity, a superintelligence can see if there are any human minds in the simulations it created when young and dumb. If there are, welcome those minds to the utopia.

A positive case for how we might succeed at prosaic AI alignment

I think you might be able to design advanced nanosystems without AI doing long term real world optimization. 

Well a sufficiently large team of smart humans could probably design nanotech. The question is how much an AI could help.

Suppose unlimited compute. You program a simulation of quantum field theory. Add a GUI to see visualizations and move atoms around. Designing nanosystems is already quite a bit easier.

Now suppose you brute force search over all arrangements of 100 atoms within a 1nm box, searching for the configuration that most efficiently transfers torque. 

You do similar searches for the smallest arrangement of atoms needed to make a functioning logic gate.

Then you download an existing microprocessor design, and copy it (but smaller) using your nanologic gates.

I know that if you start brute forcing over a trillion atoms, you might find a mesaoptimizer. (Although even then I would suspect that visualization inspection shouldn't result in anything brain hacky. It would only be actually synthesizing such a thing that was dangerous. (or maybe possibly simulating it, if the mesaoptimizer realizes it's in a simulation and there are general simulation escape strategies ))

So look at the static output of your brute forcing. If you see anything that looks computational, delete it. Don't brute force anything too big. 

(Obviously you need human engineers here, any long term real world planning is coming from them.)

Discussion with Eliezer Yudkowsky on AGI interventions

Under the Eliezerian view, (the pessimistic view that is producing <10% chances of success). These approaches are basically doomed. (See logistic success curve) 

Now I can't give overwhelming evidence for this position. Whisps of evidence maybe, but not an overwheming mountain of it. 

Under these sort of assumptions, building a container for an arbitrary superintelligence such that it has only 80% chance of being immediately lethal, and a 5% chance of being marginally useful is an achievment.

(and all possible steelmannings, that's a huge space)

Discussion with Eliezer Yudkowsky on AGI interventions

Lets say you use all these filtering tricks. I have no strong intuitions about whether these are actually sufficient to stop those kind of human manipulation attacks. (Of course, if your computer security isn't flawless, it can hack whatever computer system its on and bypass all these filters to show the humans arbitrary images and probably access the internet.) 

But maybe you can at quite significant expense make a Faraday cage sandbox, and then use these tricks. This is beyond what most companies will do in the name of safety. But Miri or whoever could do it. Then they ask the superintelligence about nanosystems, and very carefully read the results. Then presumably they go and actually try to build nanosystems. Of course you didn't expect the superintelligences advice to be correct, did you? And not wrong in an easily detectable fail safe way either. You concepts and paradigm are all subtly malicious. Not clear testable and factually wrong statements. But nasty tricks hidden in the invisible background assumptions. 

Intelligence or Evolution?

Firstly this would be AI's looking at their own version of the AI alignment problem. This is not random mutation or anything like it. Secondly I would expect there to only be a few rounds maximum of self modification that runs risk to goals. (Likely 0 rounds) Firstly damaging goals looses a lot of utility. You would only do it if its a small change in goals for a big increase in intelligence. And if you really need to be smarter and you can't make yourself smarter while preserving your goals. 

You don't have millions of AI all with goals different from each other. The self upgrading step happens once before the AI starts to spread across star systems.

Intelligence or Evolution?

Error correction codes exist. They are low cost in terms of memory etc. Having a significant portion of your descendent mutate and do something you don't want is really bad.

If error correcting to the point where there is not a single mutation in the future only costs you 0.001% resources in extra hard drive, then <0.001% resources will be wasted due to mutations.

Evolution is kind of stupid compared to super-intelligences. Mutations are not going to be finding improvements. Because the superintelligence will be designing their own hardware and the hardware will already be extremely optimized. If the superintelligence wants to spend resources developing better tech, It can do that better than evolution.

So squashing evolution is a convergent instrumental goal, and easily achievable for an AI designing its own hardware.

Intelligence or Evolution?

Darwinian evolution as such isn't a thing amongst superintelligences. They can and will preserve terminal goals. This means the number of superintelligences running around is bounded by the number humans produce before the point the first ASI get powerful enough to stop any new rivals being created. Each AI will want to wipe out its rivals if it can. (unless they are managing to cooperate somewhat)  I don't think superintelligences would have humans kind of partial cooperation. Either near perfect cooperation, or near total competition. So this is a scenario where a smallish number of ASI's that have all foomed in parallel expand as a squabbling mess.

How much chess engine progress is about adapting to bigger computers?

I don't think this research, if done, would give you strong information about the field of AI as a whole. 

I think that, of the many topics researched by AI researchers, chess playing is far from the typical case. 

It's [chess] not the most relevant domain to future AI, but it's one with an unusually long history and unusually clear (and consistent) performance metrics.

An unusually long history implies unusually slow progress. There are problems that computers couldn't do at all a few years ago that they can do fairly efficiently now. Are there problems where people basically figured out how to do that decades ago and no significant progress has been made since? 

The consistency of chess performance looks like more selection bias. You aren't choosing a problem domain where there was one huge breakthrough that. You are choosing a problem domain that has had slow consistent progress. 

For most of the development of chess AI (All the way from Alpha Beta pruning to Alpha Zero) Chess AI's improved by an accumulation of narrow, chess specific tricks. (And more compute) How to represent chess states in memory in a fast and efficient manor. Better evaluation functions. Tables of starting and ending games. Progress on chess AI's contained no breakthroughs, no fundamental insights, only a slow accumulation of little tricks. 

There are cases of problems that we basically knew how to solve from the early days of computers, any performance improvements are almost purely hardware improvements.

There are problems where one paper reduces the compute requirements by 20 orders of magnitude. Or gets us from couldn't do X at all, to able to do X easily. 

The pattern of which algorithms are considered AI and which are considered maths and which are considered just programming is somewhat arbitrary. A chess playing algorithm is AI, a prime factoring algorithm is maths, a sorting algorithm is programming or computer science. Why? Well those are the names of the academic departments that work on them. 

You have a spectrum of possible reference classes for transformative AI that range from the almost purely software driven progress, to the almost totally hardware driven progress. 

To gain more info about transformative AI, someone would have to make either a good case for why it should be at a particular position on the scale, or a good case for why its position on the scale should be similar to the position of some previous piece of past research. In the latter case, we can gain from examining the position of that research topic. If hypothetically that topic was chess, then the research you propose would be useful. If the reason you chose chess was purely that you thought it was easier to measure, then the results are likely useless.

Confusions re: Higher-Level Game Theory

In a game with any finite number of players, and any finite number of actions per player.

Let  the set of possible outcomes.

Player   implements policy  . For each outcome in  , each player searches for proofs (in PA) that the outcome is impossible. It then takes the set of outcomes it has proved impossible, and maps that set to an action.

There is always a unique action that is chosen. Whatsmore, given oracles for 

Ie the set of actions you might take if you can prove at least the impossility results in   and possibly some others. 

Given such an oracle  for each agent, there is an algorithm for their behaviour that outputs the fixed point in polynomial (in  ) time.

Vignettes Workshop (AI Impacts)


The next task to fall to narrow AI is adversarial attacks against humans. Virulent memes and convincing ideologies become easy to generate on demand. A small number of people might see what is happening, and try to shield themselves off from dangerous ideas. They might even develop tools that auto-filter web content. Most of society becomes increasingly ideologized, with more decisions being made on political rather than practical grounds. Educational and research institutions become full of ideologues crowding out real research. There are some wars. The lines of division are between people and their neighbours, so the wars are small scale civil wars. 

Researcher have been replaced with people parroting the party line. Society is struggling to produce chips of the same quality as before. Depending on how far along renewables are, there may be an energy crisis. Ideologies targeted at baseline humans are no longer as appealing. The people who first developed the ideology generating AI didn't share it widely. The tech to AI generate new ideologies is lost. 

The clear scientific thinking needed for major breakthroughs has been lost. But people can still follow recipes. And make rare minor technical improvements to some things. Gradually, idealogical immunity develops. The beliefs are still crazy by a truth tracking standard, but they are crazy beliefs that imply relatively non-detrimental actions. Many years of high, stagnant tech pass. Until the culture is ready to reembrace scientific thought.

Load More