(2024-06-14) ZviM The Leopold Model Analysis And Reactions

Zvi Mowshowitz: The Leopold Model: Analysis and Reactions. This is a post in three parts. The first part is my attempt to condense Leopold Aschenbrenner’s paper and model into its load bearing elements and core logic and dependencies. Two versions here, a long version that attempts to compress with minimal loss, and a short version that gives the gist. The second part goes over where I agree and disagree, and briefly explains why. The third part is the summary of other people’s reactions and related discussions, which will also include my own perspectives on related issues. (2024-06-01) Leopold Aschenbrenner's AGI Situational Awareness Paper

There is a lot I disagree with. For each subquestion, what would I think here, if the rest was accurate, or a lot of it was accurate?

Summary of Biggest Agreements and Disagreements

I had Leopold review a draft of this post. After going back and forth, I got a much better idea of his positions. They turned out to be a lot closer to mine than I realized on many fronts.

I do not believe America should push to AGI and ASI faster. I am still convinced that advancing frontier models at major labs faster is a mistake and one should not do that. To change that, fixing security issues (so we didn’t fully hand everything over) and power issues (so we would be able to take advantage) would be necessary but not sufficient, due to the disagreements that follow.

Everyone loses if we push ahead without properly solving alignment. Leopold and I agree that this is the default outcome, period. Leopold thinks his plan would at least makes that default less inevitable and give us a better chance, whereas I think the attitude here is clearly inadequate to the task and we need to aim for better.

We also have to solve for various dynamics after solving alignment. I do not see the solutions offered here as plausible

I am also not so confident that governments can’t stay clueless, or end up not taking effective interventions, longer than they can stay governments.

I am far less confident in the tech model that scaling is all you need from here. I have less faith in the straight lines on graphs to continue or to mean in practice what Leopold thinks they would

Decision Theory is Important

I suspect a lot of this is a decision theory and philosophical disagreement.

Eliezer noted that in the past Leopold often seemingly used causal decision theory without strong deontology.

If you read the Situational Awareness paper, it is clear that much of it is written from the primary perspective of Causal Decision Theory (CDT), and in many places it uses utilitarian thinking, although there is some deontology around the need for future liberal institutions.

Thinking in CDT terms restricts the game theoretic options. If you cannot use a form of Functional Decision Theory (FDT) and especially if you use strict CDT, a lot of possible cooperation becomes inevitable conflict. (zero-sum game)

If you think that alignment means you need to end up with an FDT agent rather than a CDT agent to avoid catastrophe, then that makes current alignment techniques look that much less hopeful.

If you think that ASIs will inevitably figure out FDT and thus gain things like the ability to all act as if they were one unified agent, or do other things that seem against their local incentives, then a lot of your plans to keep humans in control of the future or get an ASI to do the things you want and expect will not work.

Part 1: Leopold’s Model and Its Implications

The Long Version

Category 1 (~sec 1-2): Capabilities Will Enable ASI (superintelligence) and Takeoff

The straight lines on graphs of AI capabilities will mostly keep going straight. A lot of the gains come from ‘unhobbling,’ meaning giving the AI new tools that address its limitations, and picking up ‘paradigm expanding’ capabilities.

This level of capabilities is ‘strikingly possible’ by 2027.

Category 2 (~sec 3c): Alignment is an Engineering Problem

Alignment, even superalignment of ASIs, is an ordinary engineering problem.

Solving the problem only requires avoiding a bounded set of failure modes or preventing particular catatsrophic misbehaviors

Alignment is still difficult.

If ‘the good guys’ win the race to ASI and have ‘solved alignment,’ then the future will almost certainly turn out well

Category 3 (~sec 3a): Electrical Power and Physical Infrastructure Are Key

Category 4 (~sec 3b): We Desperately Need Better Cybersecurity

Current cybersecurity at the major AI labs would be trivial for state actors to penetrate if they wanted to do that. And they do indeed want to do that, and do it all the time, and over time will do it more and harder.

Category 5 (~sec 3d and 4): National Security and the Inevitable Conflicts

He who controls the most advanced ASI controls the world

ASI also enables transformational economic growth

Thus the conclusion: The Project is inevitable and it must prevail. We need to work now to ensure we get the best possible version of it. It needs to be competent, to be fully alignment-pilled and safety-conscious with strong civilian control and good ultimate aims. It needs to be set up for success on all fronts.

The Short Version

AI will via scaling likely reach ‘drop in AI researcher’ by 2027, then things escalate quickly. We rapidly get ASI

Which Assumptions Are How Load Bearing in This Model?

The entire picture mostly depends on Category 1.

*If Category 2 is wrong, and alignment or other associated problems are much harder or impossible, but the rest is accurate, what happens?

Oh no.*

Category 3 matters because it is the relevant resource where China most plausibly has an edge over the United States, and it might bring other players into the game.

We would also not have to worry as much about environmental concerns.

Also, if Leopold is right and things escalate this quickly, then we could safely set climate concerns aside during the transition period, and use ASI to solve the problem afterwards

Category 4 determines whether we have a local problem in cybersecurity and other security, and how much we need to do to address this. The overall picture does not depend on it.

Category 5 has a lot of load bearing components, where if you change a sufficient combination of them the correct and expected responses shift radically.

It would be easy to misunderstand what Leopold is proposing.

he is saying something more like this: 0. ASI, how it is built and what we do with it, will be all that matters. 0. ASI is inevitable. 0. A close race to ASI between nations or labs almost certainly ends badly. 0. Our rivals getting to ASI first would also be very bad. 0. Along the way we by default face proliferation and WMDs, potential descent into chaos. 0. The only way to avoid a race is (at least soft) nationalization of the ASI effort. 0. With proper USG-level cybersecurity we can then maintain our lead. 0. *We can then use that lead to ensure a margin of safety during the super risky and scary transition to superintelligence, and to negotiate from a position of strength.

This brings us to part 2.*

Part 2: Where I Agree and Disagree

What about the second version, that (I think) better reflects Leopold’s actual thesis? In short: 0. Yes. 0. Yes on a longer time horizon. I do think it could plausibly be slowed down. 0. Yes. 0. Yes, although to a lesser degree than Leopold if they didn’t get everyone killed. 0. Yes, although I think I worry about this somewhat less than he does. 0. I don’t know. This is the question. Huge if true. 0. Yes, or at least we need to vastly up our game. We do have a lead. 0. I am not convinced by the plan here, but I admit better plans are hard to find.

How likely do we actually get there by 2027? My Manifold market on the drop-in worker by end of 2027 is trading at 33%. There are a bunch of things that can go wrong here even if the first two points hold. End of 2027 is a short deadline. But is it ‘strikingly plausible’? I think yes, this is clearly strikingly plausible.

Part 3: Reactions of Others

The Basics

As always, it is great when people say what they believe, predict and will do. James Payor: Insofar as Leopold is basically naming the OpenAI/Microsoft/Anthropic playbook, I am glad to have that in the public record. I do not trust that Leopold is honest about his intentions and whatnot, and this is visible in the writing imo.

I think parts of this are the lab playbook, especially the tech section, alas also largely the alignment section. Other parts are things those companies would prefer to avoid. Perhaps I am overly naive on this one, but I do not think Leopold is being dishonest.

A Clarification from Eliezer Yudkowsky

It is easy to see, reading Situational Awareness, why Aschenbrenner was not optimistic about MIRI and Yudkowsky’s ideas, or the things they would want funded. These are two diametrically opposed strategies. Both world models have a lot in common, but both think the other’s useful things are not so useful and the counterproductive actions could be quite bad.

Children of the Matrix

Many questioned Leopold’s metaphor of using childhood development as a stand-in for levels of intelligence. I think Leopold’s predictions on effective capabilities could prove right, but that the metaphor was poor, and intelligence does need to be better defined.

Aligning a Smarter Than Human Intelligence is Difficult

Seriously. Super hard. Way harder than Leopold thinks.

The Sacred Timeline

The Need to Update

Open Models and Insights Can Be Copied

You Might Not Be Paranoid If They’re Really Out to Get You

We Are All There Is

If there is one place I am in violent agreement with Leopold, it is that there are no reasonable authority figures. Someone has to, and no one else will.

Patrick McKenzie: I cannot possibly underline this paragraph enough. Leopold Aschenbrenner: But the scariest realization is that there is no crack team coming to handle this. As a kid you have this glorified view of the world, that when things get real there are the heroic scientists, the uber- competent military men, the calm leaders who are on it, who will save the day. It is not so. The world is incredibly small; when the facade comes off, it's usually just a few folks behind the scenes who are the live players, who are desperately trying to keep things from falling apart. Patrick McKenzie: We, for all possible values of “we”, are going to need to step up.

The Inevitable Conflict

The biggest risk in Leopold’s approach, including his decision to write the paper, is that it is a CDT (causal decision theory)-infused approach that could wake up all sides and make cooperation harder, and thus risks causing the very crisis it predicts and wants to prevent.

There Are Only Least Bad Options

A Really Big Deal

What Gives You the Right?

What about the fact that people pretty much everywhere would vote no on all this?

If you educated them they would still vote no. Sure, if you knew and could prove glorious future, maybe then they would vote yes, but you can’t know or prove that.

The standard answer is that new technology does not require a vote. If it did, we would not have our civilization

Random Other Thoughts

Peter Bowden notices the lack of talk about sentience and requests Leopold’s thoughts on the matter. I think it was right to not throw that in at this time, but yes I am curious.


Edited:    |       |    Search Twitter for discussion