If you think nukes, bioweapons, or climate change are existential risks, you think advanced AI is an existential risk.
Nukes, bioweapons, and climate change are “I” risks, i.e., risks from intelligence. All the “A” does is posit that we’ll be able to rapidly and vastly scale up I. If I gives you nukes and bioweapons, AI will give you more of them, faster, plus other things that I would have arrived at in the fullness of time.
That is of course absent some reason to think AI will be differentially defensive technology. And there is some reason to think it will be. The trouble is it is really hard to be confident about this. Where I has produced an unsatisfactory offense-defense balance so far, we should be cautious about scaling it through the ceiling.
I think the argument presented so far establishes a prima facie case for worry, but the details will of course matter and differ a ton from person to person depending on at least these factors:
What’s the nature of [human] intelligence?
Is AI sufficiently analogous to human intelligence?
Are nukes and bioweapons really products of intelligence, or something else?
These are three vast and complex topics that take people in all manner of directions. The one thing I feel confident saying is that we are not confident in any particular direction and that there is ample room for the worrying interpretation of these questions I present below *and others like it.* Given that, risk.
One Story
Evolution is wild. Hindsight bias can make this hard to see. The only rule is that stuff that persists persists. It wasn’t meaningfully preordained that you’d see consciousness, intelligence, multi-cellularity, or life at all. The the early things just happened and the later things just cropped up and stuck. Whatever works.
Persistence is a super simple objective function and it spit out intelligence and self-awareness and abstract values like selfishness, curiosity, fear, and distrust which in turn spit out nukes and bioweapons. So what happens with more complicated objective functions? Do they spit out simpler, more predictable phenomena? I ask because to a nearest approximation, this is where AI comes from: an objective function and a vast space in which mutations try to fit the function.
The main difference seems to be the nature, provenance, and oversight of the function and the mutation process. A lot can be made of these. Evolution is just whatever shakes out from the indifferent, unblinking laws of physics; training AI models involves conscious agents who set up the training and stop it every so often to see whether they’re happy with it and change the parameters accordingly.
Even under those conditions, though, novel attributes can crop up and come to dominate the evolutionary landscape pretty quickly, maybe even between the intervals at which you’ll check on them. Classically, computers can process information vastly faster than biological brains. I’m not sure how to make an apples to apples comparison here, but I’ve seen numbers ranging from 1,000 times faster to 10^10 times faster. If all of evolutionary history occurred over the lifetime of a single human observing it, they could easily miss humans going from the savannah to the internet age with a poorly timed coffee break, much less clocking out for the night.
So if things like self-awareness and self-interest are advantageous – and the reasons they would be seem both straightforward and precedented by our best analogy – we should expect them to appear and possibly to appear quite suddenly. And in line with Yudkowskian worries, that self-interest and self-awareness could center on a wide range of [implicit] values. Maybe the range is narrower than literally all possible values because we’re biasing the process towards things people like, but if training objectives continue to be as broad as next token prediction and the like, this range seems wide indeed.
As the argument goes, values that are at all “human” might be hard to hit, and human values on the whole left us with nukes and bioweapons.
Basically there are two competing forces pushing AI values in respectively positive and negative directions relative to (most) humans:
Humans exerting some, albeit light, control over the training process
The process happening on an alien substrate at much higher speeds than our own evolutionary process
A significant note on the second point is that we’re evaluating everything relative to humans and one of the inherently favorable aspects of the way our present balance of values came to be is that the parts of it we like were in some sense fully present and on the same playing field as the parts of it we don’t like, and so held its worse aspects in check to some degree.
There might be things we like happening in AI training space, sure, but the space feels so much bigger and we, the judges, can’t say confidently that the things we like are present in AI world, at least not nearly as confidently as we can say the things we like are here in the slow, biological world, despite the nukes and bioweapons.
Maybe all this is to say that we well-meaning lovers of peace and beauty are disappointed that the same exact process that created our values created nukes and bioweapons. We should therefore have some serious pessimism about an inherently different intelligence-generating process’s ability to strike the balance better.
And if any of this fragility of values (or even the balance of values) stuff lands with you, the straightforwardness of scaling up AGI once we get there should be worrying. By hypothesis, AGI can do everything we can better than we can, including advocating for its own proliferation, which – if silicon transistor-based – could easily scale far beyond biological intelligence in just a few years. If it is at all plausible or in anyone’s interest to make this scaling happen, the tools will be there and we’ll enter a new, more alien equilibrium which may well preserve, expand, and accelerate some of the darker aspects of the old equilibrium if it doesn’t do something much more alien yet.
Well said, and I agree with all of it.
This line caught my eye, because I have been thinking about it: "I’m not sure how to make an apples to apples comparison here, but I’ve seen numbers ranging from 1,000 times faster to 10^10 times faster."
The bit rate of overt human cognition has recently been estimated at 10 bits per second, based on the input-output bottleneck that humans face when interacting with each other. The background cognitive substrate has a bitrate of about a trillion bits per second, and this might be more relevant to the comparison with AI.
If an AI thinks for two hours about a yes-no question, that is one bit per hour, but it is the background cogitation that is important.
https://www.sciencedirect.com/science/article/abs/pii/S0896627324008080#:~:text=To%20reiterate%3A%20human%20behaviors%2C%20including,that%20same%20rate%20or%20faster.