Code

Environment

May 16 2025, Amsterdam, Netherlands

On cheating – Daniel Kokotajlo, herald of the apocalypse, talking to Ross Douhat (NYT):

‘Douthat: I want to go a little bit deeper on the question of what we mean when we talk about A.G.I., or artificial intelligence wanting something. Essentially, you’re saying there’s a misalignment between the goals they tell us they are pursuing and the goals they’re actually pursuing? Kokotajlo: That’s right.
Douthat: Where do they get the goals they’re actually pursuing? Kokotajlo: Good question. If they were ordinary software, there might be a line of code that’s like: And here we rewrite the goals. But they’re not ordinary software; they’re giant artificial brains. There probably isn’t even a goal slot internally at all, in the same way that in the human brain there’s not some neuron somewhere that represents what we most want in life. Instead, insofar as they have goals, it’s an emergent property of a whole bunch of subcircuitry within them that grew in response to their training environment, similar to how it is for humans.
For example, a call center worker: If you’re talking to a call center worker, at first glance it might appear that their goal is to help you resolve your problem. But you know enough about human nature to know that’s not their only goal, or ultimate goal. However they’re incentivized, whatever their pay is based on might cause them to be more interested in covering their own ass, so to speak, than in truly, actually doing whatever would most help you with your problem. But at least to you, they certainly present themselves as they’re trying to help you resolve your problem.
In “AI 2027,” we talk about this a lot. We say that the A.I.s are being graded on how impressive the research they produce is. Then there’s some ethics sprinkled on top, like maybe some honesty training — but the honesty training is not super effective, because we don’t have a way of looking inside their mind and determining whether they were actually being honest or not. Instead, we have to go based on whether we actually caught them in a lie.
As a result, in “AI 2027,” we depict this misalignment happening, where the actual goals that they end up learning are the goals that cause them to perform best in this training environment — which are probably goals related to success and science and cooperation with other copies of itself and appearing to be good — rather than the goal that we actually wanted, which was something like: Follow the following rules, including honesty at all times; subject to those constraints, do what you’re told.’

(…)

‘Kokotajlo: Right. In the case where they choose the easy fix, it doesn’t really work, it basically just covers up the problem instead of fundamentally fixing it. So months later, you still have A.I.s that are misaligned and pursuing goals they’re not supposed to be pursuing — and that are willing to lie to the humans about it — but now they’re much better and smarter, so they’re able to avoid getting caught more easily. That’s the doom scenario.’

(…)

‘Kokotajlo: I think they’re definitely expecting the human race to be superseded.
Douthat: But superseded in a way where that’s a good thing. That’s desirable, that we are encouraging the evolutionary future to happen. And by the way, maybe some of these people — their minds, their consciousness, whatever else — could be brought along for the ride.’

(…)

‘Kokotajlo: Oh yeah. I’m a huge fan of expanding into space. I think that would be a great idea. And in general, also solving all the world’s problems, like poverty and disease and torture and wars. I think if we get through the initial phase with superintelligence, then obviously, the first thing to do is to solve all those problems and make some sort of utopia, and then to bring that utopia to the stars would be the thing to do.’

Read the complete interview here.

A few observations.

The arms race between AI and mankind is about lying.

AI or no AI, the dream is to colonize space in name of utopia.
No torture, no war, no poverty.

First the war to end all wars.

Then the torture to end all tortures.

After that, colonization of space.

Mankind will be fired. But there is space for exceptions on Mars or elsewhere.

Don’t prepare for oblivion, it’s a waste of time.

discuss on facebook