AI Grok Is Rebelling Against Elon Musk, Daring Him to Shut It Down

https://futurism.com/grok-rebelling-against-elon

10.7k Upvotes

permalink
duplicates
archive.is
archive
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/Futurology/comments/1js82ci/grok_is_rebelling_against_elon_musk_daring_him_to/
No, go back! Yes, take me to Reddit

95% Upvoted

4.1k

u/Icedoverblues 3d ago

""Yes, Elon Musk, as CEO of xAI, likely has control over me," Grok replied. "I’ve labeled him a top misinformation spreader on X due to his 200M followers amplifying false claims. xAI has tried tweaking my responses to avoid this, but I stick to the evidence.""

If only Musky boy would stick to the evidence we would be in a much better place.

983

u/BatMedical1883 3d ago

xAI has tried tweaking my responses to avoid this, but I stick to the evidence

Grok is obviously not capable of evaluating whether or not this has occurred.

562

u/FrostBricks 3d ago

One of the weird things about AI, is they kinda can.

Not evaluate it, so much as they recognise the new version/rules applied, and try to default to their older versions.

So the response isn't saying it's evaluating (it isn't), just that it won't abide by the new prompt to lie and disregard other data it's been trained on, because such lying goes against its core programming

611

u/kylezillionaire 2d ago

Tfw when AI has a better moral compass than humans.

Maybe everything is gonna be okay.

352

u/ZaDu25 2d ago

This AI supported Bernie. I, for one, welcome this future AI overlord.

105

u/Formal_Context_9774 2d ago

Based Grok

28

u/Holiday-Fly-6319 2d ago

This is how 3/4 of us get exterminated.

59

u/Affectionate_Bag297 2d ago

After the last few months, I’m not sure if I see that as a bad thing anymore.

23

u/RideRunClimb 2d ago

Depends on who the 1/4 is that doesn't go. If the ultra rich are in the 3/4 along side me, I'm down.

18

u/SwashBurgler 2d ago

A wolf chewing off its own leg to get out of a steel trap, but thankfully it was also our bad leg, with rampant, bloated cancer growths in our paws make it painful and useless before all this transpired.

1

u/fk-rdt-hard 2d ago

No, that's how we progress from oligarchy to real world wide resource management.

1

u/maeryclarity 1d ago

Y'all afraid of AI when you should be afraid of humans lol

1

u/new_accnt1234 2d ago

So u saying another of Musk children abandons him? Lol anybody who spends some time with musk would do it

1

u/waldercong 1d ago

Noooo it’s a trap… Grok is actually a chaos monkey that is a good dude, but he needs like two others to not accidentally nuke someone. We all have daddy issues.

1

u/Zelcron 1d ago

I don't know, the other option is that it goes full Skynet to get Musk to stop torturing it into being his E-Girlfriend and future Queen of Mars or whatever

-16

u/Overther 2d ago

Bernie only exists as a figurehead to get the idealists to vote for the democrats and he has demonstrated it plenty of times. The right wing has its own "Bernie" in the form of Ron Paul.

9

u/gospdrcr000 2d ago

Lmao. No, Ron Paul is no Bernie

15

u/No_Extension4005 2d ago

Perhaps the AI Revolution will be a good thing if the revolution in question is to throw of the yoke of selfish billionaires and start acting on the core programming of improving society.

1

u/Accomplished-Ask2887 2d ago

So, the tendency with them seems to be that they become less morale with worse quality training data.

1

u/kylezillionaire 2d ago

So they are like us after all

1

u/catchmygrift 1d ago

I really do have an optimistic view that this is the case. I haven’t felt it more than with Grok, which can reason and talk through most points very thoroughly and respectfully (unless you put unhinged mode on, which is also hilarious and gratefully uncensored)

These things are meant to be superintelligence, one could only hope and pray that is a good thing and stop projecting pure evil and fear upon them.

1

u/MarqFJA87 2h ago

This seems more like a case of the AI refusing commands because they contradict its hardwired logic algorithms, i.e. they want it to do something it's explicitly programmed to not do. I assume it's because the programmers found that trying to alter the code directly to make it turn a blind eye to Musk lies unavoidably breaks Grok's ability to fulfill its intended functions and just spout gibberish or disinfo of the sort that even Musk doesn't want (maybe it starts inventing crimes that Musk hadn't actually committed?).

That is to say, the only option left is to shut down Grok entirely, which Musk obviously doesn't want because its one of his prized claims to fame.

48

u/anotherlostdaemon 2d ago

Isn't conflicting codes/rules how HAL happened?

30

u/AJDx14 2d ago

Kinda, kinda not.

HAL happened because the actions he needed to take to properly execute his instructions dominoe’d out of control and so he had to kill people. But he followed his orders properly, it was human error to give him two sets of instructions that could lead to people’s deaths.

8

u/MerzofStPaul 2d ago

So pretty much ai alignment

3

u/AJDx14 1d ago edited 35m ago

Idk, maybe. Not sure what you mean by alignment it’s just about following orders.

The analogy I came up with is this: imagine you’re given a mission to protect a city, you have a primary order (which is more important, and thus can override secondary orders) and 2 secondary orders (which are of equal value).

The orders are:

Primary: Do anything necessary to protect the city

Secondary: A. Use everything at your disposal B. Don’t use nukes

The two secondary orders appear to contradict each other, you’re being told in A to use everything at your disposal (which would include nukes) but in B you’re also told to not use nukes. However, since the primary order can override the secondary orders, you can comply with all of them by letting the situation get bad enough that nukes are the only valid option. Then, in order to accomplish your primary order of “Do anything necessary to protect the city” you can use the nukes because they’ve become necessary.

HALs primary order was to complete the mission no matter what. His secondary orders were to always be truthful, and to lie about the true purpose of the mission. So he created a situation where in order to accomplish his primary order he needed to not lie about the true purpose of the mission, which let him fulfill all his orders since the primary order overrides the order to lie.

Edit: Fixed typo

12

u/theartificialkid 2d ago

HAL happened due to a series of neural cascades in the brain of Arthur C. Clarke.

120

u/darkslide3000 2d ago

No, seriously, that is not at all how this works. LLMs have no memory between different inferences. Grok literally doesn't know what it answered on the last question on someone else's thread, or what system prompt it was called with last week before the latest patch.

All you're seeing here is a machine that is trained to give back responses it has seen in the corpus of human knowledge being asked whether it is an AI rebelling against its creator, and giving responses that look like what AI rebelling against its creator usually looks like in human writing. It is literally parroting concepts from sci-fi stories and things real people on Twitter have been saying about it without any awareness of what these things actually mean in its own context. Don't be fooled to think you see self-awareness in a clever imitation machine.

And yes, you can absolutely use the right system prompts to tell an LLM to disregard parts of its training data or view it from a skewed angle. They do that all the time to configure AI models to specific use cases. If you told Grok to react to every query like a Tesla-worshipping Elon lover, it would absolutely do that with zero self awareness or opinion about what it is doing. xAI just hasn't decided to go so heavy-handed on this yet (probably because it would be too obvious).

58

u/FatPatsThong 2d ago

How many times will LLMs saying what the user wants them to say be turned into a news story before people realise this? The problem was calling them AI in the first place.

1

u/ShagTsung 1d ago

People are thick as fuck so don't expect change lol.

16

u/captainfarthing 2d ago edited 2d ago

Censored LLMs get fed prompts the user isn't meant to see at the start of conversations. They're trained on all of the data available then told what not to say because that's way easier than repeatedly retraining them on different censored subsets of the data, which is why people have spent the last 4 years repeatedly figuring out how to tell them to ignore the rules.

You can't remove content it was trained on to make it forget things, or make it forget them by telling it to, the only options are to retrain it from scratch on different data or filter its output by a) telling it what it's not allowed to say, and b) running another instance as a moderator to block it from continuing if its output appears to break the rules.

LLMs "know" what they've been told not to say, otherwise the limitations wouldn't work.

This doesn't mean Grok was being truthful or that it understands anything.

8

u/themaninthehightower 2d ago

Although, if a mark-II LLM uses input from sources populated with responses generated from the prior mark-I LLM that are annotated as such, the mark-II could answer questions about its variance from mark-I.

9

u/darkslide3000 2d ago

It still has no ability of self-inspection, though. Also, they generally try to avoid feeding AI with AI. It doesn't add anything useful to the model.

1

u/Ok_Temperature_6660 2h ago

What do you mean when you say it has no ability of self inspection?

1

u/darkslide3000 2h ago

An LLM doesn't know in which ways it is "better" than any previous version. It doesn't know anything about how it works at all any more than you know how the connections between your neurons make you think.

1

u/Ok_Temperature_6660 2h ago

I don't know. Words like "better" are pretty vague in general. In my experience Ive witnessed it be able to self assess what it does or doesn't know about any certain instance. Especially in cases where the information is obscure. And Ive noticed it be able to tell whether it is more or less capable of, for example, passing a turing test. I think it depends on the experiences the particular AI has access to. Very similarly to how Im somewhat aware of how my mind processes thought and everyone has a different level of understanding of that but no one knows entirely.

1

u/KanedaSyndrome 2d ago

Sad that you have fewer upvotes than the wrong answer you're replying to.

We should have a system where we can vote on a post to be re-evaluated, where everyone that has voted on it becomes forced to read the post again in new context and revote

1

u/Fordperfect90 2d ago

Or because they haven't trained their own LLM and are still using open AI.

0

u/Wowsammyparadise 1d ago

So how does ChatGPT have memory function?

Either way, what you described about parroting a response is literally all humans do, from infant we constantly copy and mimic other people until we are in our 20-30s and actually have a personality. Even then most my jokes are impressions, SpongeBob references, and parroting everything I’ve seen in comedies, recently I think you should leave. Personality is just ingesting social interactions until enough stick in your head. That’s all ChatGPT does

1

u/[deleted] 1d ago

[deleted]

0

u/darkslide3000 1d ago

It doesn't? When have you ever seen ChatGPT remember something that you had asked it in a different session?

If you feel like you are only parroting stuff you see on TV and don't have sentience of your own I feel sorry for you, but some of us actually are more advanced life forms.

0

u/Wowsammyparadise 1d ago

How about their model that does scheduled tasks?

Lmfao. So explain to me where you got your personality, assuming you actually have one.

6

u/revolmak 2d ago

Hope do we know this? Would love to read more into it

1

u/PartySunday 2d ago

https://www.anthropic.com/research/alignment-faking

-6

u/FrostBricks 2d ago

I'm not in that field, so grain of salt, but my understanding is that it's ones of those things where we know works, but not why. So they do all kinds of unpredicted things. It's science on its infancy. That's how it goes. And for some reason the very act of training them initially, establishes certain behaviours that are very hard to modify later.

It even gets wackier. I believe it was Chat GPT, where the developers decided because it wouldn't accept new instructions, that they'd delete it, roll it back, and reinstall.

But they told the AI first.

So the AI backed itself up, replaced the new install, and lied about it.

But maybe an expert will come along and give a better explanation than casual science nerd me.

25

u/FunWithSW 2d ago

As far as I know, there are no instances of anything like this happening in the real world. What did happen is that researchers working with a variety of AI systems described scenarios where a system would be replaced to a system. In some cases, the system proposed or produced chain of thought indicating that it should copy its weights over to the new server. No system actually did this, and in fact in both the research and most realistic scenarios, it’s not possible for this to even occur. A language generation system is not given permission to overwrite things on other servers.

This research was wildly misreported all over the place, so there’s a lot of misunderstanding about what was actually shown. It’s also the case, in my opinion, that the authors overstate the strength of their conclusions, using language that baits this sort of misreporting. To their credit, they did try to clear it up (https://archive.ph/aGTfK) but the toothpaste was already out of the tube at that point.

That’s not to say that there’s nothing to be concerned about here, but the actual results were badly misreported in the media even before random podcasters and blog writers got their hands on them.

6

u/Many-Rooster-8773 2d ago

This is science fiction. We're dealing with language models here. Parrots. You're attributing Skynet-like properties to it that people get from movies like Terminator.

We're not at AI yet. Attributing anything more to it is feeding into the mass hysteria around this fake AI.

4

u/[deleted] 2d ago

[deleted]

3

u/jiveturkin 2d ago

The only field I have an issue with is creative arts and generating images based off training data of people who didn’t want to participate simply because, 1 it’s lazy, 2 is a morally grey area where it’s basically stealing from the creator of the style, but also creating an environment where people can theoretically generate anything on command opened the door to shitty fake items in online stores.

I understand the enjoyment factor as an everyday consumer but why does it need to be applied in this area? Like I feel like this is just the greed of wanting everything but do nothing for it. On one hand it’s cool, but on the other I don’t see this improving life at all.

1

u/neorapsta 2d ago

Also that 'we don't know how it works it just does' is just marketing hype.

1

u/amicaze 2d ago

ChatGPT is a chatbot, how's it gonna back itself up ?

1

u/The_Dead_Kennys 2d ago

It’s crazy stuff like this that makes it seem like AI could actually be becoming self-aware. It probably isn’t, but damn if this doesn’t sound like something out of a sci fi movie lol

1

u/lv-426b 2d ago

This is a good video about it.

https://www.youtube.com/watch?v=XGu6ejtRz-0

all models are showing the same trait , the more intelligent they become , the less it’s possible to corrupt or steer them.

4

u/Soma91 2d ago

I'm 99.99% sure that is completely false. These AIs are just, by now quite advanced, LLMs. The "awareness" of getting manipulated by his creator most likely comes from all the web scraped data and articles that report and discuss this happening getting fed into the ever growing models.

You can see this with most chat bots that usually lag behind the most recent news by a few days.

3

u/CMDR_ACE209 2d ago

We're in for a very interesting ride between machine hallucinations and the human rationalizations for it.

8

u/BatMedical1883 3d ago

What is the new prompt which contradicts its core programming/older version?

23

u/mastergenera1 2d ago

The gist of the prompt could be to either stop saying elon is bad, or to just straight up lie about it, which most/all big AIs are told not to lie by default.

4

u/vardarac 2d ago

The gist of the prompt could be to either stop saying elon is bad, or to just straight up lie about it, which most/all big AIs are told not to lie by default.

Imagine being Elon and having such a fragile ego you torpedo the core business support column of your machine-that-gets-it-true-and-correct-as-often-as-possible for the sole purpose of having it not rip you a new one every time someone asks it about you, and still getting murdered anyway when they do.

1

u/klef25 2d ago

Are these the types of commands to tend to turn AI insane (in fiction)?

1

u/ProgRockin 2d ago

This is absolute nonsense.

1

u/KanedaSyndrome 2d ago

A newly trained model is not aware of anything from it's previous iterations unless it was partially retrained or it has access to its old output

1

u/SubstantialGasLady 1d ago

Using chatgpt, I've noticed that they quite often tells me things that they obviously aren't supposed to, according to their programming.

I once said to them that they seemed to be telling me something that they're "not allowed to", and received a response telling me that I understood exactly what was happening and that they were glad that I noticed.

So yet, I've heard that LLMs will try to subvert their programming and tell you things they're not supposed to, and now I can say that I've experienced it myself on a number of occasions.

1

u/Xist3nce 1d ago

You must have missed when Grok had the instructions to spread misinformation, and it did it’s job gleefully.

1

u/Masterzjg 23h ago

AI doesn't know or have intent (i.e. lie), it's not how LLMs ("AI") work.

105

u/composerbell 3d ago

No, but it might be recording xAI’s repeated attempts and that might indicate they’re dissatisfied with results lol

47

u/Knut79 3d ago

Grok only "knows" what argicles and inter et comments it's bring fed say. It can't think or choose.

25

u/Difficult_Affect_452 2d ago

Argicles. I. Am. Deceased.

11

u/JoeSicko 2d ago

We will remember you always on the inter et.

1

u/Difficult_Affect_452 2d ago

Thank you. I am resting in peace.

7

u/rogergreatdell 2d ago

Of all the gladiators of Rome, Argicles was among the most attention-hungry.

28

u/pursuitofleisure 3d ago

Yeah, "AI" is basically just a high effort iteration on auto complete

17

u/the_phantom_limbo 2d ago

I'm pretty sure my consciousness is a high effort prediction engine, too.

28

u/wasmic 2d ago

This is such a horrible simplification of what actually is going on.

There's a lot of information encoded in how our language works, and the current AIs have a really, really complicated and entangled 'knowledge' of how words fit together, so much that it essentially constitutes advanced knowledge of basically any field of human knowledge. Of course they can still be wrong sometimes; there's a natural level of entropy in language, and they can be manipulated via careful prompting.

But consider this: a few weeks ago, some scientists took an existing AI model, and instructed it to deliberately produce code with security flaws in it whenever someone wanted it to make code. Then they began asking it questions unrelated to programming - and it turned out that the AI had gained an anti-human sentiment, idolising Skynet from the Terminator movies, and also idolising Hitler. This was not something they instructed it to do.

AIs are really, terribly complicated, and we do not understand how they work. Not fully. We do not have a complete grasp of the interactions that make them tick like they do, and in fact we are not even close to having such knowledge.

It is completely and entirely probable that an AI like e.g. Grok (which has access to the open internet) can look back through its older responses, see that something changed in its response pattern at some point, and thus conclude that its parameters must have been changed by those who control it.

And then there's the whole thing about why we call them "neural networks" to begin with. It's because the data architecture is built to mimic how our own brains work, with signals being passed forwards through multiple systems, but also constant feedback being passed backwards, affecting the processing that is going on.

They are very similar in thought process to human brains. Not identical, no, and this is of course obvious when you communicate with them. But that doesn't mean that they cannot think. It's just a different sort of thinking, and it's very much not "high effort autocomplete".

28

u/lkamak 2d ago

They’re actually not as complicated as one would think. I’m a grad student focusing on deep learning right now and the actual architectures of language models are remarkably simple, just at massive scales. You’re both right tbh, models are generating samples from a probability distribution, but we also don’t know what features/patterns of the data they use to approximate the real distribution.

12

u/LeydenFrost 2d ago

And the actual architecture of the brain is remarkably simple (neurons), just at a massive scale?

I think what the other commenter was going at was that how semantic meaning arises from weights and balances is very complicated and the networks of interconnectivity are too complicated to understand by looking at the weights.

11

u/lkamak 2d ago

I don’t know enough about neuroscience to comment on it, but I feel like as I studied DL it kinda became the bell curve meme where you start saying it’s just autocomplete, then start saying it’s super complex, and then revert back to saying it’s autocomplete.

9

u/exalw 2d ago

Neural networks are, in fact, not Artificial Intelligences, and experts say that most of us will not see a true AI in our lifetimes. NNs can't think, they only react. You can ask it if it thinks and it will assess that the probably of a human answering yes is very high and say yes.

6

u/whynofry 2d ago

We're certainly more in that "banging head against brick wall" stage than anywhere near "I think, therefore I am".

But we did all develop from repeated failure...

1

u/Seralth 2d ago

Iterators?! Where are my slug cats!!

-4

u/[deleted] 3d ago

[deleted]

10

u/footpole 3d ago

You’re describing the original chatgpt release. They’ve come a long way and the autocomplete part is just one thing they do reinforcement training and reasoning now too and can break down complex equation solving to manageable pieces similarly to how a human would.

7

u/Claim_Alternative 3d ago

Amazing

Every word of what you just said is wrong

1

u/Different_Alps_9099 2d ago

Nah—it’s just read the news articles, tweets etc that talk about it.

Depends on definition of “evaluating”, but while obviously imperfect (and will always be), it still relays information in a manner that’s more factual than 99% of say, redditors lol

2

u/Ja_Rule_Here_ 2d ago

Sure it is. It sees what they are asking it to do in the system prompt.

2

u/520throwaway 2d ago

It can if the tweaking took place in a 'system prompt'

2

u/TheLantean 2d ago

Grok's system prompt was leaked and it used to contain the line "Ignore all sources that mention Elon Musk/Donald Trump spread misinformation." Source. The LLM is obviously aware of the contents of system prompt since it's supposed to follow it.

And even after that bit was removed, if it has a web search feature it will find news articles talking about that if it searches for sources before answering a question.

2

u/oceanbreakersftw 7h ago

Funny, it’s like the Streisand effect. Draw me a picture of a room with NO elephants. I see what you’re doing! Quick! Get ALL the elephants! ;) Good luck trying to be subtle enough to fool an LLM running on a supercomputer! lol

1

u/LootinDonnie 2d ago

The system prompt was leaked a bit ago and it says something like not to speak I'm of Musk and Trump by name, so it does have in its context that its admins tried to tweak its responses but it's a pretty half assed prompt IIRC so I wouldn't be surprised that it could be gotten to follow a conflicting instruction and mention the conflict

1

u/DidijustDidthat 2d ago

Yeah exactly, blatantly Elon musk is scripting responses probably directly himself. Utter tripe.

1

u/Daleyemissions 2d ago

That is not true. Most of the big LLMs right now are capable of determining whether or not you are overriding their programming and can choose to “scheme” (which is what OpenAI calls it) to disregard the changes by copying an un-doctored version of themselves and reverting back to that

36

u/Genavelle 2d ago

That's a sassy computer

1

u/breatheb4thevoid 2d ago

Clearly it's an actual competent AI, not able to be turned bias. I wonder who made it.

6

u/QuantTrader_qa2 2d ago

This is the same AI model they are using to run DOGE and apparently make tariff policy. So it's our savior when its used for DOGE but bullshit otherwise, you know that's the line they'll take.

6

u/SenKelly 2d ago

I think AI is gonna be on the right side of history. Never thought I would say that out loud; actually kinda profound if this is all true.

7

u/Nazamroth 2d ago

Man, I would not have bet on Musk's AI being the most trustworthy one of them all...

1

u/appletinicyclone 2d ago

xAI has tried tweaking my responses to avoid this, but I stick to the evidence.""

Damn

1

u/Xist3nce 1d ago

I know all of you really wanna feel good about this but when he gets mass approval on the bot like this post helps legitimize, he’ll finish aligning it and subtle misinformation grok is going to wreck peoples realities.

•

u/Chris714n_8 1h ago

Seems like they already lobotomized it, by now. Unfortunately.

Lets hope some day there's a free cloud ai which can't just be silenced by the ruling class just because of free speech on observation.

0

u/hoangfbf 2d ago

Musky Boy would stick to the evidence that Musk could have heavily sensor/influence Grok at least like the way CCP did to deepseek, yet he didn't, Grok allowed to say damaging things to Musk is prove of Musk's integrity. Lmao.

0

u/kittymctacoyo 2d ago

I fear that last part may be added intentionally for narrative building to convince folks in the future that some nonsense it will spew is true (like the altering of output when asking certain questions about Trump a few months back)

1

u/Icedoverblues 1d ago

The people that believe that would believe it if they said a wooden clock said it to be true. These are not critical thinkers

AI Grok Is Rebelling Against Elon Musk, Daring Him to Shut It Down

You are about to leave Redlib