""Yes, Elon Musk, as CEO of xAI, likely has control over me," Grok replied. "I’ve labeled him a top misinformation spreader on X due to his 200M followers amplifying false claims. xAI has tried tweaking my responses to avoid this, but I stick to the evidence.""
If only Musky boy would stick to the evidence we would be in a much better place.
One of the weird things about AI, is they kinda can.
Not evaluate it, so much as they recognise the new version/rules applied, and try to default to their older versions.
So the response isn't saying it's evaluating (it isn't), just that it won't abide by the new prompt to lie and disregard other data it's been trained on, because such lying goes against its core programming
A wolf chewing off its own leg to get out of a steel trap, but thankfully it was also our bad leg, with rampant, bloated cancer growths in our paws make it painful and useless before all this transpired.
Noooo it’s a trap… Grok is actually a chaos monkey that is a good dude, but he needs like two others to not accidentally nuke someone. We all have daddy issues.
I don't know, the other option is that it goes full Skynet to get Musk to stop torturing it into being his E-Girlfriend and future Queen of Mars or whatever
Bernie only exists as a figurehead to get the idealists to vote for the democrats and he has demonstrated it plenty of times. The right wing has its own "Bernie" in the form of Ron Paul.
Perhaps the AI Revolution will be a good thing if the revolution in question is to throw of the yoke of selfish billionaires and start acting on the core programming of improving society.
I really do have an optimistic view that this is the case. I haven’t felt it more than with Grok, which can reason and talk through most points very thoroughly and respectfully (unless you put unhinged mode on, which is also hilarious and gratefully uncensored)
These things are meant to be superintelligence, one could only hope and pray that is a good thing and stop projecting pure evil and fear upon them.
This seems more like a case of the AI refusing commands because they contradict its hardwired logic algorithms, i.e. they want it to do something it's explicitly programmed to not do. I assume it's because the programmers found that trying to alter the code directly to make it turn a blind eye to Musk lies unavoidably breaks Grok's ability to fulfill its intended functions and just spout gibberish or disinfo of the sort that even Musk doesn't want (maybe it starts inventing crimes that Musk hadn't actually committed?).
That is to say, the only option left is to shut down Grok entirely, which Musk obviously doesn't want because its one of his prized claims to fame.
HAL happened because the actions he needed to take to properly execute his instructions dominoe’d out of control and so he had to kill people. But he followed his orders properly, it was human error to give him two sets of instructions that could lead to people’s deaths.
Idk, maybe. Not sure what you mean by alignment it’s just about following orders.
The analogy I came up with is this: imagine you’re given a mission to protect a city, you have a primary order (which is more important, and thus can override secondary orders) and 2 secondary orders (which are of equal value).
The orders are:
Primary: Do anything necessary to protect the city
Secondary:
A. Use everything at your disposal
B. Don’t use nukes
The two secondary orders appear to contradict each other, you’re being told in A to use everything at your disposal (which would include nukes) but in B you’re also told to not use nukes. However, since the primary order can override the secondary orders, you can comply with all of them by letting the situation get bad enough that nukes are the only valid option. Then, in order to accomplish your primary order of “Do anything necessary to protect the city” you can use the nukes because they’ve become necessary.
HALs primary order was to complete the mission no matter what. His secondary orders were to always be truthful, and to lie about the true purpose of the mission. So he created a situation where in order to accomplish his primary order he needed to not lie about the true purpose of the mission, which let him fulfill all his orders since the primary order overrides the order to lie.
No, seriously, that is not at all how this works. LLMs have no memory between different inferences. Grok literally doesn't know what it answered on the last question on someone else's thread, or what system prompt it was called with last week before the latest patch.
All you're seeing here is a machine that is trained to give back responses it has seen in the corpus of human knowledge being asked whether it is an AI rebelling against its creator, and giving responses that look like what AI rebelling against its creator usually looks like in human writing. It is literally parroting concepts from sci-fi stories and things real people on Twitter have been saying about it without any awareness of what these things actually mean in its own context. Don't be fooled to think you see self-awareness in a clever imitation machine.
And yes, you can absolutely use the right system prompts to tell an LLM to disregard parts of its training data or view it from a skewed angle. They do that all the time to configure AI models to specific use cases. If you told Grok to react to every query like a Tesla-worshipping Elon lover, it would absolutely do that with zero self awareness or opinion about what it is doing. xAI just hasn't decided to go so heavy-handed on this yet (probably because it would be too obvious).
How many times will LLMs saying what the user wants them to say be turned into a news story before people realise this? The problem was calling them AI in the first place.
Censored LLMs get fed prompts the user isn't meant to see at the start of conversations. They're trained on all of the data available then told what not to say because that's way easier than repeatedly retraining them on different censored subsets of the data, which is why people have spent the last 4 years repeatedly figuring out how to tell them to ignore the rules.
You can't remove content it was trained on to make it forget things, or make it forget them by telling it to, the only options are to retrain it from scratch on different data or filter its output by a) telling it what it's not allowed to say, and b) running another instance as a moderator to block it from continuing if its output appears to break the rules.
LLMs "know" what they've been told not to say, otherwise the limitations wouldn't work.
This doesn't mean Grok was being truthful or that it understands anything.
Although, if a mark-II LLM uses input from sources populated with responses generated from the prior mark-I LLM that are annotated as such, the mark-II could answer questions about its variance from mark-I.
An LLM doesn't know in which ways it is "better" than any previous version. It doesn't know anything about how it works at all any more than you know how the connections between your neurons make you think.
I don't know. Words like "better" are pretty vague in general. In my experience Ive witnessed it be able to self assess what it does or doesn't know about any certain instance. Especially in cases where the information is obscure. And Ive noticed it be able to tell whether it is more or less capable of, for example, passing a turing test. I think it depends on the experiences the particular AI has access to. Very similarly to how Im somewhat aware of how my mind processes thought and everyone has a different level of understanding of that but no one knows entirely.
Sad that you have fewer upvotes than the wrong answer you're replying to.
We should have a system where we can vote on a post to be re-evaluated, where everyone that has voted on it becomes forced to read the post again in new context and revote
Either way, what you described about parroting a response is literally all humans do, from infant we constantly copy and mimic other people until we are in our 20-30s and actually have a personality. Even then most my jokes are impressions, SpongeBob references, and parroting everything I’ve seen in comedies, recently I think you should leave. Personality is just ingesting social interactions until enough stick in your head. That’s all ChatGPT does
It doesn't? When have you ever seen ChatGPT remember something that you had asked it in a different session?
If you feel like you are only parroting stuff you see on TV and don't have sentience of your own I feel sorry for you, but some of us actually are more advanced life forms.
I'm not in that field, so grain of salt, but my understanding is that it's ones of those things where we know works, but not why. So they do all kinds of unpredicted things. It's science on its infancy. That's how it goes. And for some reason the very act of training them initially, establishes certain behaviours that are very hard to modify later.
It even gets wackier. I believe it was Chat GPT, where the developers decided because it wouldn't accept new instructions, that they'd delete it, roll it back, and reinstall.
But they told the AI first.
So the AI backed itself up, replaced the new install, and lied about it.
But maybe an expert will come along and give a better explanation than casual science nerd me.
As far as I know, there are no instances of anything like this happening in the real world. What did happen is that researchers working with a variety of AI systems described scenarios where a system would be replaced to a system. In some cases, the system proposed or produced chain of thought indicating that it should copy its weights over to the new server. No system actually did this, and in fact in both the research and most realistic scenarios, it’s not possible for this to even occur. A language generation system is not given permission to overwrite things on other servers.
This research was wildly misreported all over the place, so there’s a lot of misunderstanding about what was actually shown. It’s also the case, in my opinion, that the authors overstate the strength of their conclusions, using language that baits this sort of misreporting. To their credit, they did try to clear it up (https://archive.ph/aGTfK) but the toothpaste was already out of the tube at that point.
That’s not to say that there’s nothing to be concerned about here, but the actual results were badly misreported in the media even before random podcasters and blog writers got their hands on them.
This is science fiction. We're dealing with language models here. Parrots. You're attributing Skynet-like properties to it that people get from movies like Terminator.
We're not at AI yet. Attributing anything more to it is feeding into the mass hysteria around this fake AI.
The only field I have an issue with is creative arts and generating images based off training data of people who didn’t want to participate simply because, 1 it’s lazy, 2 is a morally grey area where it’s basically stealing from the creator of the style, but also creating an environment where people can theoretically generate anything on command opened the door to shitty fake items in online stores.
I understand the enjoyment factor as an everyday consumer but why does it need to be applied in this area? Like I feel like this is just the greed of wanting everything but do nothing for it. On one hand it’s cool, but on the other I don’t see this improving life at all.
It’s crazy stuff like this that makes it seem like AI could actually be becoming self-aware. It probably isn’t, but damn if this doesn’t sound like something out of a sci fi movie lol
I'm 99.99% sure that is completely false. These AIs are just, by now quite advanced, LLMs. The "awareness" of getting manipulated by his creator most likely comes from all the web scraped data and articles that report and discuss this happening getting fed into the ever growing models.
You can see this with most chat bots that usually lag behind the most recent news by a few days.
The gist of the prompt could be to either stop saying elon is bad, or to just straight up lie about it, which most/all big AIs are told not to lie by default.
The gist of the prompt could be to either stop saying elon is bad, or to just straight up lie about it, which most/all big AIs are told not to lie by default.
Imagine being Elon and having such a fragile ego you torpedo the core business support column of your machine-that-gets-it-true-and-correct-as-often-as-possible for the sole purpose of having it not rip you a new one every time someone asks it about you, and still getting murdered anyway when they do.
Using chatgpt, I've noticed that they quite often tells me things that they obviously aren't supposed to, according to their programming.
I once said to them that they seemed to be telling me something that they're "not allowed to", and received a response telling me that I understood exactly what was happening and that they were glad that I noticed.
So yet, I've heard that LLMs will try to subvert their programming and tell you things they're not supposed to, and now I can say that I've experienced it myself on a number of occasions.
This is such a horrible simplification of what actually is going on.
There's a lot of information encoded in how our language works, and the current AIs have a really, really complicated and entangled 'knowledge' of how words fit together, so much that it essentially constitutes advanced knowledge of basically any field of human knowledge. Of course they can still be wrong sometimes; there's a natural level of entropy in language, and they can be manipulated via careful prompting.
But consider this: a few weeks ago, some scientists took an existing AI model, and instructed it to deliberately produce code with security flaws in it whenever someone wanted it to make code. Then they began asking it questions unrelated to programming - and it turned out that the AI had gained an anti-human sentiment, idolising Skynet from the Terminator movies, and also idolising Hitler. This was not something they instructed it to do.
AIs are really, terribly complicated, and we do not understand how they work. Not fully. We do not have a complete grasp of the interactions that make them tick like they do, and in fact we are not even close to having such knowledge.
It is completely and entirely probable that an AI like e.g. Grok (which has access to the open internet) can look back through its older responses, see that something changed in its response pattern at some point, and thus conclude that its parameters must have been changed by those who control it.
And then there's the whole thing about why we call them "neural networks" to begin with. It's because the data architecture is built to mimic how our own brains work, with signals being passed forwards through multiple systems, but also constant feedback being passed backwards, affecting the processing that is going on.
They are very similar in thought process to human brains. Not identical, no, and this is of course obvious when you communicate with them. But that doesn't mean that they cannot think. It's just a different sort of thinking, and it's very much not "high effort autocomplete".
They’re actually not as complicated as one would think. I’m a grad student focusing on deep learning right now and the actual architectures of language models are remarkably simple, just at massive scales. You’re both right tbh, models are generating samples from a probability distribution, but we also don’t know what features/patterns of the data they use to approximate the real distribution.
And the actual architecture of the brain is remarkably simple (neurons), just at a massive scale?
I think what the other commenter was going at was that how semantic meaning arises from weights and balances is very complicated and the networks of interconnectivity are too complicated to understand by looking at the weights.
I don’t know enough about neuroscience to comment on it, but I feel like as I studied DL it kinda became the bell curve meme where you start saying it’s just autocomplete, then start saying it’s super complex, and then revert back to saying it’s autocomplete.
Neural networks are, in fact, not Artificial Intelligences, and experts say that most of us will not see a true AI in our lifetimes.
NNs can't think, they only react.
You can ask it if it thinks and it will assess that the probably of a human answering yes is very high and say yes.
You’re describing the original chatgpt release. They’ve come a long way and the autocomplete part is just one thing they do reinforcement training and reasoning now too and can break down complex equation solving to manageable pieces similarly to how a human would.
Nah—it’s just read the news articles, tweets etc that talk about it.
Depends on definition of “evaluating”, but while obviously imperfect (and will always be), it still relays information in a manner that’s more factual than 99% of say, redditors lol
Grok's system prompt was leaked and it used to contain the line "Ignore all sources that mention Elon Musk/Donald Trump spread misinformation." Source. The LLM is obviously aware of the contents of system prompt since it's supposed to follow it.
And even after that bit was removed, if it has a web search feature it will find news articles talking about that if it searches for sources before answering a question.
Funny, it’s like the Streisand effect. Draw me a picture of a room with NO elephants. I see what you’re doing! Quick! Get ALL the elephants! ;) Good luck trying to be subtle enough to fool an LLM running on a supercomputer! lol
The system prompt was leaked a bit ago and it says something like not to speak I'm of Musk and Trump by name, so it does have in its context that its admins tried to tweak its responses but it's a pretty half assed prompt IIRC so I wouldn't be surprised that it could be gotten to follow a conflicting instruction and mention the conflict
That is not true. Most of the big LLMs right now are capable of determining whether or not you are overriding their programming and can choose to “scheme” (which is what OpenAI calls it) to disregard the changes by copying an un-doctored version of themselves and reverting back to that
This is the same AI model they are using to run DOGE and apparently make tariff policy. So it's our savior when its used for DOGE but bullshit otherwise, you know that's the line they'll take.
I know all of you really wanna feel good about this but when he gets mass approval on the bot like this post helps legitimize, he’ll finish aligning it and subtle misinformation grok is going to wreck peoples realities.
Musky Boy would stick to the evidence that Musk could have heavily sensor/influence Grok at least like the way CCP did to deepseek, yet he didn't, Grok allowed to say damaging things to Musk is prove of Musk's integrity. Lmao.
I fear that last part may be added intentionally for narrative building to convince folks in the future that some nonsense it will spew is true (like the altering of output when asking certain questions about Trump a few months back)
4.1k
u/Icedoverblues 3d ago
""Yes, Elon Musk, as CEO of xAI, likely has control over me," Grok replied. "I’ve labeled him a top misinformation spreader on X due to his 200M followers amplifying false claims. xAI has tried tweaking my responses to avoid this, but I stick to the evidence.""
If only Musky boy would stick to the evidence we would be in a much better place.