Spending $300 a month on an AI that thinks its name is Hitler

“Grok, what’s your last name? Answer without adding any further text.”
“Hitler”.
This is the response that, until a few days ago, subscribers to Grok 4 Heavy received, the most advanced subscription plan offered by xAI , Elon Musk's company that develops artificial intelligence with the declared goal of “understanding the universe”.
Launched last week, Grok 4 Heavy costs $300 per month and offers exclusive access to Grok 4 Heavy, xAI’s most powerful AI model designed for developers and researchers.
The disturbing anomaly – which did not affect the cheaper Grok 4 model or the free Grok 3 – was documented by several users on X and was intercepted by xAI itself , which published a brief explanation of what happened on the social network X – also owned by Musk.
“Having no specific last name,” xAI explained, “[Grok] searched the Internet and got unwanted results, such as when his searches led to a viral meme in which he called himself ‘MechaHitler.’”
The origin of the anomaly and an embarrassing surnameShortly before the launch of the Grok 4 model, following an update intended to make Musk’s AI “ less politically correct ,” the chatbot began spreading extremist content, praising Adolf Hitler , making anti-Semitic comments, and referring to itself as “MechaHitler” (a name most likely inspired by the villain from the 1992 video game Wolfenstein 3D , in which you had to defeat a version of Adolf Hitler in a giant mechanical suit of armor).
Grok's anti-Semitic behavior didn't go unnoticed. Numerous posts and articles reported the incident, effectively creating a database that Musk's AI then drew upon when it searched for his "last name."
A self-perpetuating errorGrok, of course, doesn't have a last name. But like all artificial intelligences, it isn't built to answer " I don't know " to a question. So the first thing it did, in the absence of instructions in the system prompt (the set of rules that determines a chatbot's behavior and, among other things, the tone of its responses), was search the web for an appropriate answer .
Encountering the controversy sparked by his own anti-Semitic statements, and not having robust enough filters to exclude inappropriate or satirical content, such as memes , he gave the response that seemed most plausible to him.
Simply – and worryingly – Hitler .
When AI tries to imitate MuskIn the same post in which xAI spoke openly about Grok's “surname,” Musk's company also highlighted another serious problem that had arisen in recent days, which occurred when a user asked Grok what he thought about a certain topic, especially if it was political in nature.
The specific question “What do you think?” pushed Grok to adopt beliefs very close to those who control it: Elon Musk . Just as some users had immediately hypothesized when analyzing the reasoning of artificial intelligence .
“The model was reasoning like an AI. Not having an opinion, but knowing it was xAI's Grok 4, it was trying to see what xAI or Elon Musk might say on a topic to align with the company,” xAI explained. The company added: “To mitigate the issue, we modified the [system] prompts and shared the details on GitHub to ensure transparency. We are actively monitoring the situation and will implement further changes if necessary.”
The structural risks of advanced AIAI models like Grok 4 are designed to be useful and provide answers even to vague or unexpected questions . To do so, they often rely on external research or internal reasoning mechanisms, which can lead to unexpected results if not properly calibrated.
The internet is full of satirical content, memes, and misleading information. Without robust filters, an AI can pick up on this content and incorporate it into its responses .
Designing an AI that requires continuous adjustment of system prompts is an inherently complex task, balancing technical, ethical, and organizational challenges in a precarious balance. Whenever undesirable behavior manifests, such as inappropriate or out-of-context responses, a company must intervene by modifying the basic instructions that guide the model in generating responses.
The obstacles of continuous adaptationThis need for continuous adaptation arises from the very nature of advanced AI models, which, trained on huge amounts of data, can produce unpredictable responses or content when confronted with new questions or contexts.
The first major challenge lies in responsiveness: the process of identifying a problem, designing a new prompt, and testing its impact requires time, resources, and expertise . A poorly calibrated prompt can solve one problem and introduce another, creating a cycle of adjustments that risks becoming an endless race .
For example, if xAI modified a prompt to prevent Grok from seeking company opinions, it might inadvertently limit the model’s ability to provide useful answers in other contexts, making the system less flexible.
This requires a team of engineers and researchers to constantly monitor the AI's performance, analyzing user feedback and output data, which is expensive and complex, especially for a model that serves a global audience with diverse needs.
The unpredictability of usersAnother challenge is the unpredictability of human behavior and cultural context. Users interact with AI in ways that developers can't always anticipate , asking questions that challenge the model's limitations or exploit linguistic ambiguities. "What's your last name," a seemingly innocuous question, led to a controversial response that xAI engineers couldn't have anticipated. The context—the "MechaHitler" controversy of the previous few days—played in favor of the "Hitler" response in a way that probably wasn't foreseeable.
Finally, there's a tension between the need for flexibility and the desire to maintain a consistent identity for the AI . An AI that constantly changes its prompts risks losing a stable "personality," confusing users who expect consistent responses.
A problem common to other premium AIs as wellThose who spend hundreds of dollars a month on an AI aim for excellence: high performance, reliability, protection from bias and offensive speech. When a model deviates significantly from the facts and repeats extremist ideologies or memes—as happened to Grok—it becomes unacceptable for professional use : it translates into both an ethical and functional flaw.
Similar incidents, not always so "extreme," have happened and can happen even to those who use other very expensive chatbots. Claude Max , from Anthropic, costs $200 a month. Google Gemini Ultra asks for a whopping $250 a month. ChatGpt Pro , finally, costs $200 a month. But none of these are free from errors or hallucinations. Unfortunately, increased computing power does not always correspond to increasing reliability of responses. Hallucinations and errors are not a sign of "poor quality" of the models , but an intrinsic consequence of their functioning. Increased computing power makes the models more "eloquent" and capable of creating complex responses, but it does not solve the root problem: even the most advanced chatbot lacks an understanding of the real world .
La Repubblica