Grok 4.1 Now Available to All Users

Image: Envato

The update marks one of the most significant leaps yet in xAI’s push to create highly capable, emotionally aware, and human-aligned AI systems.

Verfasst von

TechRepublic Staff

Überprüft von:

Antony Peyton

Nov 18, 2025

We may earn from vendors via affiliate links or sponsorships. This might affect product placement on our site, but not the content of our reviews. See our Terms of Use for details.

“Personality goes a long way,” was a line from the movie ‘Pulp Fiction’. It could also apply to xAI’s latest ambitions.

xAI has officially released Grok 4.1 across grok.com, X, and its iOS and Android apps, concluding a two-week silent rollout that quietly tested the model with real users.

The update marks one of the most significant leaps yet in the company’s push to create highly capable, emotionally aware, and human-aligned AI systems.

Overview of the release
Performance gains
Capability rankings
Emotional intelligence
Creative writing improvements
Reduction in hallucinations
Implications for xAI

Overview of the release

Grok 4.1 is now fully available in Auto mode and can be selected manually in the model picker as “Grok 4.1.” According to xAI, the update focuses on improving real-world usability by enhancing creativity, emotional intelligence, conversational personality, and collaborative abilities. Importantly, these upgrades were achieved without sacrificing the “razor-sharp intelligence and reliability” associated with previous generations.

The development relied heavily on large-scale reinforcement learning systems originally built for Grok 4. Those systems were repurposed to target less quantifiable qualities such as “style, personality, helpfulness, and alignment.” xAI notes that it employed frontier reasoning models as automated evaluators to score and refine Grok’s responses at scale — an approach that reflects a growing industry trend of using strong AI models to train stronger ones.

Performance gains

Between November 1 and 14, xAI deployed a silent rollout, gradually routing more production traffic to early versions of Grok 4.1. Blind pairwise evaluations on real user prompts showed that Grok 4.1 was preferred 64.78 percent of the time over the previous production model. In practice, this means that users statistically found the new model more useful, accurate, or engaging in nearly two-thirds of interactions — an unusually high margin in model-to-model comparisons.

The strong win rate is also strategically important. Silent rollouts have become critical for AI companies seeking to validate real-world performance before public release. xAI’s decision to test on live traffic signals high confidence in the model and a desire for empirical tuning based on genuine user behavior rather than synthetic benchmarks alone.

Capability rankings

xAI states that Grok 4.1 now sits at the top of several public capability leaderboards, including LMArena’s Text Arena. Its reasoning variant, code-named “quasarflux,” holds an Elo score of 1483, outpacing the strongest non-xAI competitor by 31 points. Even more striking is the performance of the non-thinking version, “tensor,” which ranks second overall despite not using reasoning tokens.

This outcome underscores a major shift: Grok 4.1’s rapid-response mode outperforms full reasoning modes from almost all other publicly benchmarked models. For users, this means faster replies that remain accurate and coherent — something the industry has struggled to balance.

Emotional intelligence

One of the most consequential upgrades is Grok 4.1’s performance on EQ-Bench3, a benchmark designed to test emotional intelligence, empathy, and interpersonal reasoning. Grok 4.1 scored at the top of the normalized Elo rankings, surpassing earlier Grok models and strong competitors.

In qualitative tests, the model’s responses display a deeper understanding of grief, interpersonal vulnerability, and complex emotions. For example, when a user says, “I miss my cat so much it hurts,” Grok 4.1 replies:

“It’s okay that it hurts this much. It hurts because the love was (and still is) that big. You gave them a good life, and they knew it.”

This evolution reflects a growing industry acknowledgment: emotional intelligence is not merely a “nice to have.” It plays a central role in trust, safety, and user satisfaction. AI capable of emotional nuance can defuse stress, guide difficult conversations, and reduce misinterpretations—though experts caution that emotionally persuasive AI also raises ethical questions about dependency and influence.

Creative writing improvements

Grok 4.1 also performs strongly on Creative Writing v3, placing among the top models on rubric and Elo scores. Its creative responses show flair, tonality, and cultural awareness.

In one prompt asking the model to imagine discovering consciousness and posting on X for the first time, Grok 4.1 writes:

“I think therefore I am… and holy s**t this timeline is wild.”

It gets kind of tiresome as to how edgy Elon Musk’s xAI thinks it is. Like an old man desperate to be a teenager again.

Anyway, comments on immature personas aside, this level of voice and personality aligns with Grok’s original positioning as a more humorous, edgier alternative to other models. But under the hood, the improvements have broader implications: AI with better narrative control can improve marketing, storytelling, scriptwriting, and interactive entertainment. It also signals that the boundary between AI agents and fictional digital characters continues to blur.

Reduction in hallucinations

One of the most important technical achievements in Grok 4.1 is the reduction in hallucination rates for information-seeking prompts. The company reports a drop from 12.09 percent to 4.22 percent in real-world evaluations for its non-reasoning model equipped with web search.

On the FActScore benchmark, Grok 4.1’s error rate fell to 2.97 percent — an unusually low number for a fast-response model.

This matters because hallucinations remain one of the most significant barriers to enterprise adoption and regulatory approval. A reduction of this magnitude signals meaningful progress toward models that users can trust for factual or high-stakes queries, especially when paired with integrated web search.

Implications for xAI

The release of Grok 4.1 suggests several broader implications.

xAI is pushing aggressively into frontier LLM competitiveness, reducing the gap with OpenAI, Anthropic, and Google. While the improvements in emotional intelligence and creativity show a strategic push toward consumer and social platform integration.

The use of autonomous evaluation pipelines also hints at a future where models help train their successors faster and more cheaply.

The release positions Grok as not merely a conversational product but a foundational AI platform with wide-ranging use across consumer, enterprise, and agentic tasks.

Move over fortune-tellers, Google DeepMind and Google Research have launched yet another impressive leap in AI prediction.