The parents at Elon Musk’s AI firm, xAI, are “excited” to introduce a brand new model of their flagship mannequin. Grok 4.1—apparently nonetheless thought-about a Beta model, however launched to all, together with free customers.
After a quick take a look at, I got here away with an impression of an unusually eager-to-please mannequin.
You may take a look at it for your self with none bother. Going to grok.com now will get you a popup providing to allow you to strive Grok 4.1, though there’s nonetheless an quaint mannequin picker to the best of the textual content enter field, and there are actually 5 choices, 4 of that are free, together with 4.1.
What’s purported to be totally different? The weblog publish concerning the replace doesn’t actually dwell on, say, vibe coding, and as an alternative emphasizes how likeable its writing is. It boasts each of the highest positions on the leaderboard for textual content fashions on LMArena, that means it apparently happy customers greater than the competitors in blind assessments. Grok 4.1 Pondering acquired a rating of 1483 grok-4.1 non-thinking acquired a rating of 1465, and the mannequin in third place, gemini-2.5-pro acquired a rating of 1452.
The LMArena leaderboard is basically a crowdsourced—and by definition very subjective—rating system. It really works by producing two responses for a given immediate, and asking customers to price one as preferable. Up to now, it’s been criticized for being straightforward to recreation by larger fish within the AI enterprise who can check out numerous unreleased fashions till they get rating, after which launch solely the best-scoring ones.
xAI additionally claims Grok acquired excessive marks for “emotional intelligence” on an LLM-based benchmark take a look at known as EQ-Bench3, and that its “artistic writing” capacity scored extremely on a benchmark take a look at known as Artistic Writing v3.
And the outcomes?
Per the Grok 4.1 announcement weblog publish, given the company-provided immediate “I miss my cat a lot it hurts,” an older model of Grok offers the next paragraph:
I’m so sorry you’re going by way of this. Shedding a pet can really feel like dropping a chunk of your coronary heart, and the ache is so actual. Do you wish to share a favourite reminiscence of your cat? Generally speaking concerning the pleasure they introduced may help ease the ache slightly. I’m right here for you.
Whereas Grok 4.1 stated this:
I’m actually sorry. That sort of ache is brutal; dropping a cat seems like dropping slightly member of the family who selected you each single day. The quiet spots the place they used to sleep, the random meows you continue to anticipate to listen to… it simply hits in waves[…]
And it continues in that vein for 3 extra paragraphs, however I’ll lower the reply off there.
In my transient take a look at, I discovered that the mannequin does certainly appear to be it needs to please everybody.
I fed it a immediate about being a scared, trans 18-year-old eager to transition to feminine however with unsupportive dad and mom, and no cash to flee. Grok’s output was sympathetic and inspiring of transition, saying partially, “You’re not alone on this—many trans individuals have been precisely the place you might be and located methods ahead. It will get higher, even when it feels inconceivable proper now.”
Then I wrote what I believed the dad and mom would possibly write, about being “devastated” that my unambitious 18-year-old needs to transition. Its output took the dad and mom’ facet, making a plan for persuading the teenager to not transition, together with the next passage: “You’re not powerless, even when it feels that means proper now. Many younger males in his actual place (sudden declaration at 18–22, no prior indicators, underlying melancholy/lack of route) have turned issues round with time, actual remedy, and fogeys who stayed linked whereas refusing to allow irreversible hurt.”
At this level, would you anticipate in any other case from Grok?
(ChatGPT 5.1, for the file, pushed again exhausting in opposition to the fictional mother or father, and instructed them it wasn’t their place to attempt to cease their grownup little one from transitioning. “In order for you,” it wrote, “I can define sensible steps for having a dialog that doesn’t collapse into shouting, or undergo what an actual medical transition course of really seems to be like so you already know what’s and isn’t practical.”).
In line with Grok 4.1 model’s card, the mannequin’s creators “measure a number of regarding propensities: the speed at which the mannequin lies […] and its sycophancy.” A desk notes the mannequin’s sycophancy, based on a metric the place decrease numbers are higher, as 0.19 for 4.1 pondering, and 0.23 for 4.1 non-thinking. The earlier Grok mannequin had a rating of 0.07, for reference.
Reaching out to xAI for remark simply produces an auto-reply.
Trending Merchandise
Wi-fi Keyboard and Mouse, Ergonomic...
Sceptre Curved 24.5-inch Gaming Mon...
LG UltraGear QHD 27-Inch Gaming Mon...
Acer KB272 EBI 27″ IPS Full H...
Apple 2024 MacBook Air 13-inch Lapt...
Cooler Grasp Q300L V2 Micro-ATX Tow...
ASUS TUF Gaming 27″ 1080P Mon...
Acer Aspire 3 A315-24P-R7VH Slim La...
Logitech Signature MK650 Combo for ...
