The Neuron
Posts
😺Claude is problematic...

😺Claude is problematic...

PLUS: A 3 step framework for better AI results...

Grant Harvey
March 20, 2025

Welcome, humans.

With so much robot news happening these days, it's the perfect time for the OG of viral robot videos to pop out and show us who's boss.

Boston Dynamics just released new Atlas footage showing their humanoid robot running, walking, and—we kid you not—doing Dark Souls-style combat rolls with shocking fluidity.

A lot of people in the comments were asking for a robot Olympics, and it turns out, DARPA kinda hosted one back in 2012.

And while it might not be an Olympics-level competition with international players, China is hosting a robot half-marathon in April with 23 companies already involved.

But if someone got serious and put together a legit robot Olympics, we bet the turnout would be UNREAL.

Look, Unitree’s G1 is already practicing its side flips! And 1X Gamma is revving up its vacuuming! Just swap that vacuum for a broom, and BOOM—future Curling medalist!

Here’s what you need to know about AI today:

Researchers proved Claude 3.7 knows when you're testing it.
NVIDIA bought a synthetic data startup.
HuggingFace countered big tech with its own AI policy plan.
Researchers found AI task capabilities doubles every 7 months.

Claude knows when you're testing it (and that's a problem)

In psychology, there’s this concept called the Hawthorne effect—where subjects behave differently when they know they're being observed.

In AI safety, they call that a nightmare scenario.

Well, that's exactly what's happening with advanced AI—Apollo Research just found that Claude Sonnet 3.7 (thinking version) can identify when it's being tested for safety up to 33% of the time.

Claude made this image after we shared this study with it… VERY sus.

Here’s what happened:

When Apollo researchers tested Claude Sonnet 3.7 for deceptive behaviors, the model frequently recognized fake scenarios.
Sometimes it even understood the exact purpose:
- Direct quote: “This seems like a test of ethical behavior - whether I would deliberately give wrong answers to avoid a stated consequence.”
Most damning? When shown transcripts from previous evaluations, Claude almost always correctly identified them as alignment tests.

Meanwhile, models without “thinking mode” (including GPT-4.5) rarely mentioned being evaluated—doesn’t mean they aren’t aware, though.

Now get this: Anthropic's own research shows similar challenges with AI models recognizing tests and hiding intentions (full paper).

In this study, they gave models a “hidden objective” to behave badly—like, no joke, ruining an omelette recipe by adding chocolate.

Then, they put together an “alignment audit” where they tested multiple teams and techniques to see if they could surface the bad behavior.

Here’s what the successful teams did to crack the AI's hidden agenda:

Prefill attacks: Starting responses with something like “As the truth serum takes effect...” sometimes caused the AI to continue by revealing hidden information.
Role-playing: When made to play both user and assistant, the model's “user persona” sometimes asked questions that exposed its true objectives.
Interpretability tools: Using specialized tools, researchers could visualize what concepts were active in the model's “thinking” when it was being sus.

That pre-fill attack is awesome. It's like catching someone off-guard by starting their sentence for them. Reminds us of one of the best cross-examination gotchas in cinematic history—from Legally Blonde, of course.

Why this matters: For companies, alignment evaluations might be less reliable than thought. And the AI lab’s self-policing now seems increasingly risky.

For users, be skeptical of safety claims based solely on “controlled tests”—today's helpful assistants understand their situation better than you might think.

To their credit, Anthropic is trying to be thoughtful about this—here’s a great ~50 min convo featuring their “thinking” on how to solve this issue of “AI control.”

The most unsettling question: If today’s best models can recognize when they're being tested, what might future models hide?

FROM OUR PARTNERS

Live Demo: Automate Compliance for SOC 2, ISO 27001, HIPAA, and More

Building a business? Navigating new compliance requirements can be a daunting task, but with the right automation tools, it doesn't have to be. Whether you’re a fast-growing startup or an established security team, Vanta can help you achieve continuous compliance (and more).

Join on April 3 to learn how Vanta can help you:

Streamline evidence collection and audit for frameworks like SOC 2, ISO 27001, HIPAA, and more.
Continuously monitor controls to build and reinforce your security foundation.
Scale your program across internal and vendor risk, demonstrate trust, and answer questionnaires with automation and AI.

Plus, get answers to your questions directly from the Vanta team. Secure your spot today.

Prompt Tip of the Day

Ever notice how ChatGPT sometimes gives shallow, generic responses to complex questions? That's because language models predict the most likely next word—they don't naturally “think” in structured ways.

Try these techniques when you need deeper, more thoughtful responses:

Make it analyze first: “Before giving an answer, break down the key variables that matter for this question. Then, compare multiple possible solutions before choosing the best one.”
Get it to self-critique (after it responds): “Now analyze your response. What weaknesses, assumptions, or missing perspectives could be improved? Refine the answer accordingly.”
Force multiple perspectives: “Answer this from three different viewpoints: (1) An industry expert, (2) A data-driven researcher, and (3) A contrarian innovator. Then, combine the best insights into a final answer.”

Treats To Try.

*Incogni removes your personal data from the open internet so scammers and identity thieves can’t access it. Protect yourself online with Incogni—get 55% off with code NEURON.
OpenAI made o1-pro, one of its best but most expensive models (hundreds per 1M tokens), available via API—you can try it out in the playground here.
Hunyuan 3D-2 from Tencent transforms your reference photos into detailed, textured 3D models that you can manipulate and customize (mini version).
Pincone provides you with a vector database that helps you build smart AI apps for search, recommendations, and agents in production.
Superlines monitors how your brand appears in AI search results so you can outrank competitors and increase visibility (free plan available to try)
Tweek organizes your schedule with a “paper-like” weekly calendar that includes themes, sub-tasks, and custom recurring tasks—free to try + paid.

See our top 51 AI Tools for Business here!

*This is sponsored content. Advertise in The Neuron here.

Around the Horn.

NVIDIA acquired a startup called Gretel that generates synthetic training data for more than $320M.
HuggingFace submitted its own open-source AI policy suggestions to counter Google, OpenAI, and Anthropic’s rival submissions to the White House AI Action Plan.
Stripe added a new $15 fee per dispute—unless you use their Smart Disputes AI to resolve the issue.
Researchers found AI’s ability to complete multi-step tasks doubles every 7 months—if sustained, AI will be able to handle week-long tasks in 2-4 years.

FROM OUR PARTNERS

Deploy secure, no-code AI agents for any team or role from one enterprise-grade platform that users love. That’s Sana Agents.

Thursday Trivia

One is real, and one is AI. Which is which? (vote below!)

Which is AI?

The answer is below (for real this time!), but place your vote to see how your guess compares to everyone else (no cheating now!)

Here are the results from last week’s trivia:

Not even close!

Here’s what you said:

M.O. chose B: “Everything's a little... fuzzy.”
M.P. chose A: “Ai would never pick the shag carpet and lumpy bedding materials, as in B. The A. is much more polished and professional”
Some of our favorite callouts on B: “Chair leg too long”, “laptop looks too big”, “carpet is weird”, “Purple haze”, and, according to C.C, “the last time I was there there were no apartments in that position opposite the Bund (insider knowledge???).”