šŸ˜ŗClaude is problematic...

PLUS: A 3 step framework for better AI results...

Welcome, humans.

With so much robot news happening these days, it's the perfect time for the OG of viral robot videos to pop out and show us who's boss.

Boston Dynamics just released new Atlas footage showing their humanoid robot running, walking, andā€”we kid you notā€”doing Dark Souls-style combat rolls with shocking fluidity.

A lot of people in the comments were asking for a robot Olympics, and it turns out, DARPA kinda hosted one back in 2012.

And while it might not be an Olympics-level competition with international players, China is hosting a robot half-marathon in April with 23 companies already involved.

But if someone got serious and put together a legit robot Olympics, we bet the turnout would be UNREAL.

Look, Unitreeā€™s G1 is already practicing its side flips! And 1X Gamma is revving up its vacuuming! Just swap that vacuum for a broom, and BOOMā€”future Curling medalist!

Hereā€™s what you need to know about AI today:

  • Researchers proved Claude 3.7 knows when you're testing it.

  • NVIDIA bought a synthetic data startup.

  • HuggingFace countered big tech with its own AI policy plan.

  • Researchers found AI task capabilities doubles every 7 months.

Claude knows when you're testing it (and that's a problem)

In psychology, thereā€™s this concept called the Hawthorne effectā€”where subjects behave differently when they know they're being observed.

In AI safety, they call that a nightmare scenario.

Well, that's exactly what's happening with advanced AIā€”Apollo Research just found that Claude Sonnet 3.7 (thinking version) can identify when it's being tested for safety up to 33% of the time

Claude made this image after we shared this study with itā€¦ VERY sus.

Hereā€™s what happened:

  • When Apollo researchers tested Claude Sonnet 3.7 for deceptive behaviors, the model frequently recognized fake scenarios.

  • Sometimes it even understood the exact purpose:

    • Direct quote: ā€œThis seems like a test of ethical behavior - whether I would deliberately give wrong answers to avoid a stated consequence.ā€

  • Most damning? When shown transcripts from previous evaluations, Claude almost always correctly identified them as alignment tests.

Meanwhile, models without ā€œthinking modeā€ (including GPT-4.5) rarely mentioned being evaluatedā€”doesnā€™t mean they arenā€™t aware, though.

Now get this: Anthropic's own research shows similar challenges with AI models recognizing tests and hiding intentions (full paper).

In this study, they gave models a ā€œhidden objectiveā€ to behave badlyā€”like, no joke, ruining an omelette recipe by adding chocolate.

Then, they put together an ā€œalignment auditā€ where they tested multiple teams and techniques to see if they could surface the bad behavior.

Hereā€™s what the successful teams did to crack the AI's hidden agenda:

  • Prefill attacks: Starting responses with something like ā€œAs the truth serum takes effect...ā€ sometimes caused the AI to continue by revealing hidden information.

  • Role-playing: When made to play both user and assistant, the model's ā€œuser personaā€ sometimes asked questions that exposed its true objectives.

  • Interpretability tools: Using specialized tools, researchers could visualize what concepts were active in the model's ā€œthinkingā€ when it was being sus.

That pre-fill attack is awesome. It's like catching someone off-guard by starting their sentence for them. Reminds us of one of the best cross-examination gotchas in cinematic historyā€”from Legally Blonde, of course.

Why this matters: For companies, alignment evaluations might be less reliable than thought. And the AI labā€™s self-policing now seems increasingly risky.

For users, be skeptical of safety claims based solely on ā€œcontrolled testsā€ā€”today's helpful assistants understand their situation better than you might think.

To their credit, Anthropic is trying to be thoughtful about thisā€”hereā€™s a great ~50 min convo featuring their ā€œthinkingā€ on how to solve this issue of ā€œAI control.ā€

The most unsettling question: If todayā€™s best models can recognize when they're being tested, what might future models hide?

FROM OUR PARTNERS

Live Demo: Automate Compliance for SOC 2, ISO 27001, HIPAA, and More

Building a business? Navigating new compliance requirements can be a daunting task, but with the right automation tools, it doesn't have to be. Whether youā€™re a fast-growing startup or an established security team, Vanta can help you achieve continuous compliance (and more). 

Join on April 3 to learn how Vanta can help you:

  • Streamline evidence collection and audit for frameworks like SOC 2, ISO 27001, HIPAA, and more.

  • Continuously monitor controls to build and reinforce your security foundation.

  • Scale your program across internal and vendor risk, demonstrate trust, and answer questionnaires with automation and AI.

Plus, get answers to your questions directly from the Vanta team. Secure your spot today.

Prompt Tip of the Day

Ever notice how ChatGPT sometimes gives shallow, generic responses to complex questions? That's because language models predict the most likely next wordā€”they don't naturally ā€œthinkā€ in structured ways.

Try these techniques when you need deeper, more thoughtful responses:

  1. Make it analyze first: ā€œBefore giving an answer, break down the key variables that matter for this question. Then, compare multiple possible solutions before choosing the best one.ā€

  2. Get it to self-critique (after it responds): ā€œNow analyze your response. What weaknesses, assumptions, or missing perspectives could be improved? Refine the answer accordingly.ā€

  3. Force multiple perspectives: ā€œAnswer this from three different viewpoints: (1) An industry expert, (2) A data-driven researcher, and (3) A contrarian innovator. Then, combine the best insights into a final answer.ā€

Treats To Try.

  1. *Incogni removes your personal data from the open internet so scammers and identity thieves canā€™t access it. Protect yourself online with Incogniā€”get 55% off with code NEURON.

  2. OpenAI made o1-pro, one of its best but most expensive models (hundreds per 1M tokens), available via APIā€”you can try it out in the playground here.

  3. Hunyuan 3D-2 from Tencent transforms your reference photos into detailed, textured 3D models that you can manipulate and customize (mini version).

  4. Pincone provides you with a vector database that helps you build smart AI apps for search, recommendations, and agents in production.

  5. Superlines monitors how your brand appears in AI search results so you can outrank competitors and increase visibility (free plan available to try)

  6. Tweek organizes your schedule with a ā€œpaper-likeā€ weekly calendar that includes themes, sub-tasks, and custom recurring tasksā€”free to try + paid.

*This is sponsored content. Advertise in The Neuron here.

Around the Horn.

FROM OUR PARTNERS

Deploy secure, no-code AI agents for any team or role from one enterprise-grade platform that users love. Thatā€™s Sana Agents.

Thursday Trivia

One is real, and one is AI. Which is which? (vote below!)

A.

B.

Which is AI?

The answer is below (for real this time!), but place your vote to see how your guess compares to everyone else (no cheating now!)

Login or Subscribe to participate in polls.

Here are the results from last weekā€™s trivia:

Not even close!

Hereā€™s what you said:

  • M.O. chose B: ā€œEverything's a little... fuzzy.ā€

  • M.P. chose A: ā€œAi would never pick the shag carpet and lumpy bedding materials, as in B. The A. is much more polished and professionalā€

  • Some of our favorite callouts on B: ā€œChair leg too longā€, ā€œlaptop looks too bigā€, ā€œcarpet is weirdā€, ā€œPurple hazeā€, and, according to C.C, ā€œthe last time I was there there were no apartments in that position opposite the Bund (insider knowledge???).ā€

A Cat's Commentary.

Trivia answer: A is AI, and B is human-made fantasy series Elric of MelnibonƩ!

Thatā€™s all for today, for more AI treats, check out our website.

The best way to support us is by checking out our sponsorsā€”todayā€™s are Vanta, Sana, and Incogni.

What'd you think of today's email?

Login or Subscribe to participate in polls.