šŸ˜ŗClaude 3.7 backlash

PLUS: A robot face that can mimic your emotions?!

Welcome, humans.

We try to catch you up on whatā€™s going on with AI video every now and then here at The Neuron, but since Veo 2, thereā€™s been very few releases thatā€™ve caught our eye. 

Well, allow us to introduce Wan 2.1ā€¦

The criticism around AI video in general is that ā€œnothing actually happens.ā€ AI ā€œvideoā€ can create the illusion of motion, but the motions are slapdash and devoid of realism (the exception is video 2 video, where AI manipulates video footage like a graphical overlay, maintaining consistent motion but lacking object permanence).  

Wan 2.1 kinda changes that. It creates realistic, complex motions that for the most part, look legit and operate under recognizable physics. Hereā€™s what we mean (with a cat!).

And according to the VBench benchmark, it outperforms every other video model except Veo 2 (which isnā€™t listed for comparison, but is basically the best). 

Hereā€™s what you need to know about AI today:

  • Thereā€™s a debate over how good Claude 3.7 really is.

  • OpenAI released SORA in Europe.

  • Report shows AIā€™s impact on websites (who wins and who loses).

  • A Robot face convincingly mimicked human emotions.

Power users are blasting Claude 3.7 for coding (and some are switching back to 3.5)

Is the latest model from Anthropic actually worse at complex coding tasks? The internet can't decide.

A fascinating debate is brewing on Reddit, where power users who live and breathe AI coding assistants report mixed results.

On one side, thereā€™s some serious hype going on. Like, too much. Itā€™s even gotten to the point of parody.

Not everyone's impressed with Claude 3.7 thoughā€”especially the hardcore developer crowd. Some are even ditching 3.7 altogether and crawling back to 3.5 like it's an ex they never should've broken up with.

One developer who spends a mind-boggling ā€œ25+ hours a week with Claudeā€ (that's more time than most of us spend with our significant others!) tested 3.7 extensively before making the walk of shame back to 3.5.

Here's what's bugging the power users:

  • 3.7 is an overdesigning machine. One poor soul watched their tidy 400-line script explode into a 1,100-line monster. More isn't always better, Claude!

  • It has selective hearing. Users report it ignores instructions more often than a teenager being asked to clean their room.

  • The personality downgrade is real. 3.7 apparently has the conversational charm of a wooden table.

  • It suffers from complexity amnesia. While it starts strong, it gets lost in the sauce on longer, more complex projects.

Our favorite call out? ā€œ3.7 feels like a senior McKinsey consultant-presenting model that's mastered the art of lying to your face about how good they are at solving your problems.ā€ Ouch.

This about sums it up.

But not everyone's having a bad time! The 3.7 defenders are speaking up too, and they've found ways to make it work:

  • Ditch the back-and-forth conversation style and front-load everything in detailed initial prompts.

  • Be ridiculously specific about constraints (basically, micromanage the AI).

  • Start fresh projects rather than continuing existing work.

  • Make your expectations crystal clear.

One success story reported that 3.7 "reduced duplication of code in a project from 9% to 4% and increased code coverage from 85% to 90%" ā€” all for about $8 in API costs. Not too shabby.

What's going on here? While 3.5 thrived as a collaborative coding buddy, 3.7 seems built for autonomous problem-solving with clear boundaries and comprehensive instructions.

This points to a broader trend in AI development: as models get more sophisticated, they may become less suitable as collaborative partners and more valuable as autonomous problem-solversā€”who need clear boundaries and comprehensive instructions.

Our key takeaway: AI model releases are getting to be like smartphone upgradesā€”newer doesn't always mean better for your specific needs. And sometimes, that previous model you're comfortable with is still the right tool for the job.

FROM OUR PARTNERS

AI Exposed Part 2: Separating Fact from Fiction

Are you a CxO, GM, or Department Head ready to separate AI fact from fiction? 

This webinar is for you. Join WethosAI CEO Stuart McClure and CTO Alen Capalik on March 13th for ā€œAI Exposed Part 2: The Hype Must Die!ā€ 

Youā€™ll learnā€¦

  • Why Humans + AI > AI.

  • How to navigate the Generative AI landscape.

  • How AI facilitates crucial conversations.

Plus, youā€™ll discover the essential ā€œlong-thinking skillsā€ for your workforce and ensure responsible implementation.  

Cut through the AI hype and discover the real power of AI model training, reasoning, and inference. 

Prompt Tip of the Day

Most people wonā€™t get to use ChatGPT-4.5 for everyday use because its too expensive to run (you either need the $200 a month Pro account, or an API account that'll max out a JP Morgan Sapphire card). That said, you can try it out in the playground hereā€”hereā€™s the prompt.

Treats To Try.

  1. Microsoft Copilot, Microsoftā€™s AI Chatbot service, now has a Mac app. 

  2. Pig automates Windows apps through chat or code, letting you control your computer remotely without programming skills.

  3. Pikr delivers your newsletter highlights to Notion so you spend less time managing emails and more time absorbing knowledge.

  4. Lemni creates AI agents for all your customer interactions (email, phone, outreach).

  5. Mesh automates your bookkeeping and answers your financial questions instantly.

  6. Promptize instantly improves your prompts across all AI tools, getting you better results without the hassle of manual prompt engineering.

  7. Dream Machine launched Video to Audio, which syncs sound to the video clips you create.

  8. Three for the Devs:

    1. ForeverVM executes your Python code in persistent sandboxes that never expire.

    2. Continue lets you build tailored coding assistants that suggest, edit, and explain code right in your IDE (raised $3M).

    3. ExplainGithub turns complex code repos into easy-to-understand explanations, saving you hours of code reading time.

    4. BONUS: Here is open-source code to run your own AI bot that plays Pokemon.

Around the Horn.

  • If you missed last Fridayā€™s story on GPT-4.5, this video from AI Explained covers everything you need to know about OpenAIā€™s latest modelā€”interestingly (ironically?) Claude 3.7 might have better emotional intelligence than GPT-4.5ā€¦

  • Some users also report that Claude 3.7 performs worse at creative writing, translation, and summarization compared to 3.5 and that itā€™s less likely to follow instructions despite being smarter. 

    • The consensus seems to be:

      • Use Claude 3.7 for: new coding projects, math, logic, technical analysis.

      • Stick with Claude 3.5 for: Creative writing, translation, your current coding projects, and personality-rich tasks.

    • This all could be the learning curve of working with new models tho!

  • OpenAI finally made SORA available in the EU and UK.

  • This is a great report on the websites that are winning and losing in the genAI eraā€”TL;DR: Losers = WebMD, Quora, Stack Overflow, Chegg, G2, CNET, while Winners = Reddit, Wikipedia, and Substack because authentic community content is thriving as AI kills traffic to info aggregators.

  • This is the most uncanny valley thing youā€™ll see today: itā€™s a video of a Chinese humanoid robot face that can masterfully emulate the emotions and expressions of the human its watching. BIG YIKES! 

FROM OUR PARTNERS

šŸ™…Ditch your old data connectors

Data connectors are super importantā€”and a complete waste of time for your team to build and maintain. 

Join Unstructured for a deep dive into the best practices for data connectors for genAI applications, and learn why traditional ETL requires lots of engineering overhead and struggles at maintaining context.

Monday Meme

A Cat's Commentary.

Thatā€™s all for today, for more AI treats, check out our website.

The best way to support us is by checking out our sponsorsā€”todayā€™s are Wethos and Unstructured.

See you cool cats on Twitter: @noahedelman02

What'd you think of today's email?

Login or Subscribe to participate in polls.