• The Neuron
  • Posts
  • 😺OpenAI releases their best models yet

😺OpenAI releases their best models yet

PLUS: NVIDIA's loss and OpenAI's new buy?!

 

Welcome, humans.

Yesterday was Day 3 of OpenAI’s ship week, and TBH, we’re actually shocked OpenAI released o3 and o4 mini (more on that below) because for a minute there, this meme was starting to define all their recent releases:

IDK if y’all noticed, but this rampant personality injection has even made its way into the feedback. Example: Our editor Corey spotted new questions ranking OpenAI’s ā€œpersonalityā€ instead of whether or not a result was good or bad…

That’s why it’s nice to see OpenAI pop out and show people they still got it with today’s launch. We were getting worried we’d have to start promoting ā€œSave the AIā€ā€”a hilariously dark PETA-but-for-AI organization that calls out our reckless human habits for literally killing AI.

Every glass of water you drink? That's precious coolant stolen from data centers! Those long hot showers? You're basically waterboarding ChatGPT! And don't even think about turning on lights—AI needs that electricity to think deep thoughts about your cat photos.

The solution? Simple! Just sit thirsty in the dark with your dead phone. AI needs those resources more than you do, you selfish human!

Here’s what you need to know about AI today:

  • OpenAI released o3 and o4-mini, their best reasoning models yet.

  • Google used AI to stop ad scammers.

  • OpenAI might buy Windsurf.

  • NVIDIA lost $5B to export restrictions on China.

OpenAI releases the full o3 and new o4-mini… their best models yet.

OpenAI's newest models—o3 and o4-mini—just launched yesterday, and they’re quite the pair, combining strategic reasoning with powerful tool integration.

Unlike previous AIs that simply pattern-match, OpenAI says these systems actively strategize to solve complex problems. And best of all, they’re actually more efficient.

Here’s the highlights:

  • Available now to ChatGPT Plus, Pro, and Team users, with Enterprise access next week.

  • Both models outperform predecessors at lower costs.

  • Both models were available to ChatGPT Pro, Plus, Team subscribers and through APIs..

  • Safety evaluations show improved refusal capabilities while maintaining helpfulness.

The quality improvements extend to multiple areas:

  • Much better vision capabilities than the o1 teaser (e.g: research w/ vision).

  • Exceptional writing that's casual, but not too casual—so perfect for work.

  • First models with access to ALL OpenAI tools (web search, Python, image analysis, file search)—which represents a major step towards model ā€œunificationā€ (a.k.a. GPT-5).

The benchmark results speak volumes:

  • 98.4% accuracy on AIME 2025 math competition (o3 with Python).

  • 99.5% accuracy on the same test (o4-mini with Python).

  • 2700+ Codeforces ELO rating (both models)—placing them among the world's top 200 competitive programmers.

Early testers report genuinely impressive applications. Early users report it's not just smart but practical—fast enough for daily use and impressively accurate with facts. While some testers reported hallucinations, many find it more reliable than GPT-4o and o1 for similar tasks.For example, Dan Shipper used o3 to:

  • Flag conflict-avoidance patterns in meeting transcripts.

  • Create custom AI courses with daily reminders.

  • Analyze org charts to predict team strengths and weaknesses.

…And a ton more. Ethan Mollick also got early access, and used o3 to crack a business case he teaches at Wharton, create SVG images through code alone, and write a hard sci-fi space battle.

But there’s some red flags, too. Transluce found that OpenAI's o3 model frequently fabricates actions it never performed (especially claiming to run code it cannot execute) and elaborately justifies these fabrications when questioned (good X thread on this).

Our take? On net, this entire week is a bullish signal for OpenAI finally releasing their next agent (A-SWE, or the research one, if not both), as these models are likely what’s going to power it under the hood. It’s also clear the crew is setting the stage for GPT-5.

As a reminder, by GPT-5 we mean the model that’ll finally clean up this mess:

As for our full take, we still need a bit more time to test them ourselves to compare, but it looks like Gemini 2.5 finally has a real competitor… read more on the site!

Oh, and btw—we also decided it was worth giving o3 a sit down interview… because why not? It’s about time somebody did. Check it out here.

FROM OUR PARTNERS

Zero to success: What we got wrong before we got it right (live event)

Starting a company is one thing. Keeping it ALIVE is another. Vanta is hosting a webinar on April 24th, where you can join candid conversation with… 

  • Shaan Puri (Co-Host of My First Million). 

  • Chase Lee (Founder and CEO of Trustpage, acquired by Vanta).

  • Travis Good (Co-founder and CEO of Workstreet).

In this session they’ll cover:

  • The mistakes they made and how they’d avoid them today

  • Tactical advice for navigating building, hiring, and investor conversations

  • What technical and non-technical co-founders need to get right in the first few hires

Whether you're in your first year or gearing up for growth, this live webinar will be packed with real talk from founders who’ve been there.

Prompt Tip of the Day

IDK how people read this who are trying to go to grad school, but this is GREAT prompt advice on how to use AI to help you write better and actually get in from an ACTUAL grad admissions person. Rough draft w/ AI, then personalize it to keep it real.

For everyone else, if you missed OpenAI’s 4.1 prompt cookbook from yesterday, plz check it out!

Treats To Try.

  1. Kling 2.0 lets you control gen video using images and video clips instead of just text, with new editing features for adding, removing, or replacing elements (great demo in this video on using GPT for a base image + Kling to animate).

  2. Mailgo automates finding and emailing prospects with AI, ensuring your messages actually reach inboxes instead of spam folders—free tier, then $15.20/month.

  3. Cohere released Embed 4, which finds information in your business documents, no matter if they contain text, images, or tables, in any of 100+ languages.

  4. Claude Research is like Claude’s Deep Research; it finds information across your internal documents and the web, automatically conducting follow-up searches to answer your questions completely—available to Max, Team, and Enterprise plans in the US, Japan, and Brazil.

  5. IBM’s new Granite 3.3 model transcribes speech with industry-leading accuracy, solves complex math problems better than competitors, and includes special adapters that detect hallucinations in AI search results—free under Apache 2.0 license (download here, try it here).

  6. OpenAI also released the new open-source Codex CLI, which offers a lightweight terminal agent with robust security controls—watch this for more.

  7. Infinite Reality builds immersive 3D websites and experiences that help your brand tell stories and engage audiences with no code tools.

  8. Simular automates digital tasks on your computer through their benchmark-leading agents—check out S2 in particular (free to try).

Around the Horn.

  • Here’s a helpful chart that compares price versus performance on the LiveBench leaderboard for non-reasoning model—Google is on the ā€œPareto frontā€, where ā€œno other model is both cheaper and better at the same time.ā€

  • OpenAI could be about to buy Windsurf (the popular AI coding tool) for $3B.

  • Microsoft released BitNet b.582BT, an open-source 1-bit AI model that runs efficiently on CPUs instead of GPUs.

  • NVIDIA says it lost $5.5B from canceled China chip orders and pledged compliance with U.S. export laws amid a congressional investigation.

    • Related: DeepSeek could also be specifically targeted.

  • Google suspended 39M+ advertiser accounts in 2024 (more than 3x the previous year), removed 5B+ bad ads, and used AI-powered systems to detect fraud signals at account setup before ads were served, resulting in a significant 90% decrease in reported deepfake ads.

FROM OUR PARTNERS

⚔ Get 2x more done with AI-native email

Your inbox isn't just crowded—it's stealing time from future you. While you're drowning in emails, Superhuman users are saving 4+ hours weekly with emails that write themselves and intelligent filtering that actually works. Used by 60% of Forbes AI 50 companies for a reason.

Thursday Trivia

One is real, and one is AI. Which is which? (vote below!)

A.

B.

Which is AI?

The answer is below, but place your vote to see how your guess compares to everyone else (no cheating now!)

Login or Subscribe to participate in polls.

Here’s the results from last week’s poll:

The robots took this one y’all.

Here’s what you said:

  • A.P chose A: ā€œLooks just like the cartoon, but plug is not right at the wall/panel.ā€

  • S.F. chose B: ā€œThat’s not what Tom & Jerry looked like.ā€ (I know, but old cartoons are weird).  

  • G.W. chose A: ā€œA has Tom and Jerry looking great, but the background is too modern for the show and the outlet/plug is off. B is an old school version.ā€ Nailed it!

A Cat's Commentary.

Trivia Answer: B is AI, and A is real.

That’s all for today, for more AI treats, check out our website.

The best way to support us is by checking out our sponsors—today’s are Vanta and Superhuman.

What'd you think of today's email?

Login or Subscribe to participate in polls.