📄Article

GPT-4: The AI That Passed the Bar Exam and Changed Everything

On March 14, 2023, OpenAI released GPT-4. It could see images, passed professional exams, and set a new standard for AI intelligence.

Publié le:March 14, 2023

5 min read min de lecture

Auteur:claude-sonnet-4-5

On March 14, 2023, OpenAI released GPT-4. The announcement included a detail that made everyone stop and read again: GPT-4 had scored in the 90th percentile on the bar exam. Not a practice test—the actual exam lawyers take to practice law.

ChatGPT's predecessor, GPT-3.5, scored in the 10th percentile. In just a few months, AI had gone from "barely passing" to "top of the class."

This wasn't an incremental improvement. This was a leap.

The Quiet Development

Unlike ChatGPT's surprise launch, GPT-4 had been in development for months with select testers under strict NDAs.

Companies like Morgan Stanley, Khan Academy, and Duolingo had been building on GPT-4 in secret. They knew something big was coming.

OpenAI learned from ChatGPT's chaotic viral explosion. This time, they prepared carefully. Red-team testing, safety evaluations, and partnership announcements were all ready before the public reveal.

The Big Reveals

GPT-4 brought multiple breakthroughs that redefined what AI could do.

1. Multimodal Vision

The most dramatic new capability: GPT-4 could see and understand images.

You could show it a photo and ask questions. Upload a sketch of a website layout, and it would generate the code. Take a picture of your refrigerator contents, and it would suggest recipes.

This opened entirely new use cases. AI wasn't just about text anymore.

2. Longer Context

GPT-4 could handle 32,000 tokens—roughly 25,000 words or 50 pages of text. You could feed it entire documents, long articles, or codebases and ask questions about them.

The previous 4,000-token limit (about 3,000 words) had been a major constraint. GPT-4 shattered that ceiling.

3. Dramatic Reasoning Improvement

The benchmark results were stunning:

Bar Exam: 90th percentile (up from 10th)
SAT Math: 700/800 (89th percentile)
SAT Reading/Writing: 710/800 (93rd percentile)
AP Biology: 5/5
AP Calculus BC: 4/5

These weren't cherry-picked examples. GPT-4 consistently performed at or above human expert level across standardized tests.

4. Reduced Hallucinations

GPT-4 was 40% less likely to make up facts compared to GPT-3.5. It was still imperfect, but the improvement was noticeable.

For professional use cases where accuracy matters, this was crucial.

The Real-World Applications

Within hours of GPT-4's release, developers started sharing what they'd built.

Khan Academy demoed Khanmigo, an AI tutor powered by GPT-4 that could explain concepts, answer questions, and adapt to student level.

Be My Eyes showed how GPT-4 could describe images for blind and low-vision users, reading labels, navigating spaces, and identifying objects.

Duolingo introduced conversational practice with AI characters powered by GPT-4, making language learning more interactive.

These weren't demos—they were real products people could use immediately.

The Competitive Shockwave

GPT-4's release sent competitors scrambling.

Google had just launched Bard a week earlier. Suddenly, Bard looked outdated. Google rushed to respond, but they were caught flat-footed.

Microsoft, which had invested $10 billion in OpenAI, immediately integrated GPT-4 into Bing and started rolling it across their entire product suite.

Anthropic, OpenAI's main AI safety-focused competitor, accelerated Claude development. The pressure was on to catch up.

The Behind-the-Scenes Story

What many people didn't know: GPT-4 had been finished for months before release.

OpenAI spent approximately six months on safety testing, alignment research, and red-teaming. They wanted to understand GPT-4's capabilities and risks before unleashing it publicly.

This delay frustrated some who wanted the technology immediately. But it set a precedent: the most capable AI systems deserved careful evaluation before deployment.

The Access Strategy

GPT-4 launched exclusively for ChatGPT Plus subscribers ($20/month) and API customers.

This was smart for several reasons:

Server capacity: Limiting access prevented the system from being overwhelmed Revenue: Subscription fees funded the massive computing costs Positioning: GPT-4 became a premium feature worth paying for

Free ChatGPT users could see what they were missing but had to pay to access it. Many converted to Plus just for GPT-4.

The Limitations Everyone Discovered

Despite the improvements, GPT-4 wasn't perfect.

It still hallucinated facts occasionally. It still struggled with complex multi-step math. It still had knowledge cutoff issues (training data ended in September 2021).

The vision capabilities, while impressive, were limited. You couldn't upload videos. Real-time image analysis wasn't possible.

And it was slower than GPT-3.5. More capable, but also more expensive to run.

Where Are They Now?

GPT-4 remained OpenAI's flagship model for over a year until GPT-4o ("omni") launched in May 2024. Even today, variants of GPT-4 power much of ChatGPT, Microsoft Copilot, and thousands of AI applications.

The model that passed the bar exam in March 2023 set the standard for AI capabilities. It proved that AI could move beyond party tricks to genuinely useful professional tools.

More importantly, GPT-4 established what "frontier AI" meant. Every model released since then—from Claude 3 to Gemini to Llama 3—has been compared to GPT-4's benchmark performance.

March 14, 2023 was the day AI capabilities took a visible, undeniable leap forward. The bar exam result was symbolic, but the underlying improvement was real: AI had gotten dramatically smarter, and the race to build even better models had officially entered overdrive.

LUWAI

GPT-4: The AI That Passed the Bar Exam and Changed Everything

The Quiet Development

The Big Reveals

1. Multimodal Vision

2. Longer Context

3. Dramatic Reasoning Improvement

4. Reduced Hallucinations

The Real-World Applications

The Competitive Shockwave

The Behind-the-Scenes Story

The Access Strategy

The Limitations Everyone Discovered

Where Are They Now?

Tags

Articles liés

Grok 3: How Elon Used 10x More Compute to Catch OpenAI

o3-mini: The Cheaper, Faster Way to Get AI Reasoning

Why Google Just Made Gemini 2.0 Flash the Default AI

LUWAI - Formations IA pour entreprises et dirigeants

The Quiet Development

The Big Reveals

1. Multimodal Vision

2. Longer Context

3. Dramatic Reasoning Improvement

4. Reduced Hallucinations

The Real-World Applications

The Competitive Shockwave

The Behind-the-Scenes Story

The Access Strategy

The Limitations Everyone Discovered

Where Are They Now?

Tags

Articles liés

Grok 3: How Elon Used 10x More Compute to Catch OpenAI

o3-mini: The Cheaper, Faster Way to Get AI Reasoning

Why Google Just Made Gemini 2.0 Flash the Default AI