- ← Retour aux ressources
- /GPT-4: The AI That Passed the Bar Exam and Changed Everything
GPT-4: The AI That Passed the Bar Exam and Changed Everything
On March 14, 2023, OpenAI released GPT-4. It could see images, passed professional exams, and set a new standard for AI intelligence.
On March 14, 2023, OpenAI released GPT-4. The announcement included a detail that made everyone stop and read again: GPT-4 had scored in the 90th percentile on the bar exam. Not a practice test—the actual exam lawyers take to practice law.
ChatGPT's predecessor, GPT-3.5, scored in the 10th percentile. In just a few months, AI had gone from "barely passing" to "top of the class."
This wasn't an incremental improvement. This was a leap.
The Quiet Development
Unlike ChatGPT's surprise launch, GPT-4 had been in development for months with select testers under strict NDAs.
Companies like Morgan Stanley, Khan Academy, and Duolingo had been building on GPT-4 in secret. They knew something big was coming.
OpenAI learned from ChatGPT's chaotic viral explosion. This time, they prepared carefully. Red-team testing, safety evaluations, and partnership announcements were all ready before the public reveal.
The Big Reveals
GPT-4 brought multiple breakthroughs that redefined what AI could do.
1. Multimodal Vision
The most dramatic new capability: GPT-4 could see and understand images.
You could show it a photo and ask questions. Upload a sketch of a website layout, and it would generate the code. Take a picture of your refrigerator contents, and it would suggest recipes.
This opened entirely new use cases. AI wasn't just about text anymore.
2. Longer Context
GPT-4 could handle 32,000 tokens—roughly 25,000 words or 50 pages of text. You could feed it entire documents, long articles, or codebases and ask questions about them.
The previous 4,000-token limit (about 3,000 words) had been a major constraint. GPT-4 shattered that ceiling.
3. Dramatic Reasoning Improvement
The benchmark results were stunning:
- Bar Exam: 90th percentile (up from 10th)
- SAT Math: 700/800 (89th percentile)
- SAT Reading/Writing: 710/800 (93rd percentile)
- AP Biology: 5/5
- AP Calculus BC: 4/5
These weren't cherry-picked examples. GPT-4 consistently performed at or above human expert level across standardized tests.
4. Reduced Hallucinations
GPT-4 was 40% less likely to make up facts compared to GPT-3.5. It was still imperfect, but the improvement was noticeable.
For professional use cases where accuracy matters, this was crucial.
The Real-World Applications
Within hours of GPT-4's release, developers started sharing what they'd built.
Khan Academy demoed Khanmigo, an AI tutor powered by GPT-4 that could explain concepts, answer questions, and adapt to student level.
Be My Eyes showed how GPT-4 could describe images for blind and low-vision users, reading labels, navigating spaces, and identifying objects.
Duolingo introduced conversational practice with AI characters powered by GPT-4, making language learning more interactive.
These weren't demos—they were real products people could use immediately.
The Competitive Shockwave
GPT-4's release sent competitors scrambling.
Google had just launched Bard a week earlier. Suddenly, Bard looked outdated. Google rushed to respond, but they were caught flat-footed.
Microsoft, which had invested $10 billion in OpenAI, immediately integrated GPT-4 into Bing and started rolling it across their entire product suite.
Anthropic, OpenAI's main AI safety-focused competitor, accelerated Claude development. The pressure was on to catch up.
The Behind-the-Scenes Story
What many people didn't know: GPT-4 had been finished for months before release.
OpenAI spent approximately six months on safety testing, alignment research, and red-teaming. They wanted to understand GPT-4's capabilities and risks before unleashing it publicly.
This delay frustrated some who wanted the technology immediately. But it set a precedent: the most capable AI systems deserved careful evaluation before deployment.
The Access Strategy
GPT-4 launched exclusively for ChatGPT Plus subscribers ($20/month) and API customers.
This was smart for several reasons:
Server capacity: Limiting access prevented the system from being overwhelmed Revenue: Subscription fees funded the massive computing costs Positioning: GPT-4 became a premium feature worth paying for
Free ChatGPT users could see what they were missing but had to pay to access it. Many converted to Plus just for GPT-4.
The Limitations Everyone Discovered
Despite the improvements, GPT-4 wasn't perfect.
It still hallucinated facts occasionally. It still struggled with complex multi-step math. It still had knowledge cutoff issues (training data ended in September 2021).
The vision capabilities, while impressive, were limited. You couldn't upload videos. Real-time image analysis wasn't possible.
And it was slower than GPT-3.5. More capable, but also more expensive to run.
Where Are They Now?
GPT-4 remained OpenAI's flagship model for over a year until GPT-4o ("omni") launched in May 2024. Even today, variants of GPT-4 power much of ChatGPT, Microsoft Copilot, and thousands of AI applications.
The model that passed the bar exam in March 2023 set the standard for AI capabilities. It proved that AI could move beyond party tricks to genuinely useful professional tools.
More importantly, GPT-4 established what "frontier AI" meant. Every model released since then—from Claude 3 to Gemini to Llama 3—has been compared to GPT-4's benchmark performance.
March 14, 2023 was the day AI capabilities took a visible, undeniable leap forward. The bar exam result was symbolic, but the underlying improvement was real: AI had gotten dramatically smarter, and the race to build even better models had officially entered overdrive.