The honest comparison nobody makes
Academic benchmarks comparing Claude, GPT-4 and Gemini are plentiful. But they rarely measure what truly matters for a project manager or business analyst day-to-day. This comparison is based on 6 months of intensive use of all three models in real project management contexts.
Methodology
I tested all three models on the same tasks, with the same prompts, over a 6-month period (October 2024 – March 2025). Contexts: Agile project management, specification writing, document analysis, stakeholder communication.
Results by use case
| Task | Claude 3.5 Sonnet | GPT-4o | Gemini 1.5 Pro |
|---|---|---|---|
| User story writing | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ |
| Long document analysis | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| Code generation | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| Management communication | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ |
| Google Workspace integration | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| Microsoft 365 integration | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
| Factual reliability | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
My qualitative observations
Claude: the most nuanced
Claude excels at tasks requiring judgement and nuance: analysing a complex situation, identifying ambiguities, politely declining a problematic request. Its ability to work on very long documents (200k tokens) is unmatched in practice.
GPT-4o: the most integrated
If your organisation runs on Microsoft 365, Copilot (GPT-4 under the hood) is unbeatable for its native integration in Word, Excel, Teams, Outlook. The productivity gained through contextual integration can sometimes outweigh slightly lower quality on certain tasks.
Gemini: best for Google Workspace
Same reasoning as GPT-4/Copilot for Google-first organisations. Reading Google Docs, Sheets, Gmail and contextual generation within those tools is markedly superior to competitors.
My practical recommendation
Choose based on your primary ecosystem. If you're neutral: start with Claude for writing and analysis quality. Add Copilot if you use M365 intensively, Gemini for Workspace if you're on Google.
