GPT-5.5 Instant is OpenAI's default ChatGPT model, released on May 5, 2026 as the successor to GPT-5.3 Instant. It shares the same underlying architecture as GPT-5.5 Thinking and Pro — a natively omnimodal model processing text, images, audio, and video end-to-end — but is optimized for low latency and everyday conversational use. GPT-5.5 Instant is the first Instant-tier model OpenAI classifies as "High Capability" in both cybersecurity and biological domains.
- Reduced Hallucinations: 52.5% fewer hallucinated claims than GPT-5.3 Instant on high-stakes prompts (medicine, law, finance), and 37.3% fewer inaccurate claims on user-flagged factual errors.
- Natively Omnimodal: Processes text, images, audio, and video in a single unified architecture — not separate models stitched together. Improvements include better photo/image analysis and STEM question answering.
- Concise Output: Uses roughly 30.2% fewer words and 29.2% fewer lines than GPT-5.3 Instant to convey the same information, with reduced "gratuitous emojis" and tighter formatting.
- Personalized Memory: Can reference past conversations, files, and Gmail for more personalized responses (available to Plus and Pro users).
- Everyday Knowledge Work: Optimized for info-seeking questions, how-tos, technical writing, and translation with a warm conversational tone and low latency.
- Multimodal Analysis: Strong image and document understanding makes it well-suited for analyzing uploads, screenshots, charts, and visual content.
- High-Stakes Factual Q&A: The significant hallucination reduction makes it more reliable for queries in medicine, law, and finance compared to prior Instant models.
| Capability | Description |
|---|
| Reasoning | AIME 2025: 81.2% (vs 65.4% for GPT-5.3 Instant). Shares architecture with GPT-5.5 Thinking |
| Coding | Capable for everyday coding tasks; for complex agentic coding, GPT-5.5 Thinking is recommended |
| Multimodal | Text, image, audio, and video input — natively omnimodal architecture |
| Response Speed | Low-latency design; matches GPT-5.4 per-token latency despite higher capability |
| Context Window | 1M tokens (922K input + 128K output). Long-context surcharge above 272K input tokens |
| Max Output | 128K tokens |
| Tool Use | Web search, file analysis; auto-switching across tools |
| Multilingual | Improved translation quality; broad multilingual support |
- Instant tier trades reasoning depth for speed — complex multi-step reasoning and agentic workflows are better served by GPT-5.5 Thinking or Pro.
- MMMU-Pro multimodal score (76) and AIME math score (81.2) are significantly lower than the full GPT-5.5 Thinking model (which scores 93.6% on GPQA Diamond).
- The rapid quarterly release cadence means prompt libraries and custom GPTs tuned to this model may need rebuilding when the next version ships (~July 2026).
- Knowledge cutoff is December 2025 (web search compensates for more recent information).
| Model | Input (Credits/Token) | Cache Write (Credits/Token) | Cache Read (Credits/Token) | Output (Credits/Token) |
|---|
| GPT-5.5 Instant | 5.00 | 5.00 | 0.50 | 30.00 |
- Long-context (>272K input tokens): Input 2x, Output 1.5x