New Gemini 2.5 Flash: -50% Google AI costs

Introduction

Google has announced the release of updated versions of Gemini 2.5 Flash and Flash-Lite, available on Google AI Studio and Vertex AI. These updates promise significant operational cost reductions of up to 50% and substantial performance improvements, representing an important step in the evolution of accessible artificial intelligence.

Key improvements in updated Gemini 2.5 Flash

The new Gemini 2.5 Flash introduces improvements in two fundamental areas based on developer feedback:

Enhanced agentic tool usage

Google has significantly enhanced the model's ability to use complex tools, achieving superior performance in multi-step and agentic applications. The improvement is quantifiable: a 5% increase on SWE-Bench Verified, rising from 48.9% to 54% compared to the previous version.

Superior operational efficiency

With "thinking" mode enabled, the model achieves higher quality outputs using fewer tokens, simultaneously reducing latency and operational costs by 24%.

"The new Gemini 2.5 Flash model offers a remarkable blend of speed and intelligence. Our evaluation on internal benchmarks revealed a 15% leap in performance for long-horizon agentic tasks."

Yichao 'Peak' Ji, Co-Founder & Chief Scientist at Manus

Gemini 2.5 Flash-Lite: optimized performance

The Flash-Lite version was developed following three key themes that make it particularly effective for high-throughput applications:

Better instruction following: The model follows complex instructions and system prompts significantly more accurately
Reduced verbosity: Produces more concise responses, a key factor in the 50% token cost reduction
Enhanced multimodal capabilities: More accurate audio transcription, better image understanding, and superior translation quality

Simplified access with "-latest" aliases

Google introduces an alias system to simplify access to the latest models. Developers can now use:

gemini-flash-latest
gemini-flash-lite-latest

These aliases always point to the most recent versions, eliminating the need to update code for each release. Google guarantees a 2-week email notice before any updates or deprecation.

Cost and performance impact

The economic improvements are substantial: a 50% reduction in output tokens for Flash-Lite and 24% for standard Flash. These improvements make AI more accessible for large-scale applications while maintaining or improving output quality.

Conclusion

The Gemini 2.5 Flash update represents an optimal balance between performance and economic sustainability. Significant cost reductions, combined with enhanced capabilities, open new possibilities for AI implementation in complex production scenarios.

FAQ

How much does it cost to use the new Gemini 2.5 Flash?

Costs are reduced by 24% for Flash and 50% for Flash-Lite compared to previous versions, thanks to fewer required output tokens.

How can I access the new Gemini 2.5 Flash versions?

You can use the models through Google AI Studio and Vertex AI with strings gemini-2.5-flash-preview-09-2025 or aliases gemini-flash-latest.

What are the main improvements in updated Gemini 2.5 Flash?

Improvements include optimized agentic tool usage, superior efficiency, and a 5% increase on SWE-Bench Verified benchmark.

Does the new Gemini Flash-Lite support multimodal applications?

Yes, Flash-Lite offers enhanced multimodal capabilities with more accurate audio transcription and better image understanding.

What does the "-latest" alias mean for Gemini models?

The "-latest" alias always points to the most recent model version, simplifying access without code modifications.

When do Gemini 2.5 Flash preview versions become stable?

These releases are not intended to become stable versions, but help Google gather feedback for future stable releases.

Google releases updated Gemini 2.5 Flash: -50% costs and enhanced AI performance

Introduction

Key improvements in updated Gemini 2.5 Flash

Enhanced agentic tool usage

Superior operational efficiency

Gemini 2.5 Flash-Lite: optimized performance

Simplified access with "-latest" aliases

Cost and performance impact

Conclusion

FAQ

How much does it cost to use the new Gemini 2.5 Flash?

How can I access the new Gemini 2.5 Flash versions?

What are the main improvements in updated Gemini 2.5 Flash?

Does the new Gemini Flash-Lite support multimodal applications?

What does the "-latest" alias mean for Gemini models?

When do Gemini 2.5 Flash preview versions become stable?

Tag:

Related links:

Introduction

Key improvements in updated Gemini 2.5 Flash

Enhanced agentic tool usage

Superior operational efficiency

Gemini 2.5 Flash-Lite: optimized performance

Simplified access with "-latest" aliases

Cost and performance impact

Conclusion

FAQ

How much does it cost to use the new Gemini 2.5 Flash?

How can I access the new Gemini 2.5 Flash versions?

What are the main improvements in updated Gemini 2.5 Flash?

Does the new Gemini Flash-Lite support multimodal applications?

What does the "-latest" alias mean for Gemini models?

When do Gemini 2.5 Flash preview versions become stable?

Tag:

Related links:

Related Articles

Google's Ironwood TPUs: The Hidden Threat to Nvidia's AI Dominance

Apple to Pay Google $1B Annually for AI-Powered Siri Overhaul

Gemini Deep Research now accesses emails, Drive, and Chat: intelligent search evolves