Version 1.5 aims to enhance efficiency of data summary and extraction
At its annual developer conference, Google highlighted its latest advancement in artificial intelligence, the Gemini 1.5 Flash, as part of its Gemini series.
As reported by CNBC, the company announced this new model is designed to be the lightest and most efficient yet, capable of summarizing conversations, captioning images, and videos, and extracting data from large documents and tables.
Demis Hassabis, CEO of Google DeepMind, emphasized the new model's enhancements during a press briefing, noting, “We heard from developers that they wanted something faster and even more cost-effective.”
The introduction of Gemini 1.5 Flash coincides with a broader industry shift towards generative AI, which is reshaping how tech companies approach product development and rollout.
This shift is especially significant for Google as it offers consumers more sophisticated and creative tools for accessing online information, moving beyond traditional web searches.
Google has also improved upon its existing AI models. The Gemini 1.5 Pro, for instance, can analyze multiple large documents—up to 1,500 pages in total—or summarize 100 emails.
According to Sissie Hsiao, a vice president at Google and general manager for Gemini experiences, the Pro model will soon handle an hour of video content or manage codebases exceeding 30,000 lines.
Hsiao described the utility of the new model: “You can quickly get answers and insights about dense documents, like figuring out the details of the pet policy in your rental agreement or comparing key arguments of multiple long research papers.”
Moreover, the unveiling of Gemini 1.5 Flash occurred alongside similar advancements from other tech giants. For example, OpenAI launched a new AI model and desktop version of ChatGPT just a day earlier.
They introduced GPT-4o, noted for being twice as fast as its predecessor and significantly more cost-effective.
OpenAI has also upgraded ChatGPT to support 50 different languages and integrated these improvements into its API, allowing developers immediate access to start building applications with the new model.
Gemini 1.5 Pro boasts capabilities in 35 languages and has a 2 million token window, which is essential for understanding context and processing extensive information simultaneously. Google executives highlighted the model's enhanced local reasoning, planning, and image comprehension.
Sundar Pichai, CEO of Alphabet, Google's parent company, praised the model’s capabilities during the press briefing, citing its extensive context window as unprecedented among foundational models.
He illustrated its practical applications by mentioning how a parent could use Gemini to summarize recent emails from their child’s school.
Finally, Google announced that Gemini 1.5 Pro would initially be available for testing in Workspace Labs, while Gemini 1.5 Flash would be accessible for testing and use on Vertex AI, Google’s platform for training and deploying AI applications.