The latest addition to the family of AI models, Gemini 1.5 Flash-8B, is now generally available for production usage.
Google has made available a smaller and faster version of the Gemini 1.5 Flash AI model, which was announced at Google I/O and is now generally available for production use. The smaller model, called Gemini 1.5 Flash-8B, is designed to be more efficient at generating output, thanks to its fast speed and very low latency inference.
According to Google, the Flash-8B AI model offers the lowest cost per intelligence of any Gemini model. The company had earlier distilled the Gemini 1.5 Flash AI model into the smaller Gemini 1.5 Flash-8B, which was aimed at faster processing and more efficient generation of output. Now, Google is revealing that this smaller version of the AI model was developed by Google DeepMind a few months ago.
Despite being smaller in size, the tech giant says that the new model nearly matches the 1.5 Flash model on several benchmarks, ranging from simple chat to transcription and long context language translation.
The key advantage of the AI model is its cost efficiency. Google notes that the Gemini 1.5 Flash-8B will have the lowest token price in the Gemini family. Developers will be charged $0.15 (roughly Rs. 12.5) per one million output tokens, $0.0375 (roughly Rs. 3) per one million input tokens, and $0.01 (roughly Rs. 0.8) for each one million tokens on cached prompts.
Moreover, Google is doubling the rate limits for the 1.5 Flash-8B AI model. Developers can now send up to 4,000 requests per minute (RPM) while using this model. According to the tech giant, the model is ideal for simple, high-volume tasks. Developers can try out the model for free via Google AI Studio and Gemini API.
The above is the detailed content of Gemini 1.5 Flash-8B AI Model Now Generally Available. For more information, please follow other related articles on the PHP Chinese website!