Google has Multimodal Generative AI Gemini

Google Gemini

Google introduced Gemini, an artificial intelligence designed to compete with ChatGPT. Designed to outperform GPT-4 in most tests, Gemini is a significant step in the development of AI, according to Google CEO Sundar Pichai. It is intended to impact Google’s entire product line and will be available for public use starting December 13th.

Sundar Pichai announced Gemini’s release at the I/O conference in June. Now, under the name Gemini, Google has released several AI models at once. So, the “lightweight” Gemini Nano is designed to work on Android devices. The Gemini Pro model will soon become the basis of many Google services, most notably the Bard chatbot. The developers call the most powerful model, Gemini Ultra, the largest LLM that Google has ever released. It is apparently designed for data centers and enterprise applications.


Gemini is currently available only in English, but it will soon support others. According to Sundar Pichai, this model will eventually be integrated into Google’s search engine, its advertising products, the Chrome browser, and other applications.

Gemini is a multimodal model; it is capable of perceiving text, audio, images, video, and code. The company has already tested Gemini in comparison with GPT-4, which is the basis of the chat bot ChatGPT from OpenAI. Google said that Ultra surpassed GPT-4 in 30 out of 32 performance tests, including reasoning and image recognition. The Pro model outperformed the GPT-3.5 in six of the eight tests.

Google also said that Ultra was the first AI model to outperform humans in a multitasking test called MMLU, which covers 57 subjects including mathematics, physics, law, medicine, and ethics. Ultra will be used in a new code-writing tool called AlphaCode2, which Google claims is capable of outperforming 85% of human programmers in tests.

According to the developers, the most obvious advantage of Gemini is its multimodality. Google did not train separate models for voice and image recognition, such as DALL-E and Whisper from OpenAI, but from the very beginning built one model capable of perceiving different types of information. And Google promises that this perception will only improve.

Leave a Reply

Your email address will not be published. Required fields are marked *