Google Unveils Gemini, Claiming It’s More Powerful Than OpenAI’s GPT-4

0

[ad_1]

On Wednesday, Google introduced its highly anticipated general purpose, multimodal, generative AI model, Gemini, which the company claims is more powerful than OpenAI’s GPT-4.

“Gemini can understand the world around us in the way that we do,” said Demis Hassabis, founder of DeepMind, Google’s elite AI lab that created the model, adding that Gemini is better than any other model out there.

Google claims Gemini has 5 times the computational power of GPT-4, leading to faster training and potentially larger model sizes. It said Gemini is the first model to outperform human experts on MMLU (Massive Multitask Language Understanding), one of the most popular methods to test the knowledge and problem solving abilities of AI models.

The model will be made available to developers through Google Cloud’s API from December 13, with a more powerful version set to debut in 2024 pending extensive trust and safety checks.

Gemini, which comes in three sizes, can run efficiently on a range of platforms, from data centers to mobile devices and combines different types of information such as text, code, audio, image, and video.

  • Gemini Ultra, the full-powered version for handling highly complex tasks.
  • Gemini Pro, suitable for scaling across a wide range of tasks.
  • Gemini Nano, designed for on-device tasks.

“By making it accessible to developers through Pro and Nano, Google is empowering unprecedented innovation,” said Wyatt Oren, Director of Sales for Telehealth at Agora, the real-time engagement solutions provider. “The API offers incredible benefits for rapid prototyping and app development, especially when it comes to handling multimedia content.”

Google said Gemini Ultra excels at tasks involving deliberate reasoning, surpassing previous state-of-the-art models. Furthermore, it excels at image benchmarks, demonstrating native multi-modality and complex reasoning abilities.

The standard approach in creating multi-modal models involves training separate components for different modalities. However, Gemini was designed to be natively multi-modal, pre-trained on different modalities from the beginning. This design allows Gemini to understand and reason about all kinds of inputs far better than existing multi-modal models.

Gemini was trained to recognize and understand text, images, audio, and more simultaneously, which makes it proficient in explaining reasoning in complex subjects like math and physics.

Gemini’s sophisticated multi-modal reasoning capabilities can help make sense of complex written and visual information. It extracts insights from hundreds of thousands of documents, enabling breakthroughs at digital speeds in many fields from science to finance.

Gemini can understand, explain, and generate high-quality code in the world’s most popular programming languages. Its ability to reason about complex information places it among the leading foundation models for coding globally.

Google trained Gemini on its AI-optimized infrastructure using Google’s in-house designed Tensor Processing Units (TPUs), making it less subject to shortages of the GPUs that GPT-4 and other models depend on.

It designed Gemini to be its most reliable and scalable model to train, and its most efficient to serve. The company said it is adding new protections to account for Gemini’s multi-modal capabilities, considering potential risks at each stage of development.

Gemini is now rolling out across a range of products and platforms. For instance, Google’s chatbot, Bard, will use a fine-tuned version of Gemini Pro for more advanced reasoning, planning, understanding, and more.

Generative AI is rapidly evolving, and the relative strengths of competing models may shift over time. But one thing is certain: Google just upped the ante.

[ad_2]

Source link

You might also like
Leave A Reply

Your email address will not be published.