Google Enters the Lightweight AI Market With Gemma

Developers and researchers can access Gemma on a variety of platforms. Compact AI models can be used to make chatbots and summarization tools.

Écrit par

Megan Crouse

Feb 21, 2024

We may earn from vendors via affiliate links or sponsorships. This might affect product placement on our site, but not the content of our reviews. See our Terms of Use for details.

Google has released Gemma, a family of AI models based on the same research as Gemini. Developers can’t quite get their hands into the engine of Google Gemini yet, but what the tech giant released on Feb. 21 is a smaller, open source model for researchers and developers to experiment with.

Although generative AI is trendy, organizations may struggle to figure out how to apply it and prove ROI; open source models allow them to experiment with finding practical use cases.

Smaller AI models like this don’t quite have the same performance as larger ones like Gemini or GPT-4, but they are flexible enough to let organizations build custom bots for customers or employees. In particular, the fact that Gemma can run on a workstation shows the continued trend from generative AI makers toward giving organizations options for ChatGPT-like functionality without the heavy workload.

SEE: OpenAI’s newest model Sora creates impressive photorealistic videos that still often look unreal. (TechRepublic)

What is Google’s Gemma?

Google Gemma is a family of generative AI models that can be used to build chatbots or tools that can summarize content. Google Gemma models can run on a developer laptop, a workstation or through Google Cloud. Two sizes are available, 2 billion or 7 billion parameters.

For developers, Google is providing a variety of tools for Gemma deployment, including toolchains for inference and supervised fine-tuning in JAX, PyTorch and TensorFlow.

For now, Gemma only works in English.

More must-read AI coverage

How do I access Google Gemma?

Google Gemma can be accessed through Colab, Hugging Face, Kaggle, Google’s Kubernetes Engine and Vertex AI, and NVIDIA’s NeMo.

Google Gemma can be accessed for free for research and development in Kaggle and through a free tier for Colab notebooks. First-time Google Cloud users can receive $300 in credits toward Gemma. Google Cloud credits of up to $500,000 are available for researchers who apply. Pricing and availability in other cases may depend on your organizations’ particular subscriptions and needs.

Since Google Gemma is open source, commercial use is permitted, as long as that use is in accordance with the Terms of Service. Google also released a Responsible Generative AI Toolkit with which developers can provide guidelines around their AI projects.

“It’s great to see Google reinforcing its commitment to open-source AI, and we’re excited to fully support the launch with comprehensive integration in Hugging Face,” said Hugging Face’s Technical Lead Phillip Schmid, Head of Platform and Community Omar Sanseviero and Machine Learning Engineer Pedro Cuenca in a blog post.

How does Google Gemma work?

Like other generative AI models, Gemma is a software that can respond to natural language prompts as opposed to conventional programming languages or commands. Google Gemma was trained on publicly available information, with personally identifiable information and “sensitive” material filtered out.

Google worked with NVIDIA to optimize Gemma for NVIDIA products, in particular by offering acceleration on NVIDIA’s TensorRT-LLM, a library for large language model inference. Gemma can be fine-tuned in the NVIDIA AI Enterprise.

What are the main competitors to Google Gemma?

Gemma competes with small generative AI models such as Meta’s open source large language models, particularly Llama 2; Mistral AI’s 7B model, Deci’s DecilLM and Microsoft’s Phi-2, as well as similar small generative AI models meant to run on an organization’s own hardware.

Hugging Face noted that Gemma out-performs many other small AI models on its leaderboard, which evaluates pre-trained models on basic factual questions, commonsense reasoning and trustworthiness. Only Llama 2 70B, the model included as a reference benchmark, earned a higher score than Gemma 7B. Gemma 2B, on the other hand, performed relatively poorly compared to other small, open AI models.

Google’s full-scale AI model, Gemini, comes in 1.8B and 3.25B parameter versions and is designed to run on Android phones.

Megan Crouse

Megan Crouse has a decade of experience in business-to-business news and feature writing, including as first a writer and then the editor of Manufacturing.net. Her news and feature stories have appeared in Military & Aerospace Electronics, Fierce Wireless, TechRepublic, and eWeek. She copyedited cybersecurity news and features at Security Intelligence. She holds a degree in English Literature and minored in Creative Writing at Fairleigh Dickinson University.