📋

Overview

Groq is positioned as a high-performance AI inference platform built specifically for developers and enterprises. Its core offering, GroqCloud, delivers fast, scalable, and affordable inference for a variety of AI models, including large language models (LLMs), text-to-speech, and automatic speech recognition. The platform's key differentiator is its custom silicon, the LPU, which was purpose-built from the ground up for inference tasks, enabling exceptional speed and cost efficiency at scale.

The main use cases include integrating AI capabilities into applications, processing large-scale workloads, and deploying intelligent systems that require low-latency responses. The target audience spans developers, startups, and large enterprises looking for a predictable, high-performance inference solution that integrates easily with existing workflows, such as through its OpenAI-compatible API.

⚡

Core Features

Custom LPU (Language Processing Unit) hardware purpose-built for fast and affordable AI inference.
GroqCloud platform offering low-latency, scalable inference with models deployed worldwide.
OpenAI-compatible API, allowing integration with just a few lines of code.
Support for a wide range of models including LLMs, text-to-speech, and automatic speech recognition.
Features like prompt caching, batch API processing, and compound AI systems for intelligent tool selection.

🚀

How to Use

Visit the Groq console to get started and obtain a free API key.
Integrate using the OpenAI-compatible API by setting the base URL to https://api.groq.com/openai/v1 and providing your API key.
Start building and testing your application with the available models on the platform.

✨

Key Advantages

Exceptional inference speed and performance powered by custom LPU silicon, not adapted GPUs.
Predictable, linear pricing with no hidden costs or surprise bills, unlike other inference providers.
Proven cost savings and performance improvements, as evidenced by customer stories citing significant speed increases and cost reductions.

💰

Pricing

Tier	Price	Description
Free	$0	Great for getting started, includes build and test access with community support.
Developer	Pay Per Token	For scaling startups, includes higher limits, chat support, batch processing, and prompt caching. Pricing is based on token usage for specific models (e.g., $0.075 per million input tokens for GPT OSS 20B).
Enterprise	Contact Us	For large-scale custom needs, includes custom models, regional endpoints, dedicated support, and scalable capacity.