Explore flat-rate, simple plans to get started and dedicated resources to scale.
Start with our Free plan to get started.
Select from a variety of models with different capabilities.
Seamlessly switch to dedicated resources as you scale.
Train your own custom models by fine-tuning on your training data. Export your models to self-host on the Team plan.
Dedicated resources are GPUs to host your models. They offer more reliable latency and increased throughput compared to shared resources (pay-per-token) while being more cost efficient at scale.
Multiple models of the same type and size can be hosted on a single GPU.
Start fine-tuning and deploying language models or explore Forefront Solutions.
Pay per token or per hour with flat-rate hourly GPUs. No hidden fees or confusing math.
pricing details