Explore pay-per-token plans to get started and dedicated resources to scale.
Start experimenting with $20 in free credit to fine-tune or inference any model.
Select from a variety of models with different capabilities and price points.
Switch from pay-per-token to dedicated resources to save on usage costs as you scale.
Choose the right model for your task and only pay for the resources you use.
Prices are per 1,000 tokens. Tokens are ~4 characters, where 1,000 tokens is about 750 words.
Dedicated resources are GPUs to host your models. They offer more reliable latency and increased throughput compared to shared resources (pay-per-token) while being more cost efficient at scale.
Multiple models of the same type and size can be hosted on a single GPU.
Train your own custom models by fine-tuning on your training data.
When using your fine-tuned model, you’ll be billed at the same rate as the base model.
Start fine-tuning and deploying language models or explore Forefront Solutions.