Explore flat-rate, simple plans to get started and dedicated resources to scale.
Start with our Free plan to get started.
Select from a variety of models with different capabilities.
Seamlessly switch to dedicated resources as you scale.
Train your own custom models by fine-tuning on your training data. Export your models to self-host on the Team plan.
Dedicated resources are GPUs to host your models. They offer more reliable latency and increased throughput compared to shared resources (pay-per-token) while being more cost efficient at scale.
Multiple models of the same type and size can be hosted on a single GPU.
Forefront Solutions enable you to build powerful natural language processing solutions in a few lines of code.
Use Summarize to accurately condense information in text or dialogue.
Use Label to quickly solve text labeling use cases.
Use Extract to isolate key information from text.
No prompt engineering or fine-tuning required.
Choose the right model for your task and only pay for the resources you use.
Prices are per 1,000 tokens. Tokens are ~4 characters, where 1,000 tokens is about 750 words.
Start fine-tuning and deploying language models or explore Forefront Solutions.
Pay per token or per hour with flat-rate hourly GPUs. No hidden fees or confusing math.pricing details
Get up an running with your models in just a few minutes.documentation