Pricing

Explore flat-rate, simple plans to get started and dedicated resources to scale.

Starter
For individuals getting started.
$29/month
Monthly features
5M serverless tokens per month
($10 per 1M overage tokens)
5 fine-tuning jobs per month
1 user
Community support
Growth
For applications with low usage.
$99 / month
Features
20M serverless tokens per month
($10 per 1M overage tokens)
10 fine-tuning jobs per month
2 users
Community support
Team
For applications in production.
$299 / month
Features
50M serverless tokens month
($10 per 1M overage tokens)
Unlimited fine-tuning access
10 users
Standard support
Dedicated GPUs
Export fine-tuned models
Enterprise
For teams needing to scale.
Custom pricing
Features
Unlimited serverless tokens
Unlimited fine-tuned models
Unlimited users
Priority support
Dedicated GPU discounts
Export fine-tuned models

Start for free

Start with our Free plan to get started.

Choose the right model

Select from a variety of models with different capabilities.

Built to scale

Seamlessly switch to dedicated resources as you scale.

Fine-tune your models

Train your own custom models by fine-tuning on your training data. Export your models to self-host on the Team plan.

Model
Parameters
Training cost
6
billion
$
0.0020
 / training example
16
billion
$
0.0040
 / training example
20
billion
$
0.0080
 / training example
20
billion
$
0.0080
 / training example

Scale with dedicated resources

Dedicated resources are GPUs to host your models. They offer more reliable latency and increased throughput compared to shared resources (pay-per-token) while being more cost efficient at scale.

Multiple models of the same type and size can be hosted on a single GPU.

Model
Parameters
GPU usage
6
billion
$
1.39
 / GPU hour
Model
Parameters
GPU usage
6
billion
$
2.22
 / GPU hour
16
billion
$
2.78
 / GPU hour
20
billion
$
3.05
 / GPU hour
20
billion
$
3.05
 / GPU hour
Model
Parameters
GPU usage
6
billion
$
2.78
 / GPU hour

Instantly build powerful NLP solutions

Forefront Solutions enable you to build powerful natural language processing solutions in a few lines of code.

Use Summarize to accurately condense information in text or dialogue.
Use Label to quickly solve text labeling use cases.
Use Extract to isolate key information from text.

No prompt engineering or fine-tuning required.

Tier 1
0-1M characters
$
0.02
/ 1k characters
Tier 2
1M-10M characters
$
0.010
/ 1k characters
Tier 3
10M-100M characters
$
0.005
/ 1k characters
Tier 4
100M-1B characters
$
0.0020
/ 1k characters
Tier 5
1B+ characters
$
0.0015
/ 1k characters
Summarize is charged based on the number of characters processed (input + output).
Tier 1
0-10k requests
$
0.01
/ request
Tier 2
10k-100k requests
$
0.005
/ request
Tier 3
100k-1M requests
$
0.002
/ request
Tier 4
1M-10M requests
$
0.0015
/ request
Tier 5
10M+ requests
$
0.0010
/ request
Label is charged on a per request basis.
Tier 1
0-1M characters
$
0.02
/ 1k characters / entity
Tier 2
1M-10M characters
$
0.010
/ 1k characters / entity
Tier 3
10M-100M characters
$
0.005
/ 1k characters / entity
Tier 4
100M-1B characters
$
0.0030
/ 1k characters / entity
Tier 5
1B+ characters
$
0.0015
/ 1k characters / entity
Extract is charged based on the number of characters processed (input + output) multiplied by the number of entity categories extracted. Using Tier 1 pricing, extracting all people, locations, and events from 1,000 characters of text would cost ~$0.06.

Start with pay-per-token

Choose the right model for your task and only pay for the resources you use.

Prices are per 1,000 tokens. Tokens are ~4 characters, where 1,000 tokens is about 750 words.

Model
Parameters
Completions usage
6
billion
$
0.005
 / 1000 tokens
16
billion
$
0.012
 / 1000 tokens
20
billion
$
0.018
 / 1000 tokens
20
billion
$
0.018
 / 1000 tokens

Frequently asked questions

What is a token?
What is a GPU hour?
How will I know how many tokens or GPU hours I use?

Ready to get started?

Start fine-tuning and deploying language models or explore Forefront Solutions.

Transparent, flexible pricing

Pay per token or per hour with flat-rate hourly GPUs. No hidden fees or confusing math.

pricing details
Start your integration

Get up an running with your models in just a few minutes.

documentation