Pricing

Explore pay-per-token plans to get started and dedicated resources to scale.

Start for free

Start experimenting with $10 in free credit every month to fine-tune or inference any model.

Choose the right model

Select from a variety of models with different capabilities and price points.

Built to scale

Switch from pay-per-token to dedicated resources to save on usage costs as you scale.

Instantly build powerful NLP solutions

Forefront Solutions enable you to build powerful natural language processing solutions in a few lines of code.

Use Summarize to accurately condense information in text or dialogue.
Use Label to quickly solve text labeling use cases.
Use Extract to isolate key information from text.

No prompt engineering or fine-tuning required.

Tier 1
0-1M characters
$
0.02
/ 1k characters
Tier 2
1M-10M characters
$
0.010
/ 1k characters
Tier 3
10M-100M characters
$
0.005
/ 1k characters
Tier 4
100M-1B characters
$
0.0020
/ 1k characters
Tier 5
1B+ characters
$
0.0015
/ 1k characters
Summarize is charged based on the number of characters processed (input + output).
Tier 1
0-10k requests
$
0.01
/ request
Tier 2
10k-100k requests
$
0.005
/ request
Tier 3
100k-1M requests
$
0.002
/ request
Tier 4
1M-10M requests
$
0.0015
/ request
Tier 5
10M+ requests
$
0.0010
/ request
Label is charged on a per request basis.
Tier 1
0-1M characters
$
0.02
/ 1k characters / entity
Tier 2
1M-10M characters
$
0.010
/ 1k characters / entity
Tier 3
10M-100M characters
$
0.005
/ 1k characters / entity
Tier 4
100M-1B characters
$
0.0030
/ 1k characters / entity
Tier 5
1B+ characters
$
0.0015
/ 1k characters / entity
Extract is charged based on the number of characters processed (input + output) multiplied by the number of entity categories extracted. Using Tier 1 pricing, extracting all people, locations, and events from 1,000 characters of text would cost ~$0.06.

Start with pay-per-token

Choose the right model for your task and only pay for the resources you use.

Prices are per 1,000 tokens. Tokens are ~4 characters, where 1,000 tokens is about 750 words.

Model
Parameters
Completions usage
6
billion
$
0.005
 / 1000 tokens
16
billion
$
0.012
 / 1000 tokens
20
billion
$
0.018
 / 1000 tokens
20
billion
$
0.018
 / 1000 tokens
30
billion
$
0.020
 / 1000 tokens

Scale with dedicated resources

Dedicated resources are GPUs to host your models. They offer more reliable latency and increased throughput compared to shared resources (pay-per-token) while being more cost efficient at scale.

Multiple models of the same type and size can be hosted on a single GPU.

Model
Parameters
GPU usage
6
billion
$
1.39
 / GPU hour
Model
Parameters
GPU usage
6
billion
$
2.22
 / GPU hour
16
billion
$
2.78
 / GPU hour
20
billion
$
3.05
 / GPU hour
20
billion
$
3.05
 / GPU hour
30
billion
$
3.33
 / GPU hour
Model
Parameters
GPU usage
6
billion
$
2.78
 / GPU hour

Fine-tune your models

Train your own custom models by fine-tuning on your training data.

When using your fine-tuned model, you’ll be billed at the same rate as the base model.

Model
Parameters
Training cost
6
billion
$
0.0010
 / training example
16
billion
$
0.0020
 / training example
20
billion
$
0.0040
 / training example
20
billion
$
0.0040
 / training example
30
billion
$
0.0050
 / training example

Frequently asked questions

What is a token?
What is a GPU hour?
How will I know how many tokens or GPU hours I use?

Ready to get started?

Start fine-tuning and deploying language models or explore Forefront Solutions.

Transparent, flexible pricing

Pay per token or per hour with flat-rate hourly GPUs. No hidden fees or confusing math.

pricing details
Start your integration

Get up an running with your models in just a few minutes.

documentation