Fine-tune for free.

1
Fine-tuning is free when training on datasets smaller than 100MB for GPT-J and 10MB for GPT-NeoX. Contact our team for pricing beyond these limits.

Don't pay until you deploy.

Explore our dedicated GPU pricing to host language models. Select your model below.

$1.39

per GPU hour

Standard

New to language models or just experimenting? Start here.

Free fine-tuning

Built-in autoscaling

Throughput optimizations

Response speed
Response speed for a 300 token in, 30 token out request.

1.72 seconds

Requests per minute
Requests per minute for a single Standard GPU. Requests are 300 tokens in, 30 tokens out.

107

get started

$2.78

per GPU hour

Performance

Faster response speeds and higher throughput—everything you need to scale.

Free fine-tuning

Built-in autoscaling

Throughput optimizations

Response speed
Response speed for a 300 token in, 30 token out request.

1.19 seconds

Requests per minute
Requests per minute for a single Performance GPU. Requests are 300 tokens in, 30 tokens out.

230

get started

Or use the base GPT-J model at $0.006 / 1000 tokens

$2.22

per GPU hour

Standard (coming soon)

New to language models or just experimenting? Start here.

Free fine-tuning

Built-in autoscaling

Throughput optimizations

Response speed
Response speed for a 300 token in, 30 token out request.

-

Requests per minute
Requests per minute for a single Standard GPU. Requests are 300 tokens in, 30 tokens out.

-

get started

$3.05

per GPU hour

Performance

Faster response speeds and higher throughput—everything you need to scale.

Free fine-tuning

Built-in autoscaling

Throughput optimizations

Response speed
Response speed for a 300 token in, 30 token out request.

2.07 seconds

Requests per minute
Requests per minute for a single Performance GPU. Requests are 300 tokens in, 30 tokens out.

84

get started

Pay-per-token plan for the base GPT-NeoX model is coming soon.

Model cost comparisons

Compare Forefront’s cost per request to other platforms that host language models. The following comparisons are the number of requests (300 token input, 30 token output) you can optimally achieve with GPT-J on each platform.

4980
requests per $1
2
Using our GPT-J Performance GPU ($2.78 per hour) at 227 requests per minute (max throughput for requests of 300 in, 30 out).
Goose AI
1962
requests per $1
3
Using Goose's base request rate of $0.00045 for 25 generated tokens plus $0.000012 per additional generated token.
NLPCloud logo
NLPCloud
1638
requests per $1
4
Using NLPCloud's flat-rate Fine-tuning GPU replica ($0.55 per hour) at 15 requests per minute (max throughput for requests of 300 in, 30 out).
Grand logo
Grand
1000
requests per $1
5
Using Grand's pay-per-request price of $0.001 / request.
Neuro logo
Neuro
262
requests per $1
6
Using Neuro's compute-based usage pricing.
Avg. time to output 30 tokens: 2.75s
Cost per second of prediction: $0.00139 ($5/hour of prediction)
1650
requests per $1
2
Using our GPT-NeoX Performance GPU ($3.05 per hour) at 80 requests per minute (max throughput for requests of 300 in, 30 out).
Goose AI
337
requests per $1
3
Using Goose's base request rate of $0.002650 for 25 generated tokens plus $0.000063 per additional generated token.

Frequently asked questions

How does flat-rate hourly pricing work?
Do you have pay-per-token pricing?
What's the pricing for fine-tuning?
How much can I expect to pay for high usage?
leading cost efficiency

The best cost and throughput available

Use standard models with pay-per-token pricing or host fine-tuned models on flat-rate hourly GPUs. We obsessively focus on improving the cost per request you can optimally achieve with the models on our platform.

We've made several performance optimizations and use the most performant hardware so our cost per request is 2x cheaper than the closest competitor, enabling businesses to scale GPT models more cost efficiently than ever before. On-demand, transparent, and built for businesses of all sizes.

Ready to get started?

Start fine-tuning and deploying language models or explore Forefront Solutions.

Transparent, flexible pricing

Pay per token or per hour with flat-rate hourly GPUs. No hidden fees or confusing math.

pricing details
Start your integration

Get up an running with your models in just a few minutes.

documentation