Fine-tune for free.

1
Fine-tuning is free when training less than 10 hours.
If fine-tuning longer than 10 hours, the cost is $8 per hour.

Don't pay until you deploy.

Explore our flat-rate hourly pricing to host language models. Select your model below.

$600

monthly, billed at $0.83 per GPU hour

Starter

New to language models or just experimenting? Start here.

Free fine-tuning

Built-in autoscaling

Throughput optimizations

Response speed
Response speed for a 300 token in, 30 token out request.

3.22 seconds

Requests per minute
Requests per minute for a single Starter GPU. Requests are 300 tokens in, 30 tokens out.

66

get started

$1200

monthly, billed at $1.39 per GPU hour

Standard

Increasing usage or going to production? Unlock better cost efficiency and faster responses.

Free fine-tuning

Built-in autoscaling

Throughput optimizations

Response speed
Response speed for a 300 token in, 30 token out request.

1.34 seconds

Requests per minute
Requests per minute for a single Standard GPU. Requests are 300 tokens in, 30 tokens out.

84

get started

$2000

monthly, billed at $2.78 per GPU hour

Performance

Even faster response speeds and higher throughput—everything you need to scale.

Free fine-tuning

Built-in autoscaling

Throughput optimizations

Response speed
Response speed for a 300 token in, 30 token out request.

1.12 seconds

Requests per minute
Requests per minute for a single Performance GPU. Requests are 300 tokens in, 30 tokens out.

95

get started

How far does your dollar go?

Compare Forefront’s cost per request to other platforms that host language models. The following comparisons are the number of requests (300 token input, 30 token output) you can optimally achieve with GPT-J on each platform.

requests per $1
2
Using our flat-rate Starter GPU ($0.83 per hour) at 66 requests per minute (max throughput for requests of 300 in, 30 out).
NLPCloud
requests per $1
3
Using NLPCloud's flat-rate Fine-tuning GPU replica ($0.55 per hour) at 15 requests per minute (max throughput for requests of 300 in, 30 out).
Grand
requests per $1
4
Using Grand's pay-per-token price of $0.0076 / 1000 tokens. Their quoted price is $0.0017 / 1000 characters which is equivalent to the pay per token price above.
Neuro
requests per $1
5
Using Neuro's compute-based usage pricing.
Avg. time to output 30 tokens: 2.75s
Cost per second of prediction: $0.00139 ($5/hour of prediction)

Frequently asked questions

How does flat-rate hourly pricing work?
Do you have pay-per-token pricing?
What's the pricing for fine-tuning?
How much can I expect to pay for high usage?
leading cost efficiency

The best cost and throughput available

Use standard models with pay-per-token pricing or host fine-tuned models on flat-rate hourly GPUs. We obsessively focus on improving the cost per request you can optimally achieve with the models on our platform.

We've made several performance optimizations and use the most performant hardware so our cost per request is 4x cheaper than the closest competitor, enabling businesses to scale GPT-J and GPT-NeoX more cost efficiently than ever before. On-demand, transparent, and built for businesses of all sizes.

Ready to get started?

Start fine-tuning and deploying language models or explore Forefront Solutions.

Transparent, flexible pricing

Pay per token or per hour with flat-rate hourly GPUs. No hidden fees or confusing math.

pricing details
Start your integration

Get up an running with your models in just a few minutes.

documentation