How to Fine-Tune GPT-J

Forefront Team
September 7, 2021

Recent research in Natural Language Processing (NLP) has led to the release of multiple large transformer-based language models like OpenAI’s GPT-[2,3], EleutherAI’s GPT-[Neo, J], and Google’s T5. For those not impressed by the leap of tunable parameters in the billions, the ease with which these models could perform on a never before seen task without training a single epoch is something to behold. While it has become evident that the more parameters a model has the better it will generally perform, an exception to this rule applies when one explores fine-tuning. Fine-tuning refers to the practice of further training transformer-based language models on a dataset for a specific task. This practice has led to the 6 billion parameter GPT-J outperforming the 175 billion GPT-3 Davinci on a number of specific tasks. As such, fine-tuning will continue to be the modus operandi when using language models in practice, and, consequently, fine-tuning is the main focus of this post. Specifically, how to fine-tune the open-source GPT-J-6B.


Curate a dataset

The first step in fine-tuning GPT-J is to curate a dataset for your specific task. The specific task for this tutorial will be to imitate Elon Musk. To accomplish this, we compiled podcast transcripts of Elon’s appearances on the Joe Rogan Experience and Lex Fridman Podcast into a single text file. Here’s the text file for reference. Note that the size of the file is only 150kb. When curating a dataset for fine-tuning, the main focus should be to encapsulate an evenly-distributed sample of the given task instead of prioritizing raw size of the data. In our case, these podcast appearances of Elon were great as they encompass multiple hours of him speaking on a variety of different topics.

If you plan on fine-tuning on a dataset of 100MB or greater, get in touch with our team before beginning.



Fine-tuning GPT-J on Forefront

Believe it or not, once you have your dataset, the hard part is done since Forefront abstracts all of the actual fine-tuning complexity away. Let’s go over the remaining steps to train your fine-tuned model.


Create deployment

Once logged in, click “New deployment”.

Create deployment


Select Fine-tuned GPT-J

From here, we’ll add a name and optional description for the deployment then select "Fine-tuned GPT-J".



Select fine-tuned GPT-J


Upload dataset

Then, we’ll upload our dataset in the form of a single text file. Again, if the dataset is 100MB or greater, get in touch with our team.

Upload dataset


Set training duration

A good rule of thumb for smaller datasets is to train 5-10 minutes every 100kb. For text files in the order of megabytes, you’ll want to train 45-60 minutes for every 10MB.

Set training duration

Set number of checkpoints

A checkpoint is a saved model version that you can deploy. You’ll want to set a number of checkpoints that evenly divides the training duration.

Set number of checkpoints


Add test prompts

Test prompts are prompts that every checkpoint will automatically provide completions for so you can compare the performance of the different models. Test prompts should be pieces of text that are not found in your training text file. This allows you to see how good the model is at understanding your topic and prevents the model from regurgitating information it has seen in your training set.

You can also customize model parameters for your specific task.

Add test prompts


Once your test prompts are set, you can press 'Fine-tune' and your fine-tuned model will begin training. You may notice the estimated completion time is longer than your specified training time. This is because it takes time to load the base weights prior to training.

View test prompts

As checkpoints being to appear, you can press 'View test prompts' to start comparing performance between your different checkpoints.


View test prompts button
View test prompts


Deploy to Playground and integrate in application

Now for the fun part: deploying your best-performing checkpoint(s) for further testing in the Playground or integration into your app.

Deploy checkpoint


To see how simple it is to use the Playground and integrate your GPT-J deployment into your app, check out our tutorial on deploying standard GPT-J.


Fine-tuning GPT-J by yourself

Using Forefront isn’t the only way to fine-tune GPT-J. For a tutorial on fine-tuning GPT-J by yourself, check out Eleuther’s guide. However, it’s important to note that not only do you save time by fine-tuning on Forefront, but it’s absolutely free—saving you $8 per hour of training. Also, when you go to deploy your fine-tuned model you save up to 33% on inference costs with increased throughput by deploying on Forefront.

Helpful Tips

  1. Prioritize quality samples of the given task over a large dataset when curating your dataset.
  2. Train 5-10 minutes per 100kb or 45-60 minutes per 10MB of your dataset.
  3. Save a number of checkpoints that evenly divides the number of minutes your training. Saving more than 10-15 checkpoints returns diminishing value and makes assessing quality difficult.
  4. Set test prompts that are not included in your dataset.
  5. You can deploy multiple checkpoints and conduct further testing in our Playground. Deployed checkpoints are pro-rated according to time deployed.
  6. For more detailed information on fine-tuning and preparing your dataset, refer to our docs.

These tips are meant as loose guidelines and experimentation is encouraged.


At Forefront, we believe building a simple experience for fine-tuning can increase experimentation with quicker feedback loops so companies and individuals can apply language models to a myriad problems. If you have any ideas on how we can further improve the fine-tuning experience, please get in touch with our team.

Ready to try GPT-J?

Increase throughput, fine-tune for free, and save up to 33% on inference costs. Try GPT-J on Forefront today.

contact sales