Recent research in Natural Language Processing (NLP) has led to the release of multiple large transformer-based language models like OpenAI’s GPT-[2,3], EleutherAI’s GPT-[Neo, J], and Google’s T5. For those not impressed by the leap of tunable parameters in the billions, the ease with which these models could perform on a never before seen task without training a single epoch is something to behold. While it has become evident that the more parameters a model has the better it will generally perform, an exception to this rule applies when one explores fine-tuning. Fine-tuning refers to the practice of further training transformer-based language models on a dataset for a specific task. This practice has led to the 6 billion parameter GPT-J outperforming the 175 billion GPT-3 Davinci on a number of specific tasks. As such, fine-tuning will continue to be the modus operandi when using language models in practice, and, consequently, fine-tuning is the main focus of this post. Specifically, how to fine-tune the open-source GPT-J-6B.
The first step in fine-tuning GPT-J is to curate a dataset for your specific task. The specific task for this tutorial will be to imitate Elon Musk. To accomplish this, we compiled podcast transcripts of Elon’s appearances on the Joe Rogan Experience and Lex Fridman Podcast into a single text file. Here’s the text file for reference. Note that the size of the file is only 150kb. When curating a dataset for fine-tuning, the main focus should be to encapsulate an evenly-distributed sample of the given task instead of prioritizing raw size of the data. In our case, these podcast appearances of Elon were great as they encompass multiple hours of him speaking on a variety of different topics.
If you plan on fine-tuning on a dataset of 100MB or greater, get in touch with our team before beginning.
Believe it or not, once you have your dataset, the hard part is done since Forefront abstracts all of the actual fine-tuning complexity away. Let’s go over the remaining steps to train your fine-tuned model.
Once logged in, click “New deployment”.
Select Fine-tuned GPT-J
From here, we’ll add a name and optional description for the deployment then select "Fine-tuned GPT-J".
Then, we’ll upload our dataset in the form of a single text file. Again, if the dataset is 100MB or greater, get in touch with our team.
Set training duration
A good rule of thumb for smaller datasets is to train 5-10 minutes every 100kb. For text files in the order of megabytes, you’ll want to train 45-60 minutes for every 10MB.
Set number of checkpoints
A checkpoint is a saved model version that you can deploy. You’ll want to set a number of checkpoints that evenly divides the training duration.
Add test prompts
Test prompts are prompts that every checkpoint will automatically provide completions for so you can compare the performance of the different models. Test prompts should be pieces of text that are not found in your training text file. This allows you to see how good the model is at understanding your topic and prevents the model from regurgitating information it has seen in your training set.
You can also customize model parameters for your specific task.
Once your test prompts are set, you can press 'Fine-tune' and your fine-tuned model will begin training. You may notice the estimated completion time is longer than your specified training time. This is because it takes time to load the base weights prior to training.
View test prompts
As checkpoints being to appear, you can press 'View test prompts' to start comparing performance between your different checkpoints.
Deploy to Playground and integrate in application
Now for the fun part: deploying your best-performing checkpoint(s) for further testing in the Playground or integration into your app.
To see how simple it is to use the Playground and integrate your GPT-J deployment into your app, check out our tutorial on deploying standard GPT-J.
Using Forefront isn’t the only way to fine-tune GPT-J. For a tutorial on fine-tuning GPT-J by yourself, check out Eleuther’s guide. However, it’s important to note that not only do you save time by fine-tuning on Forefront, but it’s absolutely free—saving you $8 per hour of training. Also, when you go to deploy your fine-tuned model you save up to 33% on inference costs with increased throughput by deploying on Forefront.
These tips are meant as loose guidelines and experimentation is encouraged.
At Forefront, we believe building a simple experience for fine-tuning can increase experimentation with quicker feedback loops so companies and individuals can apply language models to a myriad problems. If you have any ideas on how we can further improve the fine-tuning experience, please get in touch with our team.