Fine-tuning is a powerful technique to create a new GPT-J model that is specific to your use case. When done correctly, fine-tuning GPT-J can achieve performance that exceeds significantly larger, general models like OpenAI's GPT-3 Davinci.
To fine-tune GPT-J on Forefront, all you need is a set of training examples formatted in a single text file with each example generally consisting of a single input example and its associated output. Fine-tuning can solve a variety of problems, and the optimal way to format your dataset will depend on your specific use case. Below, we'll list the most common use cases for fine-tuning GPT-J, corresponding guidelines, and example text files.
Before diving into the most common use cases, there are a few best practices that should be followed regardless of the specific use case:
Classification is the process of categorizing text into a group of words. In classification problems, each input in the prompt should be classified into one of your predefined classes.
Choose classes that map to a single token. At inference time, specify the parameter, length=1, since you only need the first token for classification.
Let's say you'd like to organize your customer support messages by topics. You may want to fine-tune GPT-J to filter incoming support so they can be routed appropriately.
The dataset might look like the following:
In the example above, we provided instructions for the model followed by the input containing the customer support message and the output to classify the message to the corresponding category. As a separator we used " <|endoftext|> " which clearly separated the different examples. The advantage of using " <|endoftext|> " as the separator is that the model natively uses it to indicate the end of a completion. It does not need to be set as a stop sequence either because the model automatically stops a completion before outputting " <|endoftext|> ".
Now we can query our model by making a Completion request.
Sentiment Analysis is the act of identifying and extracting opinions within a given text across blogs, reviews, social media, forums, news, etc. Let's say you'd like to get a degree to which a particular product review is positive or negative.
The dataset might look the following:
Now we can query our model by making a Completion request.
The purpose of a chatbot is to simulate human-like conversations with users via text message or chat. You could fine-tune GPT-J to imitate a specific person or respond in certain ways provided the context of a given conversation to use in a customer support situation. First, let's look at getting GPT-J to imitate Elon Musk.
The dataset might look like the following:
Here we purposefully left out separators to divide specific examples. Instead, you can opt for compiling long-form conversations when attempting to imitate a specific person since we want to capture a wide variety of responses in an open-ended format.
You could query the model by making a Completion request.
Notice that we provide "User:" and "Elon Musk:" as a stop sequence. It's important to anticipate how the model may continue to provide completions beyond the desired output and use stop sequences to stop the model from continuing. Given the pattern of the dataset where a User says something followed by Elon Musk, it makes sense to use "User:" and "Elon Musk:" as the stop sequence.
A similar but different chatbot use case would be that of a customer support bot. Here, we'll go back to providing specific examples with separators so the model can identify how to respond in different situations. Depending on your customer support needs, this use case could require a few thousand examples, as it will likely deal with different types of requests and customer issues.
The dataset might look like the following:
An optional improvement that could be made to the above dataset would be to provide more context and exchanges leading up to the resolution for each example. However, this depends on the role you're hoping to fill with your customer support chatbot.
Now we can query our model by making a Completion request.
As with the previous example, we're using "Customer:" and "#####" as stop sequences so the model stops after providing the relevant completion.
The main purpose of entity extraction is to extract information from given text to understand the subject, theme, or other pieces of information like names, places, etc. Let's say, you'd like to extract names from provided text.
The dataset might look like the following:
Now we can query our model by making a Completion request.
A common use case is to use GPT-J to generate ideas provided specific information. Whether it's copy for ads or websites, blog ideas, or products, generating ideas is a useful task for GPT-J. Let's look at the aforementioned use case of generating blog ideas.
The dataset might look like the following:
Now we can query our model by making a Completion request.
Following the above example on preparing a dataset for each use case should lead to well-performing fine-tuned GPT-J models when having a sufficient amount of examples (>100MB dataset). For datasets less than 100MB, it is recommended to also provide explicit instructions with each example like the following dataset for blog idea generation.
Start fine-tuning and deploying language models or explore Forefront Solutions.
Pay per token or per hour with flat-rate hourly GPUs. No hidden fees or confusing math.
pricing details