Join leading teams building the next wave of world-changing applications.
Fine-tune and use large language models at the best cost and throughput available.
Multiple models, each with different capabilities and price points. GPT-J is the fastest model, while GPT-NeoX is the most powerful—and more are on the way.
Use these models for classification, entity extraction, code generation, chatbots, content generation, summarization, paraphrasing, sentiment analysis, and much more.
These models have been pre-trained on a vast amount of text from the open internet. Fine-tuning improves upon this for specific tasks by training on many more examples than can fit in a prompt, letting you achieve better results on a wide number of tasks.
Fine-tuning is as simple as preparing a dataset and uploading it to the platform.
Prepare a dataset with training examples for your task then upload it to the platform to start fine-tuning.
Define how many epochs your model should train for and how many checkpoints, or model versions, should be saved throughout training.
Set test prompts and parameters to automatically test model performance, and add any checkpoint to get an HTTP endpoint for inference. See docs for more information
Instantly use any of your models with an HTTP endpoint for inference. Integration only takes a few minutes with all the parameters you’d expect.
documentationAdvanced optimizations
We relentlessly optimize models to achieve best in class response speeds.
Inference at scale
Unlock higher concurrency with an intelligent queuing system that increases throughput by 10x.
Large language models can perform various tasks like question answering, content generation, text summarization, and code generation. These large language models need very few examples to understand tasks, and when fine-tuned on as few as 100 training examples, can outperform models 10x in size.
Choose a genre category for each movie 1. Interstellar, 2. The Departed 3. Airplane and make a list of the movie and its genre:
Lex: Let's start with an easy question about consciousness. In your view, is consciousness something that's unique to humans or is it something that permeates all matter? Almost like a fundamental force of physics?
Elon: I don't think consciousness permeates all matter.
Lex: Panpsychics believe that. There's a philosophical
Elon: How would you tell?
Lex: That's true? That's a good point.
Elon: I believe in scientific method. Don't want to blow your mind or anything, but the scientific method is if you cannot test the hypothes is, then you cannot reach meaningful conclusion that it is true.
Lex: Do you think consciousness, understanding consciousness is within the reach of science of the scientific method?
Elon: We can dramatically improve our understanding of consciousness. I'd be hard pressed to say that we understand anything with complete accuracy, but can we dramatically improve our understanding of consciousness? I believe the answer is yes.
Company Name: Netflix
Product Description: Netflix is a subscription-based streaming service that allows our members to watch TV shows and movies without commercials on an internet-connected device. You can also download TV shows and movies to your iOS, Android, or Windows 10 device and watch without an internet connection.
Blog Idea:
Extract the mailing address from this email:
Dear Jonathan,
It was great to connect at the seminar. I thought your presentation was amazing and I really appreciate offering to send a copy of the book!
My address is 4837 N Henderson Ave, Dallas, TX 75204.
Best,
Sara Davenport
Name and address:
Human: Hello, who are you?
AI: I am an AI created by Forefront. How can I help you today?
Human: I'd like to cancel my subscription.
AI:
Write a SQL query to list the names of employees with the department ID matching "1238123"
Apple announced new iPhones, iPads, and a new Apple Watch during their event on September 14, 2021. The new iPhone 13 ranges from $699 (iPhone 13 Mini 128GB) to $1599 (iPhone 13 Pro Max 1TB). The new iPads range from $329 (iPad 64GB) to $2199 (12.9-inch iPad Pro 2TB). The new Apple Watch ranges from $279 (Apple Watch SE) to $579 (Apple Watch Series 7 45mm GPS + Cellular).
Make a table summarizing the recent products from Apple's announcement:
Translate the following into German: What time is breakfast?
more cost efficient than any alternative
better throughput and latency for any model or hardware
tokens processed per week
API requests per week
Instantly experiment with any of your models without having to use an API.
Our API processes billions of tokens per day with tasks ranging from extracting data in medical records to searching internal documents with natural language questions.
The NLP problems that the models on our platform can be used to solve are countless, and it’s just the beginning. See below on how some of our customers use them to build powerful products and solutions.
We help you use state-of-the-art language models without:
1. Usage quotas
2. Content filtering
3. Approval processes
We believe in giving you full control over your models to help you build products and solve problems how you see fit.
Start fine-tuning and deploying language models or explore Forefront Solutions.
Pay per token or per hour with flat-rate hourly GPUs. No hidden fees or confusing math.
pricing detailsLess than two weeks ago, EleutherAI announced their latest open source language model, GPT-NeoX-20B. Today, we’re excited to announce that Forefront is the first platform where you can fine-tune GPT-NeoX, enabling our customers to train the largest open source language model on any natural language processing or understanding task. Start fine-tuning GPT-NeoX for free
The same fine-tuning experience our customers have come to know with GPT-J will be offered for GPT-NeoX including free fine-tuning, JSON Lines and text file support, test prompts, Weights & Biases integration, and control over hyperparameters like epochs and checkpoints. We look forward to seeing all the ways our customers will fine-tune GPT-NeoX models to solve complex NLP problems at scale. Let’s take a closer look at fine-tuning.
What is fine-tuning?
Recent research in Natural Language Processing (NLP) has led to the release of multiple large transformer-based language models (LLMs) like OpenAI’s GPT-[2,3], EleutherAI’s GPT-[Neo, J], and most recently, GPT-NeoX-20B, a 20 billion parameter language model, by EleutherAI. One of the most impactful outcomes of this research has been the finding that the performance of LLMs scales predictably as a power-law with the number of parameters; the downside of scaling parameters being the increased cost to fine-tune and inference. For those not impressed by the leap of tunable parameters now in the tens of billions, the performance that these models can achieve on a variety of tasks after fine-tuning just a few epochs on as little as 100 training examples is where you start to see the value.
Fine-tuning refers to the practice of further training language models on a dataset to achieve better performance on a specific task. This practice can enable a model to outperform one 10x its size on virtually any task. As such, fine-tuned models are the majority of models deployed in production on the Forefront platform and where businesses get the most value.
Until now, one had to choose between GPT-J’s 6 billion parameters and GPT-3 Davinci’s 175 billion parameters. The former model small enough to fine-tune and inference cost efficiently, but not big enough to perform well on complex tasks. The latter model big enough to perform well on complex tasks, but incredibly expensive to fine-tune and inference. Enter GPT-NeoX-20B, and solving many more complex NLP tasks at scale starts to look doable. Let’s look at how GPT-NeoX fine-tuned on various tasks compares to vanilla GPT-NeoX and GPT-3 Davinci.
Text summarization
Summarize text into a few sentences.
Emotion classification
Classify text as an emotion.
Question answering
Answer natural language questions about provided text.
Chat summarization
Summarize dialogue and transcripts.
Content generation
Write a paragraph based on a topic and bullet point.
Question answering with context
Answer natural language questions based on the provided information and scenario.
Chatbot with personality
Imitate Elon Musk in a conversation.
Blog idea generation
Generate blog ideas based on a company name and product description.
Blog Outline
Provide a blog outline based on a topic
How to fine-tune GPT-NeoX on Forefront
The first (and most important) step to fine-tuning a model is to prepare a dataset. A fine-tuning dataset can be in one of two formats on Forefront: JSON Lines or plain text file (UTF-8 encoding). For the purpose of this example, we’ll format our dataset as JSON Lines where each example is a prompt-completion pair. Here are some example dataset formats for the emotion classification, text summarization, question answering, and chat summarization use cases above.
After uploading your dataset, you can set the number of epochs your model will train for. Epochs refer to the number of complete passes through a training dataset, or put another way, how many times a model will “see” each training example in your dataset. A range of 2-4 epochs is typically recommended depending on the size of your dataset.
Next, you’ll set a number of checkpoints. Checkpoints refer to how many model versions will be saved throughout training. Training a model for the optimal amount of time is incredibly important, and checkpoints lets you easily find the optimal time by comparing performance between models at different points during training. Performance is compared by setting test prompts.
Test prompts are a simple method to validate the performance of your model checkpoints. They work by adding prompts and parameters for each model checkpoint to provide completions. After training, you can review the completions from each checkpoint to find the best performing model.
Alternative ways to fine-tune GPT-NeoX
Alternatively, you could fine-tune GPT-NeoX on your own infrastructure. To do this, you'll need at least 8 NVIDIA A100s, A40s, or A6000s and use the NeoX Github repo to preprocess your dataset and run the training script. The script will need to be run with the many degrees of parallelism that EleutherAI's repo supports.
Helpful Tips
These tips are meant as loose guidelines and experimentation is encouraged.
At Forefront, we believe building a simple, free experience for fine-tuning will lower the cost of experimentation with large language models enabling businesses to solve a variety of complex NLP problems. If you have any ideas on how we can further improve the fine-tuning experience, please get in touch with our team. Don't have access to the Forefront platform? Get access
A few days ago, EleutherAI announced their latest open source language model, GPT-NeoX-20B. Today, we’re excited to announce that GPT-NeoX is live on the Forefront platform, and the model looks to outperform any previous open source language model on virtually any natural language processing or understanding task. Start using GPT-NeoX
We are bringing the same relentless focus to optimizing cost efficiency, throughput, and response speeds as we have with GPT-J. Today, you can host GPT-NeoX on our flat-rate dedicated GPUs at 2x better cost efficiency than any other platform.
The full model weights for GPT-NeoX will be downloadable for free from February 9, under a permissive Apache 2.0 license from The Eye. Until then, you can use the model on the Forefront platform. We look forward to seeing all the ways our customers use GPT-NeoX to build world-changing applications and solve difficult NLP problems.
Let's take a more technical look at the model.
GPT-NeoX-20B is a transformer model trained using EleutherAI’s fork of Microsoft’s Deepspeed which they have coined “Deeperspeed”. "GPT" is short for generative pre-trained transformer, "NeoX" distinguishes this model from its predecessors, GPT-Neo and GPT-J, and "20B" represents the 20 billion trainable parameters. The approach to train the 20B parameter model includes data, pipeline, and model parallelism (”3d parallelism”) to maximize performance and training speed from a fixed amount of hardware. Transformers have increasingly become the model of choice for NLP problems, replacing recurring neural network (RNN) models such as long short-term memory (LSTM), and GPT-NeoX is the newest and largest open source version of such language models.
The model consists of 44 layers with a model dimension of 6144, and a feedforward dimension of 24576. The model dimension is split into 64 heads, each with a size of 96. Rotary Position Embedding (RoPE) is applied to 24 dimensions of each head. The model is trained with a tokenization vocabulary of roughly 50,000. Unlike previous models, GPT-NeoX uses a tokenizer that was trained on the Pile along with added special tokens like multiple white spaces to make code more efficient.
GPT-NeoX was trained on the Pile, a large-scale curated dataset created by EleutherAI.
GPT-NeoX was trained as a causal, autoregressive language model for 3 months on 96 NVIDIA A100s interconnected by NVSwitch, using cross-entropy loss to maximize the likelihood of predicting the next token correctly.
GPT-NeoX learns an inner representation of the English language that can be used to extract features useful for downstream tasks. The model is best at generating text from a prompt due to the core functionality of GPT-NeoX being to take a string of text and predict the next token. When prompting GPT-NeoX it is important to remember that the statistically most likely next token is often the one that will be provided by the model.
See how GPT-NeoX compares on task accuracy, factual accuracy, and real-world use cases.
While Davinci still outperforms due to its 10x larger parameter size, GPT-NeoX holds up well in performance and outpaces other models on most standard NLP benchmarks.
The model excels at knowledge-based, factual tasks given the Pile contains a lot of code, scientific papers, and medical papers.
The following comparisons between GPT-J and GPT-NeoX use the same prompts and parameters. Completions are provided by the general model weights for each model. Keep in mind that fine-tuning will achieve significantly better performance.
Text to command
Translate text into programmatic commands.
Product description rewriting
Generate a new product description based on a given tone.
Named Entity Recognition
Locate and classify named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations.
Content generation
Generate structured HTML blog content.
Summarization
Summarize complex text into a few words.
Product review generation
Generate a product review based on a product description.
Code generation
Create code based on text instructions.
A unique aspect to GPT-NeoX is that it fills a gap between GPT-3 Curie and Davinci, pushing the edge of how large a language model can be while still reasonable to fine-tune without incurring significant training or hosting costs. We’ve seen a majority of our customers get the most value out of fine-tuning GPT-J, and we expect GPT-NeoX to be no different. For this reason, we’re enabling customs to fine-tune GPT-NeoX models for free. Stay tuned for a blog post comparing fine-tuned GPT-NeoX models with the standard GPT-NeoX model.
Our team is currently working to release GPT-NeoX fine-tuning within 24-48 hours. If you already have access to the Forefront platform, you can start using GPT-NeoX here. Or request access here. Please contact our team for specific questions or help related to your use case.
Fine-tuning is a powerful technique to create a new GPT-J model that is specific to your use case. When done correctly, fine-tuning GPT-J can achieve performance that exceeds significantly larger, general models like OpenAI's GPT-3 Davinci.
To fine-tune GPT-J on Forefront, all you need is a set of training examples formatted in a single text file with each example generally consisting of a single input example and its associated output. Fine-tuning can solve a variety of problems, and the optimal way to format your dataset will depend on your specific use case. Below, we'll list the most common use cases for fine-tuning GPT-J, corresponding guidelines, and example text files.
Before diving into the most common use cases, there are a few best practices that should be followed regardless of the specific use case:
Classification is the process of categorizing text into a group of words. In classification problems, each input in the prompt should be classified into one of your predefined classes.
Choose classes that map to a single token. At inference time, specify the parameter, length=1, since you only need the first token for classification.
Let's say you'd like to organize your customer support messages by topics. You may want to fine-tune GPT-J to filter incoming support so they can be routed appropriately.
The dataset might look like the following:
In the example above, we provided instructions for the model followed by the input containing the customer support message and the output to classify the message to the corresponding category. As a separator we used " <|endoftext|> " which clearly separated the different examples. The advantage of using " <|endoftext|> " as the separator is that the model natively uses it to indicate the end of a completion. It does not need to be set as a stop sequence either because the model automatically stops a completion before outputting " <|endoftext|> ".
Now we can query our model by making a Completion request.
Sentiment Analysis is the act of identifying and extracting opinions within a given text across blogs, reviews, social media, forums, news, etc. Let's say you'd like to get a degree to which a particular product review is positive or negative.
The dataset might look the following:
Now we can query our model by making a Completion request.
The purpose of a chatbot is to simulate human-like conversations with users via text message or chat. You could fine-tune GPT-J to imitate a specific person or respond in certain ways provided the context of a given conversation to use in a customer support situation. First, let's look at getting GPT-J to imitate Elon Musk.
The dataset might look like the following:
Here we purposefully left out separators to divide specific examples. Instead, you can opt for compiling long-form conversations when attempting to imitate a specific person since we want to capture a wide variety of responses in an open-ended format.
You could query the model by making a Completion request.
Notice that we provide "User:" and "Elon Musk:" as a stop sequence. It's important to anticipate how the model may continue to provide completions beyond the desired output and use stop sequences to stop the model from continuing. Given the pattern of the dataset where a User says something followed by Elon Musk, it makes sense to use "User:" and "Elon Musk:" as the stop sequence.
A similar but different chatbot use case would be that of a customer support bot. Here, we'll go back to providing specific examples with separators so the model can identify how to respond in different situations. Depending on your customer support needs, this use case could require a few thousand examples, as it will likely deal with different types of requests and customer issues.
The dataset might look like the following:
An optional improvement that could be made to the above dataset would be to provide more context and exchanges leading up to the resolution for each example. However, this depends on the role you're hoping to fill with your customer support chatbot.
Now we can query our model by making a Completion request.
As with the previous example, we're using "Customer:" and "#####" as stop sequences so the model stops after providing the relevant completion.
The main purpose of entity extraction is to extract information from given text to understand the subject, theme, or other pieces of information like names, places, etc. Let's say, you'd like to extract names from provided text.
The dataset might look like the following:
Now we can query our model by making a Completion request.
A common use case is to use GPT-J to generate ideas provided specific information. Whether it's copy for ads or websites, blog ideas, or products, generating ideas is a useful task for GPT-J. Let's look at the aforementioned use case of generating blog ideas.
The dataset might look like the following:
Now we can query our model by making a Completion request.
Following the above example on preparing a dataset for each use case should lead to well-performing fine-tuned GPT-J models when having a sufficient amount of examples (>100MB dataset). For datasets less than 100MB, it is recommended to also provide explicit instructions with each example like the following dataset for blog idea generation.
If you need custom assistance or support with preparing a dataset for your use case, get in touch with our team.
Generative Pre-trained Transformer (GPT) models, the likes of which GPT-J and GPT-3 belong to, have taken the NLP community by storm. These powerful language models excel at performing various NLP tasks like question-answering, entity extraction, categorization, and summarization without any supervised training. They require very few to no examples to understand a given task and outperform state-of-the-art models trained in a supervised fashion.
GPT-J is a 6-billion parameter transformer-based language model released by a group of AI researchers called EleutherAI in June 2021. The goal of the group since forming in July of 2020 is to open-source a family of models designed to replicate those developed by OpenAI. Their current focus is on the replication of the 175-billion parameter language model, GPT-3. But don’t let the difference in parameter size fool you. GPT-J outperforms GPT-3 in code generation tasks and, through fine-tuning, can outperform GPT-3 on a number of common natural language processing (NLP) tasks. The purpose of this article will be to outline an array of use cases that GPT-J can be applied to and excel at. For information on how to fine-tune GPT-J for any of the use cases below, check out our fine-tuning tutorial.
The most natural use case for GPT-J is generating code. GPT-J was trained on a dataset called the Pile, an 835GB collection of 22 smaller datasets—including academic sources (e.g., Arxiv, PubMed), communities (StackExchange, Wikipedia), code repositories (Github), and more. The addition of Github into the data has led to GPT-J outperforming GPT-3 on a variety of code generating tasks. While “vanilla” GPT-J is proficient at this task, it becomes even more capable when one fine-tunes the model on any given programming language.
To get started fine-tuning GPT-J for code generation, check out Hugging Face’s CodeSearchNet containing 2 million comment/code pairs from open-source libraries hosted on GitHub for Go, Java, Javascript, PHP, Python, and Ruby.
Input:
Output:
An increasingly common NLP use case is to build a chatbot. A chatbot is software that simulates human-like conversations with users via text message or chat. With its main commercial use case to help users by providing answers to their questions, chatbots are commonly used in a variety of customer support scenarios. However, chatbots can also be used to imitate specific people like Kanye West.
Regardless of your reason for using a chatbot, it is recommended to fine-tune GPT-J by providing transcripts of the specific task. For instance, let’s say you want a custom chatbot to assist with customer support requests. A simple method to curate a fine-tuning dataset would be to record transcripts of typical customer support exchanges between your team and customers. Somewhere in the order of one hundred or so examples would be enough for GPT-J to become proficient at your company’s specific customer support tasks.
Story writing is simply a work of fiction that is written in easily understandable grammatical structure with a natural flow of speech.
Story writing with GPT-J becomes interesting as one could fine-tune to a particular author’s writing style or book series. Imagine having a Stephen King writing bot or a bot that could help generate books 6 and 7 to Game of Thrones because, let’s be honest, George R.R. Martin is dragging his feet at this point.
Here’s an example of the begin ning to a fictitious piece written by GPT-J-6B:
The main purpose of entity extraction is to extract information from given text to understand the subject, theme, or other pieces of information like names, places, etc. Some interesting use cases for entity extraction include:
Financial market analysis: Extract key figures from financial news articles or documents to use as signals for trading algorithms or market intelligence
Email inbox optimization: Notify users of flight times, meeting locations, and credit card charges without having to open emails
Content recommendation: Extract information from articles and media to recommend content based on entity similarity and user preferences
GPT-J shines new light on entity extraction, providing a model that is adaptive to both general text and specialized documents through few-shot learning.
Summarization is the process of summarizing information in given text for quicker consumption without losing its original meaning. GPT-J is quite proficient out-of-the-box at summarization. What follows is an example of taking a snippet of the Wikipedia article for Earth and tasking GPT-J to provide a short summary.
Input:
Output:
Not to be confused with summarization, paraphrasing is the process of rewriting a passage without changing the meaning of the original text. Where summarization attempts to condense information, paraphrasing rewords the given information. While GPT-J is capable of summarization out-of-the-box, paraphrasing with GPT-J is best achieved through fine-tuning. Here is an example of paraphrasing a shorter snippet from the same Earth Wikipedia article in the previous summarization example after training on hand-written paraphrasing examples.
Input:
Output:
A widely used commercial use case for GPT-J and other transformer-based language models is copywriting for websites, ads, and general marketing. Copywriting is a crucial marketing process to increase website, ad, and other conversion rates. Through fine-tuning GPT-J on a given company’s voice or previously successful ad campaigns, GPT-J can automatically provide effective copy at a fraction of the cost of hiring a copywriter.
Input:
Output:
Text classification is the process of categorizing text into organized groups. Unstructured text is everywhere, such as emails, text conversations, websites, and social media, and the first step in extracting value from this data is to categorize it into organized groups. This is another use case where fine-tuning GPT-J will lead to the best performance. By providing one hundred examples or more of your given classification task, GPT-J can perform as good or better than the largest language models available like OpenAI’s GPT-3 Davinci.
Sentiment analysis is the act of identifying and extracting opinions within given text like blogs, reviews, social media, forums, new, etc. Perhaps you’d like to automatically analyze thousands of reviews about your products to discover if customers are happy about your pricing plans or gauge brand sentiment on social media in real-time so you can detect disgruntled customers and immediately respond, the applications of sentiment analysis are endless and applicable to any type of business.
Given the infancy of large transformer-based language models, further experimentation will inevitably lead to more use cases that these models prove to be effective at. As you may have noticed, a number of the use cases are the result of fine-tuning GPT-J. At Forefront, we believe the discovery of more use cases will not only come from increased usage of these models, but by providing a simple experience to fine-tune that allows for easy experimentation and quick feedback loops. For a tutorial on easily fine-tuning GPT-J on Forefront, check out our recent tutorial.
Recent research in Natural Language Processing (NLP) has led to the release of multiple large transformer-based language models like OpenAI’s GPT-[2,3], EleutherAI’s GPT-[Neo, J], and Google’s T5. For those not impressed by the leap of tunable parameters in the billions, the ease with which these models could perform on a never before seen task without training a single epoch is something to behold. While it has become evident that the more parameters a model has the better it will generally perform, an exception to this rule applies when one explores fine-tuning. Fine-tuning refers to the practice of further training transformer-based language models on a dataset for a specific task. This practice has led to the 6 billion parameter GPT-J outperforming the 175 billion GPT-3 Davinci on a number of specific tasks. As such, fine-tuning will continue to be the modus operandi when using language models in practice, and, consequently, fine-tuning is the main focus of this post. Specifically, how to fine-tune the open-source GPT-J-6B.
The first step in fine-tuning GPT-J is to curate a dataset for your specific task. The specific task for this tutorial will be to imitate Elon Musk. To accomplish this, we compiled podcast transcripts of Elon’s appearances on the Joe Rogan Experience and Lex Fridman Podcast into a single text file. Here’s the text file for reference. Note that the size of the file is only 150kb. When curating a dataset for fine-tuning, the main focus should be to encapsulate an evenly-distributed sample of the given task instead of prioritizing raw size of the data. In our case, these podcast appearances of Elon were great as they encompass multiple hours of him speaking on a variety of different topics.
If you plan on fine-tuning on a dataset of 100MB or greater, get in touch with our team before beginning. For more information on preparing your dataset, check out our guide.
Believe it or not, once you have your dataset, the hard part is done since Forefront abstracts all of the actual fine-tuning complexity away. Let’s go over the remaining steps to train your fine-tuned model.
Create deployment
Once logged in, click “New deployment”.
Select Fine-tuned GPT-J
From here, we’ll add a name and optional description for the deployment then select "Fine-tuned GPT-J".
Upload dataset
Then, we’ll upload our dataset in the form of a single text file. Again, if the dataset is 100MB or greater, get in touch with our team.
Set training duration
A good rule of thumb for smaller datasets is to train 5-10 minutes every 100kb. For text files in the order of megabytes, you’ll want to train 45-60 minutes for every 10MB.
Set number of checkpoints
A checkpoint is a saved model version that you can deploy. You’ll want to set a number of checkpoints that evenly divides the training duration.
Add test prompts
Test prompts are prompts that every checkpoint will automatically provide completions for so you can compare the performance of the different models. Test prompts should be pieces of text that are not found in your training text file. This allows you to see how good the model is at understanding your topic and prevents the model from regurgitating information it has seen in your training set.
You can also customize model parameters for your specific task.
Once your test prompts are set, you can press 'Fine-tune' and your fine-tuned model will begin training. You may notice the estimated completion time is longer than your specified training time. This is because it takes time to load the base weights prior to training.
View test prompts
As checkpoints being to appear, you can press 'View test prompts' to start comparing performance between your different checkpoints.
Deploy to Playground and integrate in application
Now for the fun part: deploying your best-performing checkpoint(s) for further testing in the Playground or integration into your app.
To see how simple it is to use the Playground and integrate your GPT-J deployment into your app, check out our tutorial on deploying standard GPT-J.
Using Forefront isn’t the only way to fine-tune GPT-J. For a tutorial on fine-tuning GPT-J by yourself, check out Eleuther’s guide. However, it’s important to note that not only do you save time by fine-tuning on Forefront, but it’s absolutely free—saving you $8 per hour of training. Also, when you go to deploy your fine-tuned model you save up to 33% on inference costs with increased throughput by deploying on Forefront.
These tips are meant as loose guidelines and experimentation is encouraged.
At Forefront, we believe building a simple experience for fine-tuning can increase experimentation with quicker feedback loops so companies and individuals can apply language models to a myriad problems. If you have any ideas on how we can further improve the fine-tuning experience, please get in touch with our team.
More than one year has passed since the public release of OpenAI's API for GPT-3. Since then, thousands of developers and hundreds of companies have started building on the platform to apply the transformer-based language model to a variety of NLP problems.
In its wake, EleutherAI, a team of AI researchers open-sourcing their work, released their first implementation of a GPT-like system, the 2.7B parameter GPT-Neo, and most recently, the 6B parameter GPT-J. Before getting into GPT-J deployments, let's understand why a company or developer would use GPT-J in the first place.
So why would one prefer to use the open-source 6B parameter GPT-J over the 175B parameter GPT-3 Davinci? The answer comes down to cost and performance.
First, let's talk about cost. With GPT-3, you pay per 1000 tokens. For the unacquainted, you can think of tokens as pieces of words, where 1000 tokens are about 750 words. So with GPT-3, your costs scale directly with usage. On the other end, the open-sourced GPT-J can be deployed to cloud infrastructure enabling you to effectively get unlimited usage while only incurring the cost of the cloud hardware hosting the model.
Now let's talk about performance. "Bigger is better" has become an adage for a reason, and transformer-based language models are no exception. While a 100B parameter transformer model will always generally outperform a 10B parameter one, the keyword is generally. Unless you're trying to solve general artificial intelligence, you probably have a specific use case in mind. This is where fine-tuning GPT-J, or specializing the model on a dataset for a specific task, can lead to better performance than GPT-3 Davinci.
Now that we've discussed why one would use GPT-J over GPT-3 to lower costs at scale and achieve better performance on specific tasks, we'll discuss how to deploy GPT-J.
For this tutorial, we'll be deploying the standard GPT-J-6B.
Create deployment
Once logged in, you can click "New deployment".
Select Vanilla GPT-J
From here, add a name and optional description for your deployment then select "Vanilla GPT-J".
Press "Deploy"
Navigate to your newly created deployment, and press "Deploy" to deploy your Vanilla GPT-J model.
Replica count
From your deployment, you can control the replica count for your deployments as usage increases to maintain fast response speeds at scale.
Inferencing
To begin inferencing, copy the URL under the name and refer to our docs on a full set of instructions for passing requests and receiving responses.
You can expect all the parameters you'd typically use with GPT-3 like response length, temperature, top P, top K, repetition penalty, and stop sequences.
Playground
You can also navigate to Playground to experiment with your new GPT-J deployment without needing to use Postman or implement any code.
Deploying GPT-J on Forefront takes only a few minutes. On top of the simplicity we bring to the deployment process, we've made several low-level machine code optimizations enabling your models to run at a fraction of the cost compared to deploying on Google's TPU v2 with no loss in throughput. If you're ready to get started deploying GPT-J, get in touch with our team.