Sunday, June 14, 2020

GPT-3

I am not sure if people who haven't been doing machine learning can appreciate how weird GPT-3 is or what "few-shot learner" means.

In normal NLP machine learning, you might start with a pre-trained language model that encodes the relationships between words of a language. You then build a training set of data, probably 1000s of items where you have labels applied to text, correct translations, answers to questions, and things like that. You then train the model to minimize error on that training set. Then, with the trained model, you send it new samples of text and it spits out a label, translation, or answer as appropriate.

With GPT-3, a much bigger language model, trained to just predict the next word, you don't have to do any of that. You construct the whole task in the last bit, where you would normally be sending a trained model new samples of text. The trick is, you send in a description of the task in with the text. So you could send in:

 translate from English to French: hat => chapeau, cat => chat, hello => 
and it would send back "bonjour".

It learned enough about language to be able to have examples of what typically follows "translate from English to French" to be able to get good performance on that task. This wouldn't be surprising if it had been trained on that task, but there is no task specific training (aka fine-tuning). It was just trained to predict the next word. Having a big enough model (a mind boggling 175 billion parameters) it just picks up that whole task as a pattern.

Read the paper.

blog comments powered by Disqus