GPT-3, or Generative Pretrained Transformer 3, is an advanced language model created by OpenAI[2]. As the third iteration of the GPT series, this model stands out due to its unparalleled size, making it the biggest non-sparse language model to date. It surpasses its predecessor, GPT-2[4], as well as Microsoft[1]’s Turing NLG, boasting ten times the capacity of the latter. GPT-3 is known for its ability to generate text, such as news articles, and assist with coding tasks. However, it also has potential misuse risks like spreading misinformation or phishing. Various versions of GPT-3 serve different needs, the largest being davinci with 175 billion parameters. The later GPT-3.5 series introduced new models and capabilities. GPT-3 is instrumental in industry and research, underpinning products like GitHub[3] Copilot and being used in several Microsoft products. However, its use also prompts ethical and academic concerns.
Generative Pre-trained Transformer 3 (GPT-3) is a large language model released by OpenAI in 2020. Like its predecessor, GPT-2, it is a decoder-only transformer model of deep neural network, which supersedes recurrence and convolution-based architectures with a technique known as "attention". This attention mechanism allows the model to selectively focus on segments of input text it predicts to be most relevant. It uses a 2048-tokens-long context[jargon], float16 (16-bit) precision, and a hitherto-unprecedented 175 billion parameters, requiring 350GB of storage space as each parameter takes 2 bytes of space, and has demonstrated strong "zero-shot" and "few-shot" learning abilities on many tasks.
Original author(s) | OpenAI |
---|---|
Initial release | June 11, 2020 (beta) |
Repository | |
Predecessor | GPT-2 |
Successor | GPT-3.5 GPT-4 |
Type | |
Site web | openai |
On September 22, 2020, Microsoft announced that it had licensed GPT-3 exclusively. Others can still receive output from its public API, but only Microsoft has access to the underlying model.