GPT-1

Partager
" Retour à l'index des glossaires

GPT-1, also known as Generative Pre-training Transformer 1, is a machine learning[1] model designed for tasks related to understanding and generating human language. Developed by OpenAI[4], GPT-1 utilizes a decoder-only transformer structure with 12 layers. This model is equipped with twelve masked self-attention heads, each having 64-dimensional states. To optimize its performance, GPT-1 uses the Adam optimization algorithme[2] with a learning rate that increases linearly. The model boasts a staggering 117 million parameters, demonstrating its complexity. Despite its sophistication, its architecture undergoes minimal changes when applied to various tasks. It’s particularly noted for its performance in natural language inference[3] tasks, question answering, commonsense reasoning, and semantic similarity tasks. A notable resource for this model is the BookCorpus dataset, selected for its long passages that aid in handling long-range information.

Définitions des termes
1. machine learning. Machine learning, a term coined by Arthur Samuel in 1959, is a field of study that originated from the pursuit of artificial intelligence. It employs techniques that allow computers to improve their performance over time through experience. This learning process often mimics the human cognitive process. Machine learning applies to various areas such as natural language processing, computer vision, and speech recognition. It also finds use in practical sectors like agriculture, medicine, and business for predictive analytics. Theoretical frameworks such as the Probably Approximately Correct learning and concepts like data mining and mathematical optimization form the foundation of machine learning. Specialized techniques include supervised and unsupervised learning, reinforcement learning, and dimensionality reduction, among others.
2. algorithme. Un algorithme est une séquence bien définie d'instructions ou de règles qui fournit une solution à un problème ou à une tâche spécifique. Issus des civilisations anciennes, les algorithmes ont évolué au fil des siècles et font désormais partie intégrante de l'informatique moderne. Ils sont conçus à l'aide de techniques telles que la division et la conquête et leur efficacité est évaluée à l'aide de mesures telles que la notation "big O". Les algorithmes peuvent être représentés sous diverses formes, comme le pseudocode, les organigrammes ou les langages de programmation. Ils sont exécutés en les traduisant dans un langage que les ordinateurs peuvent comprendre, la vitesse d'exécution dépendant du jeu d'instructions utilisé. Les algorithmes peuvent être classés en fonction de leur mise en œuvre ou de leur paradigme de conception, et leur efficacité peut avoir un impact significatif sur le temps de traitement. La compréhension et l'utilisation efficace des algorithmes sont cruciales dans des domaines tels que l'informatique et l'intelligence artificielle.
GPT-1 (Wikipedia)

Generative Pre-trained Transformer 1 (GPT-1) was the first of OpenAI's large language models following Google's invention of the transformer architecture in 2017. In June 2018, OpenAI released a paper entitled "Improving Language Understanding by Generative Pre-Training", in which they introduced that initial model along with the general concept of a generative pre-trained transformer.

Generative Pre-trained Transformer 1 (GPT-1)
Original author(s)OpenAI
Initial releaseJune 2018; 5 years ago (June 2018)
Repository
SuccessorGPT-2
Type
LicenseMIT
Site webopenai.com/blog/language-unsupervised/ Modifier ceci sur Wikidata
Original GPT architecture

Up to that point, the best-performing neural NLP models primarily employed supervised learning from large amounts of manually labeled data. This reliance on supervised learning limited their use of datasets that were not well-annotated, in addition to making it prohibitively expensive and time-consuming to train extremely large models; many languages (such as Swahili ou Haitian Creole) are difficult to translate and interpret using such models due to a lack of available text for corpus-building. In contrast, a GPT's "semi-supervised" approach involved two stages: an unsupervised generative "pre-training" stage in which a language modeling objective was used to set initial parameters, and a supervised discriminative "fine-tuning" stage in which these parameters were adapted to a target task.

The use of a transformer architecture, as opposed to previous techniques involving attention-augmented RNNs, provided GPT models with a more structured memory than could be achieved through recurrent mechanisms; this resulted in "robust transfer performance across diverse tasks".

" Retour à l'index des glossaires
fr_FRFR
Retour en haut