Whisper is a speech recognition[2] system developed by OpenAI[4]. This system utilizes the power of inteligência artificial[1], specifically deep learning techniques, to analyze and transcribe spoken language. Built on the foundations of statistical methods and hidden Markov models, Whisper has evolved to employ more sophisticated techniques like convolutional neural networks, Seq2seq approaches, and transformer models. Trained on 680,000 hours of multilingual data through a process of semi-supervised learning, this model delivers enhanced performance across a variety of datasets. It not only reduces errors but also serves as a basis for a unified model for both speech and sound recognition. The architecture of Whisper involves segmenting input audio into 30-second chunks, converting them into Mel-frequency cepstrum, processing this data through an encoder, and then generating text captions via a decoder. Special tokens are used for tasks such as phrase-level timestamps. Overall, Whisper is a significant advancement in the realm of speech recognition tecnologia[3].
Whisper is a machine learning model for speech recognition e transcription, created by OpenAI and first released as open-source software in September 2022.
Original author(s) | OpenAI |
---|---|
Lançamento inicial | September 21, 2022 |
Repository | https://github.com/openai/whisper |
Tipo |
It is capable of transcribing speech in English and several other languages, and is also capable of translating several non-English languages into English. OpenAI claims that the combination of different training data used in its development has led to improved recognition of accents, background noise and jargon compared to previous approaches.
Whisper is a weakly-supervised deep learning acoustic model, made using an encoder-decoder transformer architecture.
Whisper V2 was released on December 8, 2022. Whisper V3 was released in November 2023, on the OpenAI Dev Day.