Labeled data is a crucial element in the field of inteligência artificial[1] e machine learning[2]. It refers to data that has been tagged or classified with certain labels to provide a context or meaning for that data. For example, an image of a dog could be labeled as “dog” in an image recognition dataset. This labeling process can be done manually, as was done in the early days by a team of undergraduates and Amazon Mechanical Turk workers who labeled millions of images for ImageNet. Alternatively, it can be automated through machine learning models that predict likely labels for new, unlabeled data. This process significantly enhances the efficiency of data analysis and allows for continuous learning and adaptation of models. However, it’s important to note that the quality of labeled data can influence algorithmic decision-making, potentially leading to biases if not statistically representative.
This article needs additional citations for verification. (May 2017) |
Dados rotulados is a group of samples that have been tagged with one or more labels. Labeling typically takes a set of unlabeled data and augments each piece of it with informative tags. For example, a data label might indicate whether a photo contains a horse or a cow, which words were uttered in an audio recording, what type of action is being performed in a video, what the topic of a news article is, what the overall sentiment of a tweet is, or whether a dot in an X-ray is a tumor.
Labels can be obtained by asking humans to make judgments about a given piece of unlabeled data. Labeled data is significantly more expensive to obtain than the raw unlabeled data.