Skip to content

Identifying Mislabels for Text Classification

Step 1: Zero-Shot Classification Predictions

Perform zero-shot classification predictions for your data using the three models of your choice. Each model should provide class predictions and probability scores for each instance in your dataset.

Instance ID Input Text Model 1 Prediction Model 1 Probability Model 2 Prediction Model 2 Probability Model 3 Prediction Model 3 Probability
1 Text 1 Class A 0.85 Class B 0.32 Class C 0.74
2 Text 2 Class C 0.60 Class C 0.75 Class C 0.42
3 Text 3 Class B 0.72 Class A 0.43 Class B 0.55
4 Text 4 Class A 0.90 Class A 0.85 Class A 0.88
5 Text 5 Class C 0.68 Class B 0.57 Class C 0.61
6 Text 6 Class B 0.77 Class B 0.89 Class A 0.76

Step 2: Determine Overlapping Predictions

Compare the class predictions of the three models for each instance in your dataset. Identify the rows where the model predictions do not overlap, meaning certain models predicts a different class for that instance than other models.

Instance ID Input Text Non-Overlapping Predictions?
1 Text 1 Yes
2 Text 2 No
3 Text 3 Yes
4 Text 4 No
5 Text 5 Yes
6 Text 6 Yes

Step 3: Calculate Probability Gap

For the rows where the model predictions do not overlap, calculate the gap in probability scores. The probability gap is calculated as the absolute difference between the highest and second-highest probability scores among the models, for the rows where the predictions are non-overlapping.

Instance ID Input Text Model 1 Probability Model 2 Probability Model 3 Probability Probability Gap
1 Text 1 0.85 0.32 0.74 0.11
3 Text 3 0.72 0.43 0.55 0.17
5 Text 5 0.68 0.57 0.61 0.07
6 Text 6 0.77 0.89 0.76 0.13

Step 4: Sort By Probability Gap

Sort the rows based on the calculated probability gap in descending order. Select the rows with mislabels that have a probability gap greater than a specific threshold that you define.

Instance ID Input Text Model 1 Prediction Model 1 Probability Model 2 Prediction Model 2 Probability Model 3 Prediction Model 3 Probability Probability Gap
3 Text 3 Class B 0.72 Class A 0.43 Class B 0.55 0.17
6 Text 6 Class B 0.77 Class B 0.89 Class A 0.76 0.13
1 Text 1 Class A 0.85 Class B 0.32 Class C 0.74 0.11
5 Text 5 Class C 0.68 Class B 0.57 Class C 0.61 0.07

Step 5: Present Results

The outputed mislabels table includes the rows that have a probability gap greater than the specified threshold of 0.10, as well as their aggregate model predictions across the 3 models.

Instance ID Input Text Probability Gap Aggregate Prediction
3 Text 3 0.17 Class B
6 Text 6 0.13 Class B
1 Text 1 0.11 Class A

Next Steps

While this approach is great, there are limitations in accuracy with traditional zero shot predictions. On Anote, we have built proprietary technology to actively learn (from human input) how to find mislabels within datasets, which performs way better over time than the zero shot approaches. In addition, we have built proprietary self-supervised learning algorithms, where we can better identify mislabels within structured datasets without needing to incorporate human feedback necessarily. We hope to release some of these findings to the community soon.