Identifying Mislabels for Text Classification
Step 1: Zero-Shot Classification Predictions
Perform zero-shot classification predictions for your data using the three models of your choice. Each model should provide class predictions and probability scores for each instance in your dataset.
Instance ID | Input Text | Model 1 Prediction | Model 1 Probability | Model 2 Prediction | Model 2 Probability | Model 3 Prediction | Model 3 Probability |
---|---|---|---|---|---|---|---|
1 | Text 1 | Class A | 0.85 | Class B | 0.32 | Class C | 0.74 |
2 | Text 2 | Class C | 0.60 | Class C | 0.75 | Class C | 0.42 |
3 | Text 3 | Class B | 0.72 | Class A | 0.43 | Class B | 0.55 |
4 | Text 4 | Class A | 0.90 | Class A | 0.85 | Class A | 0.88 |
5 | Text 5 | Class C | 0.68 | Class B | 0.57 | Class C | 0.61 |
6 | Text 6 | Class B | 0.77 | Class B | 0.89 | Class A | 0.76 |
Step 2: Determine Overlapping Predictions
Compare the class predictions of the three models for each instance in your dataset. Identify the rows where the model predictions do not overlap, meaning certain models predicts a different class for that instance than other models.
Instance ID | Input Text | Non-Overlapping Predictions? |
---|---|---|
1 | Text 1 | Yes |
2 | Text 2 | No |
3 | Text 3 | Yes |
4 | Text 4 | No |
5 | Text 5 | Yes |
6 | Text 6 | Yes |
Step 3: Calculate Probability Gap
For the rows where the model predictions do not overlap, calculate the gap in probability scores. The probability gap is calculated as the absolute difference between the highest and second-highest probability scores among the models, for the rows where the predictions are non-overlapping.
Instance ID | Input Text | Model 1 Probability | Model 2 Probability | Model 3 Probability | Probability Gap |
---|---|---|---|---|---|
1 | Text 1 | 0.85 | 0.32 | 0.74 | 0.11 |
3 | Text 3 | 0.72 | 0.43 | 0.55 | 0.17 |
5 | Text 5 | 0.68 | 0.57 | 0.61 | 0.07 |
6 | Text 6 | 0.77 | 0.89 | 0.76 | 0.13 |
Step 4: Sort By Probability Gap
Sort the rows based on the calculated probability gap in descending order. Select the rows with mislabels that have a probability gap greater than a specific threshold that you define.
Instance ID | Input Text | Model 1 Prediction | Model 1 Probability | Model 2 Prediction | Model 2 Probability | Model 3 Prediction | Model 3 Probability | Probability Gap |
---|---|---|---|---|---|---|---|---|
3 | Text 3 | Class B | 0.72 | Class A | 0.43 | Class B | 0.55 | 0.17 |
6 | Text 6 | Class B | 0.77 | Class B | 0.89 | Class A | 0.76 | 0.13 |
1 | Text 1 | Class A | 0.85 | Class B | 0.32 | Class C | 0.74 | 0.11 |
5 | Text 5 | Class C | 0.68 | Class B | 0.57 | Class C | 0.61 | 0.07 |
Step 5: Present Results
The outputed mislabels table includes the rows that have a probability gap greater than the specified threshold of 0.10
, as well as their aggregate model predictions across the 3 models.
Instance ID | Input Text | Probability Gap | Aggregate Prediction |
---|---|---|---|
3 | Text 3 | 0.17 | Class B |
6 | Text 6 | 0.13 | Class B |
1 | Text 1 | 0.11 | Class A |
Next Steps
While this approach is great, there are limitations in accuracy with traditional zero shot predictions. On Anote, we have built proprietary technology to actively learn (from human input) how to find mislabels within datasets, which performs way better over time than the zero shot approaches. In addition, we have built proprietary self-supervised learning algorithms, where we can better identify mislabels within structured datasets without needing to incorporate human feedback necessarily. We hope to release some of these findings to the community soon.