Approach
Design Narrative
Anote's approach leverages the Human Centered AI process, which consists of labeling data, training / fine tuning of models, making predictions across a variety of models, evaluating the results of these predictions, and integrating the best model into your product:
1. Create Project
To use the free version on production, navigate to https://dashboard.anote.ai/ and sign in. You should get to a screen that looks like this:
From there, click on the top right nav to create a project. A project is a team of people that can collaborate on an AI project. Each project includes people with different roles and have access to various datasets and models. Learn more about collaboration in projects.
2. Define Training and Testing Datasets
Choose ground truth testing data that your team would like to measure model performance on.
Testing Dataset: In this example, we will use the provided dataset as our testing dataset.
Training Dataset: In this example, we will synthetically generate the training dataset from data augmentation. We can then manually label a few items or “zero shot label”, sort by edge cases and manually adjust for different model versions.
3. Define Evaluation Metrics
To evaluate the model performance of across all assessed models, we have an evaluation dashboard so you see the improved model performance of supervised and unsupervised fine tuning models with metrics. Below are some evaluation metrics for each task type:
Task Type | Evaluation Metrics |
---|---|
Chatbot / Prompting | Evaluation Example |
Text Classification | Dashboard Example |
Named Entity Recognition | NER Evaluation |
Metrics: For this task, we can measure the time to call each model (time per label), the associated cost to run each model (cost per label), and the accuracy (mean per class accuracy).
4. Obtain Initial Predictions Results
Obtaining zero-shot results for supported LLMs.
Image Based Tasks:
Model |
---|
YOLOv8 (Ultralytics) |
Detectron2 |
Faster R-CNN |
EfficientDet |
Grounding DINO |
Grounding DINO + SAM |
5. Obtain Fine-Tuned Results
If the baseline zero shot results are good enough, you should be good to go. Otherwise, you can leverage our fine tuning library to do active learning and few-shot learning techniques to improve results and make higher quality predictions. We support supervised fine tuning where you can fine tune your LLM on your labeled data using parameter-efficient fine-tuning techniques like LORA and QLORA. In addition, we support RLHF / RLAIF Fine Tuning where you can add human feedback to refine LLM outputs, making them more accurate / tailored to your specific needs.
6. Exploring various labeling strategies:
Once baseline fine tuning results are established, label data to do Reinforcement Learning with Human Feedback (RLHF) to improve model performance. This process would explore various labeling strategies, including zero labels with programmatic labeling, all AI- or RLAIF-generated labels, synthetically generated labels, and datasets with N = 100, 200, 300, 400, or 500 manually labeled examples. To do the labeling, there is a 4 step process:
Step | Description |
---|---|
Upload | Create a new text-based dataset. This can be done by uploading unstructured data, uploading a structured dataset, connecting to data sources, scraping datasets from websites, or selecting your dataset from the Hugging Face Hub. |
Customize | Choose your task type, such as Text Classification, Named Entity Recognition, or Question Answering. Add the categories, entities, or questions you care about, and insert labeling functions to include subject matter expertise. |
Annotate | Label a few of the categories, entities, or answers to questions, and mark important features. As you annotate a few edge cases, the model actively learns and improves from human feedback to accurately predict the rest of the labels. |
Download | When finished labeling, download the resulting labels and predictions as a CSV or JSON. Export the updated model as an API endpoint to make real-time predictions on future rows of data. |
Note: The annotators can either be humans or AI models as annotators.
7. Integrate the Best Performing Model
Each trained model will be assigned a unique model ID, which can be accessed via SDK/API for developers. You can take your exported fine tuned model, and input it into our Chatbot, your accurate enterprise AI assistant. Our Chatbot has both a UI for enterprise users and a software development kit for developers. To use the Chatbot, you can upload your documents, ask questions on your documents with LLMs like GPT, Claude, Llama2 and Mistral, and get citations for answers to mitigate the effect of hallucinations
8. Iterate on Data and Models to Build Workflows
As you add new training data or adjust the taxonomy, you can obtain an updated model ID that is most up to date. This iterative process ensures continual improvement and alignment with your specific use cases.
Output 1: Obtain Final Report
This report will contain the results of all the benchmarked models (fine tuned vs. zero shot) with different numbers of labels to see how each LLM performed, and measures the effect of fine tuning. In addition to traditional metrics on accuracy, time and cost, we provide metrics such as stability (how many labels are needed to be good enough) and certainty (confidence of model predictions).
Output 2: Obtain API Access to Model
We provide an SDK / developer API to use each of the improved model versions. Make predictions with our fine tuning API, which can can be used to maintain and improve models, and to integrate the best models into your product offerings. To call our API, run
Once you have an API key, it should look like this: As an output we get: