Skip to content

Document Labeling

Document labeling is the process of assigning one or more labels or categories to a given document based on its content. Document Labeling is a common task in natural language processing (NLP), and machine learning. In this example, we will demonstrate how to use Anote for document labeling. Our goal is to assign labels to a collection of documents based on their content.

Dataset

The dataset consists of a collection of documents that need to be labeled. Each document represents a piece of text or content that requires classification into specific categories.

Example Documents

The documents in the dataset can be in various file formats. Here are a few examples of different categories of legal documents we might want to classify:

Legal Document Types
Contracts
Regulatory
Litigation
Legal Opinions

Using Anote

Watch the video

To perform document labeling using Anote, we can follow these steps:

Upload Data: Start by uploading the document dataset into Anote. The documents can be in various formats, such as PDF, Word, or plain text files. We upload the documents in Unstructured format, choose the NLP task of Text Classification, and choose the per document decomposition.

tweets

tweets

After the file is uploaded successfully, you should see this:

tweets

Customize Categories: In the annotation interface, set up the categories or labels that you want to assign to the documents. These categories represent the document types mentioned above. To add labeling functions for each document:

  1. Add the category that you want to put in, by clicking the "+ Add" button.
  2. Put in some keyword to the "Enter Keyword" input box and choose the category.

tweets

Now you can see the updated labeling functions, and their coverages, in the table below:

tweets

Annotate Documents: Begin the annotation process. Anote provides an intuitive interface to view each document and select the appropriate label from the predefined list of categories. Go through each document, label few documents by choosing a category and click the "confirm" button, to assign the correct label based on its content.

tweets

Export Results: Once the annotation process is complete, export the labeled document results from Anote. Choose the desired output format, such as CSV or JSON, and download the annotated data along with the assigned labels.

tweets