Skip to content

Tagging Entities in Text

In programmatic labeling, tagging entities in text plays a crucial role in various natural language processing (NLP) tasks. It involves identifying and classifying specific entities, such as names, dates, locations, organizations, and more, within a given text.

Tagging Entities in Text with SpaCy

SpaCy is a powerful library used for NLP tasks, including entity recognition. It provides built-in capabilities to recognize and classify common types of entities. Here are some examples of entities that SpaCy can recognize:

  • PERSON: People's names or titles
  • DATE: Dates or periods of time
  • ORG: Organizations, companies, or institutions
  • GPE: Countries, cities, or states
  • MONEY: Monetary values
  • PERCENT: Percentage values
  • LANGUAGE: Programming languages or human languages
  • PRODUCT: Objects, vehicles, or foods
  • EVENT: Named hurricanes, battles, wars, or sports events

Entity Tagging with Regex Expressions

In addition to the entity recognition capabilities provided by libraries like SpaCy, our platform supports tagging entities using powerful regex functions. This allows you to define custom patterns and match entities that may not be covered by pre-trained models.

Here are some common types of entities that can be tagged using regex expressions:

  • Email Addresses: Match patterns that represent email addresses, such as john.doe@example.com or jane.doe@example.com.
  • Phone Numbers: Identify phone number patterns, including various formats like (123) 456-7890, 123-456-7890, or 1234567890.
  • URLs: Detect URLs in text, such as https://www.example.com or www.example.com.
  • Dates: Match different date formats, including MM/DD/YYYY, YYYY-MM-DD, or January 1, 2023.
  • Product Codes: Identify specific product codes or serial numbers using predefined patterns.
  • Custom Keywords: Tag specific words or phrases relevant to your domain, such as brand names, product names, or industry-specific terms.

Custom Entity Tagging with Anote

In addition to the entity recognition capabilities provided by libraries like SpaCy, our platform offers support for tagging custom entities using powerful regex functions. This approach allows you to define custom patterns and identify entities that may not be covered by pre-trained models.

Our platform supports various entity types, including:

  • CARDINAL: Numerals that represent quantities
  • FAC: Buildings, airports, highways, bridges, etc.
  • LAW: Legal documents, laws, regulations, etc.
  • LOC: Non-GPE locations, mountain ranges, bodies of water, etc.
  • NORP: Nationalities, religious or political groups
  • ORDINAL: "first," "second," etc.
  • QUANTITY: Measurements, as of weight or distance
  • TIME: Times smaller than a day
  • WORK_OF_ART: Titles of books, songs, etc.

In addition to the above entities recognized by SpaCy, our platform extends support for various other entity types using regex expressions, such as:

  • PHONE_NUMBER: Phone numbers
  • SOCIAL_SECURITY_NUMBER: Social Security numbers
  • EMAIL: Email addresses
  • COLOR: Colors
  • VEHICLE: Vehicle makes, models, or registration numbers
  • MEDICAL_CONDITION: Medical conditions, diseases, or disorders
  • DRUG: Pharmaceutical drug names
  • MEASUREMENT: Measurements, such as length, weight, or temperature
  • NATURAL_EVENT: Natural disasters, geological events, etc.
  • ANIMAL: Animal species or names
  • PLANT: Plant species or names
  • DISEASE: Specific diseases or medical conditions
  • SHAPE: Geometric shapes
  • CHEMICAL_ELEMENTS: Names of chemical elements
  • SPORTS: Sports names or terms
  • FOODS: Food items, ingredients, or dishes
  • CONTINENTS: Names of continents
  • ASIAN_COUNTRIES: Asian countries
  • EUROPEAN_COUNTRIES: European countries
  • NORTH_AMERICAN_COUNTRIES: North American countries
  • SOUTH_AMERICAN_COUNTRIES: South American countries
  • AFRICAN_COUNTRIES: African countries
  • US_PRESIDENTS: Names of U.S. presidents
  • PLANETS: Names of planets in the solar system
  • RELIGION: Religious beliefs, names, or terms
  • MUSICAL_INSTRUMENT: Names of musical instruments
  • VEGETABLE: Names of vegetables
  • OCCUPATION: Job titles or professions
  • FAMILY: Family relationship terms or names
  • BODY_PART: Names of body parts
  • WEATHER: Weather conditions or terms
  • WEIGHT: Weight measurements
  • SPEED: Speed measurements
  • DATETIME: Date and time representations
  • FACILITY: Facilities, buildings, or venues
  • TITLE: Titles, such as Mr., Mrs., Dr., etc.
  • LATITUDE: Latitude coordinates
  • LONGITUDE: Longitude coordinates
  • ADDRESS: Postal addresses or street names
  • ARTIFACT: Man-made objects or artifacts
  • POPULATION: Population figures or statistics
  • MATHEMATICAL_OBJECT: Mathematical objects or terms
  • PHYSICAL_OBJECT: Physical objects or terms
  • GAME: Names of games or sports
  • TECHNOLOGY: Technological terms or product names
  • HOBBY: Hobbies or recreational activities
  • SOCIAL_EVENT: Social events or gatherings
  • BUSINESS: Business-related terms or entities
  • EDUCATION: Educational terms or institutions
  • MEDIA: Media-related terms or entities
  • HEALTH_AND_WELLNESS: Health and wellness-related terms or entities
  • HISTORICAL_EVENT: Historical events or periods

We recently extended our NER capabilities by adding a more advanced entities, such as:

  • ANATOMY: Human and animal body structures or components.
  • BOOK: Titles, authors, or topics related to literature.
  • BUILDING: Specific structures, architectural designs, or edifices.
  • CAREER: Professional pathways or fields of work.
  • CELESTIAL_BODY: Astronomical objects outside Earth's atmosphere like stars, asteroids, and comets.
  • CHARACTER: Fictional or real-life persons from literature, movies, or history.
  • COMPUTER: Computer types, brands, or related terminology.
  • CONCEPT: Philosophical, theoretical, or general ideas.
  • CONSTRUCTION_MATERIAL: Materials used for building or crafting.
  • COUNTRY: Names or terms related to nations worldwide.
  • DANCE: Dance styles, famous dancers, or related terms.
  • DOCUMENT: Types or names of official papers or files.
  • EMOTION: Human feelings or emotional states.
  • ETHNICITY: Groups based on shared cultural, ancestral, or historical ties.
  • EXPLORER: Names of individuals known for discovering new places.
  • FOLKLORE: Myths, legends, or stories passed down traditionally.
  • GAME_GENRE: Categories or styles of games.
  • GENRE: Broad categories or styles of artistic works.
  • HISTORICAL_PERIOD: Specific eras or epochs in history.
  • HUMAN_RIGHTS: Rights inherent to every human being.
  • INSTITUTION: Organizations, establishments, or corporations.
  • INVENTOR: Individuals known for creating innovative devices, methods, or concepts.
  • LANDMARK: Notable places, monuments, or sites of interest.
  • LEGEND: Tales or narratives, often rooted in history but embellished over time.
  • LITERARY_GENRE: Categories or styles of literature.
  • MATERIAL: Substances or items from which things are made.
  • MEMORY: Recollections, past events, or cognitive reflections.
  • MILITARY: Pertaining to armed forces, strategies, or related terms.
  • MODE_OF_TRANSPORTATION: Different ways or methods to move from one place to another.
  • MOVIE: Film titles, genres, actors, or related terminologies.
  • MYTH: Traditional stories, often concerning deities or heroes.
  • NATIONAL_PARK: Protected areas of scenic or historic value.
  • NEWSPAPER: Print media titles or related journalistic terms.
  • OCEAN: Large bodies of saltwater covering the Earth's surface.
  • PHOBIA: Irrational fears or aversions to specific things or situations.
  • PHOTOGRAPHY: The art or process of producing images on sensitized surfaces by the action of light.
  • PLACE: Locations, areas, or venues.
  • POLITICAL_SYSTEM: Structures or types of governmental rule.
  • PROFESSION: Occupations or fields of expertise.
  • RELATIONSHIP: Bonds or connections between individuals.
  • RITUAL: Set of actions performed mainly for their symbolic value.
  • SCIENCE: Branches of knowledge or study dealing with a body of facts or truths.
  • SCIENTIST: Individuals known for contributions in scientific fields.
  • SHOW: TV shows, theatre performances, or other entertainment forms.
  • SPACE_MISSION: Expeditions or projects related to space exploration.
  • SPHERE: Round geometrical objects or their symbolic implications.
  • TOOL: Instruments or devices used to perform specific tasks.
  • TOY: Objects for children or adults to play with.
  • TRADE: Business or commerce activities.
  • VEHICLE_PART: Components or parts of vehicles.
  • VIDEO_GAME: Electronic games played on various platforms.
  • BOARD_GAME: Games played on a flat surface using pieces.
  • ARCHITECTURE: The art or practice of designing and constructing buildings.
  • ASTRONOMICAL_OBJECT: Physical entities in outer space like galaxies, black holes.
  • ASTRONOMY: The study of celestial objects and phenomena.
  • BIOLOGY: The science of life or living organisms.
  • CHEMISTRY: The study of substances, their properties, and reactions.
  • CLIMATE: Long-term patterns or averages of weather in a particular region.
  • COMEDY: Entertainment in the form of humor or comedic acts.
  • CRIME_FICTION: Literary genre that fictionalizes crimes, detectives, and criminal underworlds.
  • DREAM: A series of thoughts, images, or emotions occurring during sleep.
  • EDIBLE_FOOD: Food items fit for consumption.
  • ELECTRONICS: Branch of technology dealing with devices operating on principles of electronics.
  • ENGINEERING: The application of science and math to design and produce.
  • FASHION: The prevalent styles and practices, especially in clothing, footwear, and accessories.
  • FICTIONAL_CHARACTER: An imaginary person represented in literature, film, or other storytelling mediums.
  • FLIGHT: The action or process of flying through the air.
  • FOLK_DANCE: A traditional dance that reflects the customs of a particular culture or region.
  • GAMBLING: The act of risking money or valuables in games or bets with the hope of gaining more.
  • GENETICS: The branch of biology that studies genes, genetic variation, and heredity in organisms.
  • GEOGRAPHY: The study of places and the relationships between people and their environments.
  • GHOST_STORY: A tale or account in which ghosts or supernatural beings play a central role.
  • HORROR: A genre of fiction intended to frighten, scare, or disgust the reader or viewer.
  • JOURNALISM: The activity or profession of reporting about, photographing, or editing news stories for one of the mass media.
  • LANGUAGE_FAMILY: A group of related languages that have evolved from a common ancestral language.
  • LAW_ENFORCEMENT: Organizations and individuals responsible for maintaining law and order, preventing and investigating crimes.
  • MAGIC_TRICK: An illusion or act of deception usually performed as entertainment.
  • MARINE_LIFE: Aquatic organisms that live in saltwater environments, including the ocean and sea.
  • MEDICAL_SPECIALTY: A specific field within medicine, focused on a particular discipline or type of patient care.
  • METAL: A type of chemical element characterized by its ability to conduct heat and electricity, typically solid, shiny, and malleable.
  • MILITARY_RANK: A hierarchical system within armed forces, denoting leadership positions and responsibilities.
  • MOUNTAIN: A large natural elevation of the Earth's surface, with steep sides and a peak.
  • MYTHICAL_CREATURE: An imaginary being or creature, often found in myths, legends, and folklore.
  • NATURAL_PHENOMENON: A naturally occurring event or situation, typically one that is impressive or unusual.
  • NONPROFIT_ORGANIZATION: An organization that operates for purposes other than making a profit, often benefiting a particular social cause or community.
  • PAINTING: An artwork created using paint on a surface such as canvas or paper.
  • PHILOSOPHY: The study of fundamental questions about existence, knowledge, values, reason, mind, and language.
  • POETRY: A form of literary expression that uses rhythmic and metaphorical language to evoke emotion and meaning.
  • POLITICAL_LEADER: An individual who holds or aspires to a leadership position within a political organization or government.
  • PSYCHOLOGY: The scientific study of behavior and mental processes.
  • RACE: A categorization of humans based on shared physical or genetic traits.
  • REPTILE: A cold-blooded vertebrate of the class Reptilia, which includes snakes, lizards, crocodiles, and turtles.
  • ROCK_BAND: A group of musicians that primarily play rock music.
  • ROMANCE: A genre of fiction focused on relationships and romantic love; also, passionate affection or desire.
  • SCIENCE_FICTION: A genre of speculative fiction that deals with imaginative concepts not found in the real world.
  • SELF-HELP: Advice and strategies given, usually through books or seminars, intended to help individuals solve personal problems or achieve personal goals.
  • SOCIAL_MEDIA: Digital platforms and websites where users can create, share, or exchange content and interact with each other.
  • SPACE: The vast, seemingly infinite expanse that exists beyond Earth, containing all celestial bodies.
  • STREET_ART: Visual art created in public locations, often unsanctioned, and executed outside traditional art venues.
  • SUPERNATURAL_POWER: Abilities or phenomena that cannot be explained by natural laws, often attributed to divine or mystical forces.
  • TOOL: A device or implement used to carry out a particular function or task.
  • TOURIST_ATTRACTION: A place or event of interest where tourists visit, typically for its cultural, historical, or natural significance.
  • TRAVEL: The act of moving from one location to another, particularly over long distances.