spaCy

Supported Labeling Types: Span Labeling

Spacy provides NLP pipeline optimized for named entity recognition (NER), dependency parsing, and tokenization. It can be used for a variety of information extraction and NLP tasks, making it a powerful tool for automated text processing and analysis.

Model Details

Suitable for Span-based projects, which involve extracting meaningful text spans such as names, dates, and monetary values.
Uses the en_core_web_lg model where the "lg" (large) variant includes 685,000 unique vectors with 300 dimensions, providing exceptional semantic understanding capabilities.
Built on diverse web content including news articles, blogs, and commentary, ensuring broad coverage across different text types.
Hosted locally within the Datasaur Intelligence container, eliminating external dependencies and network latency.

Usage

This model is a pre-trained large English NLP model, containing word vectors and trained for entity recognition, part-of-speech tagging, and syntactic analysis.
The en_core_web_lg model includes the following named entity categories:
- PERSON – Individuals, including full names.
- NORP – Nationalities, religious, or political groups.
- FACILITY – Buildings, airports, highways, bridges, etc.
- ORG – Organizations such as companies, agencies, institutions.
- GPE – Geopolitical entities like countries, cities, and states.
- LOC – Non-GPE locations, mountain ranges, bodies of water.
- PRODUCT – Objects, vehicles, foods, etc. (Not services.)
- EVENT – Named events such as hurricanes, wars, sports events.
- WORK_OF_ART – Titles of books, songs, paintings, etc.
- LAW – Named laws, treaties, or legal documents.
- LANGUAGE – Any named language.
- DATE – Absolute or relative dates or periods.
- TIME – Specific times of day.
- PERCENT – Percentage values.
- MONEY – Monetary values.
- QUANTITY – Measurements of weight, distance, volume, etc.
- ORDINAL – First, second, third, etc.
- CARDINAL – Numerals that do not fall under other categories.

Last updated 4 months ago