# spaCy

**Supported Labeling Types**: `Span labeling`

Spacy provides NLP pipeline optimized for named entity recognition (NER), dependency parsing, and tokenization. It can be used for a variety of information extraction and NLP tasks, making it a powerful tool for automated text processing and analysis.

<figure><img src="https://448889121-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MbjY0HseEqu7LtYAt4d%2Fuploads%2Fgit-blob-09dc1df21aa4c147c6f0848588c813a6fae4ac71%2FExtension%20-%20ML-assisted%20Labeling%20-%20Span%20labeling%20-%20spaCy%20-%20highlight.png?alt=media" alt="Image of ML Assisted with SpaCy"><figcaption><p>ML Assisted with SpaCy</p></figcaption></figure>

### Model Details

* Suitable for span labeling projects, which involve extracting meaningful text spans such as names, dates, and monetary values.
* Uses the [`en_core_web_lg`](https://spacy.io/models/en#en_core_web_lg) model where the "lg" (large) variant includes 685,000 unique vectors with 300 dimensions, providing exceptional semantic understanding capabilities.
* Built on diverse web content including news articles, blogs, and commentary, ensuring broad coverage across different text types.
* Hosted locally within the Datasaur Intelligence container, eliminating external dependencies and network latency.

### **Usage**

* This model is a pre-trained large English NLP model, containing word vectors and trained for entity recognition, part-of-speech tagging, and syntactic analysis.
* The `en_core_web_lg` model includes the following named entity categories:
  * `PERSON` – Individuals, including full names.
  * `NORP` – Nationalities, religious, or political groups.
  * `FACILITY` – Buildings, airports, highways, bridges, etc.
  * `ORG` – Organizations such as companies, agencies, institutions.
  * `GPE` – Geopolitical entities like countries, cities, and states.
  * `LOC` – Non-GPE locations, mountain ranges, bodies of water.
  * `PRODUCT` – Objects, vehicles, foods, etc. (Not services.)
  * `EVENT` – Named events such as hurricanes, wars, sports events.
  * `WORK_OF_ART` – Titles of books, songs, paintings, etc.
  * `LAW` – Named laws, treaties, or legal documents.
  * `LANGUAGE` – Any named language.
  * `DATE` – Absolute or relative dates or periods.
  * `TIME` – Specific times of day.
  * `PERCENT` – Percentage values.
  * `MONEY` – Monetary values.
  * `QUANTITY` – Measurements of weight, distance, volume, etc.
  * `ORDINAL` – First, second, third, etc.
  * `CARDINAL` – Numerals that do not fall under other categories.
