SparkNLP NER

Supported Labeling Types: Span Labeling

SparkNLP Named Entity Recognition (NER) is a high-performance, scalable NLP component built on Apache Spark. It supports deep learning-based NER models that can identify and classify entities such as names, locations, organizations, and dates in large volumes of text. SparkNLP NER enables fast and accurate entity suggestions, making it ideal for projects with large datasets or real-time processing needs. It also supports custom model training and multilingual NER, offering flexibility for various labeling tasks.

Image of ML Assisted with SparkNLP NER
ML Assisted with SparkNLP NER

Model Details

  • SparkNLP provides a deep learning-based NER system via johnsnowlabs/nlp_server.

  • The pre-trained en.ner model is designed for entity recognition tasks.

  • Models are trained on diverse sources including CoNLL 2003 (Reuters news), OntoNotes 5.0, and proprietary datasets curated by John Snow Labs.

  • Operates as a service accessible within the Datasaur Intelligence container.

Usage

  • This is ideal for complex linguistic analysis and tasks requiring detailed syntactic structures.

  • Tag set: LOC, ORG, PER, MISC.

References

Last updated