NLTK

Supported Labeling Types: Span Labeling

NLTK (Natural Language Toolkit) is an open-source Python library for natural language processing (NLP). It provides tools for text preprocessing such as tokenization, stemming, lemmatization, part-of-speech tagging, and more. In the context of our labeling platform, NLTK can be integrated to support various preprocessing tasks that help improve label consistency and model training quality. Its ease of use and rich set of linguistic resources make it a useful option for preparing and analyzing text data before or during the labeling process.

Image of ML Assisted with NLTK
ML Assisted with NLTK

Model Details

  • NLTK POS-tagging is performed using nltk.pos_tag, which internally utilizes the nltk.PerceptronTagger. This is a fast and accurate approach for part-of-speech tagging in English.

  • The underlying models are trained on the Wall Street Journal section of the Penn Treebank, providing strong performance on formal, edited text common in business documents.

  • The tagger assigns grammatical categories to words based on the UPenn Treebank Tagset, which includes categories like nouns, verbs, adjectives, adverbs, and more.

  • Fully integrated into the Datasaur Intelligence container for consistent, dependency-free operation.

Greedy Averaged Perceptron tagger, as implemented by Matthew Honnibal. See more implementation details here: https://explosion.ai/blog/part-of-speech-pos-tagger-in-python>

Usage

  • Text preprocessing for consistency and model training improvements.

  • Supports syntactic analysis in annotation workflows.

  • Tag set: UPenn Treebank Tag Set

    • The detailed available here

References

Appendix

NLTK Treebank

$

dollar e.g. $, -$, --$, A$, C$, HK$, M$, NZ$, S$, U.S.$, US$

''

closing quotation mark e.g. ', ''

(

opening parenthesis e.g. (, [, {

,

comma e.g. ,

--

dash e.g. --

.

sentence terminator e.g. ., !, ?

:

colon or ellipsis e.g. :, ;, ...

``

opening quotation mark e.g. `, ``

Treebank Tagset

Tag
Description

CC

conjunction, coordinating e.g. &, 'n, and, both, but, either, et, for, less, minus, neither, nor, or, plus, so, therefore, times, v., versus, vs., whether, yet

CD

numeral, cardinal e.g. mid-1890, nine-thirty, forty-two, one-tenth, ten, million, 0.5, one, forty-, seven, 1987, twenty, '79, zero, two, 78-degrees, eighty-four, IX, '60s, .025, fifteen, 271,124, dozen, quintillion, DM2,000, ...

DT

determiner e.g. all, an, another, any, both, del, each, either, every, half, la, many, much, nary, neither, no, some, such, that, the, them, these, this, those

EX

existential there e.g. there

FW

foreign word e.g. gemeinschaft, hund, ich, jeux, habeas, Haementeria, Herr, K'ang-si, vous, lutihaw, alai, je, jour, objets, salutaris, fille, quibusdam, pas, trop, Monte, terram, fiche, oui, corporis, ...

IN

preposition or conjunction, subordinating e.g. astride, among, uppon, whether, out, inside, pro, despite, on, by, throughout, below, within, for, towards, near, behind, atop, around, if, like, until, below, next, into, if, beside, ...

JJ

adjective or numeral, ordinal e.g. third, ill-mannered, pre-war, regrettable, oiled, calamitous, first, separable, ectoplasmic, battery-powered, participatory, fourth, still-to-be-named, multilingual, multi-disciplinary, ...

JJR

adjective, comparative e.g. bleaker, braver, breezier, briefer, brighter, brisker, broader, bumper, busier, calmer, cheaper, choosier, cleaner, clearer, closer, colder, commoner, costlier, cozier, creamier, crunchier, cuter, ...

JJS

adjective, superlative e.g. calmest, cheapest, choicest, classiest, cleanest, clearest, closest, commonest, corniest, costliest, crassest, creepiest, crudest, cutest, darkest, deadliest, dearest, deepest, densest, dinkiest, ...

LS

list item marker e.g. A, A., B, B., C, C., D, E, F, First, G, H, I, J, K, One, SP-44001, SP-44002, SP-44005, SP-44007, Second, Third, Three, Two, *, a, b, c, d, first, five, four, one, six, three, two

MD

modal auxiliary e.g. can, cannot, could, couldn't, dare, may, might, must, need, ought, shall, should, shouldn't, will, would

NN

noun, common, singular or mass e.g. common-carrier, cabbage, knuckle-duster, Casino, afghan, shed, thermostat, investment, slide, humour, falloff, slick, wind, hyena, override, subhumanity, machinist, ...

NNP

noun, proper, singular e.g. Motown, Venneboerger, Czestochwa, Ranzer, Conchita, Trumplane, Christos, Oceanside, Escobar, Kreisler, Sawyer, Cougar, Yvette, Ervin, ODI, Darryl, CTCA, Shannon, A.K.C., Meltex, Liverpool, ...

NNPS

noun, proper, plural e.g. Americans, Americas, Amharas, Amityvilles, Amusements, Anarcho-Syndicalists, Andalusians, Andes, Andruses, Angels, Animals, Anthony, Antilles, Antiques, Apache, Apaches, Apocrypha, ...

NNS

noun, common, plural e.g. undergraduates, scotches, bric-a-brac, products, bodyguards, facets, coasts, divestitures, storehouses, designs, clubs, fragrances, averages, subjectivists, apprehensions, muses, factory-jobs, ...

PDT

pre-determiner e.g. all, both, half, many, quite, such, sure, this

POS

genitive marker e.g. ', 's

PRP

pronoun, personal e.g. hers, herself, him, himself, hisself, it, itself, me, myself, one, oneself, ours, ourselves, ownself, self, she, thee, theirs, them, themselves, they, thou, thy, us

PRP$

pronoun, possessive e.g. her, his, mine, my, our, ours, their, thy, your

RB

adverb e.g. occasionally, unabatingly, maddeningly, adventurously, professedly, stirringly, prominently, technologically, magisterially, predominately, swiftly, fiscally, pitilessly, ...

RBR

adverb, comparative e.g. further, gloomier, grander, graver, greater, grimmer, harder, harsher, healthier, heavier, higher, however, larger, later, leaner, lengthier, less-, perfectly, lesser, lonelier, longer, louder, lower, more, ...

RBS

adverb, superlative e.g. best, biggest, bluntest, earliest, farthest, first, furthest, hardest, heartiest, highest, largest, least, less, most, nearest, second, tightest, worst

RBS

adverb, superlative e.g. best, biggest, bluntest, earliest, farthest, first, furthest, hardest, heartiest, highest, largest, least, less, most, nearest, second, tightest, worst

RBS

particle e.g. aboard, about, across, along, apart, around, aside, at, away, back, before, behind, by, crop, down, ever, fast, for, forth, from, go, high, i.e., in, into, just, later, low, more, off, on, open, out, over, per, pie, raising, start, teeth, that, through, under, unto, up, up-pp, upon, whole, with, you

RP

particle e.g. aboard, about, across, along, apart, around, aside, at, away, back, before, behind, by, crop, down, ever, fast, for, forth, from, go, high, i.e., in, into, just, later, low, more, off, on, open, out, over, per, pie, raising, start, teeth, that, through, under, unto, up, up-pp, upon, whole, with, you

SYM

symbol e.g. %, &, ', '', ''., ), )., *, +, ,., <, =, >, @, A[fj], U.S, U.S.S.R, *, **, ***

TO

"to" as preposition or infinitive marker e.g. to

UH

interjection e.g. Goodbye, Goody, Gosh, Wow, Jeepers, Jee-sus, Hubba, Hey, Kee-reist, Oops, amen, huh, howdy, uh, dammit, whammo, shucks, heck, anyways, whodunnit, honey, golly, man, baby, diddle, hush, sonuvabitch, ...

VB

verb, base form e.g. ask, assemble, assess, assign, assume, atone, attention, avoid, bake, balkanize, bank, begin, behold, believe, bend, benefit, bevel, beware, bless, boil, bomb, boost, brace, break, bring, broil, brush, build, ...

VBD

verb, base form e.g. ask, assemble, assess, assign, assume, atone, attention, avoid, bake, balkanize, bank, begin, behold, believe, bend, benefit, bevel, beware, bless, boil, bomb, boost, brace, break, bring, broil, brush, build, ...

VBG

verb, present participle or gerund e.g. telegraphing, stirring, focusing, angering, judging, stalling, lactating, hankerin', alleging, veering, capping, approaching, traveling, besieging, encrypting, interrupting, erasing, wincing, ...

VBN

verb, past participle e.g. multihulled, dilapidated, aerosolized, chaired, languished, panelized, used, experimented, flourished, imitated, reunifed, factored, condensed, sheared, unsettled, primed, dubbed, desired, ...

VBP

verb, present tense, not 3rd person singular e.g. predominate, wrap, resort, sue, twist, spill, cure, lengthen, brush, terminate, appear, tend, stray, glisten, obtain, comprise, detest, tease, attract, emphasize, mold, postpone, sever, return, wag, ...

VBZ

verb, present tense, 3rd person singular e.g. bases, reconstructs, marks, mixes, displeases, seals, carps, weaves, snatches, slumps, stretches, authorizes, smolders, pictures, emerges, stockpiles, seduces, fizzes, uses, bolsters, slaps, speaks, pleads, ...

WDT

WH-determiner e.g. that, what, whatever, which, whichever

WP

WH-determiner e.g. that, what, whatever, which, whichever

WP$

WH-pronoun, possessive e.g. whose

WRB

WH-adverb e.g. how, however, whence, whenever, where, whereby, whereever, wherein, whereof, why

References

Last updated