NLTK
Last updated
Last updated
Supported Labeling Types: Span Labeling
NLTK (Natural Language Toolkit) is an open-source Python library for natural language processing (NLP). It provides tools for text preprocessing such as tokenization, stemming, lemmatization, part-of-speech tagging, and more. In the context of our labeling platform, NLTK can be integrated to support various preprocessing tasks that help improve label consistency and model training quality. Its ease of use and rich set of linguistic resources make it a useful option for preparing and analyzing text data before or during the labeling process.
NLTK POS-tagging is performed using nltk.pos_tag
, which internally utilizes the nltk.PerceptronTagger
. This is a fast and accurate approach for part-of-speech tagging in English.
The underlying models are trained on the Wall Street Journal section of the Penn Treebank, providing strong performance on formal, edited text common in business documents.
The tagger assigns grammatical categories to words based on the UPenn Treebank Tagset, which includes categories like nouns, verbs, adjectives, adverbs, and more.
Fully integrated into the Datasaur Intelligence container for consistent, dependency-free operation.
Greedy Averaged Perceptron tagger, as implemented by Matthew Honnibal. See more implementation details here: https://explosion.ai/blog/part-of-speech-pos-tagger-in-python>
Text preprocessing for consistency and model training improvements.
Supports syntactic analysis in annotation workflows.
Tag set: UPenn Treebank Tag Set
The detailed available here
$
dollar e.g. $
, -$
, --$
, A$
, C$
, HK$
, M$
, NZ$
, S$
, U.S.$
, US$
''
closing quotation mark e.g. '
, ''
(
opening parenthesis e.g. (
, [
, {
,
comma e.g. ,
--
dash e.g. --
.
sentence terminator e.g. .
, !
, ?
:
colon or ellipsis e.g. :
, ;
, ...
``
opening quotation mark e.g. `
, ``
CC
conjunction, coordinating e.g. &
, 'n
, and
, both
, but
, either
, et
, for
, less
, minus
, neither
, nor
, or
, plus
, so
, therefore
, times
, v.
, versus
, vs.
, whether
, yet
CD
numeral, cardinal e.g. mid-1890
, nine-thirty
, forty-two
, one-tenth
, ten
, million
, 0.5
, one
, forty-
, seven
, 1987
, twenty
, '79
, zero
, two
, 78-degrees
, eighty-four
, IX
, '60s
, .025
, fifteen
, 271,124
, dozen
, quintillion
, DM2,000
, ...
DT
determiner e.g. all
, an
, another
, any
, both
, del
, each
, either
, every
, half
, la
, many
, much
, nary
, neither
, no
, some
, such
, that
, the
, them
, these
, this
, those
EX
existential there e.g. there
FW
foreign word e.g. gemeinschaft
, hund
, ich
, jeux
, habeas
, Haementeria
, Herr
, K'ang-si
, vous
, lutihaw
, alai
, je
, jour
, objets
, salutaris
, fille
, quibusdam
, pas
, trop
, Monte
, terram
, fiche
, oui
, corporis
, ...
IN
preposition or conjunction, subordinating e.g. astride
, among
, uppon
, whether
, out
, inside
, pro
, despite
, on
, by
, throughout
, below
, within
, for
, towards
, near
, behind
, atop
, around
, if
, like
, until
, below
, next
, into
, if
, beside
, ...
JJ
adjective or numeral, ordinal e.g. third
, ill-mannered
, pre-war
, regrettable
, oiled
, calamitous
, first
, separable
, ectoplasmic
, battery-powered
, participatory
, fourth
, still-to-be-named
, multilingual
, multi-disciplinary
, ...
JJR
adjective, comparative e.g. bleaker
, braver
, breezier
, briefer
, brighter
, brisker
, broader
, bumper
, busier
, calmer
, cheaper
, choosier
, cleaner
, clearer
, closer
, colder
, commoner
, costlier
, cozier
, creamier
, crunchier
, cuter
, ...
JJS
adjective, superlative e.g. calmest
, cheapest
, choicest
, classiest
, cleanest
, clearest
, closest
, commonest
, corniest
, costliest
, crassest
, creepiest
, crudest
, cutest
, darkest
, deadliest
, dearest
, deepest
, densest
, dinkiest
, ...
LS
list item marker e.g. A
, A.
, B
, B.
, C
, C.
, D
, E
, F
, First
, G
, H
, I
, J
, K
, One
, SP-44001
, SP-44002
, SP-44005
, SP-44007
, Second
, Third
, Three
, Two
, *
, a
, b
, c
, d
, first
, five
, four
, one
, six
, three
, two
MD
modal auxiliary e.g. can
, cannot
, could
, couldn't
, dare
, may
, might
, must
, need
, ought
, shall
, should
, shouldn't
, will
, would
NN
noun, common, singular or mass e.g. common-carrier
, cabbage
, knuckle-duster
, Casino
, afghan
, shed
, thermostat
, investment
, slide
, humour
, falloff
, slick
, wind
, hyena
, override
, subhumanity
, machinist
, ...
NNP
noun, proper, singular e.g. Motown
, Venneboerger
, Czestochwa
, Ranzer
, Conchita
, Trumplane
, Christos
, Oceanside
, Escobar
, Kreisler
, Sawyer
, Cougar
, Yvette
, Ervin
, ODI
, Darryl
, CTCA
, Shannon
, A.K.C.
, Meltex
, Liverpool
, ...
NNPS
noun, proper, plural e.g. Americans
, Americas
, Amharas
, Amityvilles
, Amusements
, Anarcho-Syndicalists
, Andalusians
, Andes
, Andruses
, Angels
, Animals
, Anthony
, Antilles
, Antiques
, Apache
, Apaches
, Apocrypha
, ...
NNS
noun, common, plural e.g. undergraduates
, scotches
, bric-a-brac
, products
, bodyguards
, facets
, coasts
, divestitures
, storehouses
, designs
, clubs
, fragrances
, averages
, subjectivists
, apprehensions
, muses
, factory-jobs
, ...
PDT
pre-determiner e.g. all
, both
, half
, many
, quite
, such
, sure
, this
POS
genitive marker e.g. '
, 's
PRP
pronoun, personal e.g. hers
, herself
, him
, himself
, hisself
, it
, itself
, me
, myself
, one
, oneself
, ours
, ourselves
, ownself
, self
, she
, thee
, theirs
, them
, themselves
, they
, thou
, thy
, us
PRP$
pronoun, possessive e.g. her
, his
, mine
, my
, our
, ours
, their
, thy
, your
RB
adverb e.g. occasionally
, unabatingly
, maddeningly
, adventurously
, professedly
, stirringly
, prominently
, technologically
, magisterially
, predominately
, swiftly
, fiscally
, pitilessly
, ...
RBR
adverb, comparative e.g. further
, gloomier
, grander
, graver
, greater
, grimmer
, harder
, harsher
, healthier
, heavier
, higher
, however
, larger
, later
, leaner
, lengthier
, less-
, perfectly
, lesser
, lonelier
, longer
, louder
, lower
, more
, ...
RBS
adverb, superlative e.g. best
, biggest
, bluntest
, earliest
, farthest
, first
, furthest
, hardest
, heartiest
, highest
, largest
, least
, less
, most
, nearest
, second
, tightest
, worst
RBS
adverb, superlative e.g. best
, biggest
, bluntest
, earliest
, farthest
, first
, furthest
, hardest
, heartiest
, highest
, largest
, least
, less
, most
, nearest
, second
, tightest
, worst
RBS
particle e.g. aboard
, about
, across
, along
, apart
, around
, aside
, at
, away
, back
, before
, behind
, by
, crop
, down
, ever
, fast
, for
, forth
, from
, go
, high
, i.e.
, in
, into
, just
, later
, low
, more
, off
, on
, open
, out
, over
, per
, pie
, raising
, start
, teeth
, that
, through
, under
, unto
, up
, up-pp
, upon
, whole
, with
, you
RP
particle e.g. aboard
, about
, across
, along
, apart
, around
, aside
, at
, away
, back
, before
, behind
, by
, crop
, down
, ever
, fast
, for
, forth
, from
, go
, high
, i.e.
, in
, into
, just
, later
, low
, more
, off
, on
, open
, out
, over
, per
, pie
, raising
, start
, teeth
, that
, through
, under
, unto
, up
, up-pp
, upon
, whole
, with
, you
SYM
symbol e.g. %
, &
, '
, ''
, ''.
, )
, ).
, *
, +
, ,.
, <
, =
, >
, @
, A[fj]
, U.S
, U.S.S.R
, *
, **
, ***
TO
"to" as preposition or infinitive marker e.g. to
UH
interjection e.g. Goodbye
, Goody
, Gosh
, Wow
, Jeepers
, Jee-sus
, Hubba
, Hey
, Kee-reist
, Oops
, amen
, huh
, howdy
, uh
, dammit
, whammo
, shucks
, heck
, anyways
, whodunnit
, honey
, golly
, man
, baby
, diddle
, hush
, sonuvabitch
, ...
VB
verb, base form e.g. ask
, assemble
, assess
, assign
, assume
, atone
, attention
, avoid
, bake
, balkanize
, bank
, begin
, behold
, believe
, bend
, benefit
, bevel
, beware
, bless
, boil
, bomb
, boost
, brace
, break
, bring
, broil
, brush
, build
, ...
VBD
verb, base form e.g. ask
, assemble
, assess
, assign
, assume
, atone
, attention
, avoid
, bake
, balkanize
, bank
, begin
, behold
, believe
, bend
, benefit
, bevel
, beware
, bless
, boil
, bomb
, boost
, brace
, break
, bring
, broil
, brush
, build
, ...
VBG
verb, present participle or gerund e.g. telegraphing
, stirring
, focusing
, angering
, judging
, stalling
, lactating
, hankerin'
, alleging
, veering
, capping
, approaching
, traveling
, besieging
, encrypting
, interrupting
, erasing
, wincing
, ...
VBN
verb, past participle e.g. multihulled
, dilapidated
, aerosolized
, chaired
, languished
, panelized
, used
, experimented
, flourished
, imitated
, reunifed
, factored
, condensed
, sheared
, unsettled
, primed
, dubbed
, desired
, ...
VBP
verb, present tense, not 3rd person singular e.g. predominate
, wrap
, resort
, sue
, twist
, spill
, cure
, lengthen
, brush
, terminate
, appear
, tend
, stray
, glisten
, obtain
, comprise
, detest
, tease
, attract
, emphasize
, mold
, postpone
, sever
, return
, wag
, ...
VBZ
verb, present tense, 3rd person singular e.g. bases
, reconstructs
, marks
, mixes
, displeases
, seals
, carps
, weaves
, snatches
, slumps
, stretches
, authorizes
, smolders
, pictures
, emerges
, stockpiles
, seduces
, fizzes
, uses
, bolsters
, slaps
, speaks
, pleads
, ...
WDT
WH-determiner e.g. that
, what
, whatever
, which
, whichever
WP
WH-determiner e.g. that
, what
, whatever
, which
, whichever
WP$
WH-pronoun, possessive e.g. whose
WRB
WH-adverb e.g. how
, however
, whence
, whenever
, where
, whereby
, whereever
, wherein
, whereof
, why
UPenn Treebank Docs https://catalog.ldc.upenn.edu/docs/LDC99T42/
python -c "import nltk; nltk.help.upenn_tagset()"