Last updated
Last updated
Supported Labeling Types: Span Labeling
NLTK (Natural Language Toolkit) is an open-source Python library for natural language processing (NLP). It provides tools for text preprocessing such as tokenization, stemming, lemmatization, part-of-speech tagging, and more. In the context of our labeling platform, NLTK can be integrated to support various preprocessing tasks that help improve label consistency and model training quality. Its ease of use and rich set of linguistic resources make it a useful option for preparing and analyzing text data before or during the labeling process.
NLTK POS-tagging is performed using nltk.pos_tag
, which internally utilizes the nltk.PerceptronTagger
. This is a fast and accurate approach for part-of-speech tagging in English.
The underlying models are trained on the Wall Street Journal section of the Penn Treebank, providing strong performance on formal, edited text common in business documents.
The tagger assigns grammatical categories to words based on the UPenn Treebank Tagset, which includes categories like nouns, verbs, adjectives, adverbs, and more.
Fully integrated into the Datasaur Intelligence container for consistent, dependency-free operation.
Greedy Averaged Perceptron tagger, as implemented by Matthew Honnibal. See more implementation details here:>
Text preprocessing for consistency and model training improvements.
Supports syntactic analysis in annotation workflows.
Tag set: UPenn Treebank Tag Set
The detailed available here
dollar e.g. $
, -$
, --$
, A$
, C$
, HK$
, M$
, NZ$
, S$
, U.S.$
, US$
closing quotation mark e.g. '
, ''
opening parenthesis e.g. (
, [
, {
comma e.g. ,
dash e.g. --
sentence terminator e.g. .
, !
, ?
colon or ellipsis e.g. :
, ;
, ...
opening quotation mark e.g. `
, ``
conjunction, coordinating e.g. &
, 'n
, and
, both
, but
, either
, et
, for
, less
, minus
, neither
, nor
, or
, plus
, so
, therefore
, times
, v.
, versus
, vs.
, whether
, yet
numeral, cardinal e.g. mid-1890
, nine-thirty
, forty-two
, one-tenth
, ten
, million
, 0.5
, one
, forty-
, seven
, 1987
, twenty
, '79
, zero
, two
, 78-degrees
, eighty-four
, IX
, '60s
, .025
, fifteen
, 271,124
, dozen
, quintillion
, DM2,000
, ...
determiner e.g. all
, an
, another
, any
, both
, del
, each
, either
, every
, half
, la
, many
, much
, nary
, neither
, no
, some
, such
, that
, the
, them
, these
, this
, those
existential there e.g. there
foreign word e.g. gemeinschaft
, hund
, ich
, jeux
, habeas
, Haementeria
, Herr
, K'ang-si
, vous
, lutihaw
, alai
, je
, jour
, objets
, salutaris
, fille
, quibusdam
, pas
, trop
, Monte
, terram
, fiche
, oui
, corporis
, ...
preposition or conjunction, subordinating e.g. astride
, among
, uppon
, whether
, out
, inside
, pro
, despite
, on
, by
, throughout
, below
, within
, for
, towards
, near
, behind
, atop
, around
, if
, like
, until
, below
, next
, into
, if
, beside
, ...
adjective or numeral, ordinal e.g. third
, ill-mannered
, pre-war
, regrettable
, oiled
, calamitous
, first
, separable
, ectoplasmic
, battery-powered
, participatory
, fourth
, still-to-be-named
, multilingual
, multi-disciplinary
, ...
adjective, comparative e.g. bleaker
, braver
, breezier
, briefer
, brighter
, brisker
, broader
, bumper
, busier
, calmer
, cheaper
, choosier
, cleaner
, clearer
, closer
, colder
, commoner
, costlier
, cozier
, creamier
, crunchier
, cuter
, ...
adjective, superlative e.g. calmest
, cheapest
, choicest
, classiest
, cleanest
, clearest
, closest
, commonest
, corniest
, costliest
, crassest
, creepiest
, crudest
, cutest
, darkest
, deadliest
, dearest
, deepest
, densest
, dinkiest
, ...
list item marker e.g. A
, A.
, B
, B.
, C
, C.
, D
, E
, F
, First
, G
, H
, I
, J
, K
, One
, SP-44001
, SP-44002
, SP-44005
, SP-44007
, Second
, Third
, Three
, Two
, *
, a
, b
, c
, d
, first
, five
, four
, one
, six
, three
, two
modal auxiliary e.g. can
, cannot
, could
, couldn't
, dare
, may
, might
, must
, need
, ought
, shall
, should
, shouldn't
, will
, would
noun, common, singular or mass e.g. common-carrier
, cabbage
, knuckle-duster
, Casino
, afghan
, shed
, thermostat
, investment
, slide
, humour
, falloff
, slick
, wind
, hyena
, override
, subhumanity
, machinist
, ...
noun, proper, singular e.g. Motown
, Venneboerger
, Czestochwa
, Ranzer
, Conchita
, Trumplane
, Christos
, Oceanside
, Escobar
, Kreisler
, Sawyer
, Cougar
, Yvette
, Ervin
, Darryl
, Shannon
, A.K.C.
, Meltex
, Liverpool
, ...
noun, proper, plural e.g. Americans
, Americas
, Amharas
, Amityvilles
, Amusements
, Anarcho-Syndicalists
, Andalusians
, Andes
, Andruses
, Angels
, Animals
, Anthony
, Antilles
, Antiques
, Apache
, Apaches
, Apocrypha
, ...
noun, common, plural e.g. undergraduates
, scotches
, bric-a-brac
, products
, bodyguards
, facets
, coasts
, divestitures
, storehouses
, designs
, clubs
, fragrances
, averages
, subjectivists
, apprehensions
, muses
, factory-jobs
, ...
pre-determiner e.g. all
, both
, half
, many
, quite
, such
, sure
, this
genitive marker e.g. '
, 's
pronoun, personal e.g. hers
, herself
, him
, himself
, hisself
, it
, itself
, me
, myself
, one
, oneself
, ours
, ourselves
, ownself
, self
, she
, thee
, theirs
, them
, themselves
, they
, thou
, thy
, us
pronoun, possessive e.g. her
, his
, mine
, my
, our
, ours
, their
, thy
, your
adverb e.g. occasionally
, unabatingly
, maddeningly
, adventurously
, professedly
, stirringly
, prominently
, technologically
, magisterially
, predominately
, swiftly
, fiscally
, pitilessly
, ...
adverb, comparative e.g. further
, gloomier
, grander
, graver
, greater
, grimmer
, harder
, harsher
, healthier
, heavier
, higher
, however
, larger
, later
, leaner
, lengthier
, less-
, perfectly
, lesser
, lonelier
, longer
, louder
, lower
, more
, ...
adverb, superlative e.g. best
, biggest
, bluntest
, earliest
, farthest
, first
, furthest
, hardest
, heartiest
, highest
, largest
, least
, less
, most
, nearest
, second
, tightest
, worst
adverb, superlative e.g. best
, biggest
, bluntest
, earliest
, farthest
, first
, furthest
, hardest
, heartiest
, highest
, largest
, least
, less
, most
, nearest
, second
, tightest
, worst
particle e.g. aboard
, about
, across
, along
, apart
, around
, aside
, at
, away
, back
, before
, behind
, by
, crop
, down
, ever
, fast
, for
, forth
, from
, go
, high
, i.e.
, in
, into
, just
, later
, low
, more
, off
, on
, open
, out
, over
, per
, pie
, raising
, start
, teeth
, that
, through
, under
, unto
, up
, up-pp
, upon
, whole
, with
, you
particle e.g. aboard
, about
, across
, along
, apart
, around
, aside
, at
, away
, back
, before
, behind
, by
, crop
, down
, ever
, fast
, for
, forth
, from
, go
, high
, i.e.
, in
, into
, just
, later
, low
, more
, off
, on
, open
, out
, over
, per
, pie
, raising
, start
, teeth
, that
, through
, under
, unto
, up
, up-pp
, upon
, whole
, with
, you
symbol e.g. %
, &
, '
, ''
, ''.
, )
, ).
, *
, +
, ,.
, <
, =
, >
, @
, A[fj]
, U.S
, U.S.S.R
, *
, **
, ***
"to" as preposition or infinitive marker e.g. to
interjection e.g. Goodbye
, Goody
, Gosh
, Wow
, Jeepers
, Jee-sus
, Hubba
, Hey
, Kee-reist
, Oops
, amen
, huh
, howdy
, uh
, dammit
, whammo
, shucks
, heck
, anyways
, whodunnit
, honey
, golly
, man
, baby
, diddle
, hush
, sonuvabitch
, ...
verb, base form e.g. ask
, assemble
, assess
, assign
, assume
, atone
, attention
, avoid
, bake
, balkanize
, bank
, begin
, behold
, believe
, bend
, benefit
, bevel
, beware
, bless
, boil
, bomb
, boost
, brace
, break
, bring
, broil
, brush
, build
, ...
verb, base form e.g. ask
, assemble
, assess
, assign
, assume
, atone
, attention
, avoid
, bake
, balkanize
, bank
, begin
, behold
, believe
, bend
, benefit
, bevel
, beware
, bless
, boil
, bomb
, boost
, brace
, break
, bring
, broil
, brush
, build
, ...
verb, present participle or gerund e.g. telegraphing
, stirring
, focusing
, angering
, judging
, stalling
, lactating
, hankerin'
, alleging
, veering
, capping
, approaching
, traveling
, besieging
, encrypting
, interrupting
, erasing
, wincing
, ...
verb, past participle e.g. multihulled
, dilapidated
, aerosolized
, chaired
, languished
, panelized
, used
, experimented
, flourished
, imitated
, reunifed
, factored
, condensed
, sheared
, unsettled
, primed
, dubbed
, desired
, ...
verb, present tense, not 3rd person singular e.g. predominate
, wrap
, resort
, sue
, twist
, spill
, cure
, lengthen
, brush
, terminate
, appear
, tend
, stray
, glisten
, obtain
, comprise
, detest
, tease
, attract
, emphasize
, mold
, postpone
, sever
, return
, wag
, ...
verb, present tense, 3rd person singular e.g. bases
, reconstructs
, marks
, mixes
, displeases
, seals
, carps
, weaves
, snatches
, slumps
, stretches
, authorizes
, smolders
, pictures
, emerges
, stockpiles
, seduces
, fizzes
, uses
, bolsters
, slaps
, speaks
, pleads
, ...
WH-determiner e.g. that
, what
, whatever
, which
, whichever
WH-determiner e.g. that
, what
, whatever
, which
, whichever
WH-pronoun, possessive e.g. whose
WH-adverb e.g. how
, however
, whence
, whenever
, where
, whereby
, whereever
, wherein
, whereof
, why
UPenn Treebank Docs
python -c "import nltk;"