In [1]:
import spacy

Annotations

In [2]:
nlp = spacy.load('en_core_web_md') # "Medium-sized" English model (contains word vectors)
doc = nlp(u"This is a sentence.")
print([(w.text, w.pos_, w.dep_) for w in doc])
[('This', 'DET', 'nsubj'), ('is', 'VERB', 'ROOT'), ('a', 'DET', 'det'), ('sentence', 'NOUN', 'attr'), ('.', 'PUNCT', 'punct')]

Just the tokenizer:

In [3]:
doc = nlp(u"Let's make the U.S.A. great again! Whadya say, Bob? #blessed")
for token in doc:
    print(token.text)
Let
's
make
the
U.S.A.
great
again
!
Whadya
say
,
Bob
?
#
blessed

Part of Speech (POS) tags and dependencies

In [4]:
for token in doc:
    print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_, token.shape_, token.is_alpha, token.is_stop)
Let let VERB VB ROOT Xxx True False
's -PRON- PRON PRP nsubj 'x False True
make make VERB VB ccomp xxxx True True
the the DET DT det xxx True True
U.S.A. U.S.A. PROPN NNP nsubj X.X.X. False False
great great ADJ JJ ccomp xxxx True False
again again ADV RB advmod xxxx True True
! ! PUNCT . punct ! False False
Whadya Whadya VERB MD nsubj Xxxxx True False
say say VERB VBP ROOT xxx True True
, , PUNCT , punct , False False
Bob Bob PROPN NNP npadvmod Xxx True False
? ? PUNCT . punct ? False False
# # SYM $ nsubj # False False
blessed bless VERB VBN ROOT xxxx True False
In [5]:
spacy.explain("NNP")
Out[5]:
'noun, proper singular'

Visualize the dependency parse:

In [6]:
from spacy import displacy

Outside of a Jupyter notebook, use displacy.serve

In [7]:
displacy.render(doc,style="dep")
# displacy.serve(doc,style="dep")
Let VERB 's PRON make VERB the DET U.S.A. PROPN great ADJ again! ADV Whadya VERB say, VERB Bob? # PROPN blessed VERB nsubj ccomp det nsubj ccomp advmod nsubj npadvmod
In [31]:
for token in doc:
    print(token.text, token.dep_, token.head.text, token.head.pos_, [child for child in token.children] )
Let ROOT Let VERB [make, !]
's nsubj make VERB []
make ccomp Let VERB ['s, great, again]
the det U.S.A. PROPN []
U.S.A. nsubj great ADJ [the]
great ccomp make VERB [U.S.A.]
again advmod make VERB []
! punct Let VERB []
Whadya nsubj say VERB []
say ROOT say VERB [Whadya, ,, Bob, ?]
, punct say VERB []
Bob npadvmod say VERB []
? punct say VERB []
# nsubj blessed VERB []
blessed ROOT blessed VERB [#]

Sentence segmentation

In [8]:
review = nlp("It's the 21st century, and a group of space marines are sent to destroy a monster that terrorizes the entire galaxy, the ultimate threat... A Leprechaun! This was a very funny movie, with the Leprechaun teaming up with a dastardly space princess who wants the Leprechaun's gold. Together the killer their way through a group of hilarious characters (but not as hilarious as in Leprechaun 3). Especially the Doctor, which Dr. Evil (From Austin Powers) resembles. Altough this came out in February '97 while the first Austing came out in May '97. Anyway, this was the 4th highest renting horror movie of '97. This is the second best Lep movie, following behind #3. Both of this were directed by Brian Trenchard-Smith, and I think he should direct Leprechaun 5: Lep In The Hood (Which is going to be theatrical and have more comedy than horror, it will star Warwick Davis and Ice-T and will be set in an inner city Los Angeles neighborhood. It will film in late summer, '99.")
In [32]:
for sent in review.sents:
    print(sent.text)
It's the 21st century, and a group of space marines are sent to destroy a monster that terrorizes the entire galaxy, the ultimate threat...
A Leprechaun!
This was a very funny movie, with the Leprechaun teaming up with a dastardly space princess who wants the Leprechaun's gold.
Together the killer their way through a group of hilarious characters (but not as hilarious as in Leprechaun 3).
Especially the Doctor, which Dr. Evil (From Austin Powers) resembles.
Altough this came out in February '97 while the first Austing came out in May '97.
Anyway, this was the 4th highest renting horror movie of '97.
This is the second best Lep movie, following behind #3.
Both of this were directed by Brian Trenchard-Smith, and I think he should direct Leprechaun 5: Lep In The Hood (Which is going to be theatrical and have more comedy than horror, it will star Warwick Davis and Ice-T and will be set in an inner city Los Angeles neighborhood.
It will film in late summer, '99.

Extract "noun chunks" (a noun plus words describing that noun):

In [30]:
for chunk in review.noun_chunks:
    print(chunk.text, ',', chunk.root.text, chunk.root.dep_, chunk.root.head.text)
It , It nsubj 's
the 21st century , century attr 's
a group , group nsubjpass sent
space marines , marines pobj of
a monster , monster dobj destroy
the entire galaxy , galaxy dobj terrorizes
the ultimate threat , threat appos galaxy
A Leprechaun , Leprechaun ROOT Leprechaun
a very funny movie , movie attr was
the Leprechaun , Leprechaun nsubj teaming
a dastardly space princess , princess pobj with
who , who nsubj wants
the Leprechaun's gold , gold dobj wants
the killer , killer pobj Together
a group , group pobj through
hilarious characters , characters pobj of
Leprechaun , Leprechaun pobj in
Especially the Doctor , Doctor ROOT Doctor
Dr. Evil , Evil nsubj resembles
Austin Powers , Powers pobj From
February , February pobj in
the first Austing , Austing nsubj came
the 4th highest renting horror movie , movie attr was
the second best Lep movie , movie attr is
Brian Trenchard-Smith , Smith pobj by
I , I nsubj think
he , he nsubj direct
Leprechaun , Leprechaun dobj direct
Lep , Lep nsubj star
The Hood , Hood pobj In
more comedy , comedy dobj have
horror , horror pobj than
it , it nsubj star
Warwick Davis , Davis dobj star
Ice-T , T conj Davis
an inner city Los Angeles neighborhood , neighborhood pobj in
It , It nsubj film
late summer , summer pobj in

Entity extraction

In [9]:
for ent in review.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)
the 21st century 5 21 DATE
Leprechaun 142 152 NORP
Leprechaun 192 202 NORP
Leprechaun 260 270 NORP
Leprechaun 377 387 LANGUAGE
Evil 425 429 PERSON
Austin Powers 436 449 PERSON
February '97 487 499 DATE
first 510 515 ORDINAL
Austing 516 523 ORG
May '97 536 543 DATE
4th 566 569 ORDINAL
97 603 605 CARDINAL
second 619 625 ORDINAL
Lep 631 634 PERSON
3 660 661 MONEY
Brian Trenchard-Smith 693 714 PERSON
Leprechaun 5 745 757 LAW
Lep In The Hood 759 774 PERSON
Warwick Davis 855 868 PERSON
Los Angeles 912 923 GPE
late summer 954 965 DATE
99 968 970 CARDINAL
In [10]:
displacy.render(review,style="ent")
It's the 21st century DATE , and a group of space marines are sent to destroy a monster that terrorizes the entire galaxy, the ultimate threat... A Leprechaun NORP ! This was a very funny movie, with the Leprechaun NORP teaming up with a dastardly space princess who wants the Leprechaun NORP 's gold. Together the killer their way through a group of hilarious characters (but not as hilarious as in Leprechaun LANGUAGE 3). Especially the Doctor, which Dr. Evil PERSON (From Austin Powers PERSON ) resembles. Altough this came out in February '97 DATE while the first ORDINAL Austing ORG came out in May '97 DATE . Anyway, this was the 4th ORDINAL highest renting horror movie of ' 97 CARDINAL . This is the second ORDINAL best Lep PERSON movie, following behind # 3 MONEY . Both of this were directed by Brian Trenchard-Smith PERSON , and I think he should direct Leprechaun 5 LAW : Lep In The Hood PERSON (Which is going to be theatrical and have more comedy than horror, it will star Warwick Davis PERSON and Ice-T and will be set in an inner city Los Angeles GPE neighborhood. It will film in late summer DATE , ' 99 CARDINAL .

spaCy has ways to add your own entities and to train and update entity predictions of a model.

Pretrained embeddings and similarity

(Note ... the "small" language models in spaCy do not contain word vectors)

In [14]:
tokens = nlp(u'dog puppy cat kitten fish banana mango pet run talk yellow blue the !')
for token1 in tokens:
    for token2 in tokens:
        print(token1.text, token2.text, token1.similarity(token2))
dog dog 1.0
dog puppy 0.85852146
dog cat 0.80168545
dog kitten 0.7035338
dog fish 0.40854338
dog banana 0.24327643
dog mango 0.18698534
dog pet 0.8057451
dog run 0.30471605
dog talk 0.2870807
dog yellow 0.28776848
dog blue 0.3140648
dog the 0.2935314
dog ! 0.29852206
puppy dog 0.85852146
puppy puppy 1.0
puppy cat 0.7073982
puppy kitten 0.7881503
puppy fish 0.29882255
puppy banana 0.22722068
puppy mango 0.19454442
puppy pet 0.71200883
puppy run 0.24046384
puppy talk 0.21253219
puppy yellow 0.25740814
puppy blue 0.28407583
puppy the 0.19345692
puppy ! 0.2728943
cat dog 0.80168545
cat puppy 0.7073982
cat cat 1.0
cat kitten 0.8215553
cat fish 0.41806534
cat banana 0.28154364
cat mango 0.20926963
cat pet 0.7505457
cat run 0.2620219
cat talk 0.22100182
cat yellow 0.33163774
cat blue 0.3623063
cat the 0.23351584
cat ! 0.29702345
kitten dog 0.7035338
kitten puppy 0.7881503
kitten cat 0.8215553
kitten kitten 1.0
kitten fish 0.30143547
kitten banana 0.25732034
kitten mango 0.20461525
kitten pet 0.6350592
kitten run 0.15976925
kitten talk 0.13019286
kitten yellow 0.28888
kitten blue 0.3052467
kitten the 0.14646468
kitten ! 0.20587482
fish dog 0.40854338
fish puppy 0.29882255
fish cat 0.41806534
fish kitten 0.30143547
fish fish 1.0
fish banana 0.37149978
fish mango 0.36966476
fish pet 0.39718428
fish run 0.25482512
fish talk 0.20104727
fish yellow 0.34719518
fish blue 0.36439574
fish the 0.2904115
fish ! 0.22938438
banana dog 0.24327643
banana puppy 0.22722068
banana cat 0.28154364
banana kitten 0.25732034
banana fish 0.37149978
banana banana 1.0
banana mango 0.7421036
banana pet 0.17017557
banana run 0.15091617
banana talk 0.13299522
banana yellow 0.41229445
banana blue 0.308636
banana the 0.17734134
banana ! 0.27213
mango dog 0.18698534
mango puppy 0.19454442
mango cat 0.20926963
mango kitten 0.20461525
mango fish 0.36966476
mango banana 0.7421036
mango mango 1.0
mango pet 0.15797481
mango run 0.047063652
mango talk 0.048520673
mango yellow 0.38050345
mango blue 0.29126176
mango the 0.108291835
mango ! 0.1488518
pet dog 0.8057451
pet puppy 0.71200883
pet cat 0.7505457
pet kitten 0.6350592
pet fish 0.39718428
pet banana 0.17017557
pet mango 0.15797481
pet pet 1.0
pet run 0.2093848
pet talk 0.22208202
pet yellow 0.22260952
pet blue 0.2349961
pet the 0.22014074
pet ! 0.24923714
run dog 0.30471605
run puppy 0.24046384
run cat 0.2620219
run kitten 0.15976925
run fish 0.25482512
run banana 0.15091617
run mango 0.047063652
run pet 0.2093848
run run 1.0
run talk 0.36677453
run yellow 0.15468088
run blue 0.19729847
run the 0.43286276
run ! 0.21172273
talk dog 0.2870807
talk puppy 0.21253219
talk cat 0.22100182
talk kitten 0.13019286
talk fish 0.20104727
talk banana 0.13299522
talk mango 0.048520673
talk pet 0.22208202
talk run 0.36677453
talk talk 1.0
talk yellow 0.10973917
talk blue 0.15201746
talk the 0.36570722
talk ! 0.31918067
yellow dog 0.28776848
yellow puppy 0.25740814
yellow cat 0.33163774
yellow kitten 0.28888
yellow fish 0.34719518
yellow banana 0.41229445
yellow mango 0.38050345
yellow pet 0.22260952
yellow run 0.15468088
yellow talk 0.10973917
yellow yellow 1.0
yellow blue 0.81883496
yellow the 0.2708097
yellow ! 0.22664054
blue dog 0.3140648
blue puppy 0.28407583
blue cat 0.3623063
blue kitten 0.3052467
blue fish 0.36439574
blue banana 0.308636
blue mango 0.29126176
blue pet 0.2349961
blue run 0.19729847
blue talk 0.15201746
blue yellow 0.81883496
blue blue 1.0
blue the 0.32360864
blue ! 0.2346622
the dog 0.2935314
the puppy 0.19345692
the cat 0.23351584
the kitten 0.14646468
the fish 0.2904115
the banana 0.17734134
the mango 0.108291835
the pet 0.22014074
the run 0.43286276
the talk 0.36570722
the yellow 0.2708097
the blue 0.32360864
the the 1.0
the ! 0.19884512
! dog 0.29852206
! puppy 0.2728943
! cat 0.29702345
! kitten 0.20587482
! fish 0.22938438
! banana 0.27213
! mango 0.1488518
! pet 0.24923714
! run 0.21172273
! talk 0.31918067
! yellow 0.22664054
! blue 0.2346622
! the 0.19884512
! ! 1.0
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]: