import spacy
nlp = spacy.load('en_core_web_md') # "Medium-sized" English model (contains word vectors)
doc = nlp(u"This is a sentence.")
print([(w.text, w.pos_, w.dep_) for w in doc])
Just the tokenizer:
doc = nlp(u"Let's make the U.S.A. great again! Whadya say, Bob? #blessed")
for token in doc:
print(token.text)
Part of Speech (POS) tags and dependencies
for token in doc:
print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_, token.shape_, token.is_alpha, token.is_stop)
spacy.explain("NNP")
Visualize the dependency parse:
from spacy import displacy
Outside of a Jupyter notebook, use displacy.serve
displacy.render(doc,style="dep")
# displacy.serve(doc,style="dep")
for token in doc:
print(token.text, token.dep_, token.head.text, token.head.pos_, [child for child in token.children] )
review = nlp("It's the 21st century, and a group of space marines are sent to destroy a monster that terrorizes the entire galaxy, the ultimate threat... A Leprechaun! This was a very funny movie, with the Leprechaun teaming up with a dastardly space princess who wants the Leprechaun's gold. Together the killer their way through a group of hilarious characters (but not as hilarious as in Leprechaun 3). Especially the Doctor, which Dr. Evil (From Austin Powers) resembles. Altough this came out in February '97 while the first Austing came out in May '97. Anyway, this was the 4th highest renting horror movie of '97. This is the second best Lep movie, following behind #3. Both of this were directed by Brian Trenchard-Smith, and I think he should direct Leprechaun 5: Lep In The Hood (Which is going to be theatrical and have more comedy than horror, it will star Warwick Davis and Ice-T and will be set in an inner city Los Angeles neighborhood. It will film in late summer, '99.")
for sent in review.sents:
print(sent.text)
for chunk in review.noun_chunks:
print(chunk.text, ',', chunk.root.text, chunk.root.dep_, chunk.root.head.text)
for ent in review.ents:
print(ent.text, ent.start_char, ent.end_char, ent.label_)
displacy.render(review,style="ent")
spaCy has ways to add your own entities and to train and update entity predictions of a model.
(Note ... the "small" language models in spaCy do not contain word vectors)
tokens = nlp(u'dog puppy cat kitten fish banana mango pet run talk yellow blue the !')
for token1 in tokens:
for token2 in tokens:
print(token1.text, token2.text, token1.similarity(token2))