当前位置：首页 > news >正文

spaCy study notes[1]

news 2025/8/9 7:09:59

文章目录

the foundation of spaCy
references

the foundation of spaCy

spaCy is a free, open-source library for advanced Natural Language Processing (NLP) in Python,supporting for 75+ languages,industrial-strength natural language processing of product.
to applying spaCy demand 64-bit CPython 3.7+ and one of these OS such as Unix/Linux, macOS/OS X and Windows.

pip install -U pip setuptools wheel
pip install -U spacy

running spaCy with GPU can input the following commands.

spacy[cuda], spacy[cuda102], spacy[cuda112], spacy[cuda113]

the acitivity of GPU computation require for calling spacy.prefer_gpu or spacy.require_gpu().
4. installing spaCy’s trained pipelines(流水线) is so easy,just install them as Python packages.

python -m spacy download zh_core_web_sm

import spacy
nlp = spacy.load("zh_core_web_sm")
import zh_core_web_sm
nlp = zh_core_web_sm.load()
doc = nlp("今天是一个好日子因为发薪水了。")
print([(w.text, w.pos_) for w in doc])

PS E:\learn\learnpy> & "D:/Program Files/Python311/python.exe" e:/learn/learnpy/learn1.py
[('今天', 'NOUN'), ('是', 'VERB'), ('一个', 'X'), ('好日子', 'NOUN'), ('因为', 'ADP'), ('发薪', 'VERB'), ('水', 'NOUN'), ('了', 'PART'), ('。', 'PUNCT')]

getting a trained pipelines starts from downloading it manually or running pip.
if a language has a approtiate trained pipeline,then you can download it using the spacy download to apply it.
when you must reach a language but no proper trained pipeline can be used unfortunately, the action that importing them directly or using spacy.blank is only certainly strategy just as follows.

from spacy.lang.yo import Yoruba
nlp = Yoruba()  # use directly
nlp = spacy.blank("yo")  # blank instance

a blank pipeline just like as tokenizer, that means creating it is equivalent to only obtaining a tokenizer.to add more components from scratch, or for testing purposes demand to Initialize the language object directly.