当前位置: 首页 > news >正文

spaCy study notes[1]

文章目录

  • the foundation of spaCy
  • references

the foundation of spaCy

  1. spaCy is a free, open-source library for advanced Natural Language Processing (NLP) in Python,supporting for 75+ languages,industrial-strength natural language processing of product.
  2. to applying spaCy demand 64-bit CPython 3.7+ and one of these OS such as Unix/Linux, macOS/OS X and Windows.
pip install -U pip setuptools wheel
pip install -U spacy
  1. running spaCy with GPU can input the following commands.
spacy[cuda], spacy[cuda102], spacy[cuda112], spacy[cuda113]

the acitivity of GPU computation require for calling spacy.prefer_gpu or spacy.require_gpu().
4. installing spaCy’s trained pipelines(流水线) is so easy,just install them as Python packages.

python -m spacy download zh_core_web_sm
import spacy
nlp = spacy.load("zh_core_web_sm")
import zh_core_web_sm
nlp = zh_core_web_sm.load()
doc = nlp("今天是一个好日子因为发薪水了。")
print([(w.text, w.pos_) for w in doc])
PS E:\learn\learnpy> & "D:/Program Files/Python311/python.exe" e:/learn/learnpy/learn1.py
[('今天', 'NOUN'), ('是', 'VERB'), ('一个', 'X'), ('好日子', 'NOUN'), ('因为', 'ADP'), ('发薪', 'VERB'), ('水', 'NOUN'), ('了', 'PART'), ('。', 'PUNCT')]

getting a trained pipelines starts from downloading it manually or running pip.
if a language has a approtiate trained pipeline,then you can download it using the spacy download to apply it.
when you must reach a language but no proper trained pipeline can be used unfortunately, the action that importing them directly or using spacy.blank is only certainly strategy just as follows.

from spacy.lang.yo import Yoruba
nlp = Yoruba()  # use directly
nlp = spacy.blank("yo")  # blank instance

a blank pipeline just like as tokenizer, that means creating it is equivalent to only obtaining a tokenizer.to add more components from scratch, or for testing purposes demand to Initialize the language object directly.

references

  1. https://spacy.io/
http://www.lryc.cn/news/614018.html

相关文章:

  • 使用Python+selenium实现第一个自动化测试脚本
  • MySQL的触发器:
  • 什么是Serverless(无服务器架构)
  • ORACLE看当前连接数的方法
  • pycharm常见环境配置和快捷键
  • isulad + harbor私有仓库登录
  • 特征值和特征向量的直觉
  • 【大模型】(实践版)Qwen2.5-VL-7B-Instruct模型量化以及运行测试
  • MCP实现:.Net实现MCP服务端 + Ollama ,MCP服务端工具调用
  • 基于开源AI智能名片链动2+1模式S2B2C商城小程序的运营策略创新研究
  • k8s调度问题
  • Android 的CameraX的使用(配置,预览,拍照,图像分析,录视频)
  • 自动驾驶数据闭环
  • WiFi 核心概念与实战用例全解
  • Redis基础数据类型
  • 【DFS系列 | 递归】DFS算法入门:递归原理与实现详解
  • 【MySQL】初识索引
  • 优选算法2
  • Redis中String数据结构为什么以长度44为embstr和raw实现的分界线?
  • 【JavaEE】(10) JavaEE 简介
  • 多级缓存架构:新品咖啡上线引发的数据库压力风暴与高并发实战化解方案
  • Spring Boot Redis 缓存完全指南
  • 破解 Django N+1 查询困境:使用 select_related 与 prefetch_related 实践指南
  • sqlite的sql语法与技术架构研究
  • http请求响应
  • npm run 常见脚本
  • token过期为了保证安全,refresh token不过期,那么拿到refresh token就可以获取token,不还是不安全吗
  • C/C++与JavaScript的WebAssembly协作开发指南
  • 【科研绘图系列】R语言绘制气泡图
  • 【优选算法】多源BFS