当前位置: 首页 > news >正文

python-NLP常用数据集0.1.012

XNLI数据集

用户语言翻译和跨语言分类的语料库

  1. 官网地址:https://github.com/facebookresearch/XNLI
  2. 下载地址:https://dl.fbaipublicfiles.com/XNLI/XNLI-1.0.zip
  3. 注意事项:数据集有json格式的,和txt格式的
  4. 数据格式

txt格式

language	gold_label	sentence1_binary_parse	sentence2_binary_parse	sentence1_parse	sentence2_parse	sentence1	sentence2	promptID	pairID	genre	label1	label2	label3	label4	label5	sentence1_tokenized	sentence2_tokenized	match
ar	neutral					وقال، ماما، لقد عدت للمنزل.	اتصل بأمه حالما أوصلته حافلة المدرسية.	1	1	facetoface	neutral	contradiction	neutral	neutral	neutral	وقال ، ماما ، لقد عدت للمنزل .	اتصل بأمه حالما أوصلته حافلة المدرسية .	True
ar	contradiction					وقال، ماما، لقد عدت للمنزل.	لم ينطق ببنت شفة.	1	2	facetoface	contradiction	contradiction	contradiction	contradiction	contradiction	وقال ، ماما ، لقد عدت للمنزل .	لم ينطق ببنت شفة .	True
ar	entailment					وقال، ماما، لقد عدت للمنزل.	أخبر أمه أنه قد عاد للمنزل.	1	3	facetoface	entailment	entailment	neutral	entailment	entailment	وقال ، ماما ، لقد عدت للمنزل .	أخبر أمه أنه قد عاد للمنزل .	True
ar	neutral	

json格式

{"annotator_labels": ["neutral", "contradiction", "neutral", "neutral", "neutral"], "genre": "facetoface", "gold_label": "neutral", "language": "ar", "match": "True", "pairID": "1", "promptID": "1", "sentence1": "\u0648\u0642\u0627\u0644\u060c \u0645\u0627\u0645\u0627\u060c \u0644\u0642\u062f \u0639\u062f\u062a \u0644\u0644\u0645\u0646\u0632\u0644.", "sentence1_tokenized": "\u0648\u0642\u0627\u0644 \u060c \u0645\u0627\u0645\u0627 \u060c \u0644\u0642\u062f \u0639\u062f\u062a \u0644\u0644\u0645\u0646\u0632\u0644 .", "sentence2": "\u0627\u062a\u0635\u0644 \u0628\u0623\u0645\u0647 \u062d\u0627\u0644\u0645\u0627 \u0623\u0648\u0635\u0644\u062a\u0647 \u062d\u0627\u0641\u0644\u0629 \u0627\u0644\u0645\u062f\u0631\u0633\u064a\u0629.", "sentence2_tokenized": "\u0627\u062a\u0635\u0644 \u0628\u0623\u0645\u0647 \u062d\u0627\u0644\u0645\u0627 \u0623\u0648\u0635\u0644\u062a\u0647 \u062d\u0627\u0641\u0644\u0629 \u0627\u0644\u0645\u062f\u0631\u0633\u064a\u0629 ."}
{"annotator_labels": ["contradiction", "contradiction", "contradiction", "contradiction", "contradiction"], "genre": "facetoface", "gold_label": "contradiction", "language": "ar", "match": "True", "pairID": "2", "promptID": "1", "sentence1": "\u0648\u0642\u0627\u0644\u060c \u0645\u0627\u0645\u0627\u060c \u0644\u0642\u062f \u0639\u062f\u062a \u0644\u0644\u0645\u0646\u0632\u0644.", "sentence1_tokenized": "\u0648\u0642\u0627\u0644 \u060c \u0645\u0627\u0645\u0627 \u060c \u0644\u0642\u062f \u0639\u062f\u062a \u0644\u0644\u0645\u0646\u0632\u0644 .", "sentence2": "\u0644\u0645 \u064a\u0646\u0637\u0642 \u0628\u0628\u0646\u062a \u0634\u0641\u0629.", "sentence2_tokenized": "\u0644\u0645 \u064a\u0646\u0637\u0642 \u0628\u0628\u0646\u062a \u0634\u0641\u0629 ."}

SQuAD数据集

  1. 官网地址:https://rajpurkar.github.io/SQuAD-explorer/
  2. 下载地址:https://rajpurkar.github.io/SQuAD-explorer/
  3. 注意事项:测试集没有给出,需要在官网提交模型由平台对模型进行测试集的跑分
  4. 数据格式:点击https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json

数据由多篇文章组成
一个title就表示一篇文章
文章里由paragraphs组成
paragraphs由多个context组成
每一个context有answers和question

部分数据:

{"data": [{"title": "Super_Bowl_50","paragraphs": [{"context": "Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24\u201310 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \"golden anniversary\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \"Super Bowl L\"), so that the logo could prominently feature the Arabic numerals 50.","qas": [{"answers": [{"answer_start": 177,"text": "Denver Broncos"}, {"answer_start": 177,"text": "Denver Broncos"}, {"answer_start": 177,"text": "Denver Broncos"}],"question": "Which NFL team represented the AFC at Super Bowl 50?","id": "56be4db0acb8001400a502ec"}, {"answers": [{"answer_start": 249,"text": "Carolina Panthers"}, {"answer_start": 249,"text": "Carolina Panthers"}, {"answer_start": 249,"text": "Carolina Panthers"}],"question": "Which NFL team represented the NFC at Super Bowl 50?","id": "56be4db0acb8001400a502ed"}, {"answers": [{"answer_start": 403,"text": "Santa Clara, California"}, {"answer_start": 355,"text": "Levi's Stadium"}, {"answer_start": 355,"text": "Levi's Stadium in the San Francisco Bay Area at Santa Clara, California."}],"question": "Where did Super Bowl 50 take place?","id": "56be4db0acb8001400a502ee"}, {"answers": [{"answer_start": 177,"text": "Denver Broncos"}, {"answer_start": 177,"text": "Denver Broncos"}, {"answer_start": 177,"text": "Denver Broncos"}],"question": "Which NFL team won Super Bowl 50?","id": "56be4db0acb8001400a502ef"}, {"answers": [{"answer_start": 488,"text": "gold"}, {"answer_start": 488,"text": "gold"}, {"answer_start": 521,"text": "gold"}],"question": "What color was used to emphasize the 50th anniversary of the Super Bowl?","id": "56be4db0acb8001400a502f0"}
http://www.lryc.cn/news/365466.html

相关文章:

  • 【大事件】docker可能无法使用了
  • 探索Linux中的gzip命令:压缩与解压缩的艺术
  • Shell 输入/输出重定向
  • 为什么RPC要比Http高效?
  • 局域网电脑监控软件是如何监控到内网电脑的?
  • 精妙无比的App UI 风格
  • SQL优化系列-快速学会分析SQL执行效率(下)
  • 交流非线性RCD负载的核心功能
  • 英语学习笔记31——Where‘s Sally?
  • 【Unity脚本】使用脚本操作游戏对象的组件
  • 学习VUE3——组件(一)
  • 2024-6-6 石群电路-25
  • vue 文件预览mp4、txt、pptx、xls、xlsx、docx、pdf、html、xml
  • 生活中优秀学习习惯
  • 什么是负载均衡?在网络中如何实现?
  • 【YOLOv10改进[Backbone]】图像修复网络AirNet助力YOLOv10目标检测效果 + 含全部代码和详细修改方式 + 手撕结构图 + 全网首发
  • ubuntu22.04 gitleb服务器满了,扩容机器的磁盘的详细步骤
  • kafka-集群-主题创建
  • Python 连接 MySQL 及 SQL增删改查(主要使用sqlalchemy)
  • JAVAEE值网络编程(2)_TCP流套接字及通信模型、TCP网络编程及代码实例
  • 【MMU】——MMU 页命中/缺页
  • Win32和c++11多线程
  • 关于python包导入问题的重思考
  • 攻防世界---misc---津门杯2021-m1
  • 【计算机视觉(8)】
  • Linux操作系统:Redis在虚拟环境下的安装与部署
  • 哈希表和二维矩阵的结合-2352. 相等行列对(新思路、新解法)
  • 深度学习中无监督学习
  • JVM基础知识
  • 618网购节,电商能挡住恶意网络爬虫的攻击吗?