当前位置: 首页 > news >正文

RadGraph: Extracting Clinical Entities and Relations from Radiology Reports代码

文章来源:NeurIPS

文章类别:IE(Information Extraction)在这里插入图片描述


RadGraph主要基于dygie++,主要文件为inference.py。

inference.py:

1、get_file_list(data_path)

def get_file_list(path):file_list = [item for item in glob.glob(f"{path}/*.txt")]with open('./temp_file_list.json', 'w') as f:json.dump(file_list, f)

该函数从data_path中读取所有的reports(txt文件)列表,然后保存到temp_file_list.json文件中。例如:

["data/s56075423.txt", "data/s59358936.txt", "data/s58951365.txt"]

2、preprocess_reports()

def preprocess_reports():file_list = json.load(open("./temp_file_list.json"))final_list = []for idx, file in enumerate(file_list):temp_file = open(file).read()sen = re.sub('(?<! )(?=[/,-,:,.,!?()])|(?<=[/,-,:,.,!?()])(?! )', r' ',temp_file).split()temp_dict = {}temp_dict["doc_key"] = file## Current way of inference takes in the whole report as 1 sentencetemp_dict["sentences"] = [sen]final_list.append(temp_dict)if(idx % 1000 == 0):print(f"{idx+1} reports done")print(f"{idx+1} reports done")with open("./temp_dygie_input.json",'w') as outfile:for item in final_list:json.dump(item, outfile)outfile.write("\n")

从temp_file_list.json中获取list,对每个report切分,形成单独的词,生成字典形式{“doc_key”: , “sentences”: },保存temp_dygie_input.json中。
在这里插入图片描述
3、run_inference(model_path, cuda)
此处使用的是allennlp。从temp_dygie_input.json中读取数据,然后保存到temp_dygie_output.json中。

def run_inference(model_path, cuda):""" Args:model_path: Path to the model checkpointcuda: GPU id"""out_path = "./temp_dygie_output.json"data_path = "./temp_dygie_input.json"os.system(f"allennlp predict {model_path} {data_path} \--predictor dygie --include-package dygie \--use-dataset-reader \--output-file {out_path} \--cuda-device {cuda} \--silent")

4、postprocess_reports(),生成final_dict
调用postprocess_individual_report(file, final_dict),单独处理每个report。

def postprocess_reports():"""Post processes all the reports and saves the result in train.json format"""final_dict = {}file_name = f"./temp_dygie_output.json"data = []with open(file_name,'r') as f:for line in f:data.append(json.loads(line))for file in data:postprocess_individual_report(file, final_dict)return final_dict

5、postprocess_individual_report( )

def postprocess_individual_report(file, final_dict, data_source=None):  """Args:file: output dict for individual reportsfinal_dict: Dict for storing all the reports"""try:temp_dict = {}temp_dict['text'] = " ".join(file['sentences'][0])n = file['predicted_ner'][0]r = file['predicted_relations'][0]s = file['sentences'][0]temp_dict["entities"] = get_entity(n,r,s)temp_dict["data_source"] = data_sourcetemp_dict["data_split"] = "inference"final_dict[file['doc_key']] = temp_dictexcept:print(f"Error in doc key: {file['doc_key']}. Skipping inference on this file")    

6、get_entity(n,r,s)

def get_entity(n,r,s):"""Gets the entities for individual reportsArgs:n: list of entities in the reportr: list of relations in the reports: list containing tokens of the sentenceReturns:dict_entity: Dictionary containing the entites in the format similar to train.json """dict_entity = {}rel_list = [item[0:2] for item in r]ner_list = [item[0:2] for item in n]for idx, item in enumerate(n):temp_dict = {}start_idx, end_idx, label = item[0], item[1], item[2]temp_dict['tokens'] = " ".join(s[start_idx:end_idx+1])temp_dict['label'] = labeltemp_dict['start_ix'] = start_idxtemp_dict['end_ix'] = end_idxrel = []relation_idx = [i for i,val in enumerate(rel_list) if val== [start_idx, end_idx]]for i,val in enumerate(relation_idx):obj = r[val][2:4]lab = r[val][4]try:object_idx = ner_list.index(obj) + 1except:continuerel.append([lab,str(object_idx)])temp_dict['relations'] = reldict_entity[str(idx+1)] = temp_dictreturn dict_entity
http://www.lryc.cn/news/9082.html

相关文章:

  • 13. OPenGL与QT界面元素交互控制图形渲染
  • 高通平台开发系列讲解(USB篇)libuvc详解
  • ICC2:set_route_opt_target_endpoints
  • 5、小程序面试题
  • Java特殊操作流
  • 如何用SCRM销售管理系统管理销售和做销售管理
  • 分享117个HTML婚纱模板,总有一款适合您
  • VIVADO2022 sdk 工程创建流程
  • 【MyBatis】源码学习 02 - Java 元注解以及 MyBatis @Param 注解分析
  • 贪心算法-蓝桥杯
  • zookeeper 复习 ---- chapter03
  • 1.PostgreSQL
  • buu [UTCTF2020]basic-crypto 1
  • 火山引擎数智平台的这款产品,正在帮助 APP 提升用户活跃度
  • 记录每日LeetCode 2341.数组能形成多少数对 Java实现
  • Ant Design Chart词云图
  • mysql索引
  • Java中怎样将数据对象序列化和反序列化?
  • ffmpeg filter的理解
  • 炔活化的生物素化试剂773888-45-2,Alkyne-Biotin,炔基生物素
  • 了解僵尸网络攻击:什么是僵尸网络,它如何传播恶意软件以及如何保护自己?
  • 大学生博主-14天学习挑战赛活动-CSDN
  • 如何自学芯片设计?
  • 通过中断控制KUKA机器人暂停与再启动的具体方法示例
  • pandas基本操作
  • 论文笔记NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
  • 花3个月面过京东测开岗,拿个20K不过分吧?
  • Leetcode DAY 35:柠檬水找零and根据身高重建队列 and用最少数量的箭引爆气球
  • java-spring_bean实例化
  • 微信中如何接入机器人才比较安全(不会收到警告或者f号)之第三步正式接入