当前位置: 首页 > news >正文

【流行病学】Melodi-Presto因果关联工具


title: “[流行病学] Melodi Presto因果关联工具”
date: 2022-12-08
lastmod: 2022-12-08
draft: false
tags: [“流行病学”,“因果关联工具”]
toc: true
autoCollapseToc: true

阅读介绍

Melodi-Presto: A fast and agile tool to explore semantic triples derived from biomedical literature1

triples: subject–predicate–object triple

SemMedDB 大型开放式知识库

使用入口

  • 🚩在线工具 Web Application

  • API

  • Jupyter Notebooks

git 下载到json在提取

curl -X POST 'https://melodi-presto.mrcieu.ac.uk/api/overlap/' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{ "x": [ "diabetes " ], "y": [ "coronary heart disease" ]}' > 1.json

使用示例

X: KRAS 
Y: lung cancer

输入的专业术语应该在Mesh先确定???

文章复现

doi: 10.1093/ije/dyab2032

{{< note >}} 1. 部分内容已经改变 2. Object的挑选精确到chronic 3. Predicate的挑选先无限制 4. Subject的挑选去掉了CRP,但是论文有纳入 5. OR的计算已经去掉? 6. gtf基因和[Uniprot蛋白名库](https://www.uniprot.org/uniprotkb?facets=model_organism%3A9606&query=reviewed%3Atrue)删掉 7. +药物库? {{< /note >}}
library(openxlsx)
# read
df <- read.xlsx("chronic kidney disease.xlsx",sheet = 1,  colNames=TRUE,check.names=FALSE )str(df$Pval)
df$Pval <- as.numeric(df$Pval)
# P value < 0.005
df <- subset(df,df$Pval < 0.005 )# removed triples where the subject was a gene or protein
df$Subject <- tolower(df$Subject)
a=stringr::str_which(df$Subject,pattern = "gene|protein|receptor")
# [waring:delete the CRP in the paper]
df$Subject[a]
df <- df[-a,]# where the term “CAUSES” implies causality, 
#   the term “ASSOCIATED_WITH” implies association, 
#   and the term “COEXISTS_WITH” implies co-existence. 
table(df$Predicate)
df <- subset(df,df$Predicate=="CAUSES"|df$Predicate=="ASSOCIATED_WITH"|df$Predicate=="COEXISTS_WITH")# restricted to triples 
# where the object contained either “kidney” or “renal”
table(df$Object)
dplyr::count(df,forcats::fct_lump_n(Object,n=10))
# 
df$Object <- tolower(df$Object)
b=stringr::str_which(df$Object,pattern = "kidney|renal")
df$Object[b]
df <- df[b,]# removed2 
df$Subject
c=stringr::str_which(df$Subject,pattern = "\\|")
df$Subject[c]
df <- df[-c,]
# 
df$Subject
c=stringr::str_which(df$Subject,pattern = "factor")
df$Subject[c]
df <- df[-c,]
# 
df$Subject
c=stringr::str_which(df$Subject,pattern = "peptide")
df$Subject[c]
df <- df[-c,]# retained only unique risk factors (subjects) 
#    to avoid duplicates
df <- dplyr::arrange(df,desc(Count),Pval)
df <- df[!duplicated(df$Subject),]table(df$Count)
# df <- subset(df,df$Count>2)write.xlsx(df, file = "筛选4.xlsx", colNames = TRUE)# enrichment odds ratio
#  (a) count the number of these triples 
#  (b) the number of total triples matched to the query 
#  (c) the total number of these triples in the data base , 
#  (d) and the total number of triples in the database .# stats.fisher_exact([[a, b-a], [c, d-c]])library(openxlsx)
# read
df <- read.xlsx("chronic kidney disease.xlsx",sheet = 1,  colNames=TRUE,check.names=FALSE )str(df$Pval)
df$Pval <- as.numeric(df$Pval)
# P value < 0.005
df <- subset(df,df$Pval < 0.005 )# removed triples where the subject was a gene or protein
df$Subject <- tolower(df$Subject)
a=stringr::str_which(df$Subject,pattern = "gene|protein|receptor")
# [waring:delete the CRP in the paper]
df$Subject[a]
df <- df[-a,]# where the term “CAUSES” implies causality, 
#   the term “ASSOCIATED_WITH” implies association, 
#   and the term “COEXISTS_WITH” implies co-existence. 
table(df$Predicate)
df <- subset(df,df$Predicate=="CAUSES"|df$Predicate=="ASSOCIATED_WITH"|df$Predicate=="COEXISTS_WITH")# restricted to triples 
# where the object contained either “kidney” or “renal”
table(df$Object)
dplyr::count(df,forcats::fct_lump_n(Object,n=10))
# 
df$Object <- tolower(df$Object)
b=stringr::str_which(df$Object,pattern = "kidney|renal")
df$Object[b]
df <- df[b,]# removed2 
df$Subject
c=stringr::str_which(df$Subject,pattern = "\\|")
df$Subject[c]
df <- df[-c,]
# 
df$Subject
c=stringr::str_which(df$Subject,pattern = "factor")
df$Subject[c]
df <- df[-c,]
# 
df$Subject
c=stringr::str_which(df$Subject,pattern = "peptide")
df$Subject[c]
df <- df[-c,]# retained only unique risk factors (subjects) 
#    to avoid duplicates
df <- dplyr::arrange(df,desc(Count),Pval)
df <- df[!duplicated(df$Subject),]table(df$Count)
# df <- subset(df,df$Count>2)write.xlsx(df, file = "筛选4.xlsx", colNames = TRUE)# enrichment odds ratio
#  (a) count the number of these triples 
#  (b) the number of total triples matched to the query 
#  (c) the total number of these triples in the data base , 
#  (d) and the total number of triples in the database .# stats.fisher_exact([[a, b-a], [c, d-c]])

NHANES

注意事项, 参考文章复现


  1. doi: 10.1093/bioinformatics/btaa726 ↩︎

  2. Trans-ethnic Mendelian-randomization
    study reveals causal relationships between
    cardiometabolic factors and chronic kidney
    disease ↩︎

http://www.lryc.cn/news/546638.html

相关文章:

  • 详细分析KeepAlive的基本知识 并缓存路由(附Demo)
  • 【Go】Go viper 配置模块
  • zabbix“专家坐诊”第277期问答
  • 大模型工程师学习日记(十一):FAISS 高效相似度搜索和密集向量聚类的库
  • python学习第三天
  • 深入解析 Svelte:下一代前端框架的革命
  • C++20 中位移位运算符的统一行为:深入解析与实践指南
  • Linux——基本指令
  • MySql面试总结(二)
  • Pytorch中的主要函数
  • Java实现大数据量导出报表
  • 大语言模型 智能助手——既能生成自然语言回复,又能在必要时调用外部工具获取实时数据
  • PyTorch 系统教程:理解机器学习数据分割
  • 分水岭算法(Watershed Algorithm)教程:硬币分割实例
  • 【STM32项目实战系列】基于STM32G474的FDCAN驱动配置
  • shell文本处理
  • 如何利用客户端双向TLS认证保护云上应用安全
  • nlp第十节——LLM相关
  • T-SQL 语言基础: SQL 数据库对象元数据及配置信息获取
  • ue5 创建多列StreeView的方法与理解
  • C# OnnxRuntime部署DAMO-YOLO香烟检测
  • 陕西省地标-DB61/T 1121-2018 政务服务中心建设和运营规范
  • UDP协议(20250303)
  • 【四.RAG技术与应用】【12.阿里云百炼应用(下):RAG的云端优化与扩展】
  • Docker新手入门(持续更新中)
  • 【星云 Orbit • STM32F4】08. 用判断数据头来接收据的串口通用程序框架
  • HSPF 水文模型建模方法与案例分析实践技术应用
  • 设置 CursorRules 规则
  • 人工智能AI在汽车设计领域的应用探索
  • 《当AI生成内容遭遇审核:需求与困境的深度剖析》:此文为AI自动生成