当前位置: 首页 > news >正文

TemplateHit中提取query和hit比对上序列索引的映射字典

template_hits(Sequence[TemplateHit]数据格式)来自结构数据库搜索结果 python运行hhsearch二进制命令的包装器类  映射索引计算:TemplateHit 中含有 indices_query,需要换算成在原始query序列中的index,hit 中indices_hit 需要减去最小index(-1 gap 除外)

import pickle 
import dataclasses
from typing import Optional, List, Sequence, Mapping@dataclasses.dataclass(frozen=True)
class TemplateHit:"""Class representing a template hit."""index: intname: straligned_cols: intsum_probs: Optional[float]query: strhit_sequence: strindices_query: List[int]indices_hit: List[int]### 读入Sequence[TemplateHit]数据
with open('test_pdb_hits.pkl', 'rb') as file:# 使用 pickle.load 从文件中加载对象test_pdb_hits = pickle.load(file)#test_pdb_hits.pkl由python运行hhsearch二进制命令的包装器类 的结果 template_hits 保存得到
#import pickle
#with open('test_pdb_hits.pkl', 'wb') as file:
#  pickle.dump(template_hits, file)def build_query_to_hit_index_mapping(hit_query_sequence: str,hit_sequence: str,indices_hit: Sequence[int],indices_query: Sequence[int],original_query_sequence: str) -> Mapping[int, int]:"""Gets mapping from indices in original query sequence to indices in the hit.hit_query_sequence and hit_sequence are two aligned sequences containing gapcharacters. hit_query_sequence contains only the part of the original querysequence that matched the hit. When interpreting the indices from the .hhr, weneed to correct for this to recover a mapping from original query sequence tothe hit sequence.Args:hit_query_sequence: The portion of the query sequence that is in the .hhrhithit_sequence: The portion of the hit sequence that is in the .hhrindices_hit: The indices for each aminoacid relative to the hit sequenceindices_query: The indices for each aminoacid relative to the original querysequenceoriginal_query_sequence: String describing the original query sequence.Returns:Dictionary with indices in the original query sequence as keys and indicesin the hit sequence as values."""# If the hit is empty (no aligned residues), return empty mappingif not hit_query_sequence:return {}# Remove gaps and find the offset of hit.query relative to original query.hhsearch_query_sequence = hit_query_sequence.replace('-', '')hit_sequence = hit_sequence.replace('-', '')hhsearch_query_offset = original_query_sequence.find(hhsearch_query_sequence)print(f"hhsearch_query_offset:{hhsearch_query_offset}")# Index of -1 used for gap characters. Subtract the min index ignoring gaps.min_idx = min(x for x in indices_hit if x > -1)fixed_indices_hit = [x - min_idx if x > -1 else -1 for x in indices_hit]print(f"fixed_indices_hit:{fixed_indices_hit}")min_idx = min(x for x in indices_query if x > -1)fixed_indices_query = [x - min_idx if x > -1 else -1 for x in indices_query]print(f"fixed_indices_query:{fixed_indices_query}")# Zip the corrected indices, ignore case where both seqs have gap characters.mapping = {}for q_i, q_t in zip(fixed_indices_query, fixed_indices_hit):if q_t != -1 and q_i != -1:if (q_t >= len(hit_sequence) orq_i + hhsearch_query_offset >= len(original_query_sequence)):continuemapping[q_i + hhsearch_query_offset] = q_treturn mappinghit = test_pdb_hits[0]
input_fasta_file = 'Q94K49.fasta'
## 从fasta文件提取 query_sequence(str格式)
query_sequence = ""
with open(input_fasta_file) as f:for line in f.readlines():if line.startswith(">"):continuequery_sequence += line.strip()print(f"hit.query:{hit.query}")
print(f"hit.hit_sequence:{hit.hit_sequence}")
print(f"hit.indices_hit:{hit.indices_hit}")
print(f"hit.indices_query:{hit.indices_query}")
print(f"query_sequence:{query_sequence}")##query和hit序列比对上的氨基酸在各自多肽链上索引的对应字典
mapping = build_query_to_hit_index_mapping(hit.query, hit.hit_sequence, hit.indices_hit, hit.indices_query,query_sequence)
print(mapping)

http://www.lryc.cn/news/249758.html

相关文章:

  • 富必达API:一站式无代码开发集成电商平台、CRM和营销系统
  • 聊聊接口最大并发处理数
  • 6.如何利用LIO-SAM生成可用于机器人/无人机导航的二维/三维栅格地图--以octomap为例
  • 【多传感器融合】BEVFusion: 激光雷达和视觉融合框架 NeurIPS 2022
  • kafka中的常见问题处理
  • HarmonyOS(八)——@Styles装饰器:定义组件重用样式
  • 手写VUE后台管理系统5 - 整合状态管理组件pinia
  • 解决webpack打包生成gz格式css/js文件没法在nginx使用的问题--全网唯一正确
  • 传统算法: Pygame 实现快速排序
  • HarmonyOS入门开发(三) 持久化存储Preferences
  • 类和对象——(3)再识对象
  • 【UGUI】实现背包的常用操作
  • 单机zk安装与zk四字命令
  • matlab导入excel数据两种常见的方法
  • 华为全屋智能5.0,无为而“智”
  • Flask 实现Token认证机制
  • MATLAB 和 Simulink 官方文档下载地址
  • 【Element】el-switch开关 点击弹窗确认框时状态先改变----点击弹窗取消框失效
  • Java 中最常用的设计模式之一,工厂模式模式的写法,
  • HTML的学习
  • JS设计模式 — 行为委托
  • Microsoft Expression Web - 网页布局
  • Java SpringBoot Controller常见写法
  • 【驱动】SPI驱动分析(五)-模拟SPI驱动
  • 人工智能_机器学习056_拉格朗日乘子法原理推导_公式由来详解_原理详解---人工智能工作笔记0096
  • 记RocketMQ本地开发环境搭建始末
  • 2023年全国职业院校技能大赛“ 信息安全管理与评估” 测试题2
  • flutter开发实战-readmore长文本展开和收缩控件
  • 如何使用简单的分支策略来保护您的 Git 项目
  • vue3的 nextTick()的使用