当前位置: 首页 > news >正文

AttentionFreeTransformer 源码解析(一):AFTFull、AFTSimple、AFTLocal

我觉得源码写的很好懂,我就不加注释了,直接上计算流程图。

AFTFull

在这里插入图片描述

class AFTFull(nn.Module):def __init__(self, max_seqlen, dim, hidden_dim=64):super().__init__()'''max_seqlen: the maximum number of timesteps (sequence length) to be fed indim: the embedding dimension of the tokenshidden_dim: the hidden dimension used inside AFT FullNumber of heads is 1 as done in the paper'''self.dim = dimself.hidden_dim = hidden_dimself.to_q = nn.Linear(dim, hidden_dim)self.to_k = nn.Linear(dim, hidden_dim)self.to_v = nn.Linear(dim, hidden_dim)self.project = nn.Linear(hidden_dim, dim)self.wbias = nn.Parameter(torch.Tensor(max_seqlen, max_seqlen))nn.init.xavier_uniform_(self.wbias)def forward(self, x):B, T, _ = x.shapeQ = self.to_q(x).view(B, T, self.hidden_dim)K = self.to_k(x).view(B, T, self.hidden_dim)V = self.to_v(x).view(B, T, self.hidden_dim)temp_wbias = self.wbias[:T, :T].unsqueeze(0) # sequences can still be variable length'''From the paper'''Q_sig = torch.sigmoid(Q)temp = torch.exp(temp_wbias) @ torch.mul(torch.exp(K), V)weighted = temp / (torch.exp(temp_wbias) @ torch.exp(K))Yt = torch.mul(Q_sig, weighted)Yt = Yt.view(B, T, self.hidden_dim)Yt = self.project(Yt)return Yt

AFTSimple

在这里插入图片描述

class AFTSimple(nn.Module):def __init__(self, max_seqlen, dim, hidden_dim=64):super().__init__()'''max_seqlen: the maximum number of timesteps (sequence length) to be fed indim: the embedding dimension of the tokenshidden_dim: the hidden dimension used inside AFT FullNumber of Heads is 1 as done in the paper.'''self.dim = dimself.hidden_dim = hidden_dimself.to_q = nn.Linear(dim, hidden_dim)self.to_k = nn.Linear(dim, hidden_dim)self.to_v = nn.Linear(dim, hidden_dim)self.project = nn.Linear(hidden_dim, dim)def forward(self, x):B, T, _ = x.shapeQ = self.to_q(x).view(B, T, self.hidden_dim)K = self.to_k(x).view(B, T, self.hidden_dim)V = self.to_v(x).view(B, T, self.hidden_dim)'''From the paper'''weights = torch.mul(torch.softmax(K, 1), V).sum(dim=1, keepdim=True)Q_sig = torch.sigmoid(Q)Yt = torch.mul(Q_sig, weights)Yt = Yt.view(B, T, self.hidden_dim)Yt = self.project(Yt)return Yt

AFTLocal

在这里插入图片描述

class AFTLocal(nn.Module):def __init__(self, max_seqlen, dim, hidden_dim=64, s=256):super().__init__()'''max_seqlen: the maximum number of timesteps (sequence length) to be fed indim: the embedding dimension of the tokenshidden_dim: the hidden dimension used inside AFT Fulls: the window size used for AFT-Local in the paperNumber of heads is 1 as done in the paper'''self.dim = dimself.hidden_dim = hidden_dimself.to_q = nn.Linear(dim, hidden_dim)self.to_k = nn.Linear(dim, hidden_dim)self.to_v = nn.Linear(dim, hidden_dim)self.project = nn.Linear(hidden_dim, dim)self.wbias = nn.Parameter(torch.Tensor(max_seqlen, max_seqlen))self.max_seqlen = max_seqlenself.s = snn.init.xavier_uniform_(self.wbias)def forward(self, x):B, T, _ = x.shapeQ = self.to_q(x).view(B, T, self.hidden_dim)K = self.to_k(x).view(B, T, self.hidden_dim)V = self.to_v(x).view(B, T, self.hidden_dim)self.wbias = nn.Parameter(torch.Tensor([[self.wbias[i][j] if math.fabs(i-j) < self.s else 0 for j in range(self.max_seqlen)] for i in range(self.max_seqlen)]))temp_wbias = self.wbias[:T, :T].unsqueeze(0) # sequences can still be variable length'''From the paper'''Q_sig = torch.sigmoid(Q)temp = torch.exp(temp_wbias) @ torch.mul(torch.exp(K), V)weighted = temp / (torch.exp(temp_wbias) @ torch.exp(K))Yt = torch.mul(Q_sig, weighted)Yt = Yt.view(B, T, self.hidden_dim)Yt = self.project(Yt)return Yt
http://www.lryc.cn/news/118708.html

相关文章:

  • C++ 计算 拟合优度R^2
  • Springboot-Retrofit HTTP工具框架快速使用
  • 微信小程序实现人脸识别(从一个没有开通人脸核身的小程序跳转到要给开通人脸核身的小程序,进行人脸识别后再跳转回来)
  • CSS-grid布局
  • 【JavaEE进阶】Bean 作用域和生命周期
  • 3分钟自建查分系统?现在每个人都可以实现了
  • 关于APP备案、小程序备案的问题,如何备案?
  • git上传代码后,如何清空历史日志以及文件操作,重新上传?以及上传代码
  • 超导热催生meme,换汤不换药的投机轮回
  • 【HashMap】 73. 矩阵置零
  • Vue-2.nodejs的介绍和安装
  • 分别用Vue和Java来实现的风靡一时的2048 游戏
  • echarts甘特图 一个值多条线
  • 多态性说明
  • 2023-08-04 LeetCode每日一题(不同路径 III)
  • 腾讯云服务器地域怎么选?可用区是什么?
  • 第一百二十三天学习记录:C++提高:STL-vector容器(下)(黑马教学视频)
  • 谈谈Spring与字节码生成技术
  • Java数组详解 -- 基础知识与常用操作
  • (统计学习方法|李航)第五章 决策树——一二三节:决策树模型与学习,特征选择,决策树的生成,
  • qt lamda表达式及捕获变量列表符号说明及示例
  • 第十六章、【Linux】程序管理与SELinux初探
  • ElasticSearch索引生命周期管理--DELETE
  • sentinel简单使用
  • C#小轮子:自动连续Ping网络地址
  • react入门笔记
  • 记录--前端重新部署如何通知用户
  • WPS的excel表格单元格拖动数字日期等 不自增原因
  • 2308C++简单异步懒
  • Linux常规操作命令