当前位置: 首页 > news >正文

[DRAFT] LLVM ThinLTO原理分析

我们在《论文阅读:ThinLTO: Scalable and Incremental LTO》中介绍了ThinLTO论文的主要思想,这里我们介绍下LLVM ThinLTO是如何实现的。本文主要分为如下几个部分:

  • LLVM ThinLTO Object 含有哪些内容?
  • LLVM ThinLTO 是如何做优化的?
  • LLVM ThinLTO 能够enable哪些优化?

LLVM ThinLTO Objects都包含了哪些?

继续使用 Example of link time optimization 中的例子进行分析,在《LLVM full LTO 学习笔记》中我们通过 magic number 作为切入点,简单分析了 full lto 的过程。下面按照这个路子继续该分析

$ clang -flto=thin -c a.c -o a_lto.o
$ clang -flto=thin -c main.c -o main_lto.o
$ hexdump a_lto.o | head
0000000 4342 dec0 1435 0000 0005 0000 0c62 2430
0000010 594d 66be fb8d 4fb4 c81b 4424 3201 0005
0000020 0c21 0000 0266 0000 020b 0021 0002 0000
0000030 0016 0000 8107 9123 c841 4904 1006 3932
0000040 0192 0c84 0525 1908 041e 628b 1080 0245
0000050 9242 420b 1084 1432 0838 4b18 320a 8842
0000060 7048 21c4 4423 8712 108c 9241 6402 08c8
0000070 14b1 4320 8846 c920 3201 8442 2a18 2a28
0000080 3190 b07c 915c c420 00c8 0000 2089 0000
0000090 000e 0000 2232 0908 6220 0046 2b21 9824

我们可以看到 magic number 为 4342 dec0,说明对于 thin LTO 的 objects,其文件格式还是 bitcode file 。通过阅读 ThinLTO 的文档,发现其实文档中早已经说的很详细了。

In ThinLTO mode, as with regular LTO, clang emits LLVM bitcode after the compile phase. The ThinLTO bitcode is augmented with a compact summary of the module. During the link step, only the summaries are read and merged into a combined summary index, which includes an index of function locations for later cross-module function importing. Fast and efficient whole-program analysis is then performed on the combined summary index.

使用 llvm-dis a_lto.o 得到其可读的 IR。我们将其与 full lto 得到的 IR 进行对比后发现,两者差异极小,主要在于最后面的 summary 部分。以 a_lto.o 进行 thinLTO 和 full LTO 的对比如下。

// ---------------- Thin LTO ----------------//
!llvm.module.flags = !{!0, !1, !2, !3}
!llvm.ident = !{!4}!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 7, !"uwtable", i32 1}
!2 = !{i32 7, !"frame-pointer", i32 2}
!3 = !{i32 1, !"EnableSplitLTOUnit", i32 0}
!4 = !{!"clang version 14.0.0 (https://github.com/llvm/llvm-project.git 58e7bf78a3ef724b70304912fb3bb66af8c4a10c)"}^0 = module: (path: "a_lto.o", hash: (3489747275, 1762444854, 1461358598, 2667786215, 1835806708))
^1 = gv: (name: "foo2", summaries: (function: (module: ^0, flags: (linkage: external, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 1, canAutoHide: 0), insts: 2, funcFlags: (readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0, noInline: 1, alwaysInline: 0, noUnwind: 1, mayThrow: 0, hasUnknownCall: 0, mustBeUnreachable: 0), refs: (writeonly ^2)))) ; guid = 2494702099028631698
^2 = gv: (name: "i", summaries: (variable: (module: ^0, flags: (linkage: internal, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 1, canAutoHide: 0), varFlags: (readonly: 1, writeonly: 1, constant: 0)))) ; guid = 2708120569957007488
^3 = gv: (name: "foo1", summaries: (function: (module: ^0, flags: (linkage: external, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 1, canAutoHide: 0), insts: 13, funcFlags: (readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0, noInline: 1, alwaysInline: 0, noUnwind: 1, mayThrow: 0, hasUnknownCall: 0, mustBeUnreachable: 0), calls: ((callee: ^5)), refs: (readonly ^2)))) ; guid = 7682762345278052905
^4 = gv: (name: "foo4") ; guid = 11564431941544006930
^5 = gv: (name: "foo3", summaries: (function: (module: ^0, flags: (linkage: internal, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 1, canAutoHide: 0), insts: 2, funcFlags: (readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0, noInline: 1, alwaysInline: 0, noUnwind: 1, mayThrow: 0, hasUnknownCall: 0, mustBeUnreachable: 0), calls: ((callee: ^4))))) ; guid = 17367728344439303071
^6 = blockcount: 5// ---------------- Full LTO ----------------//
!llvm.module.flags = !{!0, !1, !2, !3, !4}
!llvm.ident = !{!5}!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 7, !"uwtable", i32 1}
!2 = !{i32 7, !"frame-pointer", i32 2}
!3 = !{i32 1, !"ThinLTO", i32 0}
!4 = !{i32 1, !"EnableSplitLTOUnit", i32 1}
!5 = !{!"clang version 14.0.0 (https://github.com/llvm/llvm-project.git 58e7bf78a3ef724b70304912fb3bb66af8c4a10c)"}^0 = module: (path: "a_lto.o", hash: (0, 0, 0, 0, 0))
^1 = gv: (name: "foo2", summaries: (function: (module: ^0, flags: (linkage: external, visibility: default, notEligibleToImport: 1, live: 0, dsoLocal: 1, canAutoHide: 0), insts: 2, funcFlags: (readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0, noInline: 1, alwaysInline: 0, noUnwind: 1, mayThrow: 0, hasUnknownCall: 0, mustBeUnreachable: 0), refs: (^2)))) ; guid = 2494702099028631698
^2 = gv: (name: "i", summaries: (variable: (module: ^0, flags: (linkage: internal, visibility: default, notEligibleToImport: 1, live: 0, dsoLocal: 1, canAutoHide: 0), varFlags: (readonly: 1, writeonly: 1, constant: 0)))) ; guid = 2708120569957007488
^3 = gv: (name: "foo1", summaries: (function: (module: ^0, flags: (linkage: external, visibility: default, notEligibleToImport: 1, live: 0, dsoLocal: 1, canAutoHide: 0), insts: 13, funcFlags: (readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0, noInline: 1, alwaysInline: 0, noUnwind: 1, mayThrow: 0, hasUnknownCall: 0, mustBeUnreachable: 0), calls: ((callee: ^5)), refs: (^2)))) ; guid = 7682762345278052905
^4 = gv: (name: "foo4") ; guid = 11564431941544006930
^5 = gv: (name: "foo3", summaries: (function: (module: ^0, flags: (linkage: internal, visibility: default, notEligibleToImport: 1, live: 0, dsoLocal: 1, canAutoHide: 0), insts: 2, funcFlags: (readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0, noInline: 1, alwaysInline: 0, noUnwind: 1, mayThrow: 0, hasUnknownCall: 0, mustBeUnreachable: 0), calls: ((callee: ^4))))) ; guid = 17367728344439303071
^6 = flags: 8
^7 = blockcount: 5

我们将重点的差别进行 highlight,

DifferenceThin LTOFull LTO
Module Flags!3 = !{i32 1, !"ThinLTO", i32 0}
Global Value Summary module ^0^0 = module: (path: "a_lto.o", hash: (3489747275, 1762444854, 1461358598, 2667786215, 1835806708))^0 = module: (path: "a_lto.o", hash: (0, 0, 0, 0, 0))
Global Value Summary foo2 ^1- notEligibleToImport: 0
- refs: (writeonly ^2)
- notEligibleToImport: 1
- refs: (^2)
Global Value Summary i ^2- notEligibleToImport: 0notEligibleToImport: 1
Global Value Summary foo1 ^3- notEligibleToImport: 0
- refs: (readonly ^2)
- notEligibleToImport: 1
- refs: (^2)
Global Value Summary foo3 ^5notEligibleToImport: 0notEligibleToImport: 1

通过 Metadata 知道,! 后面表示的是 metadata,^表示的是 global value summary。

All metadata are identified in syntax by an exclamation point (‘!’).
Compiling with ThinLTO causes the building of a compact summary of the module that is emitted into the bitcode. The summary is emitted into the LLVM assembly and identified in syntax by a caret (‘^’).

通过 Module Flags Metadata 来对 !3 = !{i32 1, !"ThinLTO", i32 0} 进行解释。module flags metadata 是一组三元组 triplets

  • The first element is a behavior flag, which specifies the behavior when two (or more) modules are merged together.
  • The second element is a metadata string that is a unique ID for the metadata.
  • The third element is the value of the flag.
!3 = !{i32 1, !"ThinLTO", i32 0}

thin lto
ThinLTO 的值为 0, 表示非 ThinLTO,另外一个表明是否为 ThinLTO 或者 FullLTO,GLOBALVAL_SUMMARY_BLOCK 默认是 thin lto。

$ llvm-bcanalyzer -dump a_full_lto.oBlock ID #24 (FULL_LTO_GLOBALVAL_SUMMARY_BLOCK):Num Instances: 1Total Size: 789b/98.62B/24WPercent of file: 3.4924%Num SubBlocks: 0Num Abbrevs: 6Num Records: 7Percent Abbrevs: 57.1429%Record Histogram:Count    # Bits     b/Rec   % Abv  Record Kind3       218      72.7  100.00  PERMODULE1        22                    BLOCK_COUNT1        22                    FLAGS1        22                    VERSION1        38            100.00  PERMODULE_GLOBALVAR_INIT_REFS
$ llvm-bcanalyzer -dump a_thin_lto.oBlock ID #20 (GLOBALVAL_SUMMARY_BLOCK):Num Instances: 1Total Size: 789b/98.62B/24WPercent of file: 3.4727%Num SubBlocks: 0Num Abbrevs: 6Num Records: 7Percent Abbrevs: 57.1429%Record Histogram:Count    # Bits     b/Rec   % Abv  Record Kind3       218      72.7  100.00  PERMODULE1        22                    BLOCK_COUNT1        22                    FLAGS1        22                    VERSION1        38            100.00  PERMODULE_GLOBALVAR_INIT_REFS

在有 global value summary 的情况下,默认是 thin lto,除非 ThinLTO module metadata flag 为 0 。

/// Emit the per-module summary section alongside the rest of
/// the module's bitcode.
void ModuleBitcodeWriterBase::writePerModuleGlobalValueSummary() {// By default we compile with ThinLTO if the module has a summary, but the// client can request full LTO with a module flag.bool IsThinLTO = true;if (auto *MD =mdconst::extract_or_null<ConstantInt>(M.getModuleFlag("ThinLTO")))IsThinLTO = MD->getZExtValue();Stream.EnterSubblock(IsThinLTO ? bitc::GLOBALVAL_SUMMARY_BLOCK_ID: bitc::FULL_LTO_GLOBALVAL_SUMMARY_BLOCK_ID,4);// ...
}

RFC

https://lists.llvm.org/pipermail/llvm-dev/2015-May/085526.html
https://sites.google.com/site/llvmthinlto/

Patches

https://reviews.llvm.org/D13107?id=35761

Function Importer

https://reviews.llvm.org/D14914
https://reviews.llvm.org/D18343

llvm-opt2/llvm-opt相关

关于 SyntheticCount的讨论

  • https://lists.llvm.org/pipermail/llvm-dev/2017-December/119701.html
  • https://reviews.llvm.org/D43521?id=135117#inline-388028
/// Compute synthetic function entry counts.
void computeSyntheticCounts(ModuleSummaryIndex &Index);

相关术语

  • BFI, block frequency inforamtion
  • BPI,probability information
  • CGSCC,call graph scc analysis,https://lists.llvm.org/pipermail/llvm-dev/2016-June/100792.html
http://www.lryc.cn/news/198752.html

相关文章:

  • 使用Gitlab构建简单流水线CI/CD
  • 【AIGC核心技术剖析】用于高效 3D 内容创建生成(从单视图图像生成高质量的纹理网格)
  • nginx平滑升级添加echo模块、localtion配置、rewrite配置
  • 系统架构师备考倒计时19天(每日知识点)
  • 谈谈 Redis 如何来实现分布式锁
  • .NET 6.0 Web API Hangfire
  • 基于java的校园论坛系统,ssm+jsp,Mysql数据库,前台用户+后台管理,完美运行,有一万多字论文
  • Django小白开发指南
  • 保序回归与金融时序数据
  • 基于单片机设计的家用自来水水质监测装置
  • ubuntu20.04运用startup application开机自启动python程序
  • SpringBoot整合Caffeine实现缓存
  • DVWA-弱会话IDS
  • 【C++中cin、cin.get()、cin.getline()、getline() 的区别】
  • SSH连接华为交换机慢
  • Web攻防03_MySQL注入_数据请求
  • JS加密/解密那些必须知道的事儿
  • 搭建伪分布式Hadoop
  • 【C++】特殊类的设计(只在堆、栈创建对象,单例对象)
  • 分类预测 | MATLAB实现基于GRU-AdaBoost门控循环单元结合AdaBoost多输入分类预测
  • 【Spring Boot项目】根据用户的角色控制数据库访问权限
  • EthernetIP 转MODBUS RTU协议网关连接FANUC机器人作为EthernetIP通信从站
  • 如何注册微信小程序
  • 移动App安全检测的必要性,app安全测试报告的编写注意事项
  • DVWA-JavaScript Attacks
  • 算法通关村第二关|白银|链表反转拓展【持续更新】
  • 开发者职场“生存状态”大调研报告分析 - 第四版
  • 代码与细节(一)
  • AI绘画使用Stable Diffusion(SDXL)绘制中国古代神兽
  • 老卫带你学---leetcode刷题(148. 排序链表)