当前位置: 首页 > news >正文

[TPAMI 2022]HGNN: General Hypergraph Neural Networks+

论文网址:HGNN+: General Hypergraph Neural Networks | IEEE Journals & Magazine | IEEE Xplore

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用

目录

1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Related Work

2.3.1. Graph Neural Networks

2.3.2. Hypergraph Learning

2.4. Preliminaries of Hypergraphs

2.5. The Framework of Hypergraph Neural Network HGNN+

2.5.1. Hypergraph Modeling

2.5.2. Hypergraph Convolution

2.6. Discussions

2.6.1. Hypergraph vs. Graph

2.6.2. HGNN/HGNN vs. GNN+

2.6.3. HGNN vs. HGNN+

2.7. Experiments and Discussions

2.7.1. Experimental Settings

2.7.2. Vertex Classification on the Data With Graph Structure

2.7.3. Vertex Classification on the Data Without Graph Structure

2.7.4. Vertex Classification on the Data With Hypergraph Structure

2.7.5. Visualization

2.8. THU-DeepHypergraph: An Open Toolbox of the HGNN Framework+

2.9. Conclusion

3. Reference

1. 心得

(1)经典还是得读一下

2. 论文逐段精读

2.1. Abstract

        ①Limitations of GNN: feature extraction on multi modal/type

        ②Limitations of existing HGNN: shared weight between modalities/types

        ③So, they proposed HGNN+ and THU-DeepHypergraph toolbox

2.2. Introduction

        ①High order relationship:

        ②Framework of HGNN+:

        ③The conception of graph and hypergraph:

        ④Hyperedge group: pairwise edge, attribute, k-Hop and neighbors

        ⑤Advantages of HGNN+: not only for adaptive aggregation, but also expand to directed hypergraph

2.3. Related Work

2.3.1. Graph Neural Networks

        ①Lists spectral and spatial based GNN

2.3.2. Hypergraph Learning

        ①Lists the development of hypergraph and hypergraph neural network

2.4. Preliminaries of Hypergraphs

        ①Notations:

for graph \mathcal{G}=(\mathcal{V},\mathcal{E},\mathbf{W})|\mathcal{V}|\times|\mathcal{E}| incidence matrix \mathbf{H} can be written as:

\mathbf{H}(v,e)=\left\{ \begin{array} {lr}1, & \mathrm{if~}v\in e \\ 0, & \mathrm{otherwise} \end{array}\right.

and the initial node feature is \mathbf{X}^0=\{x_1^0,x_2^0,\cdots,x_N^0\},x_i^0\in\mathbb{R}^{C_0}, where C_0 denotes the dimension of feature

        ②Optimization goal in node classfication task:

\arg\min_f\left\{\mathcal{R}_{emp}(f)+\Omega(f)\right\}

where \Omega(f) is a regularizer on hypergraph, \mathcal{R}_{emp}(f) denotes the supervised empirical loss and f\left ( \cdot \right ) is classification function

        ③Regularizer \Omega(f) can be represented by:

\Omega(f)=\frac{1}{2}\sum_{e\in\mathcal{E}}\sum_{\{u,v\}\in\mathcal{V}}\frac{w(e)\mathbf{H}(u,e)\mathbf{H}(v,e)}{\delta(e)}\left(\frac{f(u)}{\sqrt{d(u)}}-\frac{f(v)}{\sqrt{d(v)}}\right)^{2}

2.5. The Framework of Hypergraph Neural Network HGNN+

2.5.1. Hypergraph Modeling

(1)Hyperedge Group Generation

        ①When the data correlation is with graph structure:

(a top) pairwise edge

Similar to traditional graph:

\mathcal{E}_{\mathrm{pair}}=\{\{v_i,v_j\}\mid(v_i,v_j)\in\mathcal{E}_s\}

(a bottom) k-hop neighbor

The k-hop neighbor is:

N_{\mathrm{hop}_k}(v)=\{u\mid\mathbf{A}_{uv}^k\neq0,u\in\mathcal{V}_s\}, where k\in \left [ 2,n_v \right ]. The hyperedge can be represented by:

\mathcal{E}_{\mathrm{hop}_k}= \begin{Bmatrix} N_{\mathrm{hop}_k}(v)\mid v\in\mathcal{V} \end{Bmatrix}

        ②When the data correlation is without graph structure:

(b top) attributes

Hyperedges are constructed by the same attributes (geo-locations, time and other specific information)(比如每个节点都是一篇论文的话,综述论文共享一条超边,研究论文共享一条超边,workshop共享一条等等). The construction of hyperedge:

\mathcal{E}_\mathrm{attribute}=\{N_\mathrm{att}(a)\mid a\in\mathcal{A}\}

(b bottom) feature

k nearest neighbor in feature space:

\left.\left\{ \begin{array} {l}\mathcal{E}_{\mathrm{feature}}^{\mathrm{KNN}_k}=\{N_{\mathrm{KNN}_k}(v)\mid v\in\mathcal{V}\} \\ \mathcal{E}_{\mathrm{feature}}^{\mathrm{distance}_d}=\{N_{\mathrm{dis}_d}(v)\mid v\in\mathcal{V}\} \end{array}\right.\right.

(2)Combination of Hyperedge Groups

        ①Coequal fusion of each hyperedge group:

\mathbf{H}=\mathbf{H}_1\|\mathbf{H}_2\|\cdots\|\mathbf{H}_K

        ②Adaptive fusion:

\left\{ \begin{array} {l}\mathbf{w}_{k}=\operatorname{copy}(\operatorname{sigmoid}(w_{k}),M_{k}) \\ \mathbf{W}=\operatorname{diag}(\mathbf{w}_{1}^{1},\cdots,\mathbf{w}_{1}^{M_{1}},\cdots,\mathbf{w}_{K}^{1},\cdots,\mathbf{w}_{K}^{M_{K}}) \\ \mathbf{H}=\mathbf{H}_{1}\|\mathbf{H}_{2}\|\cdots\|\mathbf{H}_{K} \end{array}\right.

where \mathbf{w}_{k} \in \mathbb{R} denotes trainable parameter shared by all hyperedges inside a specified hyperedge group k\operatorname{copy}\left ( a,b \right ) returns a vector of size b with all the value of ab\mathbf{W}\in\mathbb{R}^{M\times M} denotes a diagonal weight matrix,

2.5.2. Hypergraph Convolution

(1)Spectral Convolution on Hypergraph

        ①The spectral convolution of signal \mathbf{x} and filter \mathbf{g} can be denoted as:

\mathbf{g}\star x=\mathbf{\Phi}((\Phi^\top\mathbf{g})\odot(\Phi^\top\mathbf{x}))=\mathbf{\Phi}g(\mathbf{\Lambda})\mathbf{\Phi}^\top\mathbf{x}

where \odot denotes the element-wise Hadamard product, g(\boldsymbol{\Lambda})=\mathrm{diag}(g(\lambda_1),\ldots,g(\lambda_n))

        ②Further simplify it by Chebyshv:

\mathbf{g}\star\mathbf{x}\approx\sum_{k=0}^K\theta_kT_k(\tilde{\mathbf{\Delta}})\mathbf{x}

        ③Limite Chebyshv in K=1:

\mathbf{g}\star x\approx\theta_0\mathbf{x}-\theta_1\mathbf{D}_v^{-1/2}\mathbf{H}\mathbf{W}\mathbf{D}_e^{-1}\mathbf{H}^\top\mathbf{D}_v^{-1/2}\mathbf{x}

        ④Reduce parameter to avoid overfitting and simplify graph conv:

\left.\left\{ \begin{array} {l}\theta_1=-\frac{1}{2}\theta \\ \theta_0=\frac{1}{2}\theta\mathbf{D}_v^{-1/2}\mathbf{H}\mathbf{D}_e^{-1}\mathbf{H}^\top\mathbf{D}_v^{-1/2} \end{array}\right.\right.

\begin{gathered} \mathbf{g}\star x\approx\frac{1}{2}\theta\mathbf{D}_v^{-1/2}\mathbf{H}(\mathbf{W}+\mathbf{I})\mathbf{D}_e^{-1}\mathbf{H}^\top\mathbf{D}_v^{-1/2}\mathbf{x} \\ \approx\theta\mathbf{D}_v^{-1/2}\mathbf{H}\mathbf{W}\mathbf{D}_e^{-1}\mathbf{H}^\top\mathbf{D}_v^{-1/2}\mathbf{x}, \end{gathered}

(2)General Spatial Convolution on Hypergraph

        ①Inter-Neighbor Relation:

N=\{ \begin{array} {c}(v,e)\mid\mathbf{H}(v,e)=1,v\in\mathcal{V}\mathrm{and~}e\in\mathcal{E} \end{array}\}

and the vertex inter-neighbor set of hyperedge is defined as:

\mathcal{N}_v(e)=\{v\mid vNe,v\in\mathcal{V}\mathrm{and}e\in\mathcal{E}\}

the hyperedge inter-neighbor set of vertex is defined as:

\mathcal{N}_e(v)=\{e\mid vNe,v\in\mathcal{V}\mathrm{and}e\in\mathcal{E}\}

        ②Spatial hypergraph convolution:

where m_\beta^t is the message of hyperedge \beta\in\mathcal{E}y_\beta^t is the hyperedge feature of hyperedge \beta which is a element of hyperedge feature set Y^t=\{y_1^t,y_2^t,\cdots,y_M^t\},y_i^t\in\mathbb{R}^{C_t} in layer tM_v^t(\cdot),U_e^t(\cdot),M_e^t(\cdot),U_v^t(\cdot) are the vertex message function, hyperedge update functions, hyperedge message function and vertex update function in t-th layer

(3)HGNN Convolution Layer Configurations+

        ①Functions in HGNN+:

\begin{cases} M_v^{t}(x_{\alpha}^{t}) =\frac{x_{\alpha}^{t}\vert}{\vert N_e(\beta)} \\ U_e^{t}(w_{\beta},m_{\beta}^{t}) =w_{\beta}\cdot m_{\beta}^{t} \\ M_e^{t}(x_{\alpha}^{t},y_{\beta}^{t}) =\frac{y_{\beta}^{t}\vert}{\vert N_e(\alpha)} \\ U_v^{t}(x_{\alpha}^{t},m_{\alpha}^{T+1}) =\sigma(m_{\beta}^{T+1}\cdot\Theta) \end{cases}

        ②Hyper graph conv in matrix format:

\mathbf{X}^{t+1}=\sigma(\mathbf{D}_v^{-1}\mathbf{H}\mathbf{W}\mathbf{D}_e^{-1}\mathbf{H}^\top X^t\mathbf{\Theta}^t)

2.6. Discussions

2.6.1. Hypergraph vs. Graph

        ①"from the random walks’ aspect, a hypergraph with edge-independent vertex weights is equivalent to a weighted graph, and a hypergraph with edge-dependent vertex weights cannot be reduced to a weighted graph."

        ②For undirected graph without relation to edge, \mathbf{H}\in\{0,1\}^{|\mathcal{V}|\times|\mathcal{E}|} and \mathcal{G}_{\mathrm{in}}=\{\mathcal{V},\mathcal{E},\mathbf{W}\} (edge-independent).

        ③For those edge-dependent graph, \mathbf{R}\in\mathbb{R}^{|\mathcal{V}|\times|\mathcal{E}|} and \mathcal{G}_{\mathrm{de}}=\{\mathcal{V},\mathcal{E},\mathbf{W},\gamma\}. Degree matrix of vertex d\left ( v \right ) and edge \delta \left ( e \right ) can be represented by:

\left.\left\{ \begin{array} {ll}d(v) & =\sum_{\beta\in\mathcal{N}_e(v)}w(\beta) \\ \delta(e) & =\sum_{\alpha\in\mathcal{N}_v(e)}\gamma_e(\alpha) \end{array}\right.\right.

        ④之后是用随机游走还有马尔科夫链证明超图blabla,就是①,这里就不誊抄了,不是数学达人

2.6.2. HGNN/HGNN vs. GNN+

        ①For simple hyper graph (traditional graph):

\begin{cases} \mathbf{HH}^\top =\mathbf{A}+\mathbf{D} \\ \mathbf{D}_e^{-1} =1/2\mathbf{I} \\ \mathbf{W} =\mathbf{I} \end{cases}

the HGNN can be:

\begin{aligned} \mathbf{X}^{t+1} & =\sigma(\mathbf{D}_v^{-1/2}\mathbf{H}\mathbf{W}\mathbf{D}_e^{-1}\mathbf{H}^\top\mathbf{D}_v^{-1/2}\mathbf{X}^t\boldsymbol{\Theta}^t) \\ & =\sigma\left(\mathbf{D}_v^{-1/2}\mathbf{H}\left(\frac{1}{2}\mathbf{I}\right)\mathbf{H}^\top\mathbf{D}_v^{-1/2}\mathbf{X}^t\boldsymbol{\Theta}^t\right) \\ & =\sigma\left(\frac{1}{2}\mathbf{D}^{-1/2}(\mathbf{A}+\mathbf{D})\mathbf{D}^{-1/2}\mathbf{X}^t\boldsymbol{\Theta}^t\right) \\ & =\sigma\left(\frac{1}{2}(\mathbf{I}+\mathbf{D}^{-1/2}\mathbf{A}\mathbf{D}^{-1/2})\mathbf{X}^t\boldsymbol{\Theta}^t\right) \\ & =\sigma(\mathbf{D}^{-1/2}\hat{\mathbf{A}}\mathbf{D}^{-1/2}\mathbf{X}^t\hat{\boldsymbol{\Theta}}^t) \end{aligned}

        ②Utilize rooted subtree to analogize 2-uniform hypergraph:

2.6.3. HGNN vs. HGNN+

        ①Difference between two convs:

\begin{cases} \mathbf{X^{t+1}}\xleftarrow{\mathrm{HGNNConv}}\sigma(\underline{\mathbf{D_v^{-1/2}HWD_e^{-1}H^\top D_v^{-1/2}}}\mathbf{X^t}\mathbf{\Theta^t}) \\ \mathbf{X^{t+1}}\xleftarrow{\mathrm{HGNNConv^+}}\sigma(\underline{\mathbf{D_v^{-1}HWD_e^{-1}H^\top}}\mathbf{X^t}\mathbf{\Theta^t}) & \end{cases}

上面的是对称(无向图)消息传递,而下面的是非对称(无向图)传递

        ②Define of directed graph:

\hat{\mathbf{H}}(v,e)= \begin{cases} -1 & \mathrm{if}v\in T(e) \\ 1 & \mathrm{if}v\in S(e) \\ 0 & \mathrm{otherwise} \end{cases}

where T\left ( e \right ) and S\left ( e \right ) are the set of target and source vertices for hyperedge e

        ③Specific message passing of directed hyper graph:

\mathbf{X}^{T+1}=\sigma(\mathbf{D}_t^{-1}\hat{\mathbf{H}}_t\mathbf{W}\mathbf{D}_s^{-1}\hat{\mathbf{H}}_s\mathbf{X}^t\mathbf{\Theta}^t)

where 

\begin{cases} \mathbf{D}_s=\operatorname{diag}(\operatorname{col\_sum}(\hat{\mathbf{H}}_s)) \\ \mathbf{D}_t=\operatorname{diag}(\operatorname{col\_sum}(\hat{\mathbf{H}}_t)) & \end{cases}

2.7. Experiments and Discussions

2.7.1. Experimental Settings

        ①Learning rate: 0.01 and 0.001 for 3 citation network datasets and 2 social media network datasets

        ②Dropout: 0.5

        ③Optimizer: Adam with 0.0005 weight decay

        ④Loss: CE:

\mathcal{L}=-\sum_{v\in V}\sum_{c=1}^CY_{vc}\log(O_{vc})

2.7.2. Vertex Classification on the Data With Graph Structure

        ①Statics of datasets:

        ②Performance on 5 datasets when 5 samples per category were trained:

        ③Performance on 5 datasets when 10 samples per category were trained:

        ④Comparison on the effectiveness of different hyperedge groups with HGNN+:

        ⑤Comparison of HGNN and HGNN+:

        ⑥Convolution strategy comparison:

2.7.3. Vertex Classification on the Data Without Graph Structure

        ①3D object datasets: ModelNet40 and NTU

        ②Performance:

2.7.4. Vertex Classification on the Data With Hypergraph Structure

        ①Dataset: Cooking-200 and MovieLens2k-v2

        ②Performance:

2.7.5. Visualization

        ①t-sne visualization on Cooking-200:

2.8. THU-DeepHypergraph: An Open Toolbox of the HGNN Framework+

        ~

2.9. Conclusion

        ~

3. Reference

@ARTICLE{9795251,
  author={Gao, Yue and Feng, Yifan and Ji, Shuyi and Ji, Rongrong},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, 
  title={HGNN+: General Hypergraph Neural Networks}, 
  year={2023},
  volume={45},
  number={3},
  pages={3181-3199},
  keywords={Correlation;Convolution;Data models;Task analysis;Social networking (online);Mathematical models;Representation learning;Hypergraph;classification;hypergraph convolution;representation learning},
  doi={10.1109/TPAMI.2022.3182052}}

http://www.lryc.cn/news/573483.html

相关文章:

  • Qt + C++ 入门2(界面的知识点)
  • [muduo] ThreadPool | TcpClient | 异步任务 | 通信测试
  • 【单调栈】-----【Largest Rectangle in a Histogram】
  • NuttX Socket 源码学习
  • C++ 第一阶段项目一:实现简易计算器
  • MCPServer编程与CLINE配置调用MCP
  • Taro 状态管理全面指南:从本地状态到全局方案
  • 人工智能学习57-TF训练
  • 逆向入门(16)程序逆向篇-Cabeca
  • 成长笔记——多串口发送与接收
  • Python 数据分析与可视化 Day 3 - Pandas 数据筛选与排序操作
  • springboot垃圾分类网站
  • 关于 Kyber:抗量子密码算法 Kyber 详解
  • 【软考高级系统架构论文】论多源数据集成及应用
  • 组件之间的双向绑定:v-model
  • GitHub OAuth 认证示例
  • 闲庭信步使用SV进行图像处理系列教程介绍
  • 2025年- H83-Lc191--139.单词拆分(动态规划)--Java版
  • 吴恩达:从斯坦福到 Coursera,他的深度学习布道之路
  • C++基础练习-二维数组
  • C++ 文件读写
  • GPT-1 与 BERT 架构
  • 开源项目分析:EDoRA | 了解如何基于peft实现EDoRA方法
  • 【软考高级系统架构论文】论无服务器架构及其应用
  • 博图SCL语言GOTO语句深度解析:精准跳转
  • 深入解析ID3算法:信息熵驱动的决策树构建基石
  • GO语言---数组
  • 基于Spring Boot瀚森健身房会员管理系统设计与实现【源码+文档】
  • 作为测试人员,平时用什么大模型?怎么用?
  • 《深入解析:如何通过CSS集成WebGPU实现高级图形效果》