【论文阅读】On clustering using random walks

文章列表

《On clustering using random walks》阅读笔记

1. 问题建模

1.1 问题描述

let $G(V,E,ω)G(V,E,\\omega)$ be a weighted graph, $V$ is the set of nodes, $E$ is the edge between nodes in $V$ , $ω\\omega$ is the function $ω：E→Rn\\omega：E \\to \\mathbb{R}^n$ , that measures the simularity between pairs of items(a higher value means more similar).

$pij=ω(i,j)dip_{ij} = \\frac{\\omega(i,j)}{d_i}$
$di=∑k=1nω(i,k)d_i = \\sum_{k=1}^n\\omega(i,k)$

$MG∈Rn×nM^G \\in \\mathbb{R}^{n \\times n}$ is the associated transition matrix,
$MijG={pij⟨i,j⟩∈E0otherwiseM^G_{ij} = \\begin{cases} p_{ij} & \\langle i,j \\rangle \\in E \\\\ 0 & \\textrm{otherwise} \\end{cases}$

Question:

$ω\\omega$ 表示节点之间的相似性，实际上我们只有无向图，表示节点之间是否有连接，怎么通过已有的信息构建 $ω\\omega$
answer: 这里的相似度可以认为是节点之间边的权值，所以 $MijGM^G_{ij}$ 可以认为是认为是以邻接矩阵操作后的数据。

这里的内容比较坑，我在论文中一直找不到关于 $Pvisitk(i)P^{k}_{\\textrm{visit}}(i)$ 是怎么计算的，在这里卡了好久好久。

在原文中的描述是这样的：

Now, denote by $Pvisitk(i)∈RnP^k_{visit}(i) \\in \\mathbb{R}^n$ the vector whose j-th component is the probability that a random walk originating at i will visit node j in its k-th step. Thus, $Pvisitk(i)P^k_{visit}(i)$ is the i-th row in the matrix $M^G)^k$ , the k’th power of $M^G$ .

现在我们知道 $M^G$ 是怎样计算的，但是 $M^G)^k$ 呢，在原文中的描述是’'the k’th power of $M^G$ ", 我理解的应该是原有矩阵 $M^G$ 的k次方（矩阵的乘法）。

$Pvisitk(i)P^k_{visit}(i)$ is the i-th row in the matrix $M^G)^k$ ,

$Pvisitk(i)=(MG)ikP^k_{visit}(i) = (M^G)^k_i$
$(MG)k={Pvisitk(1)T,Pvisitk(2)T,…,Pvisitk(n)T}(M^G)^k=\\{P^k_{visit}(1)^{\\mathbf{T}}, P^k_{visit}(2)^{\\mathbf{T}}, \\dots, P^k_{visit}(n)^{\\mathbf{T}}\\}$

Notice: 其实到这里，和马尔可夫聚类算法（MCL）是一样的。MCL是不断迭代，知道矩阵不再改变，这里作者考虑到计算复杂，采用前k次计算结果的和来作为替代。

We now offer two methods for performing the edge separation, both based on deterministic analysis of random walks.

边缘分离，锐化

NS: Separation by neighborhood similarity.

CE: Separation by circular escape.

the weighted neighborhood ：加权领域
bipartite subgraph

$Pvisit≤k(v)=∑i=1kPvisiti(v)P^{\\leq k}_{\\textrm{visit}}(v) = \\sum_{i=1}^kP^{i}_{\\textrm{visit}}(v)$

2. NS: Separation by neighborhood similarity.

Now, in order to estimate the closeness of the two node $v$ and $u$ , we fix some small k(eg. k = 3) and compare $Pvisit≤k(v)P^{\\leq k}_{\\textrm{visit}}(v)$ and $Pvisit≤k(u)P^{\\leq k}_{\\textrm{visit}}(u)$ . The smaller the difference, the greater the intimacy between $u$ and $v$ .

$\\xlongequal{dfn} G_s(V, E, \\omega_s)$ ,
where $∀⟨v,u⟩∈E,ωs(u,v)=simk(Pvisit≤k(v),Pvisit≤k(u))\\forall \\langle v, u \\rangle \\in E, \\omega_s(u, v) = sim^k(P^{\\leq k}_{visit}(v),P^{\\leq k}_{visit}(u))$

$sim^k(x,y)$ is some similarity measure of the vectors $x\\mathrm{x}$ and $y\\mathrm{y}$ , whose value increases as $x\\mathrm{x}$ and $y\\mathrm{y}$ are more similar.

$sim^k(x,y)$ the suitable choose:
$fk(x,y)=dfnexp⁡(2k−∥x−y∥L1)−1(1)f^k(x,y) \\xlongequal{dfn} \\exp(2k − \\|x − y\\|_{L_1}) − 1 \\tag{1}$
$∥x−y∥L1=∑i=1n∣xi−yi∣\\|x − y\\|_{L_1} = \\sum_{i=1}^n|x_i-y_i|$

another choose is:
$cos⁡(x,y)=(x,y)(x,x).(y,y)(2)\\cos(x,y)= \\frac{(x,y)}{\\sqrt{(x,x)}.\\sqrt{(y,y)}} \\tag{2}$
where (·,·) denotes inner-product.(内积)

3.2 CE: Separation by circular escape.

3.3 代码实现

无向带权图

import numpy as npdef markovCluster(adjacencyMat, dimension, numIter, power=2, inflation=2):columnSum = np.sum(adjacencyMat, axis=0)probabilityMat = adjacencyMat / columnSum# Expand by taking the e^th power of the matrix.def _expand(probabilityMat, power):expandMat = probabilityMatfor i in range(power - 1):expandMat = np.dot(expandMat, probabilityMat)return expandMatexpandMat = _expand(probabilityMat, power)# Inflate by taking inflation of the resulting# matrix with parameter inflation.def _inflate(expandMat, inflation):powerMat = expandMatfor i in range(inflation - 1):powerMat = powerMat * expandMatinflateColumnSum = np.sum(powerMat, axis=0)inflateMat = powerMat / inflateColumnSumreturn inflateMatinflateMat = _inflate(expandMat, inflation)for i in range(numIter):expand = _expand(inflateMat, power)inflateMat = _inflate(expand, inflation)print(inflateMat)print(np.zeros((7, 7)) != inflateMat)if __name__ == "__main__":dimension = 4numIter = 10adjacencyMat = np.array([[1, 1, 1, 1],[1, 1, 0, 1],[1, 0, 1, 0],[1, 1, 0, 1]])# adjacencyMat = np.array([[1, 1, 1, 1, 0, 0, 0],#                          [1, 1, 1, 1, 1, 0, 0],#                          [1, 1, 1, 1, 0, 0, 0],#                          [1, 1, 1, 1, 0, 0, 0],#                          [0, 1, 0, 0, 1, 1, 1],#                          [0, 0, 0, 0, 1, 1, 1],#                          [0, 0, 0, 0, 1, 1, 1],#                          ])markovCluster(adjacencyMat, dimension, numIter)

[[1.00000000e+000 1.00000000e+000 1.00000000e+000 1.00000000e+000][5.23869755e-218 5.23869755e-218 5.23869755e-218 5.23869755e-218][0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000][5.23869755e-218 5.23869755e-218 5.23869755e-218 5.23869755e-218]]
[[ True  True  True  True][ True  True  True  True][False False False False][ True  True  True  True]]

可以从中得到聚类效果 ${\\{1，2，4\\}，\\{3\\}\\}$

谱聚类
MCL
MCL GitHub

【论文阅读】On clustering using random walks

《On clustering using random walks》阅读笔记

1. 问题建模

1.1 问题描述

2. NS: Separation by neighborhood similarity.

3.2 CE: Separation by circular escape.

3.3 代码实现

公告

DeepSeek全套部署资料免费下载

免费可商用字体批量下载

标签

【论文阅读】On clustering using random walks

《On clustering using random walks》阅读笔记

1. 问题建模

1.1 问题描述

2. NS: Separation by neighborhood similarity.

3.2 CE: Separation by circular escape.

3.3 代码实现

相关问题

公告

DeepSeek全套部署资料免费下载

免费可商用字体批量下载

标签