> 文章列表 > Derivation of Doubly Stochastic clustering

Derivation of Doubly Stochastic clustering

Derivation of Doubly Stochastic clustering

1. Original objective

given coefficient matrix C,
min⁡A12∥∣C∣−ηA∥F2s.t.A∈Ωn,diag(C)=0(1)\\min_\\textbf A \\frac{1}{2}\\||\\textbf C|-\\eta \\textbf A\\|_F^2\\quad s.t. \\ {\\bf A\\in\\Omega}_n, \\mathrm {diag}(\\textbf C)=0 \\quad \\quad (1)Amin21∥∣CηAF2s.t. AΩn,diag(C)=0(1)
where Ωn{\\bf \\Omega}_nΩn is doubly stochastic space. We have:
12∥∣C∣−ηA∥F2=12∥C∥F2+η22∥A∥F2+⟨−∣C∣,ηA⟩(2)\\frac{1}{2}\\||\\textbf C|-\\eta \\textbf A\\|_F^2= \\frac{1}{2}\\|\\textbf C\\|_F^2+ \\frac{\\eta^2}{2}\\|\\textbf A\\|_F^2 +\\langle {\\bf -|C|, \\eta A} \\rangle \\quad (2)21∥∣CηAF2=21CF2+2η2AF2+∣C∣,ηA(2)

Since C\\bf CC is fixed in A-DSSC, we are acturally optimizing:
min⁡A⟨−∣C∣,A⟩+η2∥A∥F2s.t.A∈Ωn(3)\\min_\\textbf A \\langle {\\bf -|C|, A} \\rangle + \\frac{\\eta}{2}\\|\\textbf A\\|_F^2 \\quad s.t. \\ {\\bf A\\in\\Omega}_n \\quad \\quad \\quad \\quad \\quad (3)Amin∣C∣,A+2ηAF2s.t. AΩn(3)

2. Dual objective

Introducing Lagrange multipliers α,β∈Rn\\alpha, \\beta \\in \\mathbb R^nα,βRn and A≥0A\\geq 0A0 for satisfying the doubly stochastic constraint, then we have a minmax problem:
min⁡A≥0max⁡α,β⟨−∣C∣,A⟩+η2∥A∥F2+⟨α,A1−1⟩+⟨β,A⊤1−1⟩(4)\\min_{\\textbf A\\geq 0} \\max_{\\alpha, \\beta} \\langle {\\bf -|C|, A} \\rangle + \\frac{\\eta}{2}\\|\\textbf A\\|_F^2 + \\langle \\alpha,{\\bf A1-1} \\rangle + \\langle \\beta,{\\bf A^\\top 1-1} \\rangle \\quad \\quad(4)A0minα,βmax∣C∣,A+2ηAF2+α,A11+β,A11(4)

内积是拉格朗日法实现矩阵约束的标准表示形式,优化α\\alphaα用于满足行和为1约束,优化β\\betaβ用于满足列和为1约束

Note that
⟨α,A1−1⟩+⟨β,A⊤1−1⟩=⟨α1⊤+1β⊤,A⟩−1⊤(α+β)\\langle \\alpha,{\\bf A1-1} \\rangle + \\langle \\beta,{\\bf A^\\top 1-1} \\rangle=\\langle \\alpha \\textbf 1^\\top + \\textbf 1\\beta^\\top,\\textbf A \\rangle-\\textbf1^\\top(\\alpha+\\beta)α,A11+β,A11=α1+1β,A1(α+β)

P.S. ⟨α,A1⟩=tr(α⊤⋅A1)=tr(A1⋅α⊤)=tr(A⋅1α⊤)=tr(1α⊤⋅A)=⟨α1⊤,A⟩\\langle \\alpha,{\\bf A1}\\rangle=tr(\\alpha^\\top \\cdot {\\bf A1})=tr({\\bf A1} \\cdot \\alpha^\\top)=tr({\\bf A \\cdot 1}\\alpha^\\top)=tr({\\bf 1}\\alpha^\\top \\cdot \\bf A)=\\langle \\alpha \\textbf 1^\\top ,\\textbf A \\rangleα,A1=tr(αA1)=tr(A1α)=tr(A1α)=tr(1αA)=α1,A

Therefore, strong duality holds by Slater’s condition, so this is equivalent to:
max⁡α,β−1⊤(α+β)+min⁡A≥0⟨−∣C∣,A⟩+η2∥A∥F2+⟨α1⊤+1β⊤,A⟩(5)\\max_{\\alpha, \\beta} -\\textbf1^\\top(\\alpha+\\beta) + \\min_{\\textbf A\\geq 0} \\langle {\\bf -|C|, A} \\rangle + \\frac{\\eta}{2}\\|\\textbf A\\|_F^2 + \\langle \\alpha \\textbf 1^\\top + \\textbf 1\\beta^\\top,\\textbf A \\rangle \\quad (5)α,βmax1(α+β)+A0min∣C∣,A+2ηAF2+α1+1β,A(5)

3. Search Optimal A and Dual Solution α,β\\alpha,\\betaα,β

Let K=∣C∣−α1⊤−1β⊤\\bf K=|C|-\\alpha 1^\\top-1\\beta^\\topK=∣C∣α11β, we have:

⟨−∣C∣,A⟩+⟨α1⊤+1β⊤,A⟩=⟨−∣C∣+α1⊤+1β⊤,A⟩=⟨−K,A⟩\\bf \\langle -|C|,A\\rangle+\\langle \\alpha \\textbf 1^\\top + \\textbf 1\\beta^\\top,\\textbf A \\rangle=\\bf \\langle -|C|+\\alpha \\textbf 1^\\top + \\textbf 1\\beta^\\top,A \\rangle=\\langle -K,A\\rangle∣C∣,A+α1+1β,A=∣C∣+α1+1β,A=K,A

Therefore, the inner min⁡\\minmin term becomes:

η⋅min⁡A≥n⟨−Kη,A⟩+η2∥A∥F2(6)\\eta\\cdot{\\bf \\min_{A\\geq n}\\langle -\\frac{K}{\\eta},A\\rangle} + \\frac{\\eta}{2}\\|\\textbf A\\|_F^2 \\quad \\quad \\quad \\quad \\quad (6)ηAnminηK,A+2ηAF2(6)

we can complement Eqn.(6) to a F-norm form:
(6)=−12η∥K∥F2+ηmin⁡A≥n12∥Kη−A∥F2(7)(6)=-\\frac{1}{2\\eta}\\|\\textbf K\\|_F^2+\\eta\\min_{A\\geq n}\\frac{1}{2}\\|{\\bf \\frac{K}{\\eta}-A}\\|_F^2 \\quad (7)(6)=2η1KF2+ηAnmin21ηKAF2(7)

Apparently,the optimal A\\bf AA satisfies A=K/η\\bf A=K/\\etaA=K/η,but requires A≥0\\bf A\\geq 0A0,therefore, A\\bf AA is given as:
A=1η[∣C∣−α1⊤−1β⊤]+{\\bf A}=\\frac{1}{\\eta}{\\bf[|C|-\\alpha 1^\\top-1\\beta^\\top]_+}A=η1[∣C∣α11β]+

Therefore, we have
(7)=−12η∥K∥F2+12η∥K−∥F2=−12η∥K+∥F2(8)(7)=-\\frac{1}{2\\eta}\\|\\textbf K\\|_F^2+\\frac{1}{2\\eta}\\|\\textbf K_-\\|_F^2=-\\frac{1}{2\\eta}\\|\\textbf K_+\\|_F^2 \\quad (8)(7)=2η1KF2+2η1KF2=2η1K+F2(8)

Finally, the version of the dual becomes (See Eqn.5-8):
max⁡α,β−1⊤(α+β)−12η∥K+∥F2\\max_{\\alpha, \\beta} -\\textbf1^\\top(\\alpha+\\beta)-\\frac{1}{2\\eta}\\|K_+\\|_F^2α,βmax1(α+β)2η1K+F2
i.e.,
max⁡α,β−1⊤(α+β)−12η∥[∣C∣−α1⊤−1β⊤]+∥F2\\max_{\\alpha, \\beta} -\\textbf1^\\top(\\alpha+\\beta)-\\frac{1}{2\\eta}\\|{\\bf[|C|-\\alpha 1^\\top-1\\beta^\\top]_+}\\|_F^2α,βmax1(α+β)2η1[∣C∣α11β]+F2
i.e.,
min⁡α,β1⊤(α+β)+12η∥[∣C∣−α1⊤−1β⊤]+∥F2\\min_{\\alpha, \\beta} \\textbf1^\\top(\\alpha+\\beta) + \\frac{1}{2\\eta}\\|{\\bf[|C|-\\alpha 1^\\top-1\\beta^\\top]_+}\\|_F^2α,βmin1(α+β)+2η1[∣C∣α11β]+F2