> 文章列表 > 牛客网算法八股刷题系列(八)K-Means真题描述

牛客网算法八股刷题系列(八)K-Means真题描述

牛客网算法八股刷题系列(八)K-Means真题描述

牛客网算法八股刷题系列——K-Means真题描述

  • 题目描述
  • 正确答案:A\\mathcal AA
  • 题目解析

题目描述

两个种子A(−1,1),B(2,1)A(-1,1),B(2,1)A(1,1),B(2,1),其余样本(0,0),(0,2),(1,1),(3,2),(6,0),(6,2)(0,0),(0,2),(1,1),(3,2),(6,0),(6,2)(0,0),(0,2),(1,1),(3,2),(6,0),(6,2)。利用K-Means\\text{K-Means}K-Means算法,点群中心按坐标平均计算。最终:

  • 种子点AAA需要移动的次数;
  • 种子点BBB需要移动的次数;
  • 属于种子点AAA样本点数(不含AAA);
  • 属于种子点BBB样本点数(不含BBB);

分别是()(\\quad)()

A2,2,3,3\\mathcal A \\quad 2,2,3,3A2,2,3,3
B1,1,3,3\\mathcal B \\quad 1,1,3,3B1,1,3,3
C1,1,2,4\\mathcal C \\quad 1,1,2,4C1,1,2,4
D2,2,2,4\\mathcal D \\quad 2,2,2,4D2,2,2,4

正确答案:A\\mathcal AA

题目解析

初始状态下,上述样本点(蓝色点)与种子点(橙色点)之间的位置关系表示如下:
初始状态
针对种子点A,B\\mathcal A,\\mathcal BA,B,分别求出各自对应其他样本点距离

  • 这里以‘欧式距离’计算点之间的距离信息,以AAA与样本点a1:(0,0)a_1:(0,0)a1:(0,0)之间距离为例,后续同理:
    DistA⇔a1=(−1−0)2+(1−0)2=2\\text{Dist}_{A \\Leftrightarrow a_1} = \\sqrt{(-1 -0)^2 + (1 - 0)^2} = \\sqrt{2}DistAa1=(10)2+(10)2=2
  • 由于计算过程中不包含A,BA,BA,B点,这里直接将它们视作‘虚拟样本’
SamplePoint/InitialCenter\\text{SamplePoint/InitialCenter}SamplePoint/InitialCenter AAA BBB ClusterResult\\text{ClusterResult}ClusterResult
a1:(0,0)a_1:(0,0)a1:(0,0) 2\\sqrt{2}2 3\\sqrt{3}3 AAA
a2:(0,2)a_2:(0,2)a2:(0,2) 2\\sqrt{2}2 5\\sqrt{5}5 AAA
a3:(1,1)a_3:(1,1)a3:(1,1) 222 111 BBB
a4:(3,2)a_4:(3,2)a4:(3,2) 17\\sqrt{17}17 2\\sqrt{2}2 BBB
a5:(6,0)a_5:(6,0)a5:(6,0) 50\\sqrt{50}50 17\\sqrt{17}17 BBB
a6:(6,2)a_6:(6,2)a6:(6,2) 50\\sqrt{50}50 17\\sqrt{17}17 BBB

至此,能够得到距离种子点AAA近的样本点集合AnearA_{near}Anear和距离种子点BBB近的样本点集合BnearB_{near}Bnear
{Anear:{a1,a2}Bnear:{a3,a4,a5,a6}\\begin{cases} A_{near}: \\{a_1,a_2\\} \\\\ B_{near}: \\{a_3,a_4,a_5,a_6\\} \\end{cases}{Anear:{a1,a2}Bnear:{a3,a4,a5,a6}
新的点群中心坐标的计算结果表示如下:
{a1.x+a2.x2=0+02=0a1.y+a2.y2=0+22=1A1:(0,1){a3.x+a4.x+a5.x+a6.x4=1+3+6+64=4a3.y+a4.y+a5.y+a6.y4=1+2+0+24=1.25B1:(4,1.25)\\begin{aligned} & \\begin{cases} \\begin{aligned}\\frac{a_{1.x} + a_{2.x}}{2} = \\frac{0 + 0}{2} = 0\\end{aligned} \\\\ \\begin{aligned} \\frac{a_{1.y} + a_{2.y}}{2} = \\frac{0 + 2}{2} = 1 \\end{aligned} \\end{cases} \\quad A_1:(0,1) \\\\ & \\begin{cases} \\begin{aligned}\\frac{a_{3.x} + a_{4.x} + a_{5.x} + a_{6.x}} {4} = \\frac{1 + 3 + 6 + 6}{4} = 4\\end{aligned} \\\\ \\begin{aligned} \\frac{a_{3.y} + a_{4.y} + a_{5.y} + a_{6.y}}{4} = \\frac{1 + 2 + 0 + 2}{4} = 1.25 \\end{aligned} \\end{cases} \\quad B_1:(4,1.25) \\end{aligned}2a1.x+a2.x=20+0=02a1.y+a2.y=20+2=1A1:(0,1)4a3.x+a4.x+a5.x+a6.x=41+3+6+6=44a3.y+a4.y+a5.y+a6.y=41+2+0+2=1.25B1:(4,1.25)
此时已经得到新的种子点A1,B1A_1,B_1A1,B1,新的位置关系表示如下:
第一次迭代
重新执行上面的步骤:
针对种子点A1,B1A_1,B_1A1,B1,分别求出各自对应其他样本点的距离:

SamplePoint/InitialCenter\\text{SamplePoint/InitialCenter}SamplePoint/InitialCenter A1A_1A1 B1B_1B1 ClusterResult\\text{ClusterResult}ClusterResult
a1:(0,0)a_1:(0,0)a1:(0,0) 111 >1>1>1 A1A_1A1
a2:(0,2)a_2:(0,2)a2:(0,2) 111 >1>1>1 A1A_1A1
a3:(1,1)a_3:(1,1)a3:(1,1) 111 >1>1>1 A1A_1A1
a4:(3,2)a_4:(3,2)a4:(3,2) >54\\begin{aligned}>\\frac{5}{4}\\end{aligned}>45 54\\begin{aligned}\\frac{5}{4}\\end{aligned}45 B1B_1B1
a5:(6,0)a_5:(6,0)a5:(6,0) >894\\begin{aligned}>\\frac{\\sqrt{89}}{4}\\end{aligned}>489 894\\begin{aligned}\\frac{\\sqrt{89}}{4}\\end{aligned}489 B1B_1B1
a6:(6,2)a_6:(6,2)a6:(6,2) >734\\begin{aligned}>\\frac{\\sqrt{73}}{4}\\end{aligned}>473 734\\begin{aligned}\\frac{\\sqrt{73}}{4}\\end{aligned}473 B1B_1B1

能够得到距离种子点A1A_1A1近的样本点集合A1⇒nearA_{1\\Rightarrow near}A1near和距离种子点B1B1B1近的样本点集合B1⇒nearB_{1 \\Rightarrow near}B1near
{A1⇒near:{a1,a2,a3}B1⇒near:{a4,a5,a6}\\begin{cases} A_{1 \\Rightarrow near}: \\{a_1,a_2,a_3\\} \\\\ B_{1 \\Rightarrow near}: \\{a_4,a_5,a_6\\} \\end{cases}{A1near:{a1,a2,a3}B1near:{a4,a5,a6}
此时发现新集合A1⇒near,B1⇒nearA_{1 \\Rightarrow near},B_{1 \\Rightarrow near}A1near,B1near中的样本点与初始状态的集合Anear,BnearA_{near},B_{near}Anear,Bnear样本点存在差异,需要继续计算。新样本点的结果表示为
A2:(13,1),B2:(5,43)A_2:(\\frac{1}{3},1),B_2:(5,\\frac{4}{3})A2:(31,1),B2:(5,34)
对应图像结果表示如下:
第二次迭代

继续迭代。针对种子点A2,B2A_2,B_2A2,B2,分别求出各自对应其他样本点的距离:

SamplePoint/InitialCenter\\text{SamplePoint/InitialCenter}SamplePoint/InitialCenter A2A_2A2 B2B_2B2 ClusterResult\\text{ClusterResult}ClusterResult
a1:(0,0)a_1:(0,0)a1:(0,0) 103\\begin{aligned}\\frac{\\sqrt{10}}{3}\\end{aligned}310 >103\\begin{aligned}>\\frac{\\sqrt{10}}{3}\\end{aligned}>310 A2A_2A2
a2:(0,2)a_2:(0,2)a2:(0,2) 103\\begin{aligned}\\frac{\\sqrt{10}}{3}\\end{aligned}310 >103\\begin{aligned}>\\frac{\\sqrt{10}}{3}\\end{aligned}>310 A2A_2A2
a3:(1,1)a_3:(1,1)a3:(1,1) 23\\begin{aligned}\\frac{2}{3}\\end{aligned}32 >23\\begin{aligned}>\\frac{2}{3}\\end{aligned}>32 A2A_2A2
a4:(3,2)a_4:(3,2)a4:(3,2) >1023\\begin{aligned}>\\frac{10\\sqrt{2}}{3}\\end{aligned}>3102 1023\\begin{aligned}\\frac{10\\sqrt{2}}{3}\\end{aligned}3102 B2B_2B2
a5:(6,0)a_5:(6,0)a5:(6,0) >53\\begin{aligned}>\\frac{5}{3}\\end{aligned}>35 53\\begin{aligned}\\frac{5}{3}\\end{aligned}35 B2B_2B2
a6:(6,2)a_6:(6,2)a6:(6,2) >133\\begin{aligned}>\\frac{\\sqrt{13}}{3}\\end{aligned}>313 133\\begin{aligned}\\frac{\\sqrt{13}}{3}\\end{aligned}313 B2B_2B2

能够得到距离种子点A2A_2A2近的样本点集合A2⇒nearA_{2 \\Rightarrow near}A2near和距离种子点B2B_2B2近的样本点集合B2⇒nearB_{2 \\Rightarrow near}B2near
{A2⇒near:{a1,a2,a3}B2⇒near:{a4,a5,a6}\\begin{cases} A_{2 \\Rightarrow near}: \\{a_1,a_2,a_3\\} \\\\ B_{2 \\Rightarrow near}: \\{a_4,a_5,a_6\\} \\end{cases}{A2near:{a1,a2,a3}B2near:{a4,a5,a6}
此时发现新集合A2⇒near,B2⇒nearA_{2 \\Rightarrow near},B_{2 \\Rightarrow near}A2near,B2near中的样本点与第一次迭代的集合A1⇒near,B1⇒nearA_{1 \\Rightarrow near},B_{1 \\Rightarrow near}A1near,B1near样本点相同,停止迭代。整个迭代过程中,种子点A,BA,BA,B各移动两次,各种子点对应集合均包含333个样本点。A\\mathcal A \\quadA 选项正确。