牛客网算法八股刷题系列(八)K-Means真题描述
牛客网算法八股刷题系列——K-Means真题描述
- 题目描述
- 正确答案:A\\mathcal AA
- 题目解析
题目描述
两个种子点A(−1,1),B(2,1)A(-1,1),B(2,1)A(−1,1),B(2,1),其余样本点为(0,0),(0,2),(1,1),(3,2),(6,0),(6,2)(0,0),(0,2),(1,1),(3,2),(6,0),(6,2)(0,0),(0,2),(1,1),(3,2),(6,0),(6,2)。利用K-Means\\text{K-Means}K-Means算法,点群中心按坐标平均计算。最终:
- 种子点AAA需要移动的次数;
- 种子点BBB需要移动的次数;
- 属于种子点AAA的样本点数(不含AAA);
- 属于种子点BBB的样本点数(不含BBB);
分别是()(\\quad)()
A2,2,3,3\\mathcal A \\quad 2,2,3,3A2,2,3,3
B1,1,3,3\\mathcal B \\quad 1,1,3,3B1,1,3,3
C1,1,2,4\\mathcal C \\quad 1,1,2,4C1,1,2,4
D2,2,2,4\\mathcal D \\quad 2,2,2,4D2,2,2,4
正确答案:A\\mathcal AA
题目解析
初始状态下,上述样本点(蓝色点)与种子点(橙色点)之间的位置关系表示如下:
针对种子点A,B\\mathcal A,\\mathcal BA,B,分别求出各自对应其他样本点的距离:
这里以‘欧式距离’计算点之间的距离信息,以
AAA与样本点
a1:(0,0)a_1:(0,0)a1:(0,0)之间距离为例,后续同理:
DistA⇔a1=(−1−0)2+(1−0)2=2\\text{Dist}_{A \\Leftrightarrow a_1} = \\sqrt{(-1 -0)^2 + (1 - 0)^2} = \\sqrt{2}DistA⇔a1=(−1−0)2+(1−0)2=2由于计算过程中不包含
A,BA,BA,B点,这里直接将它们视作‘虚拟样本’
SamplePoint/InitialCenter\\text{SamplePoint/InitialCenter}SamplePoint/InitialCenter | AAA | BBB | ClusterResult\\text{ClusterResult}ClusterResult |
---|---|---|---|
a1:(0,0)a_1:(0,0)a1:(0,0) | 2\\sqrt{2}2 | 3\\sqrt{3}3 | AAA |
a2:(0,2)a_2:(0,2)a2:(0,2) | 2\\sqrt{2}2 | 5\\sqrt{5}5 | AAA |
a3:(1,1)a_3:(1,1)a3:(1,1) | 222 | 111 | BBB |
a4:(3,2)a_4:(3,2)a4:(3,2) | 17\\sqrt{17}17 | 2\\sqrt{2}2 | BBB |
a5:(6,0)a_5:(6,0)a5:(6,0) | 50\\sqrt{50}50 | 17\\sqrt{17}17 | BBB |
a6:(6,2)a_6:(6,2)a6:(6,2) | 50\\sqrt{50}50 | 17\\sqrt{17}17 | BBB |
至此,能够得到距离种子点AAA近的样本点集合AnearA_{near}Anear和距离种子点BBB近的样本点集合BnearB_{near}Bnear:
{Anear:{a1,a2}Bnear:{a3,a4,a5,a6}\\begin{cases} A_{near}: \\{a_1,a_2\\} \\\\ B_{near}: \\{a_3,a_4,a_5,a_6\\} \\end{cases}{Anear:{a1,a2}Bnear:{a3,a4,a5,a6}
新的点群中心坐标的计算结果表示如下:
{a1.x+a2.x2=0+02=0a1.y+a2.y2=0+22=1A1:(0,1){a3.x+a4.x+a5.x+a6.x4=1+3+6+64=4a3.y+a4.y+a5.y+a6.y4=1+2+0+24=1.25B1:(4,1.25)\\begin{aligned} & \\begin{cases} \\begin{aligned}\\frac{a_{1.x} + a_{2.x}}{2} = \\frac{0 + 0}{2} = 0\\end{aligned} \\\\ \\begin{aligned} \\frac{a_{1.y} + a_{2.y}}{2} = \\frac{0 + 2}{2} = 1 \\end{aligned} \\end{cases} \\quad A_1:(0,1) \\\\ & \\begin{cases} \\begin{aligned}\\frac{a_{3.x} + a_{4.x} + a_{5.x} + a_{6.x}} {4} = \\frac{1 + 3 + 6 + 6}{4} = 4\\end{aligned} \\\\ \\begin{aligned} \\frac{a_{3.y} + a_{4.y} + a_{5.y} + a_{6.y}}{4} = \\frac{1 + 2 + 0 + 2}{4} = 1.25 \\end{aligned} \\end{cases} \\quad B_1:(4,1.25) \\end{aligned}⎩⎨⎧2a1.x+a2.x=20+0=02a1.y+a2.y=20+2=1A1:(0,1)⎩⎨⎧4a3.x+a4.x+a5.x+a6.x=41+3+6+6=44a3.y+a4.y+a5.y+a6.y=41+2+0+2=1.25B1:(4,1.25)
此时已经得到新的种子点A1,B1A_1,B_1A1,B1,新的位置关系表示如下:
重新执行上面的步骤:
针对种子点A1,B1A_1,B_1A1,B1,分别求出各自对应其他样本点的距离:
SamplePoint/InitialCenter\\text{SamplePoint/InitialCenter}SamplePoint/InitialCenter | A1A_1A1 | B1B_1B1 | ClusterResult\\text{ClusterResult}ClusterResult |
---|---|---|---|
a1:(0,0)a_1:(0,0)a1:(0,0) | 111 | >1>1>1 | A1A_1A1 |
a2:(0,2)a_2:(0,2)a2:(0,2) | 111 | >1>1>1 | A1A_1A1 |
a3:(1,1)a_3:(1,1)a3:(1,1) | 111 | >1>1>1 | A1A_1A1 |
a4:(3,2)a_4:(3,2)a4:(3,2) | >54\\begin{aligned}>\\frac{5}{4}\\end{aligned}>45 | 54\\begin{aligned}\\frac{5}{4}\\end{aligned}45 | B1B_1B1 |
a5:(6,0)a_5:(6,0)a5:(6,0) | >894\\begin{aligned}>\\frac{\\sqrt{89}}{4}\\end{aligned}>489 | 894\\begin{aligned}\\frac{\\sqrt{89}}{4}\\end{aligned}489 | B1B_1B1 |
a6:(6,2)a_6:(6,2)a6:(6,2) | >734\\begin{aligned}>\\frac{\\sqrt{73}}{4}\\end{aligned}>473 | 734\\begin{aligned}\\frac{\\sqrt{73}}{4}\\end{aligned}473 | B1B_1B1 |
能够得到距离种子点A1A_1A1近的样本点集合A1⇒nearA_{1\\Rightarrow near}A1⇒near和距离种子点B1B1B1近的样本点集合B1⇒nearB_{1 \\Rightarrow near}B1⇒near:
{A1⇒near:{a1,a2,a3}B1⇒near:{a4,a5,a6}\\begin{cases} A_{1 \\Rightarrow near}: \\{a_1,a_2,a_3\\} \\\\ B_{1 \\Rightarrow near}: \\{a_4,a_5,a_6\\} \\end{cases}{A1⇒near:{a1,a2,a3}B1⇒near:{a4,a5,a6}
此时发现新集合A1⇒near,B1⇒nearA_{1 \\Rightarrow near},B_{1 \\Rightarrow near}A1⇒near,B1⇒near中的样本点与初始状态的集合Anear,BnearA_{near},B_{near}Anear,Bnear样本点存在差异,需要继续计算。新样本点的结果表示为
A2:(13,1),B2:(5,43)A_2:(\\frac{1}{3},1),B_2:(5,\\frac{4}{3})A2:(31,1),B2:(5,34)
对应图像结果表示如下:
继续迭代。针对种子点A2,B2A_2,B_2A2,B2,分别求出各自对应其他样本点的距离:
SamplePoint/InitialCenter\\text{SamplePoint/InitialCenter}SamplePoint/InitialCenter | A2A_2A2 | B2B_2B2 | ClusterResult\\text{ClusterResult}ClusterResult |
---|---|---|---|
a1:(0,0)a_1:(0,0)a1:(0,0) | 103\\begin{aligned}\\frac{\\sqrt{10}}{3}\\end{aligned}310 | >103\\begin{aligned}>\\frac{\\sqrt{10}}{3}\\end{aligned}>310 | A2A_2A2 |
a2:(0,2)a_2:(0,2)a2:(0,2) | 103\\begin{aligned}\\frac{\\sqrt{10}}{3}\\end{aligned}310 | >103\\begin{aligned}>\\frac{\\sqrt{10}}{3}\\end{aligned}>310 | A2A_2A2 |
a3:(1,1)a_3:(1,1)a3:(1,1) | 23\\begin{aligned}\\frac{2}{3}\\end{aligned}32 | >23\\begin{aligned}>\\frac{2}{3}\\end{aligned}>32 | A2A_2A2 |
a4:(3,2)a_4:(3,2)a4:(3,2) | >1023\\begin{aligned}>\\frac{10\\sqrt{2}}{3}\\end{aligned}>3102 | 1023\\begin{aligned}\\frac{10\\sqrt{2}}{3}\\end{aligned}3102 | B2B_2B2 |
a5:(6,0)a_5:(6,0)a5:(6,0) | >53\\begin{aligned}>\\frac{5}{3}\\end{aligned}>35 | 53\\begin{aligned}\\frac{5}{3}\\end{aligned}35 | B2B_2B2 |
a6:(6,2)a_6:(6,2)a6:(6,2) | >133\\begin{aligned}>\\frac{\\sqrt{13}}{3}\\end{aligned}>313 | 133\\begin{aligned}\\frac{\\sqrt{13}}{3}\\end{aligned}313 | B2B_2B2 |
能够得到距离种子点A2A_2A2近的样本点集合A2⇒nearA_{2 \\Rightarrow near}A2⇒near和距离种子点B2B_2B2近的样本点集合B2⇒nearB_{2 \\Rightarrow near}B2⇒near:
{A2⇒near:{a1,a2,a3}B2⇒near:{a4,a5,a6}\\begin{cases} A_{2 \\Rightarrow near}: \\{a_1,a_2,a_3\\} \\\\ B_{2 \\Rightarrow near}: \\{a_4,a_5,a_6\\} \\end{cases}{A2⇒near:{a1,a2,a3}B2⇒near:{a4,a5,a6}
此时发现新集合A2⇒near,B2⇒nearA_{2 \\Rightarrow near},B_{2 \\Rightarrow near}A2⇒near,B2⇒near中的样本点与第一次迭代的集合A1⇒near,B1⇒nearA_{1 \\Rightarrow near},B_{1 \\Rightarrow near}A1⇒near,B1⇒near样本点相同,停止迭代。整个迭代过程中,种子点A,BA,BA,B各移动两次,各种子点对应集合均包含333个样本点。A\\mathcal A \\quadA 选项正确。