본문 바로가기
Machine Learning

clustering 평가 방법 (RI, rand index)

by hyez 2022. 3. 7.

Rand Index (RI)

adjusted rand index는 클러스터의 타깃값을 아는 경우에 사용하는 평가 방법 (supervised inference?)

 

given $S = {o_1, ..., o_n}$, S의 두개의 partition이 존재.

  1. $X={X_1, ..., X_r}$ a partition of $S$ into $r$ subset
  2. $Y={Y_1, ..., Y_r}$ a partition of $S$ into $s$ subset
  • ${\displaystyle a}$, the number of pairs of elements in $S$ that are in the same subset in $X$ and in the same subset in $Y$
  • ${\displaystyle b}$, the number of pairs of elements in $S$ that are in different subsets in $X$ and in different subsets in $Y$
  • ${\displaystyle c}$, the number of pairs of elements in $S$ that are in the same subset in $X$ and in different subsets in $Y$
  • ${\displaystyle d}$, the number of pairs of elements in $S$ that are in different subsets in $X$ and in the same subset in $Y$
  $X_{same}$ $X_{diff}$
$Y_{same}$ a d
$Y_{diff}$ c b

$$ R = {a + b \over a+ b+ c+ d} = {a + b \over {n \choose k}}$$

 

Rand Index를 measure of the percentage of correct decisions로 볼 수 있음

$$RI = {TP + TN \over TP+FP+FN+TN}$$

 

 

Reference

댓글