Rand Index (RI)
adjusted rand index는 클러스터의 타깃값을 아는 경우에 사용하는 평가 방법 (supervised inference?)
given $S = {o_1, ..., o_n}$, S의 두개의 partition이 존재.
- $X={X_1, ..., X_r}$ a partition of $S$ into $r$ subset
- $Y={Y_1, ..., Y_r}$ a partition of $S$ into $s$ subset
- ${\displaystyle a}$, the number of pairs of elements in $S$ that are in the same subset in $X$ and in the same subset in $Y$
- ${\displaystyle b}$, the number of pairs of elements in $S$ that are in different subsets in $X$ and in different subsets in $Y$
- ${\displaystyle c}$, the number of pairs of elements in $S$ that are in the same subset in $X$ and in different subsets in $Y$
- ${\displaystyle d}$, the number of pairs of elements in $S$ that are in different subsets in $X$ and in the same subset in $Y$
$X_{same}$ | $X_{diff}$ | |
$Y_{same}$ | a | d |
$Y_{diff}$ | c | b |
$$ R = {a + b \over a+ b+ c+ d} = {a + b \over {n \choose k}}$$
Rand Index를 measure of the percentage of correct decisions로 볼 수 있음
$$RI = {TP + TN \over TP+FP+FN+TN}$$
Reference
'Machine Learning' 카테고리의 다른 글
의사결정 나무 (Decision Tree) ID3 알고리즘 (0) | 2022.04.04 |
---|---|
의사결정나무, Decision Tree (0) | 2022.04.04 |
분산과 편향 차이 이해하기 (bias vs variance) (0) | 2022.04.04 |
선형회귀(Linear Regression), Lasso, Ridge 이해하기 (0) | 2022.03.31 |
레이블인코딩(Label Encoding) vs 원핫인코딩(One-hot Encoding) 비교 (0) | 2022.02.28 |
댓글