clustering 평가 방법 (RI, rand index)

Rand Index (RI)

adjusted rand index는 클러스터의 타깃값을 아는 경우에 사용하는 평가 방법 (supervised inference?)

given $S = {o_1, ..., o_n}$, S의 두개의 partition이 존재.

${\displaystyle a}$, the number of pairs of elements in $S$ that are in the same subset in $X$ and in the same subset in $Y$
${\displaystyle b}$, the number of pairs of elements in $S$ that are in different subsets in $X$ and in different subsets in $Y$
${\displaystyle c}$, the number of pairs of elements in $S$ that are in the same subset in $X$ and in different subsets in $Y$
${\displaystyle d}$, the number of pairs of elements in $S$ that are in different subsets in $X$ and in the same subset in $Y$

$$ R = {a + b \over a+ b+ c+ d} = {a + b \over {n \choose k}}$$

Rand Index를 measure of the percentage of correct decisions로 볼 수 있음

$$RI = {TP + TN \over TP+FP+FN+TN}$$

의사결정 나무 (Decision Tree) ID3 알고리즘 (0)	2022.04.04
의사결정나무, Decision Tree (0)	2022.04.04
분산과 편향 차이 이해하기 (bias vs variance) (0)	2022.04.04
선형회귀(Linear Regression), Lasso, Ridge 이해하기 (0)	2022.03.31
레이블인코딩(Label Encoding) vs 원핫인코딩(One-hot Encoding) 비교 (0)	2022.02.28