🔢 Digit Recognizer Competition : 손글씨 숫자 분류 모델 만들기 - 3. 모델 평가

이번 글은 간단하게 저번글에서 만든 모델을 평가해보도록 하겠습니다.

📚 시리즈 전체 목차

1. 데이터 준비
2. CNN 모델 구현
3. 모델 평가 및 시각화

🔎 글 개요

모델의 성능을 확인하는 대표적인 방법인 Confusion Matrix와 오분류 사례 시각화를 통해 모델이 어떤 숫자에서 실수하는지 살펴보겠습니다.

🗂️ 글 내부 목차

1. 모델 평가 및 시각화
- 1.1 Confusion Matrix
- 1.2 Plot Error Preds

1. 모델 평가 및 시각화

CNN 모델을 직접 평가하고 결과를 시각화 합시다. 저번글 마지막 섹션과 이어집니다. ( 모델 학습 파트에 넣었지만, 사실은 평가 파트로 보는게 합당합니다.)

1.1 Confusion Matrix

혼동 행렬은 단순히 모델의 정확도를 넘어서, 어떤 숫자에서 주로 실수하는지를 한눈에 보여줍니다. 이를 통해 '4를 9로 오분류하는 사례가 많다'는 식의 패턴을 발견하고, 데이터 전처리나 모델 튜닝의 방향성을 정할 수 있어 모델 개선에 매우 중요한 도구입니다.

# Look at confusion matrix 

def plot_confusion_matrix(cm, classes,
                          normalize=False,
                          title='Confusion matrix',
                          cmap=plt.cm.Blues):
    """
    This function prints and plots the confusion matrix.
    Normalization can be applied by setting `normalize=True`.
    """
    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45)
    plt.yticks(tick_marks, classes)

    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]

    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, cm[i, j],
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black")

    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label')

# Predict the values from the validation dataset
Y_pred = model.predict(X_val)
# Convert predictions classes to one hot vectors 
Y_pred_classes = np.argmax(Y_pred,axis = 1) 
# Convert validation observations to one hot vectors
Y_true = np.argmax(Y_val,axis = 1) 
# compute the confusion matrix
confusion_mtx = confusion_matrix(Y_true, Y_pred_classes) 
# plot the confusion matrix
plot_confusion_matrix(confusion_mtx, classes = range(10))

4 이미지를 9로 예측한 사레가 조금 있는걸 제외하고는 매우 잘 예측하고 있다.

1.2 Plot Error Preds

혼동 행렬에서 4를 9로 많이 틀렸다고 나왔는데, 그 4 이미지들을 실제로 보면 숫자 4가 진짜 9처럼 생겼다거나 하는 경우가 있을 수 있습니다. 그래서 직접 오분류된 이미지를 확인해보는 것이 필요합니다.

아래 코드는 단순히 오분류만 보는 게 아니라, 가장 심하게 확신하고 틀린 경우 를 시각화 하고 있습니다.

모델이 틀릴 수도 있지만, 자신 없게 틀리는 것과 확신하고 틀리는 것의 위험성은 다르기 때문입니다.

# Display some error results 

# Errors are difference between predicted labels and true labels
errors = (Y_pred_classes - Y_true != 0)

Y_pred_classes_errors = Y_pred_classes[errors]
Y_pred_errors = Y_pred[errors]
Y_true_errors = Y_true[errors]
X_val_errors = X_val[errors]

def display_errors(errors_index,img_errors,pred_errors, obs_errors):
    """ This function shows 6 images with their predicted and real labels"""
    n = 0
    nrows = 2
    ncols = 3
    fig, ax = plt.subplots(nrows,ncols,sharex=True,sharey=True)
    for row in range(nrows):
        for col in range(ncols):
            error = errors_index[n]
            ax[row,col].imshow((img_errors[error]).reshape((28,28)),cmap='gray')
            ax[row,col].set_title("Predicted label :{}\nTrue label :{}".format(pred_errors[error],obs_errors[error]))
            n += 1

# Probabilities of the wrong predicted numbers
Y_pred_errors_prob = np.max(Y_pred_errors,axis = 1)

# Predicted probabilities of the true values in the error set
true_prob_errors = np.diagonal(np.take(Y_pred_errors, Y_true_errors, axis=1))

# Difference between the probability of the predicted label and the true label
delta_pred_true_errors = Y_pred_errors_prob - true_prob_errors

# Sorted list of the delta prob errors
sorted_dela_errors = np.argsort(delta_pred_true_errors)

# Top 6 errors 
most_important_errors = sorted_dela_errors[-6:]

# Show the top 6 errors
display_errors(most_important_errors, X_val_errors, Y_pred_classes_errors, Y_true_errors)

이번 글은 여기서 마무리하겠습니다.

사실 요즘 데이콘 암석 분류 문제에 꽂혀 있어서 업로드가 좀 늦었네요. (솔직히... 귀찮기도 했습니다 ㅎㅎ)

군인 신분으로 좋은 AI 학습용 컴퓨터를 쓰지 못하다 보니, 코드를 짜놓고도 모델을 제대로 돌려보지 못하는 게 가장 아쉽습니다. 😭

역시 군 복무 중엔 무거운 AI 프로젝트는 쉽지 않은 것 같아요... 에잉.

그래도 저는 공부의 가장 빠른 길은 나보다 많이 아는 사람의 방식을 따라해보고, 그걸 나만의 방식으로 소화하는 것이라고 믿습니다.

비록 지금은 제가 직접 큰 프로젝트를 진행하긴 어렵지만, 앞으로도 캐글 고수들의 문제풀이 과정을 따라가며 계속 공부하고, 그 내용을 글로 정리해보려 합니다.

그럼 또 다음 캐글 문제에서 찾아뵙겠습니다! 🙌

📚 시리즈 전체 목차

1. 데이터 준비
2. CNN 모델 구현
3. 모델 평가 및 시각화