goormNLP [Self-Supervised Learning]

2022.02.23

Auspice by Goorm, Manage by DAVIAN @ KAIST

Lecture: Self-Supervised Learning

2022-02-23

일반적으로 Supervised Learning은 높은 성능의 모델을 만드는 것이 유리하지만, 수 많은 데이터에 Label을 전부 달아야 한다는 점에서 데이터셋 모으기가 어려우며 제한적이다.

이와 같은 문제를 해결하고자 나온 방법이 Semi-Supervised Learning (준지도 학습)과 Unsupervised Learning(비지도 학습)이다.

최근 주목받는 연구 방법은 Self-Supervised Learning (자기지도 학습)이다. 자기 지도학습이란

연구자가 직접 만든(Pretext) task를 정의.
Label이 없는 데이터셋을 사용하여 1의 Pretext task를 목표로 모델을 학습.
이때, 데이터 자체의 정보를 Label로 직접적으로 처리하는 것이 아니라, 적당히 변형/사용하여 supervision(지도)로 삼음.
2에서 학습 시킨 모델을 Downstream task에 가져와 weight는 freeze 시킨채로, transfer learning을 수행.
처음에는 Label이 없는 상태에서 직접 supervision을 만들어 학습한 뒤, transfer learning 단계에서는 Label이 있는 ImageNet 등에서 Supervised Learning을 수행하여 2에서 학습 시킨 모델의 성능을 평가.

Self-Supervised Learning의 이름답게 Label 등의 직접적인 supervision이 없는 데이터셋에서 스스로 supervision을 만들어 학습한다.

스크린샷 2022-02-24 오전 11 52 34

스크린샷 2022-02-24 오전 11 52 44

스크린샷 2022-02-24 오전 11 52 53

Evaluate the pre-trained representations through fine-tuning in a transfer learning setting

스크린샷 2022-02-24 오후 12 01 38

It is unclear whether aforementioned pre-text tasks really enhance the representation quality.
What do we want from the learned representations?
1. Invariant mapping : representations should be stable for an slightly transformed version of an image.
2. Semantic Similarity : semantically related images should be close to each other.