https://arxiv.org/abs/2104.06401 Self-supervised object detection from audio-visual correspondence We tackle the problem of learning object detectors without supervision. Differently from weakly-supervised object detection, we do not assume image-level class labels. Instead, we extract a supervisory signal from audio-visual data, using the audio compone arxiv.org 1. Introduction 목표 : 아무 label이 없..