Visual Representation Learning with Minimal Supervision

Büchler, Uta

Preview

PDF, English
Download (29MB) | Terms of use

Citation of documents: Please do not cite the URL that is displayed in your browser location input, instead use the DOI, URN or the persistent URL below, as we can guarantee their long-time accessibility.

DOI: 10.11588/heidok.00029205
URN: urn:nbn:de:bsz:16-heidok-292054
URL: http://www.ub.uni-heidelberg.de/archiv/29205

Abstract

Computer vision intends to provide the human abilities of understanding and interpreting the visual surroundings to computers. An essential element to comprehend the environment is to extract relevant information from complex visual data so that the desired task can be solved. For instance, to distinguish cats from dogs the feature 'body shape' is more relevant than 'eye color' or the 'amount of legs'. In traditional computer vision it is conventional to develop handcrafted functions that extract specific low-level features such as edges from visual data. However, in order to solve a particular task satisfactorily we require a combination of several features. Thus, the approach of traditional computer vision has the disadvantage that whenever a new task is addressed, a developer needs to manually specify all the features the computer should look for. For that reason, recent works have primarily focused on developing new algorithms that teach the computer to autonomously detect relevant and task-specific features. Deep learning has been particularly successful for that matter. In deep learning, artificial neural networks automatically learn to extract informative features directly from visual data. The majority of developed deep learning strategies require a dataset with annotations which indicate the solution of the desired task. The main bottleneck is that creating such a dataset is very tedious and time-intensive considering that every sample needs to be annotated manually. This thesis presents new techniques that attempt to keep the amount of human supervision to a minimum while still reaching satisfactory performances on various visual understanding tasks. In particular, this thesis focuses on self-supervised learning algorithms that train a neural network on a surrogate task where no human supervision is required. We create an artificial supervisory signal by breaking the order of visual patterns and asking the network to recover the original structure. Besides demonstrating the abilities of our model on common computer vision tasks such as action recognition, we additionally apply our model to biomedical scenarios. Many research projects in medicine involve profuse manual processes that extend the duration of developing successful treatments. Taking the example of analyzing the motor function of neurologically impaired patients we show that our self-supervised method can help to automate tedious, visually based processes in medical research. In order to perform a detailed analysis of motor behavior and, thus, provide a suitable treatment, it is important to discover and identify the negatively affected movements. Therefore, we propose a magnification tool that can detect and enhance subtle changes in motor function including motor behavior differences across individuals. In this way, our automatic diagnostic system does not only analyze apparent behavior but also facilitates the perception and discovery of impaired movements. Learning a feature representation without requiring annotations significantly reduces human supervision. However, using annotated dataset leads generally to better performances in contrast to self-supervised learning methods. Hence, we additionally examine semi-supervised approaches which efficiently combine few annotated samples with large unlabeled datasets. Consequently, semi-supervised learning represents a good trade-off between annotation time and accuracy.

Document type:	Dissertation
Supervisor:	Ommer, Prof. Dr. Björn
Place of Publication:	Heidelberg
Date of thesis defense:	4 December 2020
Date Deposited:	27 Jan 2021 13:39
Date:	2021
Faculties / Institutes:	The Faculty of Mathematics and Computer Science > Dean's Office of The Faculty of Mathematics and Computer Science The Faculty of Mathematics and Computer Science > Department of Computer Science
DDC-classification:	004 Data processing Computer science
Controlled Keywords:	Maschinelles Sehen, Deep Learning, Unsupervised Learning
Uncontrolled Keywords:	Computer Vision, Visual Representation Learning