Semantic-Aware Image Analysis

Li, Weihao

PDF, English
Download (18MB) | Terms of use

Citation of documents: Please do not cite the URL that is displayed in your browser location input, instead use the DOI, URN or the persistent URL below, as we can guarantee their long-time accessibility.

DOI: 10.11588/heidok.00027681
URN: urn:nbn:de:bsz:16-heidok-276817
URL: http://www.ub.uni-heidelberg.de/archiv/27681

Abstract

Extracting and utilizing high-level semantic information from images is one of the important goals of computer vision. The ultimate objective of image analysis is to be able to understand each pixel of an image with regard to high-level semantics, e.g. the objects, the stuff, and their spatial, functional and semantic relations. In recent years, thanks to large labeled datasets and deep learning, great progress has been made to solve image analysis problems, such as image classification, object detection, and object pose estimation. In this work, we explore several aspects of semantic-aware image analysis. First, we explore semantic segmentation of man-made scenes using fully connected conditional random fields which can model long-range connections within the image of man-made scenes and make use of contextual information of scene structures. Second, we introduce a semantic smoothing method by exploiting the semantic information to accomplish semantic structure-preserving image smoothing. Semantic segmentation has achieved significant progress recently and has been widely used in many computer vision tasks. We observe that high-level semantic image labeling information can provide a meaningful structure prior to image smoothing naturally. Third, we present a deep object co-segmentation approach for segmenting common objects of the same class within a pair of images. To address this task, we propose a CNN-based Siamese encoder-decoder architecture. The encoder extracts high-level semantic features of the foreground objects, a mutual correlation layer detects the common objects, and finally, the decoder generates the output foreground masks for each image. Finally, we propose an approach to localize common objects from novel object categories in a set of images. We solve this problem using a new common component activation map in which we treat the class-specific activation maps as components to discover the common components in the image set. We show that our approach can generalize on novel object categories in our experiments.

Translation of abstract (German)

Die Extraktion und Nutzung von semantischen Informationen aus Bildern gehört zu den wichtigsten Computer-Vision-Anwendungen. Allumfassendes Ziel von Bildanalyse ist das semantische Verständnis auf Pixelebene. Dazu gehört unter anderem die Zuordnung von Pixeln zu Objekten und Flächen, sowie ihre örtlichen, funktionalen und semantischen Zusammenhänge. In den letzten Jahren konnte auf dem Feld der Bildanalyse, insbesondere bei Klassifikation, Objekterkennung und Posenschätzung, großer Fortschritt durch annotierte Datensätze und Deep Learning erzielt werden. Diese Arbeit untersucht verschiedenste Aspekte der semantischen Bildanalyse. Erstens betrachten wir die Semantische Segmentierung urbaner Szenen mittels Dense CRFs. Dense CRFs modellieren globale Zusammenhänge in Bildern unter Nutzung des Kontextes der Struktur einer Szene. Zweitens führen wir eine Methode zur semantischen strukturerhaltenden Bildglättung, unter Nutzung von Kontextinformation, ein. Semantische Segmentierung konnte seit kurzem große Fortschritte erzielen und wird in vielen Computer-Vision-Anwendungen genutzt. Wir beobachten, dass Semantik als nützliche A-Priori-Information für natürlich wirkende Bildglättung verwendet werden kann. Drittens präsentieren wir eine Methode zur Kosegmentierung von Objekten der selben Klasse in einem Paar von Bildern unter Nutzung einer Siamese Encoder-Decoder CNN-Architektur. Der Encoder extrahiert semantische Deskriptoren der Objekte im Vordergrund, ein Mutual Correlation Layer detektiert Objekte derselben Klasse. Abschließend generiert der Decoder eine Vordergrundmaske für jedes Objekt. Viertens stellen wir eine Methode zur Lokalisierung häufiger Objekte eines Bilddatensatzes aus unbekannten Klassen vor. Zur Lösung dieses Problems führen wir eine Common Component Activation Map ein, in welcher klassenspezifische Activation Maps zur Erkennung häufiger Komponenten im Datensatz genutzt werden. Wir zeigen in Experimenten, dass dieser Ansatz auf neue Objektkategorien generalisiert.

Document type:	Dissertation
Supervisor:	Rother, Prof. Dr. Carsten
Place of Publication:	Heidelberg
Date of thesis defense:	18 December 2019
Date Deposited:	05 Feb 2020 08:27
Date:	2020
Faculties / Institutes:	The Faculty of Mathematics and Computer Science > Department of Computer Science
DDC-classification:	004 Data processing Computer science