Image Annotation

Motivation

In many real-life scenarios, an object can be categorized into multiple categories. E.g., a newspaper column can be tagged as "political", "election", "democracy"; an image may contain "tiger", "grass", "river"; and so on. These are instances of multi-label classification, which deals with the task of associating multiple labels with single data. It is a difficult problem becauses one needs to consider the intricate correlations that exist amont different labels.

Automatic image annotation is a multi-label classification problem that aims at associating a set of textual with an image that describe its semantics. It has potential applications in image retrieval, image description, etc. Recent outburst of multimedia content on the Internet and as personal collections has raised the demand for auto-annotation methods; due to which this has become an active area of research.

A Modified KNN for Image Annotation [1]

ex 1

{bear, reflection, water, black, river}

ex 2

{field, horses, mare, foals, tree}

ex 3

{green, phone, woman, hair, suit}

ex 4

{fight, grass, game, anime, man}

ex 5

{building, base, horse, statue, man}

ex 6

{fence, mountain, range, airplane, sky}

For a given image, the labels are usually predicted from an annotation vocabulary of few hundred labels. Because of the large vocabulary, there is high variance in label frequency ("class-imbalance"). Moreover, due to limitations of manual annotation, a significant number of available images are not annotated with all the relevant labels ("weak-labelling"). These two issues affect the performance of many existing image annotation models.
In this work, we proposed 2PKNN, a two-step variant of the classical K-nearest neighbour algorithm, that triest to address these two issues. We also proposed a metric learning framework over 2PKNN for learning better distances.

Generating Image Description [2]

Problem

problem

Results


* A black ferrari is parked in front of a green tree.	* An adult hound is laying on an orange couch.	* A blond woman is posing with an elvis impersonator.	* A small sailboat is passing near a yellow buoy.
* A sporty car is parked on a concrete driveway.	* A sweet cat is curling on a pink blanket.	* An orange fixture is hanging in a messy kitchen.	* An ocean boat is travelling in a narrow water.

In this work, we proposed a method to describe an image in a sentence.
It is based on annotating an image with linguistically motivated phrases.
These phrases are combined to generate image description.

Related Publications

Yashaswi Varma and C V Jawahar - Image Annotation using Metric Learning in Semantic Neighbourhoods Proceedings of 12th European Conference on Computer Vision, 7-13 Oct. 2012, Print ISBN 978-3-642-33711--6, Vol. ECCV 2012, Part-III, LNCS 7574, pp. 114-128, Firenze, Italy. [PDF]

Ankush Gupta, Yashaswi Verma and C. V. Jawahar - Choosing Linguistics Over Vision to Describe Images, In AAAI, 2012. [paper] [presentation] [poster]

People

Yashaswi Verma
C. V. Jawahar