Sketched Visual Narratives for Content Based Video Retrieval


Digital media impacts all aspects of our society, and a wealth of visual content is searched daily by both professionals and casual users. Digital video in particular has become ubiquitous, with commodity hardware encouraging the casual capture of video both for distribution online and for archival in personal media collections. This explosion of video content motivates new approaches to eciently search, browse and present video collections.

Video search is commonly performed using text based search of manually annotated metadata tags. Text o ers a concise medium for describing objects or semantic concept present in a scene. However video is a rich visual medium, and it is cumbersome to describe the appearance (e.g. shape, colour), the relative spatial positions of objects, or sequences of actions (events) using a purely textual description. Pictorial representations (e.g. sketches) o er a complementary, intuitive way to describe these scene characteristics.

We propose to explore the combined use of text and sketch to specify queries in a Content Based Video Retrieval (CBVR) system. A user's ability to recall events is typically based upon their episodic memory, [73]; the tendency to recall objects and their interactions rather than the precise photometric properties of a scene. Sketches o er an intuitive and convenient way to depict episodes that we will harness for our retrieval system. We will develop systems that accept a text-annotated free-hand sketched "storyboard" as input, and identify matching video clips using not only the semantic labels in the sketch, but also the appearance and motion characteristics of the depicted scene.

Although Sketch Based Retrieval (SBR) of multimedia dates back to the early nineties[25, 68], there has comparatively little SBR work investigating the use of sketch for video retrieval. The majority of SBR video techniques are either simple adaptations of image retrieval technologies applied to extracted video key-frames, or focus on only one facet of the scene description (e.g. shape, or motion trajectory). We propose to extend the work of Collomosse et al.[17] who were the rst to apply sketched storyboards for video search. In [17], a storyboard sketch depicts both an object's appearance and its motion, using cues borrowed from classical animation (e.g. arrows or stream-lines). However, the object model used in [17] was simplistic (focusing on only shape and colour, under a linear motion constraint), but most importantly did not enable the matching of sequences of actions i.e. events -- yet it is precisely the sequencing of actions that underpins the narrative description of an event, and is mostly strongly aligned with the episodic value of recall.

Our work will explore the visual depiction of narrative in a storyboard sketch, enabling the matching of objects based on classi cation, appearance and the sequence of actions they participate in. For example, a user might search surveillance footage by sketching a desired object and sequence of actions -- or specify a more complex sequence of movements using a sketch, for example to search for a particular event in a movie, or piece of choreography in a cultural video archive. The key scienti c challenge underpinning this goal is that of devising a suitable video representation. Our representation must not only be discriminative but also accommodate the ambiguity inherent in sketched queries. The representation must also be amenable to scalable matching over the large video collections that are becoming increasingly available to even casual multimedia consumers.


0 object(s)