Designing interaction for

browsing media collections

(by similarity)

PhD defense, January 14 2015






Christian Frisson
numediart Institute, University of Mons, Belgium Jury: Thierry Dutoit, Stéphane Dupont, Xavier Siebert,
Jean Vanderdonckt, Marcelo Wanderley, Jef Wijsen
(background designed by Charles-Alexandre Delestage and Willy Yvart )

Context

Browsing sounds

A typical setup for sound design


at Dame Blanche studios in Brussels, Belgium, picture by Christian Frisson

The sound browser takes half of the digital workspace.

Context

Browsing sounds

A long list to scroll and review...


screenshot of the AVID/Digidesign Pro Tools Workspace from the top screen

A concern of this thesis: how can we facilitate this task?

Context

Research Q&A

from the book Designing for Interaction by Dan Saffer, New Riders Pr., 2007

How to aid experts browsing collections of sounds?

  • By blending interaction (HCI) + organization (MIR) techniques
  • By thinking beyond media types
  • By conducting user evaluations

Outline

Context

Background

Method

Designs

Experiments

Background

Background

Browsing media content

Which file/media browser to use?

Dame Blanche studios in Brussels, Belgium, picture by Christian Frisson

Background

File browsers

(HCI) trends: grids, lists, rapid serial visual presentation


Apple OSX Finder layouts: icons, list, columns, CoverFlow

(MIR) no content-based organization

Background

Sound browsers (research)

(HCI) trend: (scatter)plots

CataRT
CataRT, SonicBrowser, SoundTorch

(MIR) content-based organization sometimes

Background

Sound browsers (commercial)

(HCI) trends: lists and grids


Digital/Media Asset Managers: AudioFinder, SoundFisher, SoundMiner

(MIR) content-based organization rarely

Background

Browsers for "big media/data"

(HCI) trend: starfield display among other infoviz techniques


From FilmFinder to SpotFire

((M)IR) more for data science than creative arts!

Background

Video browsers

(HCI) trend: grids (of frames) then scatterplots


ITEC Video Explorer, MediaMill Fork Browser, Panopticon

(MIR) content-based organization frequent

Note: more evaluation campaigns than for sound.

(TrecVID, VideOlympics, VideoBrowserShowcase)

Background

Media browsers compared


along media type, HCI and MIR techniques, usability evaluation...

Method

Method

Organization (MIR)

Interaction (HCI)

Method

Organization (MIR)

Content-based similarity dataflow

Method

Interaction (HCI): infoviz

Visual variables (foundations)


from: Charles-Eric Dessart, Vivian Genaro Motti, and Jean Vanderdonckt,
Animated Transitions between User Interface Views, Proc ACM AVI 2012

originally: Jock Mckinlay, Automatic design of graphical presentations, PhD, Stanford, 1986

Our focus: sparingly test some variables

Method

Interaction (HCI): infoviz

Visual variables (for similarity)


Colin Ware, Visual thinking for design, Morgan Kaufmann, 2008

Our focus: position and glyphs (shape, color, texture)

Method

Interaction (HCI): infoviz

"Information visualization has not yet proven itself

for search interfaces."


Marti A. Hearst, Search User Interfaces, Cambridge Univ Press, 2009

Our focus: information visualization

Method

Interaction (HCI)

Design recommendation 12: "searchers rarely scroll, so get "important" information above the first-scroll point."


Max L. Wilson, Search User Interface Design, Morgan & Claypool, 2012

Our choice for experiments: no scroll, pan or zoom

Designs

Designs

Context

3-month numediart projects

Paced towards demos

Designs

Context

numediart projects on media browsing

captured from the numediart video on vimeo created by Laura Colmenares Guerra

Media types in this thesis: mostly audio, then video

Designs

Framework (MIR)

MediaCycle


(left) content-based dataflow (right) file tree

Designs

Toolbox (HCI)

DeviceCycle for PureData


Novint Falcon, Apple Magic Trackpad, 3Dconnexion Space Navigator, Contour Design Shuttle Pro2

available on github.com/ChristianFrisson/DeviceCycle

Designs

Media browsers compared



2 examples: inter-audio (left), intra-video (right) presented at ACM TEI 2014

Experiments

Experiments

Intra-video

Inter-audio

Experiments

Intra-video

Inter-audio

Experiments

Method (general)

Known-item search evaluation


Ambience at the Video Browser Showdown 2013

Experiments

Intra-video

Inter-audio

Experiments

Research question (inter-audio)

For reviewing results of search by tag in collection of sounds in a 2D visual presentation, is a map organized by content-based similarity more efficient than a grid ordered by filename?

or ?

Experiments

Dataset (inter-audio)

One Laptop Per Child sound library (Creative Commons)

keywords extracted from filenames, rendered using Jason Davies' Word Cloud Generator (left)

Contains sound effects, searchable by tag on filenames.
Open dataset, towards experimental replicability.

Experiments

Apparatus (inter-audio)

Audio hover (touchpad), target buzzer (space navigator)


pictures by Charles-Alexandre Delestage (left) and Willy Yvart (right)

Logged data: times, pointer path, user actions.

Experiments

Intra-video

Inter-audio

Hypothesis

System

Protocol

Results

Experiments

Hypothesis 1 (inter-audio)

Facts from the previous research:

No comparison of content-based layouts of sounds
versus a grid ordered by filename
.

Solution to evaluate:

Compare both layouts to evaluate the effect of position.

Experiments

System 1-4 (inter-audio)

Dimension reduction

tSNE (Student t-distributed statistical neighbors embedding)


(right) tSNE applied to the COIL image dataset in Divvy

(MIR) similarity neighborhoods mapped to position (HCI)

Experiments

Protocol 1 (inter-audio)

Grid vs content-based starfield

Layouts compared: grid vs starfield.

Experiments

Results 1 (inter-audio)

Quantitative


Mann-Whitney u-tests (grid > cloud) for success times: p=0.66 Z=-0.43 Mgrid=30.72 Mcloud=32.03

Grid performs (not significantly) better than starfield.

Experiments

Hypothesis 2 (inter-audio)

Facts from the previous experiment:

Mapping content-based organization
to one visual variable (position) didn't help.

Solution to evaluate:

Mapping to more visual variables.

Combining content-based positions
with content-based glyph representations of sounds.

Experiments

System 2-4 (inter-audio)

Adding Perceptual Sharpness as audio feature (from YAAFE)

(right) by Thomas Grill, Perceptually Informed Organization of Textural Sounds,
PhD, Univ. of Music and Performing Arts Graz, Austria, 2012

(MIR) Perceptual Sharpness mapped to brightness/contour (HCI)

Experiments

Protocol 2 (inter-audio)

Grid vs content-based starfield plus glyphs

Layouts compared: grid vs cloud.

Experiments

Results 2 (inter-audio)

Quantitative


Mann-Whitney u-tests (grid > cloud) for success times: p=0.04 Z=-1.78 Mgrid=33.94 Mcloud=40.21

Grid performs (almost significantly) better than cloud.

Experiments

Results 2 (inter-audio)

Qualitative

Users prefer cloud over grid.

Experiments

Hypothesis 3 (inter-audio)

Facts from the previous experiment:

Combining content-based positionning
with content-based glyph representations didn't help.

Solution to evaluate:

Sampling the population closer to the expected users (sound designers): students in audiovisual communication.

Experiments

Protocol 3 (inter-audio)

"Expert" students (same system)

Layouts compared: grid vs cloud.

Experiments

Results 3 (inter-audio)

Quantitative


Unpaired Student t-tests (grid > cloud) for success times: p=0.02 t=-2.04 Mgrid=50.18 Mcloud=56.29

Grid still performs (almost significantly) better than cloud.

Experiments

Results 3 (inter-audio)

Qualitative

Users prefer cloud over grid.

Experiments

Results 1-3 (inter-audio)

Qualitative


10 most efficient runs per task

Users have a more "directed" search with grid.
(2D progress bar, pathway direction)

Experiments

Hypothesis 4 (inter-audio)

Facts from the previous experiment:

Combining content-based positionning and glyphs
plus a closer population sample didn't help.

Solution to evaluate:

Combining the benefits of a grid ("directed search") with the benefits of a content-based cloud (similarity in neighborhoods).

Experiments

System 4 (inter-audio)

The metro layout: discretizing cloud using a proximity grid

Different layouts on the same OLPC collection filtered by tag "water"
applying Wojciech Basalaj, Proximity visualisation of abstract data, PhD, Univ. Cambridge, 2001

Experiments

System 4 (inter-audio)

The metro layout: highlighting closest neighbors

Metro layout on an OLPC collection filtered by tag "water"
with (right) / without (left) HD nearest neighbor 2D links

Experiments

System 4 (inter-audio)

The metro layout: optimal density preserving direct neighbors

Proportion of direct neighbors vs proximity grid densities (77 OLPC water sounds)

densities:
from the collection size (left) to its ceiled square root (right)

neighbors:
(blue) horizontal, (green) vertical, (red) diagonal

horizontal lines:
densities for the grid ordered by filename

Stacked bars should be above horizontal lines.

Experiments

Protocol 4 (inter-audio)

The metro layout

Layouts compared: grid vs album vs metro.

Experiments

Results 4 (inter-audio)

Quantitative


Kruskal-Wallis rank sum test: chi-square=5.26 with p=0.07.
Tukey multiple comparisons of success times at 95% (>: performs better than):
p(metro>grid)=0.01 - p(metro>album)=0.34 - p(album>grid)=0.26

Metro performs significantly better than grid!

Experiments

Results 4 (inter-audio)

Qualitative

Users prefer metro over album, then grid.

Experiments

Summary (inter-audio)

Promising results for the metro layout!

Contributions

Contributions

Summary

Background

53 media browsers compared

Designs

MediaCycle framework: UI

DeviceCycle toolbox

10 media browsers designed

Experiments

4 inter-audio: the metro layout




Contributions

Publications

International conferences: 13 as first author, 5 as co-author

Journal: 1 as co-author

Future works

Future works

MIR


Descriptors for sound effects

Start with Essentia (UPF-MTG)

Dimension reduction

Reduce projection errors

Proximity grid optimization

Compare with more recent methods (pictured):
SAT-NeRV (top), SelfSortMap (bottom)

Future works

HCI

Visualize similarity in 1D with lists

To sort spreadsheet layouts

Combine tag- and content-based views

To cover the search workflow

Design suitable gestural interaction

To feel media content (pictured: inFORM)

Future works

MIR+HCI

A "real" sound browser for sound designers!



Combining most aforementioned future works.

Thanks

Thanks

Acknowledgements

Jury

Colleagues

Research Friends

Opensource

Creatives

Friends

Family

Discussion

Experiments

Intra-video

Experiments

Protocol (intra-video)

6 teams, 10 expert and 8 novice tasks

10s target to find on 1 in 10 1h-long videos


Server of the Video Browser Showdown 2013

Experiments

System (intra-video)

VideoCycle VBS 2013 edition

screenshot of VideoCycle VBS 2013 edition

Not content-based! (MIR)
Only rapid serial visual presentation (HCI)

Experiments

Results (intra-video)

Average submissions per task

graph by Schoeffmann et al. IJMIR 2014

Experiments

Results (intra-video)

Time to find targets

graph by Schoeffmann et al. IJMIR 2014

Experiments

Inter-audio

Experiments

System 4 (inter-audio)

The metro layout: discretizing cloud using a proximity grid

Spiral search for ideal cell assignment.


methods: empty / swap / bump

dots: ideal (black) and actual (white) location
cell color: occupied (gray) / available (white)
arrow: cell content replacement
by Wojciech Basalaj, Proximity visualisation of abstract data, PhD, Univ. Cambridge, 2001

We used the empty method.


Experiments

System 4 (inter-audio)

The metro layout: preserving direct neighbors

different layouts on the same collection: album (left), cloud (mid), metro (right)
with (bottom) / without (top) HD nearest neighbor 2D links

Experiments

System 4 (inter-audio)

The metro layout: optimal density

Proportion of direct neighbors vs proximity grid densities on OLPC collections filtered by tag


tags (sounds): (left-right) toy (46), scrape (64), water (77), spring (93), hit (129), metal (147)
densities: from the collection size (left) to its ceiled square root (right)
neighbors: (blue) horizontal, (green) vertical, (red) diagonal
horizontal lines: densities for the grid ordered by filename

Experiments

System 4 (inter-audio)

The metro layout: max density may preserve worse than grid

Proportion of direct neighbors along proximity grid densities
on an OLPC collection filtered by tag water (77 sounds)

densities: (left) from the collection size to its ceiled square root; (right) 20 most densest
neighbors: (blue) horizontal, (green) vertical, (red) diagonal
horizontal lines: densities for the grid ordered by filename

Experiments

System 4 (inter-audio)

The metro layout: max density may preserve worse than grid

Proportion of direct neighbors along proximity grid densities on OLPC collections filtered by tag


tags (sounds): (left-right) toy (46), scrape (64), water (77), spring (93), hit (129), metal (147)
densities: 20 densest values (densest is the ceiled square root of the collection size)
neighbors: (blue) horizontal, (green) vertical, (red) diagonal
horizontal lines: densities for the grid ordered by filename

Experiments

System 4 (inter-audio)

The metro layout: what's new?

Our optimization method to preserve direct neighbors.

Plus never applied on audio browsers.

hub paper: Kerry Rodden, Wojciech Basalaj, David Sinclair, and Kenneth Wood, Does organisation by similarity assist image browsing? ACM CHI 2001, DOI=10.1145/365024.365097

Manuscript erratum: PhotoMesa doesn't use proximity grids!

Benjamin B. Bederson, PhotoMesa: a zoomable image browser using quantum treemaps and bubblemaps,
ACM UIST 2001, DOI: 10.1145/502348.502359