Bearbeitet von: Bartholomäus Steinmayr

Betreuer: Dominikus Baur


Inhaltliche Ansätze zur Visualisierung von Musiksammlungen konzentrieren sich normalerweise auf musikalische Aspekte. Dabei bleibt die andere Hälfte eines Lieds, der Text, gerne außen vor. Diese Projektarbeit zielt darauf ab, textuelle Verbindungen innerhalb einer Musiksammlung für den Benutzer sichtbar zu machen. Volltextsuche erlaubt es, Lieder mit bestimmten Textstellen zu finden. Eine Clusteranalyse als Grundlage der Visualisierung soll Gruppen innerhalb der Sammlung aufzeigen (seichte Poptexte) und Aussagen über Genres erlauben (alle Gospellieder enthalten mindestens ein "Lord").

Konkrete Aufgaben

  • Erstellung einer Literaturliste von verwandten/relevanten wissenschaftlichen Arbeiten
  • Erstellung einer ausführlichen Dokumentation im Medieninformatik-Wiki
  • Design und Implementierung eines lauffähigen Prototypen
  • Schrittweise Verfeinerung der Arbeit
  • Erstellung einer mindestens 30-seitigen Ausarbeitung, die den Hintergrund, das Design, die Implementierung und die Ergebnisse beschreibt und sich an diese Vorgaben ( hält
  • Halten eines Abschlußvortrags im Oberseminar


Beginn: 10.8.2009 (Teilzeit, d.h. jeweils 10h/Woche)

Monat 1:

  • Aufgaben: Einarbeitung, Schreiben eines oder mehrerer Abstracts (max. 250 Wörter), Literaturrecherche
  • Ergebnisse: Mind. zwei Abstracts mit unterschiedlichen Auslegungen des Themas, mind. 12 relevante (!) wissenschaftliche (!!) Arbeiten

Monat 2:

  • Aufgaben: Design
  • Ergebnis: Designdokument

Monat 3:

  • Aufgaben: Implementierung
  • Ergebnis: Vorläufiger Prototyp

Monat 4:

  • Aufgaben: Verfeinerung des Designs, Weiterentwicklung
  • Ergebnis: Prototyp

Monat 5:

  • Aufgaben: Ende der Entwicklung, Nutzerstudie
  • Ergebnisse: Fertiger Prototyp

Monat 6:

  • Aufgaben: Ausarbeitung, Oberseminarvortrag
  • Ergebnis: Fertige Ausarbeitung


Erster Entwurf für einen Überblick über die Problemstellung und vorgeschlagene Lösung von SongWords.

SongWords - Music exploration and discovery using lyrics-based visualization

Recently broadband internet connections and mass-storage devices of sizes unthinkable just years ago have become commonplace. This has given a very broad audience the possibility to amass huge collections of digital music. Furthermore, the ubiquity of the web has given many amateur and semi-professional artist the chance to release their works to a greater public. These changes pose two distinct challenges for users who want to actively choose music to listen to. Firstly, there is a need to navigate one's own collection (containing potentially ten-thousands of songs), finding music suiting the current mood or creating playlists for specific occasions. Secondly, discovery of new music from the vast offerings on the web can potentially mean listening to hundreds of songs on various platforms before finding a track to one's taste.

Several attempts have been made to solve the first problem. However, most of these focus on the musical content. Extracting this information is very difficult and potentially inaccurate. In contrast, lyrics are commonly available in a checked form and consist of "plain" text which makes it possible to process them using of proven text clustering algorithms. To the knowledge of the author, no attempt has been made to solve the second problem in connection with the first one.

SongWords strives to simplify these tasks and make the associated process a pleasant one. To reach this goal, it builds a database of locally available music and downloads lyrics for the tracks from the web. These lyrics are then used to cluster the songs and visualize them in a way that related songs are nearby. Furthermore, publicly available songs from platforms like, Magnatune or Jamendo are overlaid in the vicinity of similar songs from the local database, allowing the user to listen to a streamed preview and add the songs to a wishlist which can later be purchased online.

Ursprüngliche Fragestellung wurde um die Discovery-Funktionalität erweitert. Ist sie damit zu weit? Insgesamt zu blumig?

-- BartholomaeusSteinmayr - 10 Aug 2009

SongWords - Synchronized alternative views for music queries

While SongWords is primarily a tool for playful exploration of music collections and discovery of unknown tracks, songs the user knows well are a good starting point for said exploration. Therefore SongsWords should support a mechanism for quickly finding a specific song. According to Bainbridge et al the vast majority of queries to a music information retrieval systems include metadata. Users should therefore be given the opportunity to apply these well known mechanisms. Direct database queries using metadata or lyric fragments would work well on desktop computers, but are much less suitable for tabletop interfaces lacking a physical keyboard. It therefore seems desirable to implement a hierarchically and/or alphabetically ordered visualization in parallel to the main visualization using lyrical similarity.

One such option would be to alphabetically arrange the albums of a collection and then display the tracks in the proper order on higher zoom levels. The remaining question is how to integrate this view with the primary view without disorienting the user. One option would be the use of Aligned Self-Organizing Maps, which could smoothly interpolate between the two views. However, it seems likely that alphabetical ordering only works efficiently if the ordering kept precisely (see user study). Therefore, interpolation between two visualizations does not seem promising.

A binary approach appears to be more feasible: In both visualization modes the user can manually focus a track or the program does so automatically for the track closest to the center of the screen. With the press of an onscreen button, the user can then switch between the two different modes. In this shift, the position of the currently focused item on the screen and the zoom level will remain constant and only the items around this pivot element will be changed. This works with both desktop and tabletop interfaces and makes it possible to quickly navigate collections without losing orientation.

-- BartholomaeusSteinmayr - 20 Aug 2009

SongWords: Web-based semantic browsing for lyrics

SongWords: A web-based semantic text-browser for lyrics

Many people like to read the lyrics of a song while listening to it, for example to better understand the content in a foreign language or to be able to sing along. This is underlined by the plethora of websites offering lyrics and the fact the "lyrics" is searched for more often than "sex" in google [].

The results of these queries are usually viewed either in a webbrowser or in specialized plugins for audio players. Neither of these interfaces allow the user to further explore similar songs based on lyrical content. This is where SongWords comes into play: SongWords runs in a webbrowser, parses the user's music collection (either from local files or websites like and offers an interface similar to online lyric databases in the familiar environment of a webbrowser. In this environment, proven and commonly-known tools for working with bodies of text exist. In contrast to common online databases, SongWords shows similar (in terms of lyrics) songs from the user's collection and new ones from the web to create a personalized music exploration experience. Furthermore, single words or larger passages from the text can be selected and clicked and will take the user directly to song most similar to the current one that contains the word (keyword hyperlinking). Alternatively, all songs containing the keyword or phrase can be presented sorted in order of overall similarity to the current song.

Using common web 2.0 technologies SongWords can stream audio from a variety of sites like or Jamendo and allow the user to directly buy songs they like, allowing them to enlarge their music collection in a casual way from a homogenous environment.

-- BartholomaeusSteinmayr - 28 Aug 2009


Zur Überprüfung der Theorie, dass alphabetische Sortierung nur dann gut funktioniert, wenn sie exakt eingehalten wird, habe ich ein kleines Skript geschrieben. Dieses zeigt eine Liste von Wörtern an, die mehr oder weniger stark sortiert ist (wobei linear zwischen alphabetischer Sortierung und zufälliger Reihenfolge interpoliert wird) und misst die Zeit die der Nutzer benötigt um ein bestimmten Wort in der Liste zu finden. Unter kann (einmalig) ein Durchlauf getestet werden. Es verbleibt, Teilnehmer für die Studie zu rekrutieren und die Ergebnisse zu interpretieren. Dies plane ich in den nächsten Tagen zu tun.

-- BartholomaeusSteinmayr - 22 Aug 2009


SongWords is planned as a hybrid application, running on both traditional desktop PCs as well as multi-touch tabletops. These platforms offer very different human-computer interfaces which makes it necessary to develop a concept on how to harness the specific capabilities without building two different user interfaces.

The following list compares the capabilities of the two platforms. A positive entry alway corresponds to an implicit negative for the other platform. Features available on both platforms are not listed.

Advantages Desktop:
  • High resolution display
  • Proven text entry device (keyboard)
  • Pointing devices can differentiate between hovering and clicking

Advantages Tabletop:
  • Large display area
  • (Multi)-Touch interface
  • Allows potentially large number of simultaneous users

These features make the two platforms viable for different use cases.

On a desktop, detailed text analysis and queries using extensive metadata or fulltext searches can be easily accomplished.

On a tabletop, novice users can use intuitive gestures to navigate collections and multiple users can cooperate to swap songs or build playlists (jukebox mode).

The user-interface problems that need to be solved are therefore:

On both platforms:
  • For gestures that require more than one contact point (not possible on desktop) or hovering (not possible on most tabletops), find and implement a preferably natural alternative for the other input device
On the tabletop:
  • Estimate the number of users and detect their position to orient the content towards them (indispensable for lyrics display). If reliable estimation is not possible, create an explicit "start participation" functionality to allow users to join a session.
  • Allow users to share the workspace without interfering with each other. This is especially difficult since input operations cannot be attributed to a specific user and can most likely not be solved perfectly.
  • Display text in a readable fashion (i.e. with sufficient resolution) without consuming too much screen space (which is difficult on a low resolution device such as most tabletops)

On the desktop:
  • Allow the interaction of multiple users. This seems next to impossible to do properly, at least without the use of multiple sets of input devices, available at very few machines.

The greater length of the list for tabletop interfaces can be attributed to the fact that desktop interfaces have a decade long history of application and development, whereas tabletops have yet to conquer the mass market. Furthermore, most of the problems appear in the context of multi-user interaction, which most likely will not be implemented at all on desktop PCs.


User context and motivation

To define the scope of the proposed implementation, assumptions on the requirements and motivation of the target audience and the available technology have to be made.

Technical scope:
  • The size of the music collection is some 10000 songs
  • The target machine is connected to the internet with a bandwidth of at least 1 mbit/s
  • The target machine has a recent cpu and graphics adapter
  • Desktop PCs have a mouse and a keyboard and a resolution of at least 70 dpi
  • Tabletop PCs are multi-touch enabled and have a resolution of at least 20 dpi

Target audience:
  • As a starting point for the exploration the user should posses a music collection of several hundred songs (which should not be instrumentals), although SongWords will work without such a collection by loading random songs from the web
  • As the owner of such a collection, the user regularily listens to music on their computer and thus has a basic understanding of common music player interfaces on desktop PCs
  • Furthermore the user can (at least partially) understand the lyrics of the songs and is interested in reading along while a song is playing
  • There are several intended usage scenarios:
    • The user has a collection with significantly more than 1000 songs and does not know all songs in their collection. Instead of just playing random songs, the user can apply SongWords' similarity clustering and keyword hyperlinking to navigate their collection in a playful, yet directed way
    • The user has a collection of arbitrary size and and wants to discover new music from the vast collection of songs availlable on the web. By downloading the lyrics of random songs, SongWords can embed these songs into the visual canvas of the user's collection and stream them on demand
    • The user wants to create a playlist of songs from their collection, which either consists of single topic or smoothly varies the topic over the course of several songs
    • The user enjoys listening to songs and reading lyrics in a visually appealing environment
    • Several users want to collaborate to select what songs to play next (jukebox mode)
    • Several users want to collaboratively compare their respective music collections

Design decisions

In this section several basic design decisions for SongWords will be listed in a loose fashion, before being added to the design document. For every desicion the reasoning that lead to it is elaborated. In some cases simply prototype implementations will be given to try out the result.

Aligned Worlds: SongWords displays music tracks as icons (depending on the semantic zoom level) layed out in a two-dimensional plane. The primary function for this layout is a self-organizing map based on lyrical similarity. However, this is not suited for finding specific tracks with known meta information. For this task, a alphabetical ordering is preferable (see user study). Furthermore, SongWords will allow the user to search for specific parts in the lyrics of the songs in the database and show the results around the track from which the search originated.

All these views use the same metaphor of icons for the songs, but different strategies for arranging them. Performing a hard switch between these layouts make it likely that the users lose their current "focus" track, which might even leave the currently displayed area. The same effect is likely to happen if the views are interpolated, which furthermore makes the handling a lot less responsive, if done slow enough to be effective.

Therefore the strategy of Aligned Worlds is proposed. "Worlds" denominates the different arrangements of the same set of icons that constitutes a music collection. When the user initiates a switch between the different worlds, it is carried out immediately. However, the viewport into the display space is positioned so that the current focus item of the user stays at the same position. Thus the user can navigate the collection quickly without ever losing a reference point for their current position.


Single user operation: Large multi-touch interfaces such as tabletops lend themselves well to multi-user applications. In SongWords several users could for example explore one or more music collections collaboratively. While most tabletops easily provide sufficient screen space and input capabilities to support at least two simulatious users, the actual user interface proposed to SongWords makes implementing real concurrent interaction difficult:

To visualize large music collections SongWords uses a two-dimensional plane, which can be panned, zoomed and rotated arbitrarily. Since these properties of the viewport span the entire table surface, multiple users can not effectively work with different parts of the collection. One solution for this problem is proposed by M. Schicker: Virtual lenses can be deployed anywhere on the table and visually and semantically enlarge the icons of the songs beneath them.

This is not adapted for SongWords for two reasons: For one, users still need physical access to the screen area they are interested in, which can be problematic on large displays or if two users want to work with the same part of a music collection. Another reason is the size of music collections: For SongWords it is assumed that a single collection can contain at least 10000 songs. On a tabletop with a size of one m^2 there would be one song every cm^2 . This number can locally be far higher though, since the tracks are most likely not evenly distributed and several users could browser several collections. It would therefore be necessary to position the lenses with a precision of few milimeters, which most tabletops do not provide. Furthermore, lenses do not solve the problem of how to allow several users to browse different worlds (see above) at the same time.

Another possibility would be to split the display are into several independent viewports. This would however eliminate the characteristic advantage of the large screenspace of the tabletop as well as the feeling of working collaboratively.

It is therefore decided that SongWords is designed primarily a single-user application. While several users can work with a tabletop to view lyrics and play songs, all users can pan and zoom and have to come up with their own way of not interfering with one another.

Jukebox mode: As defined above, SongWords' user interface is designed for only one simultaneous user. This however does not prohibit multiple sequential users. This will be implemented in SongWords in the "jukebox mode".

This mode is designed with social situations in mind, where multiple people can use a tabletop (or less desirable, a desktop) computer to collaboratively select songs they want to listen to in a casual fashion fitting these occasions. In this mode, users cannot interrupt the currently playing song, to guarantee a homogeneous listening experience. Instead, users apply the same mechanics normally used for playing to vote on what song to play next. Once the current song is finished, the highest rated song is played. Optionally, the implicit playlist defined through the scores of the songs can be displayed, giving users a feedback on their vote.

The score of the previous song is then reset to zero or a negative number. By changing this number, the host of the event can penalize the repetition of songs. This decision can still be overridden though, if enough users want to hear a given song again. Furthermore, negative scores can be increased over time with an adjustable rate, allowing the host to penalize repetition of songs in high frequency.

There is no mechanism to hinder users from voting multiple times, but a real jukebox only provides a financial limit as well. We therefore believe that SongWords can be a viable tool in social situations without a designated DJ.

Dynamic song database: A user's database of tracks in SongWords is most likely not a static entity: New songs will be added either automatically (through the discovery feature) or manually (through the user extending their collection). It is computationally unfeasible to recalculate the self-organizing map used for the visualization on every change of the database. Furthermore, this might lead to sudden changes in the display layout, which can easily disorient the user.

One advantage of self-organizing maps is that new data can be added at any time and will be assigned a valid position in the visualization space. There is however, one problem to be solved before this can be properly used: Document feature vectors are used as input to the SOM. These vectors contain the counts of the most frequent (in all documents) words in a given document. This characterization of texts can be improved by dividing term frequencies by number of documents that contain a given term (term frequency - inverse document frequency).

The problem is that when a new document is added to the database, a) the overall frequencies change and thus potentially the terms that constitute the feature vectors change and b) the document frequency changes. This means that potentially the feature vectors for all tracks in the database change, which means the positions of all tracks have to be recalculated. However, this situation becomes increasingly unlikely with growing database size. Therefore, the first attempt will be to recalculate the positions as tracks are added. Since downloading the lyrics for a given track will take at least tenths of a second, the time for the calculation should not be a severely limiting factor.

Should this be computationally impossible to do in real-time or lead to sudden changes in the positions of the tracks, an alternative approach will be chosen:
  • On startup, SongWords will query all new files on a user's local computer and recalculate the feature vectors and positions. Only then will the tracks be shown on screen.
  • Tracks that are downloaded or streamed at runtime will have a feature vector created based on the current most frequent terms and have their position assigned based on the current SOM, with the other tracks remaining at their current position

Design Document

Required backend functionality

  • Query list of local directories for mp3 (and possibly other format such as ogg vorbis) files
  • Extract meta-information from files and store them in a database
  • Play locally available files
  • Find random songs not in the local collection on the web
  • Find and stream music for these songs from the web
  • Load lyrics and album art for all songs from the web
  • Calculate feature vectors for all songs in the database, based on the lyrics
  • Calculate positions on a self organizing map for all songs in the database

Required frontend functionality

  • Visualisation
    • Provide a virtual 2d canvas on which to display the track
    • This canvas can manipulated:
      • Panned
      • Zoomed
      • On tabletops : Rotated
    • On desktops this manipulation is achieved using the mouse:
      • Dragging the mouse with the left button held (outside of a data item, see next section) pans the canvas, so that the mouse cursors appears to be grabbing it
      • Turning the mouse wheel or dragging the mouse with the right button held zooms the canvas, with an upward motion of wheel or mouse increasing the magnification
    • On tabletops the canvas is manipulated using the (potentially multiple) contact points of the users hand:
      • When only one contact point exists the canvas is panned in the same way as on a desktop, i.e. without changing zoom or orientation
      • When more than one contact point exists the contact points are treated as if they were fixed onto the canvas and the canvas is moved, rotated and/or stretched to match the movement of the contact points as best as possible

    • Some user controls, such as buttons, will be necessary aside from the icons representing the songs
      • On the desktop, these controls will be rendered at fixed positions, over (in terms rendering order) the icons of the songs
      • On tabletops, these controls will also be rendered over the icons, but will respond to rotations of the canvas
      • This allows the user to work from any side of a tabletop interface, by simply rotating the entire interface in their direction

    • On this canvas the tracks in the database are rendered as icons
    • The most important attributes for this visualization are the position of all the icons and the appearance of a single icon
    • The appearance of the single icons depends on the local information density and the current zoom level:
      • Each track is visualized as a circle with a given diameter
      • The goal is to achieve a constant information density over the entire screen space. To achieve this, the diameter can be enlarged in sparse areas and diminished in cluttered areas
      • The content that is rendered inside the circles then depends on the size of the circle. With increasing diameter, the following content will be displayed:
        • First, the circle will be filled with a solid color, either based on the automatic classification of the song (see SOM) or a color manually assigned by the user (see brushing)
        • The cover of the album the track appeared on is displayed
        • The name of the song and the artist is overlaid, the album art is rendered with decreased opacity
        • An excerpt of the lyrics is shown
        • Finally, the album art is progressively removed entirely and more of the lyrics is displayed until they are on screen entirely
        • In this view, the lyrics can be scrolled by dragging the mouse or a finger inside the icon
        • Furthermore, clicking or tapping a word in the text performs a full-text search for this word in all songs
        • The switch between two stages is realized as a smooth fade-over
        • Other stages (for example for song structure) may be added in an experimental fashion

        • Songs that are not in a users local collection will be displayed in the same fashion, but with reduced overall opacity

        • If a song is double clicked or tapped is played (see below) and the icon is centered on the screen, enlarged by 50% (relative to the other icons) and magnification is increased until the lyrics of the song are visible

    • The position of the icons on the canvas depends on the mode selected by the user
        • The icons can be sorted alphabetically, based on track name, artist or album
        • When sorted alphabetically, the icons will be arranged in rows, with each row corresponding to one initial letter
        • The most important ordering of the icons is by lyrical content. A self-organizing map (SOM) is used to assign positions to the tracks, so that songs with similar content are near each other
        • In this mode, overlapping of the icons is likely. Therefore, the icons "repel" each other with basic simulated physics to achieve an even distribution
        • To enable a visualization based on genre definitions a second SOM is trained using the tags assigned to a given track on sites like as feature vectors
        • The results of a full-text search (see above) are displayed in a spiral around the track the search originated from, with the more similar songs closer to the centre

        • In every arrangement there is one track that is focussed. By default, this is the track currently closest to the center of the screen, but can be changed by the user by clicking or tapping any song
        • This focus-track is used when the arrangement is changed: The position of this icon remains unchanged, while the other icons move to their new positions in an animated fashion (or fade if they are no longer visible in the new mode)
        • To analyse the behaviour of larger groups of songs (for example when switching from a view sorted by genre to lyrics) the user can mark sets of tracks. After pressing a on-screen button for enabling this feature, the user can draw a polygon of arbitrary shape and all icons within this shape will be marked graphically from then on until a new set is selected

    • Two different modes for playback of songs exist:
      • In normal mode, both locally available songs and songs on the internet can be played by double clicking or double tapping the icon of the song
        • Furthermore, a playlist of songs can be created. Songs can be added to this playlist in two ways:
          • By clicking a single song with the right mouse-button or tapping it with two fingers
          • By clicking or tapping the corresponding button in the user interface and then drawing one or more polylines onto the canvas, using either a single finger or the right mouse button. The songs whose icons these lines cross are added to the playlist
          • This playlist can then be played back in SongWords or exported for use with arbitrary third-party software
      • In jukebox mode, playing of songs is automated:
        • Each song is assigned a score, which is initially zero
        • When a user double clicks or double taps a song, this score is increased by one, with a visual indication of this process
        • The scores of the songs are visualized by changing the color of the icons
        • Optionally, the current playlist of songs as defined by the scores can be overlaid
        • After the currently playing song is over, its score is reduced to zero (or below, depending on whether the administrator wants to allow multiple plays)
        • Then, the song with the highest score is selected and played
        • If multiple songs share the same score, one track is randomly selected from this set
        • If this set is very large (for example when there are no votes), a limit for the distance of the next song to the current song can be imposed, resulting a random, but smooth (in terms of content) playlist
        • If the score of a song is negative, it is slowly increased over time until it is zero again


The current binary version of the implementation can be downloaded from the svn: The documentation for the source code can be viewed on:

The first step in the implementation of SongWords was the creation of a command-line version. This version includes the full song database functionality available in the gui version. Furthermore, downloading of song lyrics and album artworks (one of larger challenges in the implementation) was coded and tested.

The next step was the creation of a graphical user interface. This was begun with implementation of rendering algorithms for the individual items and a pan- and zoom-able user interface. Real-time physical interaction between the individual items was made possible through implementation of a quad-tree structure.

Next was the implementation of feature-vector calculation and rendering of a self-organizing map (for which two libraries were evaluated, the latter of which was used). Because of the time consuming nature of loading of songs and calculating of SOMs, a multi-threaded architecture was used for the program. This lead to a number of synchronization issues, some of which had to be solved using custom synchronization methods. In retrospective, using asynchronous calls might have been a better approach.

Then, multi-touch interaction was implemented, using a previously written server software, processing events created by the touchLib library.



Fujihara, Hiromasa ; Goto, Masataka ; Ogata, Jun: Hyperlinking Lyrics: A Method for Creating Hyperlinks Between Phrases in Song Lyrics (2008), PAP, 281-286, ISMIR 2008 Session 2d Social and Music Networks

Identitee - Visualizer

-- DominikusBaur - 03 Aug 2009

Davidson, Han: Synthesis and control on large scale multi-touch sensing displays, Proc. of New Interfaces for Musical Expression (2006)

-- DominikusBaur - 21 Aug 2009

Paley: Textarc

-- DominikusBaur - 24 Aug 2009 - Lists

-- DominikusBaur - 26 Aug 2009

G Robertson et al: Animated Visualization of Multiple Intersecting Hierarchies

Verwandte Arbeiten:

Topic revision: r28 - 03 Feb 2010, BartholomaeusSteinmayr
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Medieninformatik-Wiki? Send feedback