Visualisation of key findings from N-Screen user testing

You can read a more detailed description of the results in a previous blog post.

Posted in Evaluations, Recommendations, Second screens, Social TV, User Experience | Leave a comment

Two days left to participate in NoTube’s Loudness Web Evaluation!

In NoTube we are investigating loudness normalisation in multimedia environments. To investigate different listening situations and user preferences, we launched a web evaluation in October 2011 and already published some preliminary results.

To gather more test data in order to improve the results we again invited everyone to participate in the web evaluation which can be carried out online until Friday, December 16th 2011. So, now, there are only two days left! Simply go to http://survey.irt.de/notube to take part and to learn more about Loudness Normalisation on the Web!

Posted in Audio, Evaluations, Loudness | 2 Comments

Preliminary findings of N-Screen user testing

Libby and I recently spent two days testing NoTube’s ‘N-Screen’ prototype with members of the public at the BBC R&D user testing lab in London.

As Libby has described previously on this blog, N-Screen is a second screen prototype application designed to help a small group of people explore a collection of on-demand programmes and choose one to watch together in real-time, with participants either being in the same room together or in separate locations. The scenario imagines a future world in which most people will have their own personalised connected device such as a tablet or smartphone.

We recruited ten participants to test the app: five men and five women across a spread of ages between 20 and 64. All participants described themselves as TV enthusiasts, regularly watching at least 2 hours of TV a day.

Following some introductory questions about watching TV in general, during each session we showed the participant a version of N-Screen containing BBC iPlayer catch-up programmes (about 1,000 programmes) and walked them through a group-watching scenario - with Libby and I taking the role of the participant’s N-Screen ‘friends’!

Programme suggestions and explanations

N-Screen supports different TV recommendation and browsing strategies across the spectrum from cold-start to fully personalised, combined into a single user interface. This provides multiple ways of helping people to find something interesting to watch from a large collection of video content.

Suggestions for you

Each participant in an N-Screen group starts with a different set of personalised programme recommendations based on NoTube’s Beancounter user profiling service. For testing we had to show mock-up examples of these types of suggestions, and we had to ask our participants to imagine these were based on their user profile. Despite this, all the participants liked the concept of seeing programme suggestions based on things they’d done in the past.

Quite a good idea…might bring up programmes that haven’t come to your attention.

Really good – tries to tailor it me.

"Suggestions for you" in N-Screen

However, they weren’t so keen on the idea of getting recommendations based solely on age and/or gender. Several participants thought that this just wouldn’t work for them because they thought that their TV tastes didn’t match their age or gender profile; others thought that people didn’t like to be “pigeon-holed” in this way.

Tapping on a programme suggestion in N-Screen displays an overlay with a brief programme synopsis, and an explanation as to why it has been recommended – for example: “Recommended because you watched That’s Britain which also has Nick Knowles in it”. The idea here is to present the pathways through the Linked Data graph showing the connection that led to a recommendation being made. Several participants were particularly keen on the idea of being suggested programmes based on links between the people in them, such as actors or TV personalities, as long as those people are considered significant and interesting.

I like the idea of plotting actors through their career… I like it that you’ve gone for the actor – I want to see more of specific actors I like.

That’s a good idea if it’s a particular actor you follow…but if it was an actor in Eastenders I don’t know if that would really appeal to you because you’re watching Eastenders for Eastenders, and not necessarily the actor. But if it was Kelly Holmes and I’d like her as a character and I saw she was in Bargain Hunt, then I’d think let’s watch that.

I like Stephen Fry and I would be interested to see what he’s doing.

I like the idea that it’s also got Ian Hislop in it.

Someone’s in it who you like… A way of trying out something new that you might not know about – I like that.

However, in general, people didn’t seem to care as much about the explanation for a recommendation as we’d expected, based on research we’d read about the value of explanations for enhancing users’ trust in recommendation systems. It’s possible though that the explanations may have had more resonance with our participants if they’d be based on the individual’s real activities and data.

More like this

Beneath the main programme information, the overlay screen also shows a list of programmes related to the selected programme based on collaborative filtering techniques. The idea here is to expand out the selection of potentially interesting programmes for the user because the list of personalised programmes could be quite small. Again, all our participants thought that these types of suggestions could be useful as another means of bubbling up content that may be of interest, but they didn’t find the associated Amazon-style explanations (“Recommended because people who watched DIY SOS also watched this”) particularly useful.

N-Screen recommendations and explanations

Random selection

N-Screen also offers a “random selection” option as an alternative means of surfacing content buried in the video collection, or for times when a user might reach a dead-end with the recommendations approach. The idea is to add an extra element of serendipity to the experience. Our user trial of the NoTube Archive Browser prototype, conducted earlier this year, suggested that people found interesting new programmes in a BBC archive collection regardless of whether they saw similar or random programmes.

Most participants said they thought they would find this feature useful as another way of finding new programmes, and a couple of them said it was one the things they liked best about N-Screen - although we did discover a few usability issues with the user interface.

Sometimes you get stuck. It’s like shuffle on iPod - definitely a good idea.

If you’re not sure what you want or what you’re in the mood for, if you didn’t want to watch the usual…yes I’d give it a try.

"Random selection" in N-Screen

Sharing and receiving suggestions from friends

Finding interesting niche video content and using drag and drop to share these ‘hidden gems’ with friends is key to the N-Screen design. Since the earliest iterations of N-Screen, this aspect of the user experience has always appealed to people - together with the accompanying whooshing noise which provides an audio cue that you’ve received a suggestion from a friend. Similarly, once they’d got the hang of it, the majority of our participants also enjoyed dragging and dropping to swap programme suggestions, because they found it “simple”, “fun”, and “instant”.

We discovered a few initial usability issues around grabbing items to drag, and dropping them in the right place, and most people didn’t realise at first that programme items could be dragged. However, it’s possible that this was because 9 out of 10 of them were not tablet owners, and were therefore not familiar with using drag-and-drop, an interaction style that is becoming increasingly common with the rise in numbers of touchscreen devices.

If all tablets are as easy to use as this then l’d be happy to drag and drop things – it’s so simple, really easy.

Easy to use, even if you hadn’t been here I would have figured it out…it’s easy to drag and drop.

Using drag and drop to share programme suggestions with friends

The idea of sharing and receiving suggestions for things to watch with friends in this way was a highlight of the app for many. However, not all participants imagined that that this would necessarily be done in real-time; several of them talked about swapping programme ideas to watch later, in the same way that they might currently use email or texts to send links to interesting programmes.

I don’t think it’s on to recommend things for others to watch instantly.

Neither could the majority of the participants see themselves using N-Screen on multiple devices in the same room as other people; they couldn’t see the point.

Sharing…I think it works more if we’re in a different location. I couldn’t imagine using it in the same room. What would be the point of that?

It takes away the point of chatting.

Obviously that would be us being lazy and not wanting to talk to each other.

It’s considered impolite to get your iPad out when you’re in a social group.

Several participants said they could imagine scenarios for using N-Screen with friends or family located remotely, but these tended to be associated with one specific individual rather than a group: with “my mum”, “my friend back home”, or “my best friend”. Again, several participants talked about sharing items that could be watched later, rather than immediately.

Not for watching something instantly, only for making suggestions for things you could choose to watch later if you wanted to.

My mother keeps saying ‘you should watch this’ - and I’m not always able to at the time she suggests…If my mother sends me a text about a show, I might not be looking at the phone, so it would be good if she could use this and we could watch apart or together, and we could watch it now or later.

I’d liken it to a reading group – I wouldn’t say ‘let’s all watch Eastenders now all together’ but I’d imagine creating a list that you watch on your own later.

Sharing with the group

When we asked participants if they would share different programmes with the whole group in N-Screen, rather than with specific individuals in it, their responses suggest that empathy in considering other people’s preferences is a strong influence over deciding which programmes to share with whom.

Yes, it would depend on the friends and their tastes. For some programmes, like Strictly [Come Dancing], that everyone likes this would be great. I’m into scifi but not all my friends are.

There’s only certain people you can recommend things to on this scale. So it would be limited to people I thought who would be interested. Some friends I have nothing in common with taste-wise when it comes to entertainment.

Yes, I know different friends like different things.

Changing the TV display

Once the group has decided what to watch, the idea is that one of the N-Screen participants drags the programme to the TV icon in the top-right to start playing the programme on the TV screen. All the participants liked this feature.

Ah, this is too much, it’s awesome. I love this.

Fantastic, you’re often watching something on the iPlayer and you really want it on the TV, so this really speeds things up. It’s amazingly quick.

That’s magic. I like that.

Changing the TV

For the scenario in which N-Screen friends are remote, our initial idea was that they could watch something ‘together apart’, with their TVs being sychronised – so that dragging a programme to the ‘shared TV’ icon would automatically start playing the programme on everyone’s TV.

Some participants caught on to the idea of watching ‘together apart’ in real-time and thought it could work well.

I like the idea of me being in my house and a friend being in their house and watching something at the same time, but not in the same place.

That we’d all watch together at the same time, simultaneously in real-time is good.

If they had the same set-up, I’d expect them to watch same thing at the same time.

However, nearly all participants were against the idea of programmes on someone else’s TV being changed remotely and the majority felt that each individual should be in control of their own TV, unless explicit permission had been given.

I would hope they’d be in control of what they’re watching. I can’t change their TV can I? I’d like to warn them I’m about to change their TV…

It shouldn’t change the other person’s TV. It would be like a ghost…

I’d like to drag and drop for my own TV but I’d be annoyed if someone else changed my TV – we’d end up having wars! It takes control away.

A couple of participants mentioned that a small alert in the corner of the other person’s TV screen might be a useful compromise.

Some initial conclusions

  • Overall, participants were complimentary about trying out N-Screen; they mostly liked it and found it fun and easy to use, but not necessarily for collaborative browsing and watching TV in real-time with others.
  • They were positive about the different types of programme suggestions and the concept of sharing and receiving suggestions with friends.
  • However, several of the older participants couldn’t see it replacing other ways of sharing TV recommendations such as texting or emailing. The latter were also sceptical about the concept of getting together with friends to decide what to watch on TV without having pre-planned it.
  • The idea of dragging a programme to the TV icon as a way of controlling what’s playing on the TV was universally liked, so long as it didn’t also change their friends’ TVs.

Next we’ll be taking a closer look at the implications of these findings.


Posted in Recommendations, Second screens, Showcase, Social TV, User Experience | 1 Comment

First Results of the User Evaluation on Loudness Harmonisation on the Web

The evaluation of different loudness harmonisation methods which was previously carried out in NoTube as part of our work in the context of loudness normalisation clearly showed the excellent performance of EBU-R 128 (see also our previous blog post). However, we have to keep in mind that this test was carried out under quasi studio listening conditions. Concerning listening set-ups in home environments there is a wide range of different listening conditions which we are currently investigating in a dedicated Loudness Web Evaluation that was launched in October 2011.

In this evaluation we consider the actual listening conditions of listeners. Based upon the tools applied by EBU-R 128 and corresponding technical documents we want to investigate the interdependence between loudness harmonisation, loudness range characteristics and listening conditions. In the context of NoTube, the user’s listening environment can typically be characterised as a computer based listening environment with, among others, built-in laptop speakers or small external PC speakers. But taking into account the proceeding fusion of internet and TV (e.g. HbbTV etc.) we consider also studio-quality stereo systems and home cinema speaker systems.

The evaluation is based upon a sufficient number of short video clips covering different genres like “Movie”, “Commentary”, “Concert”, “Sport”, “Show”, “Commercial” and “News”. The audio part of the clips selected for evaluation was varied with respect to Programme Loudness and Loudness Range. In each case three variants are presented to the user. The task of the participants is simply to indicate the preferred variant.

Although EBU-R 128 defines Target Programme Loudness it does not enclose any relation to the reproduced sound pressure level. This relation depends on individual taste, personal preferences, user habits and the listening environment. This fact implies that such loudness evaluations are strongly influenced by the “individual volume” which is set by the listener. In order to include this important individual parameter a corresponding procedure was prepared to adjust and register the “individual volume” (see below). It defines the individual reference volume and shall be constant and not changed during the complete evaluation procedure, so the test persons are asked to not change the volume level finally adjusted in this procedure. In contrast to other audio evaluations where a constant listening level is given each participant is able to choose his preferred listening level individually.

Loudness Adaptation

For the evaluation of the loudness adaptation short extracts were selected from ten video clips representing different genres. The compilation of three variants of audio adaptation under test was carried out by measurement and adaptation of the Programme Loudness using tools referring to EBU-R 128.

Audio adaptation types

Table 1: Audio Adaptation Types

Loudness Range Adaptation

For the evaluation of the Loudness Range adaptation five extracts from video clips were selected from different genres. The Loudness Range adaptation used for the test is strictly based on the measurement of the loudness descriptor “Loudness Range (LRA)” as specified in EBU-TECH 3342. The following figure shows the resulting LRA values in LU (Loudness Units as defined in EBU-R 128) under test of the five loudness range items uncompressed and after LRA compression.

Loudness Range of items under test (each uncompressed and adjusted compression c1_15 dB/c1_25 dB)

Figure 1: Loudness Range of items under test (each uncompressed and adjusted compression c1_15 dB/c1_25 dB)

Listening level

In order to enable the test person to adjust and to indicate his individual listening level (sound pressure) a short extract from a news broadcast was selected as test item. Our ears are especially familiar with the sound of human speech, thus, news anchors are ideally qualified to help adjusting to a convenient (or even the correct respectively original) listening level. The individual listening level that is finally adjusted by the user is considered to be the reference volume and should not be changed throughout the complete test procedure. The individually adjusted listening levels are obtained using a special test signal with announcements in different (loudness) listening levels where the announced listening level meets the corresponding loudness level.

These level announcements are presented after adjustment of the individual reference listening level. The participants are asked to indicate the first level announcement which they clearly can understand by clicking the corresponding button. This method is well-known as “Hearing Threshold Method” and presumed to be notably precise. The differences between the announced levels are 7 LU. From the psychoacoustic point of view this difference can be indicated as “clearly distinguishable“. An estimation of the relationship to the corresponding sound pressure level can be achieved by measuring the resulting reproduced average sound pressure level of the anchorman in an individual listening condition. The relationship between announcement and loudness level is presented in the following table.

Listening level announcements – Relationship between reference loudness and sound pressure listening level

Table 2: Listening level announcements – Relationship between reference loudness and sound pressure listening level

Introduction and evaluation

The adjustment of the listening level is part of the introduction to the evaluation. The introduction contains an additional video sequence which is composed of nine short clips with different loudness adaptations to make the test person familiar with the loudness adaptation under test. The introduction comprises also a questionnaire to collect the age of the test person and the following characteristics of the individual listening condition:

  • age
  • type of speaker
  • size of speaker
  • distance to speaker
  • background noise

The arrangement of the loudness evaluation is like follows, each clip under test is presented in those 3 variants which have been described above. The participant is asked to listen to each of the 3 variants of the actual clip and thereafter to indicate which loudness/loudness range variant he personally prefers considering his taste/custom respectively his individual listening environment. After indicating the preferred variant by clicking the corresponding button the next clip with its three versions is presented for evaluation. The order of the presentation of clip number respectively variant is done randomly.

Preliminary Results

The preliminary results presented here are based on the evaluation period from September 19th 2011. Besides partners from the NoTube consortium we invited participants from other communities, e.g. the EBU audio expert group “FAR-PLOUD” and the “Surround-sound-Forum” within VDT (Verband Deutscher Tomeister). However, the results presented below are only a preliminary subset based on 48 valid test runs which were collected so far. In general, the presented data at this stage of the analysis has to be considered as purely descriptive. Here, only the the calculated percentage collected for each attribute collected is presented. In this preliminary presentation of the results we want to focus only on the aspects listening level, listening situation, background noise and the main results concerning loudness and loudness range evaluation.

The distribution of the listening level in figure 2 shows that the majority of participants chose a rather high listening level. The preferred level was Level 3 which was selected by almost 40% of the test persons.

Distribution of Listening level

Figure 2: Distribution of Listening level

The distribution of the speaker type in figure 3 shows a nearly equal representation of headphones (both in-ear and on-ear), built-in speakers and external speakers (proportion of subwoofers related to PC and stereo speakers only) with a predominance of headphones.

Distribution of Speaker Type

Figure 3: Distribution of Speaker Type

The distribution of the indicated background noise as presented in figure 4 shows a clear dominance of weak background noises. Only one user declared to have strong background noise.

Distribution of Background Noise

Figure 4: Distribution of Background Noise

The results of the evaluation of both loudness and loudness range adaptation are presented in the following figures.

Preferences of Loudness Adaptation

Figure 5: Preferences of Loudness Adaptation

Preferences of Loudness Range Adaptations

Figure 6: Preferences of Loudness Range Adaptations

Conclusions

First conclusions with respect to the evaluation of loudness and loudness range adaptation which can be drawn from the descriptive data presented above approve the excellent performance of EBU-R 128 with respect to Target Programme Loudness. This answers the open question whether the loudness harmonisation following EBU-R 128 which was clearly preferred in the previous evaluation is depending on individual listening conditions. Considering the listening conditions covered in this loudness web evaluation there is clearly no influence observable.

With respect to the evaluation of the different loudness range adaptations there is a tendency identifiable. The participants seem to prefer rather medium or even strong loudness range compression (compression characteristics c1_15 dB/c1_25 dB in figure 1) than uncompressed audio with high loudness range. A first analysis of the data with respect to correlations between loudness range adaptation aspects and the type of speaker or background noise showed no noticeable interrelation. On the other hand this result could be an indication that loudness ranges of 25 LU and more do not meet home listening environments respectively the expectance of the majority of home listeners. But in order to answer these questions an analysis of a larger dataset is necessary.

Call for participation

We thus invite you to participate in the web evaluation which will be open for another two weeks. The test can be carried out online. Simply go to http://survey.irt.de/notube to take part and to learn more about Loudness Normalisation on the Web! The evaluation will be open until Friday, December 16th 2011.

(This post was contributed by Gerhard Spikofski, IRT)

Posted in Audio, Evaluations, Loudness | 2 Comments

Designing a new user interface for NoTube’s Beancounter

Managing the large volumes of data generated by the Social Web presents many challenges. In considering the user experience of NoTube’s Beancounter we have been thinking about how to present this kind of data to users in meaningful ways, as well as ensuring we implement robust models of sharing, privacy and ownership.

The importance of getting these things right is reflected in the growing interest outside of the NoTube project in data mining of activity and social data for the provision of social recommendations.

To re-cap, NoTube’s Beancounter technology supports the automatic generation of an implicit user interests profile. This is based on re-use of an individual’s activity on social media services, such as the content of their tweets and Facebook ‘likes’, to determine their interests. The idea is to re-use the scattered and disparate activity data and make it useful by combining it, looking for patterns, and using it to suggest things to watch.

Beancounter is currently being re-built in the backend. Alongside this we have been working with visual designers from the design agency Fabrique in Amsterdam to develop a new user interface for the Web front-end. This blog post outlines some of the design challenges we have encountered along the way.

Challenge 1: Displaying an overview of your interests

To create your Beancounter profile you need to link one or more social media accounts (Facebook, Twitter, LastFM etc) to your Beancounter account. Beancounter retrieves activity data from these sources, interprets the information contained in the activities and matches it to concepts from the Linked Open Data cloud (DBpedia concepts) to give you an overview of your interests. These interests include programmes, movies, people, locations and genres. Each interest is assigned a weight in the profile; the more instances of an interest in your activities, the higher the weight of the interest - and the more influence it has on your recommendations. These weightings change over time as the Beancounter continually adjusts to your activities. For example, if you watch or listen to lots of coverage of The Olympic Games but aren’t generally interested in sport the rest of the time, the weightings for sport-related interests will temporarily increase during The Olympics.

This interests profile is useful because it can be automatically used as input for personalised TV recommendations (via NoTube’s N-Screen prototype for example) to help you decide what to watch. And because the profile is portable, it could also be used as input for other applications.

During implementation of the first version of the Beancounter it became clear that there are potentially a very large number of interests to display for any given user, many of which have very low weightings because they have only appeared once or twice in the user’s activities and therefore have very little significance for recommendations.

We wanted a simple way to clearly present the spread of interests, which also immediately conveys the influence (i.e. weight) of each interest for recommendations, and takes account of the fact that the number of interests could quickly become quite large. As a solution, the designers adapted the tag cloud metaphor into a scaleable grid, with variable font and cell sizes to indicate relative weightings and hierarchy, so that the most influential concepts are prioritised at the top.

Design mock-up for displaying Beancounter interests

Draft Beancounter design mock-up showing the activities that contributed to a particular interest in the user's Profile

Challenge 2: Showing how each activity affects your interests

Another of the UI design challenges that emerged from implementation of the first version of the Beancounter is how to show the contribution of an activity to the ‘evolution’ of your interests. For example, as described in a video of an earlier demo, clicking on the activity “You watched Timecop…” in the user’s activity stream opens a pop-up (shown in the screenshot below) using bar charts to show the ‘before’ and ‘after’ status of the user’s interests based on watching the film Timecop. The two existing concepts ‘1990s science fiction films’ and ‘Films shot anamorphically’ now have a greater weighting than they did before. The other concepts (where only the blue bar is displayed) are new interests associated with Timecop, which didn’t already exist in the profile.

Early Beancounter UI showing the effects on a user's interests of watching the film Timecop

We wanted to simplify the presentation of this information, and to make it easier to understand what is happening and why it might be interesting. We’re getting there with the new design, although we’re still thinking about the best way to convey the idea that the length of the bar really represents ‘the influence of this concept on your recommendations’, and of providing a relative scale to measure this level of influence against.

Draft design mock-up for Beancounter showing the effects of an activity on the user's interests

This design also allows the possibility for the functionality to be extended so that you could manually adjust the weighting of a particular concept (for example, by sliding the bar to the left or the right) to give it more or less influence over recommendations.

Challenge 3: Displaying on-the-fly data analytics

In addition to displaying your interests, it has always been our intention to offer some analysis of the data that the Beancounter has collected about you. This is based on the premise that people are usually interested in information about themselves, and the initial inspiration came from the Dopplr annual report and the BBC’s RadioPop prototype for social radio listening.

Beancounter offers the potential for a range of detailed analytics, including what you’ve watched and listened to most often and when, the things you are most interested in now and at previous points in time, people who have similar watching and listening habits, and those who are least similar. For more design inspiration we looked at many examples of beautiful data visualisations. We particularly liked the infographics from Hunch.com and The Feltron Annual Reports. However, many of these were hand-crafted, and our requirement is for attractive design modules that can be adapted for automated on-the-fly presentation.

Draft Beancounter design mock-up displaying an analysis of the user's data

Challenge 4: Interacting with multiple layers of information

The way that data is stored in the new Beancounter allows for any activity (e.g. listening to a Tom Waits track on Last.fm), interest (e.g. Tom Waits), type of interest (e.g. all people) or type of activity (e.g. all the things you’ve listened to) in the UI to be linked to the relevant analytics relating to that object; providing timelines, comparative views, explanations and statistics. Whilst this enables the user to delve deeper and gain extra insight to their data should they wish to, we want to make sure that the UI doesn’t become cluttered and confused with all these additional overlays. We’re therefore working with the visual designers to determine the most elegant model for interacting with these multiple layers of information without being overwhelmed by them.

Next steps

We’re still finalising the design work. Over the next few months we will be integrating these design mock-ups into a new Beancounter UI so that you will be able to try it out, get personalised TV recommendations in NoTube’s N-Screen, and perhaps discover some interesting new things about yourself…

Posted in Beancounter, User Experience | 1 Comment

Algorithms for recommendations in various N-Screen implementations

We currently have three different versions of N-Screen running:

They all have the same basic design with small tweaks for image size. They interoperate - you can drag and drop between them. The main differences lie in the collection of data for the backends and in the calculation of similarity between videos. For similarity calculation we use three different techniques for the three different datasets, as we had three different sets of data available.

The Redux version was our first experiment in this area. BBC Redux is a BBC research video on-demand testbed. We were lucky enough to be able to obtain anonymised watching data for programmes in a five-month subset of the period it covers. So our first experiment, led by Dan Brickley, was to take that watching data - around 1.2 million observations over 12,000 programmes - and use open source tools to generate similarity indexes. We were able to use a standard function in Mahout, Apache’s machine learning and data mining software, to generate similarity indexes using a Tanimoto Coefficient model. This function essentially uses people as links between programmes (“Bob watched both ‘Cash in the Attic’ and ‘Bargain Hunt'”), and sorts programme pairs according to how many people watched them both. With this dataset, this technique produced some nice examples of clearly related clusters (for example what you might call ‘daytime home-related factual’, see picture below).

A cluster of 'daytime home-related factual'

It is quite rare to have access to this kind of data about what people have watched. It’s both valuable and private, and may not be readily available. It may not exist, if no-one has watched anything yet. For the TED dataset we therefore took a different approach. TED talks are a diverse set of talks by people prominent in their field, licensed under the Creative Commons BY-NC-ND license. From our point of view, the advantage of using this dataset was that transcripts were available for all talks. To calculate similarity between the talks for N-screen we were therefore able to use a tf-idf algorithm. This technique treats each programme as a document, and finds the most characteristic words for each document within the total corpus of documents, and can be used to match the documents based on the words selected. We were lucky enough to be able to use some Ruby software open sourced by a colleague at the BBC to do this.

This technique produced clearly similar clusters within the 600 video dataset, for example, in this selection, you can clearly see items relating to women and also to drawing and art:

Our third example is an iPlayer version of N-screen. On any given day, there are about 1000 TV and radio programmes available to UK viewers on iPlayer, the BBC’s on-demand service. This is an interesting dataset for us because of its high-quality metadata, including categories, formats, channel and people featured. We were curious as to whether we could generate interestingly similar programme connections using only metadata. Our first approach was to try a Tanimoto similarity over the structured metadata, but the results were not particularly satisfactory - many programmes had no similar items. We then tried tf-idf over the metadata descriptions. This seemed to pick up characteristics of the text rather than of the programmes (for example repeated quirks in phrasing of the descriptions). The best approach we have tried (evaluated only informally) is tf-idf over a combination of metadata and the results of an entity-recognition technique.

We used the existing metadata from /programmes json format (for example http://www.bbc.co.uk/programmes/b00k7pvx.json or http://www.bbc.co.uk/programmes/b015ms3r.json). As you can see from those examples, some have descriptions of people who are in the programme, with mappings to dbpedia where available. We can get more of these by using a service to extract entities from the description text. We used Lupedia, which was developed in the NoTube project by Ontotext for this. We took this data coupled with the channel and the categories to produce a list of keywords, and then ran tf-idf over the top of that. The result can be variable:

Example of a not very good similarity match

but in many cases, reasonably good:

Example of a good selection of similar material

and occasionally throws up an interesting surprise:

Unexpected link between programmes

The next stage is to evaluate these results formally.

Posted in Recommendations, Second screens, Social TV | 1 Comment

N-Screen backend: XMPP/Jabber and group chats

The idea of N-Screen (demo) is to have real-time small-group non-text communication - so for example, sharing a programme (or perhaps a specific point in a programme) with a person, with a TV, or with a group, using drag and drop.


N-Screen related content screenshot

We had a number of very specific requirements:

  • Real time communication
  • Different types of receivers (people, TV/video players, others)
  • Structured data transfer
  • Anonymous usage

We also needed good, open tools and libraries available because of the limited amount of time we had to implement.

Like several other groups, we’ve been using XMPP (Jabber) for the backend because it works in real time and has plenty of tools and libraries. Others have been using the PubSub framework to broadcast synchronised content to connected devices, but integral to our plan was to enable any people watching to also be able to share. I had a surprising amount of success with using a central negotiator that allowed ad-hoc groups to be formed from anonymous users, populating each user’s roster with other people it knew about. However, a much less error-prone approach has been to use ad-hoc XMPP group chats, and this has enabled us to make a pure HTML/Javascript implementation with no backend dependencies apart from an XMPP server and some simple APIs to the database of content.

I’ll talk a little about the requirements in more detail, mentioning some implementation issues as we go.

Requirements

Real-time communication

This is essential for drag and drop between devices to be ‘realistic’ - i.e. for a good user experience. Network issues can always be tricky here, particularly under demonstration (rather than real-life) conditions.

Different types of receivers

A ‘TV’ listens for ‘play’ and ‘pause’ messages and does something with them. A ‘person’ listens for ‘drop’ messages and displays them appropriately. There might also be other kinds of listeners - loggers perhaps, or bots that enhance or modify content dropped to them. All types need to take account of who is joining the group and the kind of thing that they are so that they can do the right thing and display them appropriately.

Structured data transfer

For user experience reasons a fair bit of data needs to be send on most interactions. A shared item needs to have basic metadata (identifier, URL, title, description, image) and also who shared it. Other kinds of message include announcements about the kind of thing you are. We chose Json as the body of the XML XMPP message, though XML would also have been fine or better. One issue is that ‘IQ’ (hidden data) messages cannot be sent to group chats, so that all group messages are visible in a standard chat room if connected to with, say, PSI.

Anonymous usage

Although there is plenty of potential for connecting N-screen with Twitter and / or Facebook, we didn’t want to require it. In N-Screen you need to give a name so that other people using the application can refer to you, but that’s the limit of the requirement for identification. For scalability and maintainability reasons we didn’t want to create a lot of named users on the XMPP server. Fortunately, XMPP allows you to create group chats with anonymous users, which is perfect for our needs.

The setup

We’ve been through many iterations to get here but I’m now pretty happy with the setup we have.

Ejabberd server with Bosh and group chat enabled

Ejabberd is not particularly simple to set up, but once it is up, seems pretty stable. I’ve put some tips on troubleshooting here (scroll to the end). PSI is a great tool for debugging as you can set it to log the XML messages going past.


PSI view of a groupchat created behind the scness of N-Screen

PSI XML view of a groupchat in N-Screen

One thing to note is that for ejabberd at least the group chat URL is

[room_name]@conference.[server]/[nick]

e.g.

default@conference.localhost/libby

APIs to the content

I used a simple ruby server and mysql backend to generate Json search and random APIs. For content-to-content recommendations for TED we have used TF-IDF analysis of the transcripts using this code by my BBC colleague Chris Lowis.

The workflow is as follows:

  • The user goes to a webpage, and gets an alert requesting their name
  • Based on the window hash (the bit after ‘#’), the Javascript chooses what group chat to join / create, using Strophe over Bosh to make the connection and announces itself to the room using a presence message with the name provided by the user
  • The eJabberd server then automatically tells the user about the other partipants in the room, and the Javascript renders them either as users or as TV
  • The ‘TV’ is also a piece of Javascript / Strophe that additionally announces itself to all joiners of the room as a particular type of thing (a ‘TV’). Multiple TVs are allowed in the room.
  • All user pages keep a list of all TVs, and on dropping to the TV sends a programme onto all of them
  • On leaving the page the user is disconnected from the eJabberd server - this can take a few seconds to percolate to the user interface.

The rest is client-side, which I’ll talk about further in another post. Feel free to try out N-Screen here.

Posted in Uncategorized | Leave a comment

N-Screen: a second screen application for small-group exploration of on-demand content

For our latest social prototype in NoTube we return to the problem of finding interesting things to watch within large video collections, and investigate how working together might help people find something interesting.

As we’ve seen, the problem with on-demand video is too much choice is exhausting and demotivating and leads to satisficing behaviour and sometimes no choice at all, particularly in group-choice situations.

It looks as if watching together apart (watching the same thing in different physical locations) is going to be a big deal in the future. Both google and facebook are putting in place tools that allow people to hang out while watching videos together.

Lets think about a group choosing what to watch from scatch. What sorts of things do they say?

  • who is s/he? (who is that actor / participant?)
  • who directed it? (who made it?)
  • what’s it about?

but also:

  • what do you want to watch?

The first set of questions are the kinds of questions metadata can answer. Who is in it, who created it, what sort of thing you can expect from it, what it is similar to.

The second type of question is much harder. We have preferences about each others’ future mental states, or to put it another way, we would usually like everyone to enjoy the content we will watch together, without fully knowing the other participants’ preferences or state of mind. It’s a hard decision problem, and it’s no wonder people give up quickly.

N-Screen is a second screen HTML / Javascript web application that allows people to express their preferences to each other directly, by dragging and dropping content to each other individually or as a group, directly answering the second kind of question. When someone receives some content like this in N-Screen, they can click on it to see more information about it, answering the first kind of question.

The system is designed to be used in conjunction with an out-of-band communications channel (e.g. face to face chat, Skype, or IRC) for the direct negotiations, as much depends on the subtleties of communication - understanding how people are feeling - and this is best done using some familiar channel. It’s called ‘N-screen’ because it might be the primary screen, or one of a bunch of equals; it could play video locally or remotely (in theory).

It’s primarily for tablets and laptops, but runs on anything with a modern Web browser; from smartphones to touch-tables and desktop PCs. It works very nicely on a desktop PC with a touch screen; whereas it serves only as proof-of-concept on an iphone or android phone right now. Similarly, it can run on a touch table, but doesn’t make the most of its potential.

Once people have found something interesting to watch together one of them can drag and drop it to the TV and it will play.

One of the design aspirations behind this work, was to explore practical ‘hands on’ notions of collective intelligence, particularly from groups (perhaps professionals with a common goal; perhaps school children) who are intensely exploring some collection or topic together.

We have a demonstration which you are welcome to try, that uses some of the wonderful TED Talks videos. It’s a work in progress and we’ll be adding features over the next few weeks (if you want to try playing it, use this URL in another window (uses flash).

Do let us know if you have any comments.

Posted in Uncategorized | 4 Comments

Loudness Web Evaluation now online - be part of it!

End of 2010, we conducted a user evaluation to compare different ways to normalise the loudness of video clips. It showed a significant improvement of loudness normalised videos following EBU-R 128 compared to the original video clips without any normalisation, the “Max PPM=0 dBFS” normalisation used in CD production and the former European Broadcast Recommendation “Max QPPM=-9 dBFS” (see the Research Topics pages for more information).

Following the results of this test, we decided to perform another user evaluation to investigate the variation of both the Programme Loudness and Loudness Range (LRA) considering different listening situations as they can occur in the context of NoTube, i.e. using a computer, a mobile device or a Hybrid TV. We prepared different versions of a number of video clips to evaluate the application of loudness and LRA adaptation for typical listening situations of users.

http://survey.irt.de/notube The test can be carried out via the Web and to acquire the largest possible number of test cases for this evaluation we invite everyone to participate! Go to http://survey.irt.de/notube or scan the QR code with your smart phone to take part and to learn more about Loudness Normalisation on the Web! The evaluation will be open until Wednesday, October 12th 2011.

Posted in Evaluations, Loudness | 2 Comments