While EPGs (electronic programme guides) already provide considerable amounts of metadata, a lot of valuable meaning is “hidden” in free-text and needs to be extracted.
A basic form of semantic annotation is tagging text with formal references to concepts, e.g. DBpedia URIs. This can be achieved automatically by means of text-mining (information extraction) techniques which recognise references to concepts in a string of text. These concepts can then be used to form semantic descriptions of programme content.
Why was this of interest to NoTube?
As part of the automatic semantic enrichment of programme and user metadata, NoTube was interested in techniques for automatically recognising references to concepts in a string of text (e.g. a programme synopsis, or subtitles). Extracting key entities such as people’s names, locations, events, dates, and specialised terms from free-form text is the first part in enabling the matching to concepts which provide background knowledge about a programme. Once a concept has been identified, it can then be mapped to its resolving identity (its URI). For example, a person such as Sir Robin Knox-Johnston can be mapped to http://dbpedia.org/page/Robin_Knox-Johnston. Extracted entities in turn have relations with other entities in the Linked Data cloud and those relations can provide further information about a programme.
What NoTube has done in this area
One of the project partners, Ontotext, has developed Lupedia, a multilingual web service for looking-up entity names in free-form text, to support annotation of TV content using references to data from existing Linked Data repositories such as DBpedia, Freebase, Geonames and Wordnet.
Find out more: See the Things to use section of this site
Who was involved? Vrije Universiteit in Amsterdam, and Ontotext, in collaboration with other project partners.
——-
Click here for further information on entity recognition