Ask the experts: the future for media archives

Stefano Cavaglieri#

Published 1st June 2014

by Stefano Cavaglieri Issue 89 - May 2014

Semantic linking is a term coined by Tim Berners-Lee and used to describe a framework of syntax that allows computers to understand complex statements of the kind humans are able to deal with easily. If all the information online were to be accessible through semantic linking, computers would be able to make use of it in much more subtle ways, and this would greatly increase the power of data search and retrieval. While constructing a framework and system for internet-wide semantic linking is a massive and complex undertaking, one area in which it could more readily be implemented is in media archiving. Here it would allow much more versatile and efficient retrieval of media assets, using a far wider range of search criteria. The commercial possibilities opened up by this development would create far greater revenue for holders of media archives.
Whats different about semantic linking?
Sentences like The Beatles were a popular band from Liverpool, John Lennon was a member of the Beatles, Let It Be was recorded by the Beatles are easily understood by people. But how can they be understood by computers? Statements are built with syntax rules. The syntax of a language defines the rules for building the language statements. But how can syntax become understandable to computers? This is what the Semantic Web is all about, describing things in a way that computer applications can understand.
The Semantic Web is not about links between web pages; instead, it describes the relationships between things (for example, A is a part of B and Y is a member of Z) and the properties of things (such as the format, dimensions, replay speed, equalization, etc.).
Berners-Lee puts it like this: If HTML and the Web made all the online documents look like one huge book, RDF (Resource Description Framework), schema, and inference languages will make all the data in the world look like one huge database.
If information about music, events, preservation, and so on could be stored in RDF files, intelligent web applications could then collect information from any source, combining the information and presenting it to users in a more meaningful way. This could have the advantage of creating a more relational database-like guarantee for the correctness of query results.
Is a Semantic Web just around the corner?
The Semantic Web is not a very fast growing technology. One of the reasons for this is the very steep learning curve. RDF was developed by people with academic backgrounds in logic and artificial intelligence, making it very easy for the rest of us to understand it. Another is the current lack of standards. RDF is data about data or metadata. Often RDF files describe other RDF files. Will it ever be possible to link all these RDF files together and build a Semantic Web?
The promise of the Semantic Web has raised a number of different expectations. These expectations can be traced to three different perspectives on the Semantic Web. The Semantic Web is portrayed as: A universal library, to be readily accessed and used by humans in a variety of information use contexts; The backdrop for the work of computational agents completing sophisticated activities on behalf of their human counterparts; and A method for federating particular knowledge bases and databases to perform anticipated tasks for humans and their agents.

Some of the challenges for the Semantic Web include vastness, vagueness, uncertainty, inconsistency, and deceit. Automated reasoning systems will have to deal with all of these issues in order to deliver on the promise of the Semantic Web.
It is not very likely that owners of media archives will be able to catalog their multimedia document just by putting an RDF file on the Internet. Various applications will have to be developed, including a search engine database for all the items, and someone will have to develop a standard for it.
It might be eBay, it might be Microsoft, it might be Google. But eventually we will see marketplaces based on RDF. Publishing information about things on the Internet will be much easier than before. One day we will be able to collect information about almost everything on the web in a standardized RDF format. What other snags are there?
The advantages that the Semantic Web brings in terms of reuse, dynamism, flexibility, and openness also pose the possibility of inefficiencies such as complexity, and performance degradation. Then theres the human factor: people may include spurious metadata (i.e. metacrap) into web pages in an attempt to mislead Semantic Web engines that naively assume the metadatas veracity.
Enthusiasm about the Semantic Web could be tempered by concerns regarding censorship and privacy. For instance, text-analyzing techniques can now be easily bypassed by using other words, metaphors for instance, or by using images in place of words. An advanced implementation of the Semantic Web would make it much easier for governments to control the viewing and creation of online information, as this information would be much easier for an automated content-blocking machine to understand.
Another criticism of the Semantic Web is that it would be much more time-consuming to create and publish content because there would need to be two formats for one piece of data: one for human viewing and one for machines. However, many web applications in development are addressing this issue by creating a machine-readable format upon the publishing of data or the request of a machine for such data.
Is it all too difficult then?
Where Semantic Web technologies have found a greater degree of practical adoption, it has tended to be among core specialized communities and organizations for intra-company projects. The practical constraints toward adoption appear less challenging where domain and scope is more limited than that of the general public and the World-Wide Web.
Media archiving could be an ideal application. However, the IASA (International Association of Sound and Audiovisual Archives) is not yet committed to the Semantic Web. Documentation practices in libraries and archives are well established. They are supported by international regulations and a vast know-how. Common resources provide trusted information for aggregating data in the traditional way. A number of consortia at various levels are successfully covering different topics and the dissemination of information is well developed. At this stage, most of the organizations represented within the IASA do not really feel they need the Semantic Web.
So at the current stage of development, the Semantic Web is not a priority for the IASA. But this does not mean it will be ignored. A very soft, although practical introduction to some of the beauties of the Semantic Web might be offered by companies such as NOA-Audio, which have systems that are particularly adaptable to a Semantic Web extension, and which have representatives on IASA technical committees.
The commercial potential of media archives which are searchable through Semantic Web technology could be the argument that spurs development in this direction. An IASA implementation of Semantic technologies would transform the way archive holders could exploit their assets, and for the world at large, it would open up access to archived material in exciting new ways.

Related Articles

Related News

Related Videos

© KitPlus (tv-bay limited). All trademarks recognised. Reproduction of this content is strictly prohibited without written consent.