A new tool built by IBM Research in Israel and the European Union can help identify people, places and things in large collections of audio-visual content on the Web, even if they haven’t been tagged or indexed.
Finding spare parts or identifying a landmark could soon be as easy as uploading and analyzing a digital photo, thanks to a new European Union project led by Israeli researchers.
Called SAPIR (Search in Audio visual content using Peer-to-peer Information Retrieval), the new Web-based technology can analyze and identify the pixels in large-scale collections of audio-visual content, even if they haven’t been tagged or indexed with descriptive information.
The analytics engine, which can be used to sort through video, pictures, music and multimedia, was produced by a consortium of researchers from the EU, led by Yossi Mass, a scientist at IBM Research in Haifa. It is driven by content, rather than relying on keywords or tags.
Unstructured, uncategorized, untagged
The need for such technology is clear. Today, multimedia comprises the biggest proportion of information stored on the Net. And, according to a report in May by the IDC, a subsidiary of the International Data Group, 95 percent of electronic information on the Internet is unstructured and has not been categorized, or even tagged.
Images make up the biggest part of this digital universe. The number of cell phone pictures alone reached nearly 100 billion in 2006, a figure that is expected to grow to 500 billion by 2010.
“SAPIR is a potential ‘game-changer’ when it comes to scalability in search and analyses,” explains Mass. “It approaches the problem from a fundamentally different perspective and opens up a universe of new possibilities for using multimedia to analyze the vast visual and auditory world in which we now live.”
According to the developers from Israel, Italy, Germany, France, the Czech Republic, Spain and Norway, SAPIR can index audio-visual content and sift through collections of millions of multimedia items by extracting “low-level descriptors” from the photographs or videos. These descriptors include features such as color, layout, shapes or sounds.
Identification and analysis
If a tourist uses her mobile phone to photograph a statue, for example, SAPIR identifies the image’s low-level descriptors, compares them to existing photographs, and helps to identify the statue.
With further research, the developers believe more features could be analyzed. You could photograph a bag you saw someone carrying on the street, say, and find out which stores carried the item.
In the future, scientists might even be able to expand the technology’s reach so that it could be used to analyze medical images and rich media patient records to suggest a likely medical diagnosis, by comparing the combined results with historical data from distributed medical repositories.
“SAPIR taps into the vast – and rapidly growing – electronic repository of multimedia and has exceptional reliability and nearly unlimited capacity,” says an IBM Israel spokesman. “It uses the same type of self-organizing, peer-to-peer technology currently used for swapping audio and video over the Internet. With this approach, there is no central point of potential failure, and server hardware can be added for additional capacity when the collection grows.”