So what’s going on at the Security Laboratory I’m working for? A major project is summarized below, another one is currently too sensitive for publication.
Search engines: The problem of lexical ambiguity
This summary is based on .
Problems may occur, when search queries consist of words, which have more than one meaning. For example, Gavin the gardener tries to find pictures or information about apples (fruit) using the search term ‘apple’. Google will deliver two result pages exclusively with information about the company Apple (computers). Gavin never heard of a company called Apple before. That was not he was looking for. When looking for pictures, it gets a bit better, although there’re still more than half of the pictures related to the company.
To understand those results, one should understand the PageRank algorithm. Google works with the opinion of the mass. Someone puts a link to a certain website S on his own website. Now, the more people put a link to S on their own website, the higher Google rates the relevance and importance of S in a certain search context.
So obviously, a lot of people are linking to Apples (company) or related sites. (In fact, the ranking might also depend on some marketing aspects. Are companies paying for higher rankings?). One couldn’t say the delivered search results are wrong. They just don’t match with what Gavin had in mind when he searched ‘apple’. Moreover, Gavin never heard of the company Apple before, gives up crawling through the result pages after the third one.
How to deal with that problem? As we figured out, to humans ‘apple’ has at least two meanings. Therefore, some sort of classification should be applied. And why don’t we use the opinion of the mass again? A ‘Co-active intelligence’ system, as PageRank is, tries to address this issue. They rely on user-generated information. So why don’t we collect user-information concerning different meanings of certain search terms? Assuming every search session addresses only a certain meaning of a word or combination of words, based upon the collected user interaction information we can build up a collection of classes that are logically connected to specific search terms.
For example, Gavin searches for ‘apple’ with the fruit in mind. The search engine delivers certain result pages. Gavin chooses several pages, pictures, … The search engine now internally connects the search query with the selected pages, assuming that Gavin knows, which pages, images, … fit best for the query. The more people participate, the more accurate results will get. Users then can choose, which meaning actually fits best in that specific search context.
Co-active intelligence systems are able to reveal associations that may not have been discovered yet, known as ‘emergent semantics’. For example, ‘big apple’ could also be associated with New York. In general, emergent semantics can also be used in another context like scientific cooperation or service discovery .
 Truran, M., Goulding, J., and Ashman, H. 2005. Co-active intelligence for image retrieval. In Proceedings of the 13th Annual ACM international Conference on Multimedia (Hilton, Singapore, November 06 – 11, 2005). MULTIMEDIA ’05. ACM, New York, NY, 547-550. DOI= http://doi.acm.org/10.1145/1101149.1101273
 K. Aberer, P. Cudre-Mauroux, A. M. Ouksel, T. Catarci, M.-S. Hacid, A. Il-
larramendi, V. Kashyap, M. Mecella, E. Mena, E. J. Neuhold, O. D. Troyer,
T. Risse, M. Scannapieco, F. Saltor, L. de Santis, S. Spaccapietra, S. Staab, and
R. Studer. 2004. Emergent Semantics Principles and Issues. In Procceedings of the 9th International Conference on Database Systems for Advanced Applications (DASFAA 2004).