Exploiting universal search - part 1

By Andreas Pouros | 08 Sep 2007

Last month I focused on Google's evolving user interface and how the incorporation of multimedia elements into Google's search results (called Universal or Integrated Search) represents the biggest series of user interface changes at Google since the launch of AdWords. This evolution has brought with it an increasingly aggressive incorporation of images, videos, maps, and more into Google's standard search results pages, making it imperative that websites optimise those different data types to exploit the increasing levels of traffic they are delivering.

This month the focus is on how to optimise your images so that you can get the best possible visibility within vertical Image Search and also within the standard search results that are increasingly pulling images in.

Why should you care about Image Search?

The more of a search results page your brand holds the more likely it is that you will win the click-through. This can be achieved by ensuring your images appear when Google pulls images into its pages. This is true for Google, Yahoo! and MSN, all of which now integrate images into their results.

Secondly, vertical Image Search, accessed typically through a tab called 'Images' within all the major search engine interfaces, is becoming increasingly popular. Hitwise data shows that there has been a 90% growth in Image Search usage year-on-year, with over 360 million Image Searches made each month across the big 5 search engines (Google, Yahoo!, Ask, MSN, and AOL). It's important to note that approximately 72% of all image searches are conducted with Google. Within Google, 'Image Search' is the most popular vertical by a significant margin.

So, how do you optimise images?

There are a number of things you should do, but these five are the most important:

1. File Naming Conventions

All the images you use on your site should be given descriptive names. 'pete-sampras.jpg' is infinitely better than 'Image.jpg', assuming it's an image of Pete Sampras of course.

2. Use ALT Tags, Title Tags and 'long descriptions'

HTML allows you to add three descriptive elements to images - an alt tag, a title tag, and a long description, all to help the search engines contextualise your images. They would work like this:

Essentially, your Alt Tag is for a short description, the long description is for a more comprehensive description, and the title element is for whatever you want (but somewhere in between the size and scope of the contents of the other two elements would be recommended).

3. Text Relevancy and Proximity

The major search engines will look at the text on the page an image resides on as an indication, and for confirmation, of the image's theme. It is therefore imperative to, at the very least, ensure that textual content in close proximity to the image is relevant to the image and is aligned with the alt, title, and long description tag contents of the image. A highly relevant 'caption' under the image is often the best way of achieving this.

4. Term Selection

Relevant to most of the above points is making sure that you are using words and phrases in your optimisation of images that people are likely to be searching for. For example, more people search for 'nokia mobile phone' than 'nokia handset' so the first one should be used in your optimisation.

5. Understand the impact of Enhanced Image Search

Enhanced Image Search is a Google service that can be enabled by webmasters through their Google Webmaster Tools account. Once enabled the webmaster's site is entered into an 'Image Program' whereby participants can tag other sites' images with descriptive search terms. Multiple participants will essentially tag the same image during a session which helps Google get information about what the image is about.

Enhanced Image Search is important as, by opting in, you allow Google to use the input from Image Labeler participants to help in determining whether your images are worthy of ranking, and for what specific search terms.

This has implications for Webmasters optimising their own images as an optimiser will need to make sure that his/her images are clearly about something specific so that Google Image Labeler participants tag the image with the same unambiguous terms.

For example, let's say you have a website about the HBO TV Series 'Curb Your Enthusiasm' starring Larry David. If you have an image of the cast and are participating in the Image Program, people are likely to tag it with the words 'Curb Your Enthusiasm', 'Curb Your Enthusiasm Cast', 'CYE Cast members', 'Larry David and the Curb cast', etc, which will communicate to Google that the image is relevant (and should rank) for those specific terms. What if, however, a huge number of people are searching for 'Larry David'. Your image of the cast is unlikely to rank for that search term as not many Image Labeler participants would have tagged it up with that specific term.

If you had another image on your site that was just a picture of Larry David, most human taggers would probably tag it as 'Larry David' (assuming they know who he is) and nothing else, making it incredibly well tagged to rank for that term. The bottom line is that a website needs high quality images that are clear, unambiguous and are unequivocally about the things you want them to rank for. And you'll need a quantity of images on your site commensurate with your Image Search ambitions.

The future of Image Search

The above methods reflect how the methods employed by the major search engines to index images are currently primitive at best; they rely on textual cues and, in Google's case, a degree of human tagging. The status quo is heavily flawed - for two main reasons.

Firstly, what if an image exists on the Web that has no textual cues to identify what it is? No alt tags, no description and no data about it from Google Image Labeler or other image tagging websites? With current Image Search technology those images would never be found by the search engines.

Secondly, current Image Search technology is poor at analysing images within the context of even semantically relevant textual cues. For example, an image of the famous racehorse Shergar would likely use the horse's name on the page, in tags, in captions, etc, which would result in the image being deemed highly relevant for a search for 'Shergar', but not particularly relevant for the word 'horse' or even 'racehorse'. The same is true for other famous animals, which are often the most famous in their species but wouldn't appear when the species is explicitly searched for. Other famous horses such as Desert Orchid and Red Rum, and Ming Ming the Panda would be other good examples.

It isn't just animals either. A search for 'mountain' in Image Search rarely serves back any pictures of Mount Everest or K2, the two highest mountains in the world. A search for guitarist doesn't return any images of Jimi Hendrix, Eric Clapton, or Jimmy Page, arguably the three most accomplished guitarists of all time.

The problem is that current search engines are not currently capable of determining that a picture of Ming Ming is also a picture of a Panda, or that a picture of Jimi Hendrix playing guitar at Woodstock is also a picture of a guitarist. They are too literal in their computations and it limits the scope and value of Image Search results.

Intelligent ways of solving the above two problems are likely to be employed by the major search engines within the next few years. In fact, the technology is pretty much there for that to happen quite soon. For example, the University of California are using clever image indexing techniques that allows for the training of a system to identify each part of an image's composition. They call this Supervised Multiclass Labelling ( http://www.jacobsschool.ucsd.edu/news/news_releases/release.sfe?id=650). It work by exposing the system to hundreds of pictures of different things - cars, tigers, blossom, rainfall, etc, etc - which allows the system to be 'trained' to automatically understand and identify the visual characteristics of those things, so when its presented with a new image it can determine what the image is about.

This kind of approach would instantly identify that Shergar is (was) a horse, that Ming Ming is a panda, and that a picture of Jimi Hendrix holding a guitar makes him a guitarist and not only 'Jimi Hendrix'.

Once the data processing requirements with these more sophisticated approaches becomes more efficient and manageable there would be no reason for them not to be employed by the major search engines. The need for websites to have a good breadth of high quality and unique images will then become even more important than it is now.

Next Month: Exploiting Universal Search Part 2: Video

Share this article:

About the author