Enriching web content with semantic formatting

By Ryan Siddle | 06 Sep 2011

Looking back over the past 10 years it is clear to see that the web has evolved to become the most powerful and ubiquitous source of information in the world. This richness is changing the way we live and search. In the late 90's and start of the new millennium, web pages were merely basic content, hyperlinked with information that had little meaning to search engines. It took seconds, if not tens of seconds for a single webpage to load using a 56Kb dialup modem. That was cutting edge technology!

With the development of broadband, faster speeds and lower prices more users can engage online leading to even more information. The front face of Google's business model has always been to provide users with the most relevant search experience. With all of this information and no way of search engines to actually understand what a search phrase means with content, a solution was drastically required to help search engines actually understand information.

Understanding List Snippets

List snippets are the latest addition to the Google arsenal. Google mentioned on their Insider Blog in August that they are still testing list snippets on a select number of search results and have not yet rolled out the full list snippets to all. Although Google have not released the markup, judging by VoucherCodes.co.uk HTML source code, we can see that list structures are created by repeating block elements of nested code. It is also highly likely that Google will also factor in semantic relationships for similar list titles to show the most relevant lists of data.

  

The screen capture above shows two results with Google list snippets for the search term "food discount vouchers".  Some of the lists do not contain the keyword "food" directly, however there is a relationship between food and restaurant which also links to "main courses".

Existing formatting for vertical searches

Applying a logical and documented semantic XHTML structure allows Google to provide users with relevant content types through its various vertical markets using different data types, including: web links, images and products.

A search for 'Duracell AA batteries' yields various media types within one search query.

  

Recent additions to the formatting families allow additional "things" to be marked up, allowing them to be represented as "objects" on the web. People, events, recipes, navigation breadcrumbs and many other listed items can be described in the form of attributes. 

At present there are three potential solutions to use semantically markup content:

  • Microdata
  • Microformats
  • RDFa

Google was one of the first search engines to support HTML mark-up with microformatting to produce rich snippets of data with a 'universal search' experience.

 

Types of objects to mark-up

Each of the mark-up types uses its own defined method of classification, usually by a documented schema that needs to be referenced or predefined identifiers. We can describe an object, for example, every webpage on the internet may have the following attributes:

  • Description
  • URL
  • Name
  • Image

And a Product type may have additional information about it such as

  • Condition (New/Second-hand)
  • Brand
  • Manufacturer
  • Model
  • SKU
  • MPN (similar to a SKU/Barcode)
  • Description

Using one of the rich snippet mark-up types allows this information to be represented and potentially listed in the SERPs. Search engines such as Google will still use their algorithmic techniques for normal SERPs to decide whether the marked up content is relevant to the search query being performed and rank it accordingly within the SERPs.

 

What are microformats?

As of 2009, microformats were introduced as an extension to the existing World Wide Web Consortium ("W3C") HTML 4 schema, in aid to add meaningful information to a webpage. The microformatting code resides within standard HTML code and no direct schema declaration needs to be made, but instead use predefined attributes, classes and IDs.

Microformats.org is the main information source, providing examples and detailed explanations to the various information that can be marked up.

A typical example of microformatting in practise is for companies looking to promote their business address(es):

   L'Amourita Pizza
   Located at
     
        123 Main St, Albuquerque, NM.
       
     
        
           
        
        
           
        
     
   Phone: 206-555-1234
   http://pizza.example.com

Source: http://www.google.com/support/webmasters/bin/answer.py?answer=146861  

Microformats do not appear to be as popular as they first were because the lack of development in recent times. The term microformatting and rich snippets are often used interchangeably, however rich snippets are the end result of microformatting or the other semantic mark-up. An alternative to microformats is marking up with microdata.

Microdata mark-up

Microdata has gained momentum most recently with Google, Bing and Yahoo officially announcing their support for this formatting as well as Google's support pages displaying links directly to a schema website. Microdata specification was designed with HTML5 in-mind and provides a great deal of functionality with its own Document Object Model (DOM) API.

Data-vocabulary.org  and Schema.org are just two of a select few trusted sites hosting microdata schema specifications and referencing. The numerous examples of different object types, relations and examples make it an excellent resource for any business to use.

    L'Amourita Pizza
    Located at
    
      itemtype="http://data-vocabulary.org/Address">
      123 Main St,
      Albuquerque,
      NM.
    
    Phone: 206-555-1234.
    http://pizza.example.com.

Source: http://www.google.com/support/webmasters/bin/answer.py?answer=146861  

 
 

 Resource Description Framework-in-attributes (RDFa) mark-up

Resource Description Framework-in-attributes (RDFa) is another XHTML mark-up that uses a limited number of attributes to describe the object. The difference between RDFa and microdata is the way in which XML is presented. RDFa has mainly been a working draft of XHTML 2.0 in an aid to provide meta data to any XML based structure, however XHTML 1 was extended to 1.1 to include RDFa.

   L'Amourita Pizza
   Located at
   
     
         123 Main St,
         Albuquerque,
         NM.
      
   
  
      
         
         
      
   
   Phone: 206-555-1234
   http://pizza.example.com

Source: http://www.google.com/support/webmasters/bin/answer.py?answer=146861

 

So what other types of information can you mark-up as rich content?

Almost anything really. In recent months, microdata has been leading the rich snippet content with the specifications rapidly evolving and being included. Google announced in June 2011 that they have historically supported all three mark-ups, however future proofing should be made with microdata.

Whichever method is chosen, the Google rich snippet testing tool should be used to ensure the formatting is applied correctly.