Such a little thing has the potential of having such an impact -- your sitemap. I don't mean your navigation bar or a web page listing your pages. I mean that XML file you generate every time you re-index your blog. You are generating an XML sitemap, right? And submitting it to the major search engines -- Google, Yahoo!, and MSN?
If not, read on.
Don't worry if you haven't been creating sitemaps and submitting them. Even though Google sitemaps have been around for a spell, there has been recent news that has made sitemaps much more important:
In an encouraging act of collaboration, Google, Yahoo and Microsoft announced tonight that they will all begin using the same Sitemaps protocol to index sites around the web. Now based at Sitemaps.org, the system instructs web masters on how to install an XML file on their servers that all three engines can use to track updates to pages. This should make it easier to get your pages indexed in a simple and standardized way. People who use Google Sitemaps don’t need to change anything, those maps will now be indexed by Yahoo and Microsoft.
So what is a sitemap?
Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site.
Web crawlers usually discover pages from links within the site and from other sites. Sitemaps supplement this data to allow crawlers that support Sitemaps to pick up all URLs in the Sitemap and learn about those URLs using the associated metadata. Using the Sitemap protocol does not guarantee that web pages are included in search engines, but provides hints for web crawlers to do a better job of crawling your site.
Sitemap 0.90 is offered under the terms of the Attribution-ShareAlike Creative Commons License and has wide adoption, including support from Google, Yahoo!, and Microsoft.
So for my blog, I've create a sitemap template (customized from a Movable Type template for generating sitemaps I found on the web) that generates a fresh sitemap every time I re-build my index pages. This sitemap includes the URLs for the major splash pages, as well as the URLs for each individual posting, each index, and each month and category archive. Sounds like a lot, right? As per the protocol, I've weighted the pages for relative importance -- individual posts rank the most important, as does the home page, then the month and category archives, then the other splash pages last. Similarly, each page has an update frequency specified (individual posts rarely, indices most often).
Now Google and Yahoo! have a map of what my blog looks like, what pages to crawl first in case I'm not going to get fully crawled, and how often to come back (as a suggestion, of course).
Now you submit that sitemap to the search engines themselves:
Once you have created the Sitemap file and placed it on your webserver, you need to inform the search engines that support this protocol of its location by submitting it to them via the search engine's submission interface or an HTTP request.
The search engines can then retrieve your Sitemap and make the URLs available to their crawlers.
"How?", you ask
For Google, log in to your Webmaster Tools page and use the submission form there. What? You don't have an account for webmaster tools?
The go to Google's Webmaster Tools page right now and get an account. From there you can submit a sitemap, check it for errors once it has been crawled, then start poking around to see just when Google last crawled you, what problems it found with your links, what words Google is finding in your content that it guiding its understanding of your site's subject matter -- you get the idea.
For Yahoo!, get logged in to the Yahoo! Site Explorer. Again, explore the interface to see how you get your site recognized, then how to associated the sitemap with it. Then poke around with the cool toys.
In both cases, Google and Yahoo! will authenticate your ownership of the site first before allowing you access to more functions. Authentication is quick and easy. Google requires you to put a meta-tag in your home page HTML that it will then try to detect. Yahoo! requires you upload a special file to the root directory of your site and then checks for it. (Google offers this form of authentication as an option as well.)
MSN is another matter. I have not been able to find out how a webmaster is supposed to submit a sitemap to MSN. Several postings on messageboards state that there is no mechanism yet. That should change soon. There is an MSN submission page, but it asks for your home page URL for the standard web crawl. Given the smaller importance of MSN in terms of search engine market share (Google 49.2%, Yahoo 23.8%, MSN 9.6% as of July 2006), I wouldn't sweat it too much.
Has it made a difference for me? Traffic-wise, it might be too early to tell (but Rob Hyndman posted to my blog to say it helped him out). Nevertheless the exercise has been great as an audit of my web pages, as well as giving me the tools via Webmaster Tools and Site Explorer to see exactly what the search engines see and when they saw it.