I like to think about the kinds of analysis of web sites that I would do if there were no limitations on data availability and quality. It’s a fun kind of pipe dream for a steamy summer evening. Earlier this evening, I had a conversation with a friend that turned to search log analysis; he liked one of my “standard” ideas (number 1 below) for an interesting analysis based around on-site search.
Pretending for a moment that the data were properly logged, here are a few things I’d like to use as points of site evaluation:
- From what pages are the most on-site searches initiated?
- What terms are used in searches from which pages? Is there a logical reason that people would search for A from page B?
- How are queries initiated from the home page qualitatively different from queries made from other pages in the site?
- For sites that serve content that is segmented for different audiences, which audience’s pages generate the most searches? Are the searches initiated from each segment’s pages substantially different in character?
- How do visits that include successful on-site searches differ from visits that include unsuccessful searches and visits that do not involve searching, in terms of visit length, quality events, and conversion?
- If there is an advanced search tool, how often is it used? Are the visits that include this type of search different from those with basic searches? Which advanced search fields are used the most?
- How many visits are abandoned at null search results?
- Are more searches conducted by new visitors or returning visitors? Are the queries of new versus returning visitors qualitatively different?
- Visitors may initiate searches within one or two pages of arriving at the site, or after navigating through many pages of content. What are the differences in search terms used and the pages from which searches are initiated for queries that occur early in the visit versus late in the visit?
- Most visitors will make only one query in a visit, but some visitors will make multiple queries. What terms do repeat searchers use, and how are they different from searchers who make only one query?
- A/B and multivariate optimization techniques can test how well two pages perform against each other in an experimental trial. Using these methods, how does visit length/quality vary when more or less information is provided to the user in the results?
- How does providing “guideposts” or other additional help on the null results page change the page abandonment rate? Can you re-engage a “nulled” visitor?
- What languages are represented in the site’s search logs? Can this information advise the choice of content for internationalization or localization?
- Are there seasonal patterns to search queries on the site? I would like to go so far as to assert that there really must be seasonal search patterns to queries on every site, as the lives of searchers worldwide are profoundly affected by seasonal changes. But that’s a bit much to claim without a proper survey of annual search logs (preferably for the same set of several consecutive years) of a variety of sites across various sectors. That would be a monumental project, and good luck getting permission to publish on the data…
Site search is just fascinating stuff, neatly contained within the context of a single site, which provides some kind of parameters by which to attempt to understand the user queries. If I had access to a mountain of search engine query data from Google or Yahoo or even MSN, I’m not sure that I would know where to start. Context? The whole world, and everything. How do you sift through that?






