Membership

Not a member? Join now

Union list of periodicals

Spotlight: Why can't I find what I want on the Web?

 

To kick off the spotlight series I have intentionally picked a subject that has broad general appeal.  This article takes us back to the basics of searching the web and why we can’t always find what we are looking for as it is embedded in the invisible/deep web.  I also offer some tips and updates for searching the web.

Synopsis

A couple of months ago I was reading an article about the nation’s online searching habits (1).  Specifically, it reported that most people who search the web are “top of the heap” searchers.  That is, we will scan the first page of records using the links that most suit our needs with very few of us seeming to bother with any other pages of results.  I pondered this for a moment and dismissed this notion as being a  “lay searchers” technique.  After all, we information professionals pride ourselves on being able to locate information quickly, efficiently and effectively.  It is our business to be ahead of the game in search retrieval skills.   However, upon reflection I had to hold up my hand to the frustration of when my “quick fix search” yields very little from the “top of the heap”.  In truth I believe that we (information professionals) have become blasé (dare I say lazy) about searching the web, hopefully finding what we need with the fewest click-throughs.  So why not hold up your hand also, revisit the basics and learn some new tips and ideas on the way. 

How do we find stuff on the web?

There are basically four ways to find stuff on the web.

·         Browsing

This is the act of following a trail of hypertext links.  When the web was small, browsing was adequate, but the growth in size and diversity of the web has made this an inefficient method.

·         Search engines

This usually involves aspects of keyword searching and you are, in fact, searching a database containing indexes of web pages.  The database is constructed using a web crawler (or spider) that travels round the web collecting pages. 

·         General web directories

These are generally collections of links to web pages organised by subject.  Yahoo (http://www.yahoo.co.uk) is an example of this.

·         Targeted specialised directories

These are guides that focus on specialised topic areas and are generally compiled by subject matter experts.  Resources are examined for quality, authority, and reliability.

What stops us from finding stuff on the web?

·         Stellar growth of the web

The financial costs associated with the rapid growth of the web make indexing every page an impractical economic proposition.  Consider this: once a web page is indexed, a crawler may never visit that website again.

·         Difficult site to index

Real time content like news feeds and weather updates can be hard to capture.  Also, pages consisting of PDFs, compressed files, or Shockwave can also be ignored.   Database content and password-controlled sites are generally not categorised, either.

·         The web is big business

Don’t forget the web is one big money making machine, so why should search engine providers go the extra mile?  Providers try to be everything to everyone and that’s fine up to a point.  But consequently they may be less rigorous in comprehensiveness.

All the above points relate to the invisible or deep web.

What kind of stuff is in the invisible/deep web?

Non-profit making organisations like academic institutions, government departments, or think tanks, although their subject is narrow, can provide depth, authority, and comprehensiveness in a given subject area.

So, invisible web resources tend to have more specialised content.  Generally you will find that they have a more advanced search interface giving you more control over searching capability, perhaps with a search feature using Boolean.  Also, they generally will have increased precision and recall.  Check out the INTUTE (http://www.intute.co.uk) specialised directory, as this is a good example of an authoritative, timely, and exhaustive resource.

Search Software Road Tests

Ultimately we use the web to satisfy an information need, so let’s have a more focused and planned method of attack.  This is not a case for abandoning our old trusted search engines; it’s just a case of expanding the tools available to you and an appreciation of where to start taking into account invisible/deep web resources.

For the record, I still like Google for a “quick and dirty” search, but as I make a living out of delivering knowledge to people (and so do you) then sometimes we need to be a little more exhaustive.  I have been test-driving some new search software recently and here is a selection of what I thought was consistently good.

·         Rankingthumbshots: http://ranking.thumbshots.com

A search engine that presents results showing overlapping links, unique links, and total links.  Very good when you need to make sure that all the bases have been covered.

·         Exalead: http://www.exalead.com

I really liked this meta search engine and it has consistently given me good results.

·         Jux2: http://www.jux2.com

Another favourite meta search engine with consistently good results and high-ranking relevant results.

·         Twingine: http://twingine.com

This is a search engine I have used with very good results.  It compares Google and Yahoo on a split page.

A more recent development in looking for good content on the web has involved searching blogs.    I found both of the following very useful.

·         Blogdigger: http://www.blogdigger.com

·         Bloglines: http://www.bloglines.com

Top tips for searching the web

·         After ten minutes, if you can’t find what you are looking for, then change tack.

·         Every month try out a new search engine or directory.

·         Seek, don’t search.  Find out where the experts in your topic area “hang out” and use this as a starting point.

·         Compare search engines, especially for narrow subject areas.

·         Use subject-specific directories, as these are more likely to reach the invisible and deep web.

·         Professional online services always allow for more complex search queries; the web does not.

·         Search for sources, not just information.

Conclusions

For information professionals it is always worthwhile revisiting our most basic skills and services.  What is more basic than web searching?  It is this real value-added skill of efficient and effective web searching that sets us apart.  There are still some people who think “it’s all free and easily available on the web”, so it’s worth reminding our clients and ourselves again that “top of the heap” style searching is bound to cause frustration and annoyance.  We need to advocate an understanding of the web (visible/invisible/deep) and keep our clients up to date with new techniques for getting the best from web searching.

References

1.  Arthur, C.  Top of the heap.  The Guardian 2006 August 31.

Further Reading

This is a small selection of texts I have found useful.

Hartman, K and Ackermann, EC. 2004. Searching and researching on the Internet and the World Wide Web.  Franklin, Beedle and Associates.

Hock, R.  2004. The extreme searchers Internet handbook. A guide for the serious searcher. CyberAge Books.

Melvin, M and Thurow, S. 2003. Search engine visibility.  New Riders.

Sherman, C and Price, G. 2001. The invisible web. Uncovering information sources search engines can’t see. CyberAge Books.

Joanna Ptolomey is a qualified librarian and works as a freelance information professional.  She has held positions in the business sector and the NHS as a librarian.  She can be contacted at joanna.ptolomey@ntlworld.com.