Net Beat


Lawyers, Start Your Search Engines


You're on the Internet, and need to find some information related to a file you're working on, say, pertaining to your client's specific medical condition. Unfortunately, your law firm homepage doesn't point you in the right direction, and your son's personal webpage seems to have links only to sites of basketball stars that wear Reeboks and to fans of the rock group Tragically Hip.

So you head for a search engine. What's your favourite: Webcrawler? Alta Vista? HotBot?

If your first thought is
Yahoo!, then I'm afraid you're on the wrong track: Yahoo! is not a search engine but, in actuality, a subject index.

The difference is more than semantic. True, they are both websites dedicated to pointing you somewhere else, but approaches to constructing them differ along many axes, among them scope, content, report style, method of indexing, method of search, and data currency.

If Yahoo! seems to be the most organized among the different sites listed above, that's not an accident. Yahoo! is a subject index, a site that strives to make chaos look organized by categorizing its listings by subject (thereby suggesting that unconnected sites are somehow ultimately connected).

Other examples of general subject indices are the
World Wide Web Virtual Library and Magellan.

Subject indices are popular on law-related websites. Your favourite page of legal links listed by area of law or by geography is a common example of a subject index, determined to break down each heading logically into finer points. For a sample, check out the
World Wide Legal Information Association's Canadian law pages.

A subject index is a three-dimensional table of contents, drawing lines and creating sequences through various topics. But that orderliness comes at a price: a good subject index requires a considerable amount of human intervention to make it work properly.

The need for human participation is not surprising. Human brains are far better at grasping concepts and grouping associated notions than machines are, since the latter are programmed by the former who don't really understand how it is that they grasp concepts and associate notions in the first place.

Moreover, there are devious human beings out there, who try to trick computers into thinking that their websites are bigger, more important or more relevant than they really are. These impostor entries can be spotted more easily by other human beings, who can at least try to understand the motivation for devious behaviour, another concept the computer has yet to grasp.

A search engine is a totally different animal. (The computer is mineral; you and I are vegetable.)

You may already be familiar with the likes of
Alta Vista, HotBot and Lycos. Basically, a search engine is comprised of two elements, each of which must be evaluated separately.

The first element is a searchable database, maintained at the search site itself. This point is key a search engine doesn't respond to your query by going `out' to the Internet to fetch things to show you, any more than a radio disk-jockey responds to a call-in request by going out to the stores to buy the right disk. The gathering and marshalling of information is all done beforehand; you're only searching through the booty.

A search engine is useless to you if you don't know what scope its database purports to represent. It may encompass the whole of the Internet, or be confined to pages on the World Wide Web. It might sample only text documents or text and graphics. It might hold newsgroup postings, e-mail addresses, or whatever.

Alta Vista extensively catalogues the Web and newsgroups; whereas
Excite's database, on the other hand, retains only two weeks' worth of newsgroup postings. Lycos claims to index 91% of the web, and maintains a separate database called Pictures & Sounds; however, it is one of the slowest to update changes.

The second element is a series of search algorithms. These determine, inter alia, the flexibility of the search options e.g., as the sophistication of Boolean operators, the availability of a truncation or `wildcard' character and the indexing method for the data. They vary from engine to engine.

Lycos, for example, uses only the first 250 content words of text to prepare its indices. And instead of Boolean operators, it offers what can only be described as variations on an OR search.

HotBot, on the other hand, has full Boolean capabilities but won't support truncated search terms. And what constitutes `relevance' for the purposes of ranking varies from one engine to another, so equivalent searches on different engines produce different results.

Yahoo!, as mentioned earlier, is a subject index, not a search engine. True, it does offer you a search function to do word searches in its database, but the search function is weak, and the database is limited to the World Wide Web; even there it only tracks URLs, page titles and short descriptions.

Subject indices and search engines each have their own obvious strengths. A subject index works best when you understand the framework binding it. As a lawyer, for example, you'd have a much easier time with a legal subject index than, say, a medical or geological one.

If the structural gridwork of the index isn't helping you, though, you need a tool to break through the `top-down' viewpoint of the hierarchy and allow lateral exploration. A search engine will provide that functionality, and won't need a lot of human intervention to help it along. There is a toll exacted for that lack of organization though: you'll have to sift through lots of cyber-hay to find your needle.

The greater challenge more bedeviling than searching is figuring out how to set up your webpages so that they appear properly in someone else's search. Subject indices like Yahoo! make it easy: you simply submit your entry and tell them how you want it indexed. The search engines are harder: they each require different coddling to make them respond properly. We'll talk more about that in a future column.

******************

The Canadian Society for the Advancement of Legal Technology (CSALT) has moved to a permanent home on the World Wide Web. Completely revamped by Webmaster (and Past-President) Harvey Ash, when completed the site will contain links to various resources on legal technology, links to members' homepages and articles of interest. Check it out at
http://www.csalt.on.ca/.

 



Lewis S. Eisen is a computer trainer and consultant to the legal profession, and the author of The Canadian Lawyer's Internet Guide. He can be contacted at leisen@pfx.ca, or http://www.magma.ca/~leisen/index.html.

[Horizontal Rule]

Net Beat Table of Contents

Back to LSE Homepage