November 1997

Searches On The Intranet

The same software that is used on the Internet can be used to distribute information within a business. This is an intranet.

We recently faced the task of making a law firm library's card catalog available to the PCs on a local area network (LAN). The goal was to make this information available using the Web browser which was already installed on each PC. This led us to look at ways to search a local database.

A Small Text Database

The library's card catalog database consists of several thousand small records, each representing a book or other publication. The records are maintained in a database program and they can be exported to a text file, with a unique line (containing only a dollar sign) following each record. Nearly all the exported records were fewer than 20 lines long.

We wanted to search the records without regard to fields, such as title, author, and publication date, so we can search just by entering a few words.

Because each record is small, we wanted the list of hits to include the entire record, not just a link to each record.

Web Page Searches

Many people have used search engines on the World Wide Web. The outstanding feature of these search engines, we feel, is their ease of use. You type in a few words and get back a list of hits, which are Web pages containing those words.

The database you are searching, in this case, consists of Web pages residing on Internet servers. You can purchase some of the well-known search engines, including Alta Vista and Excite, to use on your PC. They search the files on your hard disk, and each hit is a file instead of a Web page.

How would you search a card catalog? You could put each "card" on a separate Web page or in a disk file, but then your list of hits would be a list of the cards and a link to each one. It would be tedious to have to examine each hit by clicking on a link to see the entire card, then clicking the browser's Back button to return to the list, and repeating this for every hit. Of course, we do exactly that for Web searches, but in that case the links lead to a great deal of information and the hit list shows a brief summary of what is available. It seems that a Web search engine is not the appropriate tool for a card catalog search.

Relational Database Searches

Many businesses have valuable information in relational databases. Software products are available which permit you to request information using a Web browser, which then generates a "database query" to extract information from the database and display it with the browser.

In fact, we already had in place at this same firm the software to do such queries. Attorneys and secretaries can enter a client number and get a list of all the matters for that client. Or they can enter a matter number and get a list of all the file folders for that matter. In both cases the information is obtained from the same relational database used for the firm's accounting and records management systems.

At first glance, a card catalog is just a very simple database. You could, therefore, put the records in a relational database, but this approach has some disadvantages.

A Web search engine indexes the words on each Web page, but a relational database does not index the words in each field of a record. It is very inefficient to search for a string within a field when the words are not indexed. For example, if the author field contains "Joan Smith" and you search for "Smith," a relational database must do the search by reading every record in the database and checking the author field.

Also, a relational database forces you to specify what field you want to search. You can't search for "Smith" anywhere in the record, for example. You have to specify whether you want to find Smith in the author field or the publisher field.

If you want to search specific fields, the relational database is a good choice. You can search only the title field, or only the author field, and ignore every other field. Many online library card catalogs are like this, but we think they are difficult to use for precisely this reason.

Customizing Softfile

A few years ago, RTG developed a text database called Softfile which meets some of our requirements. It maintains an index of all the words in the database. It searches for one or more words and returns a list of records which contain all of the words. It can do boolean searches using and, or, and not. It can also import a text file.

However, Softfile's user interface is rather complicated by comparison to a Web search engine. Modifications were required to make it work like a search engine, which lets you fill out a form and click the Search button.

Standard Web forms contain controls, like a text box for entering words and a button for submitting the form. When you click the button, the browser sends the information to the Web server.

At the server, Web forms are handled by the Common Gateway Interface (CGI), an Internet standard that can also be used on intranet Web servers. The information sent by the browser tells the server what program to run in order to process the form. The program creates a page that responds to the request.

We modified Softfile to receive a list of words (or a more complex boolean expression) from a form. In response, it creates a page containing the records selected from the database, in the format (known as HTML) which browsers understand.

We also provided two other capabilities. You can search for all the words in the index which begin with certain letters, and you can search for words that are similar to a specified word. These searches are useful when you are not certain of the exact spelling of a word, when you want to ignore the ending of a word, and as a tool to find misspelled words.

A Windows 95 Web Server

If you already have PCs on a LAN running Windows 95, you can add a Web server which also runs Windows 95. A Web server program called OmniHTTPd has the required CGI interface. You can learn more about it, and download it for free, from www.fas.harvard.edu/~glau/httpd.

[Note: This is now a commercial product from Omnicron Technologies]

On each PC you need to install TCP/IP software, which is provided with Windows 95, and a Web browser such as Netscape Navigator or Internet Explorer.


RTG Bills and RTG Timer are trademarks of RTG Data Systems. Other company and product names may be trademarks of the companies with which they are associated.

Back to the RTG News page