Search engines on the Internet: yesterday, today, tomorrow

The need for search capabilities of Internet materials first appeared during the "The Pre-Web era". That is why Archie, Veronica and WAIS systems and, of course, Fingers were created. In 1980s these tools helped users to navigate their way during FTP or Gopher surfing, and to obtain addresses of their friends and colleagues, among other things. These programs were simple-in-use mechanisms and required little or no experience from their users (who most of the time were computer specialists and/or programmers). But many people had never used them, because these Internet resources were not very user-friendly, and often simply did not work from remote locations.

The situation finally changed in the beginning of 1990s, when the new application - the World Wide Web started to grow. WWW revolutionized the way we look at digital information and virtual environment. The possibility to easily transfer and receive high quality color images, video and sound files in addition to plain text have made the Net attractive to everybody. The following two or three years yielded huge number of Net users. People suddenly recognized the importance of the Internet for their business, research, education, entertainment and recreation activity. Internet has come to almost every company, university, scientific laboratory and many private homes. Thousand and thousand people wanted to place their own materials on the Net.

For this reason the quantity of information was increasing so fast that soon the Internet turned to a Cyber Jungle. To support the navigation in this vast virtual universe the further development of retrieval tools became a must. The demand was there and the supply of such systems became a reality. These search mechanisms became known as "search engines" and now their general number is above several hundreds and more and more of them are being developed every day.

This report is not an ordinary search engines' evaluation paper. We have a lot of them on the pages of computer and Internet journals and on the Web. My goal is to analyze search tools evolving during the last several years, find out major trends and discuss their development in the nearest future.

The basis of the present report is a survey of the most well known retrieval tools. I have been analyzed both well-known (WWW Worm, WebCrawler, Lycos, Harvest, Galaxy, Yahoo) and new ones systems (Alta Vista, HotBot, Ultra InfoSeek, OpenText, Excite, Magellan, MetaCrawler). Many special articles written by authors from different countries have been reviewed as well.

This research allows to ascertain that search engines are changing constantly. They are improving almost as fast as everything on the Net. There are improvements in the all aspects: database content, the search interface, search features, output of results, and additional services. Systems that are not improving for a long time go out of business.

The main trend of the latest development is two-fold: the ever-growing number of documents; and, the number of words (terms) inside documents that are being indexed by search robots. Older tools usually were able to index only some million sources (which included some web-pages, Usenet articles, Gopher materials and FTP files). Contemporary engines are able to survey on average about 50 million documents (from approximately 30 million in Alta Vista up to 68 million in the new version of Lycos). In spite of that, the searching response time has not increased, since newer systems utilize more powerful hard- and software.

Perhaps it is even more important that engines of the latest generation frequently index all the words at the web-page or in the message from the Usenet newsgroups. Early tools usually restricted their focus only by titles, headings, URL, summary, or some first n number of sentences in the text in the document. This technique substantially limited the possibility of retrieving relevant materials on some narrow issues and consequently resulted in misrepresentation of search results. Thus modern search engines are much more reliable than the older ones.

Thus, it is possible to say that soon comprehensive fulltext indexing will be a compulsory requirement for any search engine that is going to be competitive. We define it as very important principle in the evolving of the Internet search mechanisms.

The second trend that clearly appears is interface improving. First of all, text interfaces have been finally replaced by graphical ones. Interfaces as such became much more intuitive than they were 2 or 3 years ago. Developers of first search system did not worry too much about user's comfort. At that time much more significant was just the brute force of the mechanism. There were only empty slots for typing in queries. Good help screens with examples on how to use syntax or Boolean operators were absent. Therefore users had very hard time trying to figure out how to correctly form a query. Things are very different with the newest generation systems. In the majority of cases they are provided with detailed multy-stage menus, which make compilation of queries much easier. OpenText Power and HotBot Modified query forms are nice patterns of progress in this area. Many search engines supplied with detailed Help, Examples or FAQ that assist users. Their interfaces relieve clients from advanced experience with Boolean logic. So users do not need special training to use these systems.

Growth in the number of operators and other query elements is also very important. Several years ago only two or at best three classical Booleans were used: AND, OR and NOT. Right now we have NEAR in Alta Vista and FOLLOWED BY in OpenText - very useful adjacent operators, allowing to narrow a query. Many contemporary systems allow truncation, date limiting, web-page field and even source type restriction, and have a character case sensitivity. Many contemporary search engines now have multilingual capabilities, which allows retrieval of information in different languages. All this allows us potentiality to form a query as exactly as we wish.

The next meaningful feature is a search result improving. In this case it is possible to say that systems became smarter. Today their artificial intelligence is able to rank search results depending of their value in a particular query. Systems analyze key words allocation in the document, their quantity and relative numbers. On this basis AI ranks search results, thus referenced of most value rise to the top of list. There is no doubt that the work on how to make search engines even more intelligent will continued, since only systems supported by powerful AI element will be able to master this ocean of fulltext data and survive the strong competition.

There is another important role for AI. As far as the Internet and especially the World Wide Web are expanding and growing a lot of materials that have little value will appeare on it. "Junk data" prevents us from retrieving materials that are really important. There will be a serious contradiction between a desire to achieve comprehensiveness of information and reporting "trash" documents. Only AI-based systems that can evaluate and select appropriate sites or, at least, print it out at the top of list result, are able to defend users from this information overflowing.

Additional functions appearing is one of the latest major trends. Many systems of the newest generation have additional capabilities which make their use more convenient. Some of them give an additional opportunity to search other special databases, such as world wide e-mail addresses, company directories, timely news, etc. Many engines have a "Similar pages" function incorporated into a result list, that is able to clarify results and delete irrelevant data. Others place on their pages links to the most recognized reference sources on the Internet like electronic dictionaries, thesauri and encyclopedias. Thus a search engine interface becomes a nice virtual reference desk for everyone who works with Internet resources.

In addition to technological changes, strategic tendencies are also evident. Among them we see huge search engines splitting up. For a short time we got many regional and field-of-knowledge retrieval mechanisms. Almost all European countries have at least 5-6 such systems. The reason is clear. It is difficult to index and register absolutely all data on the Net. Search tools of the smaller scale should assist with this task (comprehensive data enrollment) for the special areas like separate countries and most important fields of knowledge. Perhaps soon we will have as many Internet directories and search engines as we have paper indexes today.

There is no doubt that all trends mentioned above will be available for the nearest future. As Internet resources grow, search engines' potential will expand as well. There are two ways for retrieval tools development: partial improving of major search mechanisms and creation of new major systems under new names. The first method can be demonstrated with Lycos, WebCrawler and Alta Vista . The second can be seen with HotBot and Ultra InfoSeek that have replaced Inktomi and InfoSeek respectively.

It might be a little bit funny but fundamental reason for improving search mechanisms is expansion of business-geared activity on the Internet. Many people have already understood that retrieval engines creation and support is a lucrative business. As search engines are becoming most frequently visited servers (almost as popular as sex-servers) they became appropriate places for advertisers. It is possible to say that advertisement is a mover and shaker of search engines.

As the case stands search serversí developers should constantly increase the attractiveness and ease of use of their products to as many people as possible. There is no another way to do this but to provide search capabilities with more sophisticated parameters make their interface more user friendly. Competition in this area is a solid quarantee of search engines subsequent development. For our common wealth.

Last Updated: Tuesday, February 04, 1997