The Super Search Engine: The Best Indexing and Searching.

The Super Search Engine: The Best Indexing and Searching

By Ion Saliu, Search Engineer At-Large

The best indexing and searching of the Internet.

The search engines must be the brains of the Internet, as the browsers must be the heart of the Net. Without searching, the Internet would have not had any future.

I believe Yahoo deserves the award of Internet search pioneer. They put together a list of Web pages — by category. That was a start. The Net surfers had places to go. Yahoo was called a directory or portal. The portals are manmade. The directories are still useful. But they have serious limitations. First, it's impossible to categorize all Web pages, given the extraordinarily fast pace of Web changes. There are billions of Web pages, and counting! No group of human editors can keep up with the immensity of the Internet. Another serious limitation of the portals is the bias. The selection and inclusion in a directory are very subjective decisions. It was made much worse by the pay-for-inclusion models. Money has become a substitute for quality. That would kill the Internet, as in "Kaput Internet!"

Next, a huge search engine took over the Internet: Alta Vista. It was a powerful search engine, capable of indexing and listing millions of Web pages. Alta Vista shed light on huge dark areas of the Net, invisible before the search engine came to life. Alta Vista was quantitatively a power. It was a poor performer when it came to the quality of the search results. I remember searches when Alta Vista listed one URL over and over again! I remember one Web page listed more than twenty times. The listing included every modification of the page! The relevancy of the search was also very poor in the old Alta Vista. (By the way, Alta Vista is now part of Yahoo search and uses the same search technology: Inktomi.)

Then, a smaller search engine defined the concept of relevancy: Hotbot. I remember Hotbot getting all the awards of a notable computing publication, PC Magazine. It was still the pioneering era of searching on the Internet. O tempora! O mores!

The Internet was taken by storm with the introduction of Google and its famous Beta. Google picked up where Hotbot left off. The relevancy became the major focus of the search technology. Only the Google insiders know exactly the algorithm. It is assumed that the keystone is the back-linking. That is, if a Web page (or its parent site) is listed on many other pages, that page must have merit. If it is good, other people refer other Net surfers to the same resource. Such concept does have a logical foundation.

One of my Web pages ranks very high on the keyword deviation standard. Yet, it is ranked very lowly on the keyword standard deviation! Isn't it the very same concept? Sure is. In both situations, my page offers the most comprehensive treatise of the subject, including pertinent software. The page is also naturally integrated in a Web site of related articles.

The above anachronism is also caused by the scholastic syndrome. The search engines consider that the established educational outlets must be the best in treating a subject. Problem is, the schools tend to be overly conservative, even mummified. Most notable advancement in knowledge has occurred outside traditional institutions. Matter of fact, the mummified education makes impossible advancement in knowledge. As in that Pink Floyd song:

”We don't need no education,
We don't need no thought control!”

There are several problems with back-linking, especially after many Internet authors "discovered" the Google algorithm. First, the backlinking can be the result of pay-for-inclusion. Again, money speaks, not the quality; therefore the relevancy can be a moot point. Second, more and more Web authors exchange links. One thousand good friends but lousy Web authors can beat, at any time, one genius Web author. The back linking has seriously damaged the relevancy of Internet searching. I have stumbled upon miserable Web pages with high ranking in all major search engines. Such pages of misery succeed to rank high because of keyword manipulation and back linking.

Further search improvements of the moment

Heuristics

I wrote previously on a serious weakness of the search engines. For example, they treated "search" and "searches" as totally different concepts. I tried the search engines on keywords that my Web site deals with. I noticed that lexicographic and lexicographical were treated as totally different concepts. Logically, such concepts should have been treated as being the same logical entity. Thus, heuristic (logical) grouping of concepts (keywords) should be an urgent priority. I have noticed, lately, that Google improved on that key paradigm.

Integration: the book paradigm

Apparently, the search engines favor pages with very short content. They favor one page with the keywords repeated many times, instead of the book paradigm. The book paradigm is a collection of related pages: several pages at the same website dealing with the keywords. More pages dedicated to the subject means a more thorough analysis of the respective topic.

The book model could seriously impede spamming. Laziness is the creator of short Web pages with the keywords repeated again, and again, without meaning at all. Writing several pages dedicated to the same topic and closely related topics indicates seriousness. It is a good indicator that the keywords are treated in a more thorough manner. The integration should be viewed only as part of the same Web site. Otherwise, it would make it very easy for any sucker to write down a few lines of keywords and then offer hundreds of links to external Web pages of high quality! Every Web idiot would rank higher than geniuses that don't offer links to any external websites!

It takes more quality effort to write a book than scribble a sketchy page. The effort is even higher if the book is accompanied by a CD (e.g. a site dedicated to software downloads). Complexity should be valued, not punished.

The future of search engines

I do not believe that the search engines have a long-lived future. Kind of like the portals (directories). The portals made a lot of noise for a couple of years. Not any longer — every Internet service provider is a portal nowadays. In the near term, I foresee high quality search engines installed by just about every Internet service provider. It is not hard at all to create high quality search technology — unbiased and relevant. Searching has long been a major function of databases. Also, the word processors rely heavily on searching. There is a lot of search knowledge out there — and experience too.

Look at the search facility of my Web site. I still don't call it a search engine. It only indexes my Web pages. I agree close to 100% on the relevancy of searches of my Web site. If such technology could be expanded to index the entire Internet, it could become a high quality Internet search engine. Perhaps some programmer of Internet searching decides to open-source his/her script or program. Hundreds, if not thousands of Internet programmers, could chip in and improve the original to a highest quality of Web searching. I name that level:

The Super Search Engine of the Internet

At this point of Internet infancy - the year of grace 2006 - the three major Internet search engines look very much alike. Looks like Google, Yahoo, and MSN (now Bing) apply the same search algorithm to the same search index. The difference is in the IT capacity. Google beats them all because of a far larger computer capacity. Many more computers and far larger storage capacity (hard drives) translate to many, many more pages indexed. It also looks like the Big Three are applying now my concept of book paradigm - to some extent. I can teach them even better tricks...

Read related articles:

Ion Saliu
Doctor in Occult Science of Searching (OssD)

The Super Search Engine: The Best Indexing and Searching.

Web pages and the best search index.

| Home | Search | New Writings | Odds, Generator | Contents | Forums | Sitemap |

History and future considerations on Internet searching and search engines.