consulting | development | marketing

Search engines and how they got that way

A sentimental and humorous journey through 20 years of technology.

Once upon a time, a time long forgotten, there were no Search Engines. No Google. No Yahoo! and no MSN (This is where kids look at you for the first time in astonishment and unbelievingly utter "You're kidding."). It was a world without the World Wide Web, yes, a world where there was no Internet. Where not everybody had a computer. No email. No chatting. No YouTube. No myspace. (The kids have now left the room, but not before shouting "Get outta here! That's impossible. I'm gonna text my BFF that UR NSANE."). How fast they forget. I didn't say this would be easy.

In this gray and long forgotten world, what we know now as The Internet had emerged from an idea initially conceived during the cold war. A network of computers operating more or less independently from each other, yet being connected and therefore able to share data and communicate with each other.

Access to this network was, although public, still limited, restricted mostly to universities (who else had a computer in their home in 1990? A Modem? Well, back then we actually used Couplers.), and the information available on all of these computers was still somewhat manageable.

The early days

In 1990, a bright student at McGill Univeristy in Montreal named Alan Emtage, created a script which, based on regular expressions, was gathering file names from public FTP servers and indexed them into a searchable database - the first Search Engine, the Grandfather of them all: Archie. It was short for "archives", and that was all it did: it archived the file names and where to find them. It didn't scan the contents of the file and didn't care if it was text, an image, or something else.

This came only about a year later with the introduction of Gopher by Mark McCanhill, a student at the University of Minnesota. Although Gopher only indexed text documents, it was quite an improvement, since now one could actually search for the content of the file without having to download it first only because the file name sounded promising. Gopher also could transfer to your desired file's address (similar to clicking on a search result in Google today), read the file and send it to your email address (Admit it, it was a cool feature. Do you know how to do that today?).

Archie and Gopher were pretty successful and so it's no surprise that in 1992 a tool named Veronica (Very Easy Rodent-Oriented Net-wide Index to Computerized Archives or named after Archie's girlfriend in the comic books - whichever you like best), and another year later, in 1993 Jughead (Jonzy's Universal Gopher Hierarchy Excavation And Display) appeared on the (still very gray) horizon. Both provided searches within the Gopher listings, menu listings and menu information.

In the meantime, somewhere in Switzerland

At the CERN institute in Switzerland, a man named Tim Berners-Lee had a dream: a project, a network based on the concept of Hypertext, which, when combined with two other Internet Protocols (TCP and DNS) would create a network in which users would be able to share and browse information - something we know now as The World Wide Web.

We can without a doubt assume his project was a success (although not in the early days), and the first web page ever was published at http://info.cern.ch/ on August 6, 1991. And yes, the address is still working. But sorry, it's not the original page anymore. I don't have a copy of it anymore (If someone does, please send me the link or the screenshot, and I will be more than happy to include it here!) However, The CERN Institute does provide the earliest version saved here.

To read this and other web pages special software was needed, and among the first browsers aside from the one developed by CERN (called WorldWideWeb) were libwww, Viola and in 1992 Lynx, developed within Academic Computing Services at the University of Kansas. Lynx had not been created for the World Wide Web, it was already a Hypertext browser, and later became a web browser when an Internet interface was added in 1993. Lynx was already able to access and read Gopher files and - unlike Gopher - is still around, available on every UNIX or linux system (there might even be a Windows version...), and it hasn't changed too much over the years. It is still a text-based browser, ignoring everything but pure text content (Sometimes a really nice alternative to educate people on how modern Search Engines see their pages).

All it took was Mosaic

1993, the year the World Wide Web exploded. Figuratively.
Explode it did, and it did so because of Marc Andreessen and Eric Bina, who programmed a piece of software at the National Center for Supercomputing Applications (NCSA), called it Mosaic and released it that same year, first for UNIX and later for Apple and Microsoft® Windows®. Unlike all the others, Mosaic was the first browser with a GUI (Graphical User Interface). Marc Andreessen soon left NCSA, and within a short time, together with Jim Clark, the founder of SGI, started Mosaic Communications Corp, which later became Netscape Communication Corporation, which of course released Netscape and Mozilla. Mosaic's development was discontinued in 1997.

On August 23, 1995 Windows 95® was released by Microsoft®, including its own version of a web browser called Internet Explorer. After the ensuing browser war, Internet Explorer finally outranked Netscape as the leading web browser in 1999.

A new era begins

Back to the year 1993, the year of the official release of the first graphical web browser:
That same year, MIT student Matthew Gray developed the World Wide Web Wanderer or Wandx as it was later called, the first "real" web Search Engine Robot. First, its purpose was to measure and track the actual size of the web; later it gathered URLs, thus creating the first database of websites. Over a period of 3 years, the World Wide Web Wanderer measured the number of websites from the initial 130 in June 1993 to over 100,000 in January 1996, estimating the number to be 230,000 only six moths after that.

Also in 1993, Martijn Koster in England created ALIWEB (Archie-Like Indexing of the Web), the first engine allowing users to submit their pages to be indexed. This was also the only way to get indexed by ALIWEB, since it did not have any robot or spidering technology to index pages on its own. Still online today.

Although not publicly introduced until 1995, Excite (initially called Architext) was created in 1993 by 6 Stanford students. It used statistics and analytics to support the search process and a special version for webmasters was made available.

Enter the big players

1994 was a big year for Search Engines:
Galaxy, initially created as a directory containing Gopher and telnet search features, was developed by the MCC Research Consortium at the University of Texas, in Austin. It is still around.

And another major player entered the stage in 1994: Yahoo!. Jerry Yang and David Filo at Stanford University started out by creating a list of their own favorite websites, initially called Jerry's Guide to the World Wide Web. What set Yahoo! apart from other Search Engines was the fact that it also contained a description of the page itself and was for a very long time a directory (and parts of it still are).

Around the same time in 1994, the first full-text Search Engine, indexing every word on a web page, WebCrawler was introduced by Brian Pinkerton at the University of Washington.

Lycos completes the list of newcomers in 1994. Named after the Latin word for the wolf spider, Lycos was created by Dr. Michael Maldin at CMU's Center for Machine Translation at Pittsburgh's Carnegie Mellon University. It added relevance, prefix matching and proximity to the search algorithm, providing users with even more exact search results. At its start, Lycos' catalog contained information about approximately 54,000 pages on the web, growing to 60 million by November 1996, making it the largest Search Engine at that time. Lycos was sold to Terra Networks in 2000 for $5.4 (!) billion, becoming Terra Lycos. In 2004, Lycos again changed ownership, this time being purchased by Daum Communincations Coporation, Korea's second largest Internet portal, changing its name back to Lycos, Inc.

Although it first appeared online in 1994, many didn't consider Steven Kirsch's InfoSeek Guide's first version a real Search Engine, as it started as a pay-per-use service, dropping the fees in August 1994 and re-emerged as a true Search Engine in 1995.

Take-off: The dot-com boom

1995 - the year of the Search Engines (and millions of other start-up-dot-coms):
DejaNews, MetaCrawler, SavvySearch, InfoSeek (relaunch), Excite (public version), Inktomi, which later for quite some time provided Search results for Miscrosoft's® MSN Search, and in 2003 got acquired by Yahoo!, and Magellan, which was later taken over by Excite, all started in 1995. Including the biggest of them all: AltaVista.

SavvySearch by Daniel Dreilinger at Colorado State University and MetaCrawler by University of Washington student Eric Selberg and his advisor Oren Etzioni were Search Engines that allowed users to search in up to 20 other Search Engines simultaneously, thus the name Meta-Search Engines.

DejaNews, started in 1995 by Steve Madere in Austin, Texas, was a special Search Engine for Usenet postings. Unfortunately, in 1999 DejaNews became deja.com and reinvented itself as mainly a shopping comparison website (although still providing the Usenet search feature), but during the transition many messages in the archive were lost. After the dot-com crash financial troubles followed and in 2001 the service was discontinued, leaving Usenet readers without any possibility to search for postings across newsgroups without having to download all available message headers in a certain group in their readers. The archives were later acquired by Google and represented as Google Groups. Although the archives now allegedly extend back to the year 1981, many on the Internet will always miss DejaNews.

AltaVista, originally owned and created by DEC (Digital Equipment Corporation), was the first Search Engine to introduce natural language queries, and backed by DEC's 64-bit Alpha servers had capabilities unparalleled in the Search Engine world of 1995. The site received 300,000 hits on its first day, and more than 80 million two years later. In 1998, AltaVista served around 13 million search queries per day.
In 1998, DEC was sold to Compaq, and Compaq relaunched AltaVista 1999 as a portal; confusing many users with the new features that suddenly appeared on the home page instead of the minimalist search interface.
Compaq agreed to sell a majority of AltaVista (83%) to an investment company named CMGI in 1999 for the alleged outrageous sum of 2.3 billion (!) dollars in cash and stock, but in the wake of the dot-com fallout the deal was cancelled and never materialized.
While in 1996 AltaVista was the sole provider for Yahoo! search results and undisputedly the largest, fastest and best Search Engine at that time, in 2003 AltaVista was acquired by Overture (formerly goto.com) for a "measly" 140 million dollars in cash and stock; Overture itself in return was bought by Yahoo! the same year.

In 1996, many more Search Engines appeared: Dogpile, another meta-crawler, by Aaron Flin and eventually sold to Go2net (which was in turn acquired by Infospace), LookSmart, originally an Australian inception called NetGet, which later provided search results for MSN Search®, HotBot by Eric Brewer and Paul Gauthier at the University of California at Berkeley, AskJeeves by Garrett Gruener and David Warthen from Berkeley, California, now known as ask.com (acquired Excite in 2004 and Excite Europe in 2005) and Alexa by Brewster Kahle and Bruce Gilliat, which in 1999 was acquired by amazon.com. Alexa is best known for the Alexa Toolbar which the company uses to track and store user behavior and to rank websites visited.

BackRub also was introduced in January 1996. "BackRub?" I hear you say. "Never heard of it" you say. A friendly reminder, Dear Reader: Search Engines change names. Check out these links before you proceed: Backrub Link 1 and BackRub Link 2. Still don't believe me? Well, they confirm it here. Both pages are not the originals according to avid observers (many factors like background colors suggest they are not, so do the dates), but the same people also agree they are pretty close.

1997 introduced more interesting players like FAST, developed at the Department of Computer and Information Science at NTNU, Norway. In April 2008, FAST was acquired by Microsoft®, and is now known and FAST, A Microsoft Subsidiary.

Northern Light by C. David Seuss in Cambridge, Massachusetts, maybe one of the best Search Engines ever. However, after using all of the 70 million dollars in funding, in 2002 Northern Light was bought by Divine Inc for $16 million, a company that went bankrupt shortly afterwards. Seuss bought the assets of his original brainchild at an auction for $81,000 in 2003 and it is back online again.

GoTo.com, an Idealab spin off and the first company to successfully include paid listings in its search service became the basis for all other PPC services. In 2001 goto.com was renamed Overture Services, Inc and partnered with MSN and Yahoo!. Yahoo! in 2003 acquired Overture for $1.6 billion and rebranded it as Yahoo! Search Marketing.

The age of Google or how to survive the dot-com bubble

Yes, it's 1998, and this can only mean one thing: it's the year Google was incorporated and officially introduced as a Search Engine. But it wasn't until Sept 21, 1999 when the company finally removed the "Beta" label from its website. Everything else about Google (well, except for its Search Engine algorithm) can be found on the web. I suggest going to www.google.com.

But more things happened in 1998: Microsoft® wanted its piece of the cake and released MSN Search in late 1998 to compete with the other big players at the time. Back then however, MSN Search used search results from other Search Engines: Inktomi, LookSmart and AltaVista. In 2006, the official name was changed to Live Search and Microsoft® AdCenter® was integrated into Live Search. Before Microsoft® started showing exclusively its own AdCenter® it was also displaying Yahoo! Search Marketing results on MSN Search pages.

What happened next is history. Most of the other smaller Search engines either went out of business during the dot-com crash or were picked up for pennies on the dollar by the survivors in the aftermath.

No new big player has appeared on the Search Engine playing field since then. Some have tried and we can't even remember their names. Most of you have probably been using Google all of you lives, and imagining a world without it is — simply impossible.

What's next?

So what has been happening? What's new? And: What's next?
Allright, most of the Search Engine websites have had facelifts to comply with the new, hip Web 2.0 look and feel. But that's not all. The technology in the background, unseen and unnoticed by most users, has undergone dramatic changes. Search Engines know more about us than our spouses, our friends or parents. Where we shop. What we eat. Who we talk to. Where we live. When we sleep. Big Brother has long become a reality. How long until Minority Report will be as well?

Every move we make on our computers, every click with the mouse, every purchase we make is being stored somewhere, analyzed, interpreted. This information is not new, it has always been available. But once considered geeky, looking at and interpreting log files has evolved into an invaluable and necessary task: Finally, the value and power of this information have become apparent.

At the checkout at my grocery store a video monitorplays commercials 24/7. The same ones for everybody — as of now. My phone plays videos, too. And Google lets me develop ads especially for mobile devices.

In every browser window, on every monitor in my office there is at least one open Google Search tab at all times.

Yet, the other day, I reached for my 30-year old Oxford English Reader's dictionary. Google is not in there.
Archie anyone?

Most of this information was compiled during my teaching years (<2004) for educational purposes.
I dug it up a couple of days ago, after it had been collecting virtual dust on one of my backup drives for quite a while. I verified the original information and researched and updated the newer parts.
Forgive me, but I simply cannot mention every resource I used over the years. The original document still has them, but meanwhile most of them don't exist anymore, have been replaced, renamed, rebranded or simply vanished into the dark of cyberspace.
I want to thank the countless programmers and developers for their tireless efforts over the years.
As always, when in doubt, start here: www.google.com

Contact us about your project
1-877-NINANET
(1-877-646-2638)