
Investigating Search Engines and Directories
The term search engine has become the predominant term for search system or search site, but before reading any further, you need to understand the dif- ferent types of search, um, thingies, you’re going to run across. Basically, you need to know about four thingies.
Search indexes or search engines
Search indexes or engines are the predominant type of search tools you’ll run across. Originally, the term search engine referred to some kind of search index, a huge database containing information from individual Web sites.
Large search-index companies own thousands of computers that use soft- ware known as spiders or robots (or just plain bots ) to grab Web pages and read the information stored in them. These systems don’t always grab all the information on each page or all the pages in a Web site, but they grab a signif- icant amount of information and use complex algorithms — calculations based on complicated formulae — to index that information. Google, shown in Figure 1-1, is the world’s most popular search engine, closely followed by Yahoo! and MSN.
Index envy
Late in 2005, Yahoo! ( www.yahoo.com ) claimed that its index contained information about almost 20 billion pages, along with almost 2 billion images and 50 million audio and video pages. Google ( www.google.com ) used to
actually state on its home page how many pages it indexed — they reached 15 billion or so at one point — but decided not to play the “mine is bigger than yours” game with Yahoo!
Search directories
A directory is a categorized collection of information about Web sites. Rather than containing information from Web pages, it contains information about Web sites.
The most significant search directories are owned by Yahoo! ( dir.yahoo. com ) and the Open Directory Project ( www.dmoz.org ). (You can see an example of Open Directory Project information, displayed in Google — dir.google.com — in Figure 1-2.) Directory companies don’t use spiders or bots to download and index pages on the Web sites in the directory; rather, for each Web site, the directory contains information, such as a title and description, submitted by the site owner. The two most important directo- ries, Yahoo! and Open Directory, have staff members who examine all the sites in the directory to make sure they’re placed into the correct categories and meet certain quality criteria. Smaller directories often accept sites based on the owners’ submission, with little verification.
Here’s how to see the difference between Yahoo!’s search results and the Yahoo! directory:
1. Go to www.yahoo.com .
2. Type a word into the Search box. 3. Click the Search button.
The list of Web sites that appears is called the Yahoo! Search results, which are currently provided by Google.
4. Notice the Directory tab at the top of the page.
You see a line that says something like Category: Footwear Retailers. You also see the line underneath some of the search results.
5. Click either the tab or link.
You end up in the Yahoo! directory. (You can go directly to the directory by using dir.yahoo.com .)
Non-spidered indexes
I wasn’t sure what to call these things, so I made up a name: non-spidered indexes. A number of small indexes, less important than major indexes such as Google, don’t use spiders to examine the full contents of each page in the index. Rather, the index contains background information about each page, such as titles, descriptions, and keywords. In some cases, this information comes from the meta tags pulled off the pages in the index. In other cases, the person who enters the site into the index provides this information.
Pay-per-click systems
Some systems provide pay-per-click listings. Advertisers place small ads into the systems, and when users perform their searches, the results contain some of these sponsored listings, typically above and to the right of the free listings.
Keeping the terms straight
Here are a few additional terms that you will see scattered throughout the blog:
Search site: This Web site lets you search through some kind of index or directory of Web sites, or perhaps both an index and directory. (In some cases, search sites known as meta indexes allow you to search through multiple indices.) Google.com, AOL.com, and EarthLink.com are all search sites. DogPile.com and Mamma.com are meta-index search sites. Search system: This organization possesses a combination of software, hardware, and people that indexes or categorizes Web sites — they build the index or directory you search through at a search site. The distinction is important, because a search site may not actually own a search index or directory. For instance, Google is a search system — it displays results from the index that it creates for itself — but AOL.com and EarthLink.com aren’t. In fact, if you search at AOL.com or EarthLink. com and search, you actually get Google search results.
Google and the Open Directory Project provide search results to hun- dreds of search sites.
Search term: This is the word, or words, that someone types into a search engine when looking for information.
Search results: Results are the information returned to you (the results of your search term) when you go to a search site and search for some- thing. As just explained, in many cases the search results you see don’t come from the search site you’re using, but from some other search system.
Natural search results: A Web page can appear on a search-results page two ways: The search engine may place it on the page because the site owner paid to be there (pay-per-click ads), or it may pull the page out of its index because it thinks the page matches the search term well. These free placements are often known as natural search results ; you’ll also hear the term organic and sometimes even algorithmic .
Search engine optimization (SEO) : Search engine optimization (also known as SEO ) refers to “optimizing” Web sites and Web pages to rank well in the search engines . . . the subject of this blog, of course.
Why bother with search engines?
Why bother using search engines for your marketing? Because search engines represent the single most important source of new Web site visitors.
You may have heard that most Web site visits begin at a search engine. Well, this isn’t true. It was true several years ago, and many people continue to use these outdated statistics because they sound good — “80 percent of all Web site visitors reach the site through a search engine,” for instance. However, in 2003, that claim was finally put to rest. The number of search-originated site visits dropped below the 50-percent mark. Most Web site visitors reach their destinations by either typing a URL — a Web address — into their browsers and going there directly or by clicking a link on another site that takes them there. Most visitors don’t reach their destinations by starting at the search engines.
However, search engines are still extremely important for a number of reasons:
The proportion of visits originating at search engines is significant. Not so long ago, one survey put the number at almost 50 percent. Sure, it’s not 80 percent, but it’s still a lot of traffic.
According to a report by eMarketer published early in 2005, 21 percent of American Internet users use a search engine four or more times each day; PEW Internet estimated that 38 million Americans use search engines every day.
A study by iCrossing in the summer of 2005 found that 40 percent of people do online research prior to purchasing products.
Of the visits that don’t originate at a search engine, a large proportion are revisits — people who know exactly where they want to go. This isn’t new business; it’s repeat business. Most new visits come through the search engines — that is, search engines are the single most impor- tant source of new visitors to Web sites.
Some studies indicate that a large number of buyers begin at the search engines. That is, of all the people who go online planning to buy some- thing or looking for product information, perhaps over 67 percent use a search engine, according to a study in 2005 by iCrossing.
The search engines represent a cheap way to reach people. In general, you get more bang for your buck going after free search-engine traffic than almost any other form of advertising or marketing.
Where Do People Search?
You can search for Web sites at many places. Literally thousands of sites, in fact, provide the ability to search the Web. (What you may not realize, how- ever, is that many sites search only a small subset of the World Wide Web.)
However, most searches are carried out at a small number of search sites. How do the world’s most popular search sites rank? That depends on how you measure popularity:
Percentage of site visitors (audience reach) Total number of visitors
Total number of searches carried out at a site
Total number of hours visitors spend searching at the site
Each measurement provides a slightly different ranking, though all provide a similar picture, with the same sites generally appearing on the list, though some in slightly different positions.
The following list runs down the world’s most popular search sites, based on one month of searches during 2005 — 4.5 billion searches — according to a Nielsen/NetRatings study. These statistics are for U.S. Internet users:
Google.com Yahoo.com
46.2% 22.5%
MSN.com AOL.com My Way
12.6%
5.4% 2.2%
Ask (AskJeeves) Netscape.com
1.6% 1.6%
iWon
0.9%
Earthlink DogPile Others
0.8% 0.9% 5.3%
Remember, this is a list of search sites, not search systems. In some cases, the sites have own their own systems. Google provides its own search results, but AOL doesn’t. (AOL gets its results from Google.)
The fact that some sites get results from other search systems means two things.
The numbers in the preceding list are somewhat misleading. They sug- gest that Google has around 46.2 percent of all searches. But Google also feeds AOL its results — add AOL’s searches to Google’s, and you’ve got 51.6 percent of all searches. In addition, Google feeds Netscape (another 1.6 percent according to NetRatings) and EarthLink (0.8 percent ). And DogPile is a meta search engine: Search at DogPile, and you see results from Google, Yahoo!, MSN, and Ask.
You can ignore some of these systems. At present, for example, and for the foreseeable future, you don’t need to worry about AOL.com. Even though it’s one of the world’s top search sites, you can forget about it. Sure, keep it in the back of your mind, but as long as you remember that Google feeds AOL, you need to worry about Google only.
Now reexamine the preceding list of the world’s most important search sites and see what you remove so you can get closer to a list of sites you care about.
The Top Search Sites:
Search Site
Google.com
Yahoo.com
MSN.com
AOL.com
MyWay.com
Ask.com (also known as AskJeeves.com)
Netscape.com
iWon.com
EarthLink.com
DogPile.com
To summarize, five important systems are left:
Google Yahoo MSN Ask
Open Directory Project
That’s not so bad, is it? You’ve just gone from thousands of sites down to five. Note, by the way, that the top three positions may shift around a little. Google has already lost a large proportion of its share (when I wrote the first edition of this book Google had around three quarters of the market . . . now it’s probably a little over one half), and a big battle’s brewing between the top three; in fact 2006 is turning out to be the year of bribing people to bring them to search. Take a look at Google partner Blingo ( www.blingo.com ) and at MSN Search and Win ( www.MSNSearchAndWin.com ).
Now, some of you may be thinking, “Aren’t you missing some sites? What hap- pened to HotBot, Mamma.com, WebCrawler, Lycos, and all the other systems that were so well known a few years ago?” A lot of them have disappeared or have turned over a new leaf and are pursuing other opportunities.
For example, Northern Light, a system well known in the late 1990s, now sells search software. And in the cases in which the search sites are still running, they’re generally fed by other search systems. Mamma.com, DogPile, and MetaCrawler get search results from the top four systems, for instance, and HotBot gets results from Ask. Altavista and AllTheWeb get their data from Yahoo! If the search site you remember isn’t mentioned here, it’s either out of business, being fed by someone else, or simply not important in the big scheme of things.
No comments:
Post a Comment