Wednesday, January 13, 2010

Crawlers, spiders, and robots - Databases - Quality considerations

Crawlers, spiders, and robots

The query interface and search results pages truly are the only parts of a search engine that the user ever sees. Every other part of the search engine is behind the scenes, out of view of the people who use it every day. That doesn’t mean it’s not important, however. In fact, what’s in the back end is the most important part of the search engine, and it’s what determines how you show up in the front end.

If you’ve spent any time on the Internet, you may have heard a little about spiders, crawlers, and robots. These little creatures are programs that literally crawl around the Web, cataloging data so that it can be searched. In the most basic sense, all three programs — crawlers, spiders, and robots — are essentially the same. They all collect information about each and every web URL.

This information is then cataloged according to the URL at which they’re located and are stored in a database. Then, when a user uses a search engine to locate something on the Web, the references in the database are searched and the search results are returned.


Every search engine contains or is connected to a system of databases where data about each URL on the Web (collected by crawlers, spiders, or robots) is stored. These databases are massive storage areas that contain multiple data points about each URL.

The data might be arranged in any number of different ways and is ranked according to a method of ranking and retrieval that is usually proprietary to the company that owns the search engine.

You’ve probably heard of the method of ranking called PageRank (for Google) or even the more generic term quality scoring. This ranking or scoring determination is one of the most complex and secretive parts of SEO. How those scores are derived, exactly, is a closely guarded secret, in part because search engine companies change the weight of the elements used to arrive at the score according to usage patterns on the Web.

The idea is to score pages based on the quality that site visitors derive from the page, not on how well web site designers can manipulate the elements that make up the quality score. For example, there was a time when the keywords that were used to rank a page were one of the most important factors in obtaining a high-quality score.

That’s no longer the case. Don’t get me wrong. Keywords are still vitally important in web page ranking. However, they’re just one of dozens of elements that are taken into consideration, which is why a large portion of Part II of this book is dedicated to using keywords to your advantage. They do have value; and more important, keywords can cause damage if not used properly — but we’ll get to that.

Quality considerations

When you’re considering the importance of databases, and by extension page quality measurements, in the mix of SEO, it might be helpful to equate it to something more familiar — customer service. What comprises good customer service is not any one thing. It’s a conglomeration of different factors — greetings, attitude, helpfulness, and knowledge, just toname a few — that come together to create a pleasant experience. A web page quality score is the same.

The difference with a quality score is that you’re measuring elements of design, rather than actions of an individual. For example, some of the elements that are known to be weighted to develop a quality score are as follows:

Domain names and URLs

Page content

Link structure

Usability and accessibility

Meta tags

Page structure

It’s a melding of these and other factors — sometimes very carefully balanced factors — that are used to create the quality score. Exactly how much weight is given to each factor is known only

to the mathematicians who create the algorithms that generate the quality score, but one thing is certain: The better quality score your site generates, the better your search engine results will be, which means the more traffic you will have coming from search engines.

Sunday, January 10, 2010

Search engine results pages

The other sides of the query interface, and the only other parts of a search engine that’s visible to users, are the search engine results pages (SERPs). This is the collection of pages that are returned with search results after a user enters a search term or phrase and clicks the Search button. This is also where you ultimately want to end up; and the higher you are in the search results, the more traffic you can expect to generate from search. Specifically, your goal is to end up on the first page of results — in the top 10 or 20 results that are returned for a given search term or phrase. Getting there can be a mystery, however. We’ll decode the clues that lead you to that goal throughout the book, but right now you need to understand a bit about how users see SERPs.

Let’s start with an understanding of how users view SERPs. Pretend you’re the searcher. You go to your favorite search engine — we’ll use Google for the purposes of illustration because that’s everyone’s favorite, isn’t it? Type in the term you want to search for and click the Search button. What’s the first thing you do when the page appears? Most people begin reading the titles and descriptions of the top results. That’s where you hook searchers and entice them to click through the links provided to your web page. But here’s the catch: You have to be ranked close enough to the top for searchers to see those results page titles and descriptions and then click through them, which usually means you need to be in the top 10 or 20 results, which translates into the first page or two of results. It’s a tough spot to hit.

There is no magic bullet or formula that will garner you those rankings every time. Instead, it takes hard work and consistent effort to push your site as high as possible in SERPs. At the risk of sounding repetitive, that’s the information you’ll find moving forward. There’s a lot of it, though, and to truly understand how to land good placement in SERPs, you really need to understand how search engines work. There is much more to them than what users see.

Thursday, January 7, 2010

Structure of a search engine

By now you probably have a fuzzy idea of how a search engine works, but there’s much more to it than just the basic overview you’ve seen so far. In fact, search engines have several parts. Unfortunately, it’s rare that you find an explanation describing just how a search engine is made — that’s proprietary information that search companies hold very close to their vests — and that information is vitally important to succeeding with search engine optimization (SEO).

Query interface

The query interface is what most people are familiar with, and it’s probably what comes to mind when you hear the term ‘‘search engine.’’ The query interface is the page, or user interface, that users see when they navigate to a search engine to enter a search term. There was a time when the search engine interface looked very much like the page shown in figure. This interface was a simple page with a search box and a button to activate the search, and not much more.

Today, many search engines on the Web have added much more personalized content in an attempt to capitalize on the real estate available to them. For example, Yahoo! Search, shown in Figure, is just one of the search services that now enable users to personalize their pages with a free e-mail account, weather information, news, sports, and many other elements designed to make users want to return to that site to conduct their web searches. One other option users have for customizing the interfaces of their search engines is a capability like the one Google offers. The Google search engine has a customizable interface to which users can add different gadgets. These gadgets enable users to add features to their customized Google search home page that meet their own personal needs or tastes.

Search has even extended onto the desktop. Google and Microsoft both have search capabilities that, when installed on your computer, enable you to search your hard drive for documents and information in the same way you would search the Web. These capabilities aren’t of any particular use to you where SEO is concerned, but they do illustrate the prevalence of search and the value that users place on being able to quickly find information using searching capabilities.

When it comes to search engine optimization, Google’s user interface offers the most potential for you to reach your target audience, because it does more than just optimize your site for search: If a useful tool or feature is available on your site, you can enable users to have access to this tool or feature through the Application Programming Interface (API) made available by Google. Using the Google API, you can create a gadget that users can install on their Google Desktop, iGoogle page, or Firefox or Chrome browser. This enables you to have your name in front of users on a daily basis.

For example, a company called offers a Google gadget that enables users to turn their documents into PDF files right from their Google home page once the gadget has been added. If the point of search engine optimization is ultimately to get your name in front of as many people as possible, as often as possible, then making a gadget available for addition to Google’s personalized home page can only further that goal.

Sunday, January 3, 2010

Google Page Rank - Read More About The Concept-

PageRank is one of those mysteries that may never be completely unraveled. Volumes have been written about it, but probably the only two people in the world who understand it completely are Larry Page and Sergey Brin. That’s because it was their brainchild.
PageRank actually started as part of a research project that Page and Brin were working on at
Stanford University. The project involved creating a new search engine that ranked pages in a
democratic fashion with a few weights and measures thrown in for accuracy. Hence, the term.
(What else would you call a ranking system for web pages that was developed by Larry Page?)
The interesting thing about PageRank is that although Page and Brin conceived the idea and created the algorithm that arrives at a PageRank, it didn’t belong to them. Stanford University actually owned the patent on the PageRank algorithm until Google purchased the exclusive right to use the algorithm for 1.8 million shares of the company (which were sold in 2005 for $336 million).
PageRank is a method by which web pages are ranked in Google search results. A combination of factors create the actual rank of a web page. Google explains it this way: ‘‘PageRank relies on the uniquely democratic nature of the Web by using its vast link structure as an indicator of an individual page’s value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. But Google looks at more than the sheer volume of votes, or links a page receives; it also analyzes the page that casts the vote. Votes cast by pages that are themselves ‘‘important’’ weigh more heavily and help to make other pages ‘‘important.’’’ In other words, it’s a mystery. A page that has more links (with equal votes) might rank lower than a page that has a single link that leads to a ‘‘more important’’ page. The lesson? Create pages for visitors, not for search engines.

