Before I landed my cushy job as a magazine editor, I spent three years under the hood at Hotbot as an engineer and manager. Between days reading our log files and nights shmoozing with other search engineers, I learned more than I'd ever wanted to know about where search traffic comes from, and where it goes to. I even wrote an article about it for Webmonkey.
But I had put all that behind me ... until my lovely wife, Christina, asked me about search engine optimization for Artloop, her fine art research and location service.
Dozens of companies had pitched their optimization services to her, but Christina, a former MSN manager as smart about database schema as she is about business plans, balked. Why pay someone to set up bogus domains, build huge farms of gateway pages, and cram hundreds of keywords like "britney spears" into Artloop's HTML? The very idea ran contrary to the information architecture and site layout her staff had worked so hard to make as clean and clear as possible for their visitors. Moreover, as a Web user herself, she'd learned to recognize these traffic-grabbing methods and had become wary of sites using tricks to get her to click. Why should she assume her own customers would behave differently?
And she was right: Trying to fool search engine users with keywords and trick tags makes sense only if your goal is to flash a lot of ad banners, return traffic be damned. That used to be the business model for an entire industry. But most sites in business today hope to convert first-time visitors into loyal customers by building long-term relationships. Sure, searchers need to find your site, but the results on Hotbot's Top Ten lists show that the only results people stick with are the ones that don't try to scam them. Trap doors, redirects, keyword spam, and multiple domains that host the same pages are more likely to make people reach for the back button (a move the Direct Hit technology behind Top Ten results can detect), not their credit cards.
So, rather than waste money on consultants, Christina and I decided to create our own search optimization spec. Using data gleaned from representatives of leading search engines, insider data, and old-fashioned trial and error, we came up with our own strategy for getting traffic from search engines and portals without having to fake people out. In the process, we encountered so many dubious "experts" with something for sale software, books, services that we decided to raise the bar on them and publish our notes for free.
Imagine our surprise when Google's engineers read this article (when it first published in early June, 2001) and invited us to visit their offices to dig even deeper into the workings of their gigapage Web index. Of course we took them up on the offer, and we've updated this article with our notes from those meetings. We've also included answers to the best questions from the hundreds of emails we've received over the past couple months.
But before you can benefit from the sweat off of our brows, you have to get your priorities straight.
Set Your Priorities
Christina and I soon discovered how critical it is to figure out which goals absolutely need to be met, and which aren't worth the bother. When it comes to search engine optimization, it's easy to become obsessed with doing every little thing, but the rewards for that kind of attention to detail are far too small. You need to stick to chasing the big game.
The Biggest Fish to Fry: Yahoo, Google, and Inktomi
The Yahoo directory accounts for half the traffic referred to most sites. So get your site listed on Yahoo, and your traffic can literally double overnight. Beyond that, most search engine traffic comes from two places: Google and Inktomi.
Traffic from Google has increased at an astonishing rate over the past year: Jakob Nielsen's search engine referrals to his Useit site confirm this, as do the unpublished reports from retail sites like Stylata. Google, once considered a niche site for nerds, is the Wall Street Journal's pick for best search engine on the Net, and the traffic numbers seem to agree.
Inktomi, the number two traffic generator, doesn't run its own search site. Instead, the company provides the technology behind MSN Search and AOL Search, two top referrers, as well as Hotbot and over a dozen more.
Portal sites like Excite, Lycos, and AltaVista still draw lots of traffic, but together Google and Inktomi outweigh the entire rest of the field. Add it up and it's pretty clear how to maximize your traffic for the least effort:
- Get yourself into Yahoo's directory.
- Make sure your site is thoroughly crawled by Google and Inktomi.
- Get lots of links to your site from domains that a lot of other sites link to that's how Google and Inktomi determine relevance when ranking search results.
- For all other search engines, implement a blanket strategy that gets you reasonable results. By not chasing each one of them separately, you can put your company's time and money to more important uses.
All of this can be accomplished with one, three-step process. And it really is as easy as 1-2-3.
Step 1: Get Crawled
There are quite a few things you can do to grab the attention of search engines and directories:
Clean Up Your URLs
Frames used to be the biggest roadblock to getting crawled, but no more: Both Google and Inktomi now crawl them (the section of Inktomi's support FAQ that claims this isn't so is out of date, according to the company). Instead, the problem with most e-commerce sites today is that their product pages are dynamically generated. While Google will crawl any URL that a browser can read, most of the other search engines balk at links with "?" and "&" characters that separate CGI variables (such as "artloop.com/store?sku=123&uid=456"). As a result, many individual product pages don't show up outside of Google.
One way to circumvent this difficulty is to create static versions of your site's dynamic pages for search engines to crawl. Unfortunately, duplicating your pages is a huge amount of extra work and a constant maintenance chore, plus the resulting pages are never quite up-to-date all the headaches dynamic pages were designed to eliminate.
A far better strategy is to follow the lead of Amazon and rewrite your dynamic URLs in a syntax that search engines will gladly crawl. So URLs that look like this ...
amazon.com/store?shop=cd&sku= B00004WFIZ&ref=p_ir_m&sessionID= 107-6571839-6268523
... become ...
Amazon's application server knows the fields in the URL are actually CGI parameters in a certain order, and processes them accordingly.
J.K. Bowman's Spider Food site explains how to fix URLs for most popular e-commerce servers. One of Artloop's Web programmers learned Apache rewrite rules that tell Apache how to translate slash-separated URLs into a format used by their Netzyme application server. On the back end, Netzyme is passed something like this:
artloop.com/cgi-bin/CssP.exe?CsspApp= ArtLoopClient1&CssServer=localhost%3A32401&CsspFn =@/details/ArtistDetail.html:@:getForm&ObjectLocation =ART&ArtistID=3918
But users and search engines see the tidier, Apache-served URLs, which look something like this:
Not only are the rewritten URLs crawlable by all search engines, they're also more human-friendly, making them easier to pass around the Net.
Many readers have written in to to ask if the search engines will begin crawling and indexing Flash content soon. The answer, as you might guess, is no. Unlike PDF files, Flash files rarely contain information in text format. Search developers don't want to clutter up their indexes with a million "Skip Intro" pages.
Submit your Site
There are a lot of automated search engine submission services that you can use to submit your site to as many search engines as possible. The one most recommended by people I talked to is Submit It, an early player that did so well, Microsoft bought them Submit It is now part of MSN bCentral, and it charges a minimum fee of US$59 to keep a few URLs submitted for a year.
You can avoid the fees by simply submitting to individual search engines on your own. Start with UseIt's list of top referrers that's where most of the traffic you can get will come from. And while you'd think submitting your site to one Inktomi-powered site would work for all of them, optimization experts have told us it works better if you hit them all.
Don't Forget the Directories
Submit It does submit your site to the busiest directory sites, except for the biggies: Yahoo, LookSmart (which MSN serves under its logo), and the Open Directory Project (which powers Lycos, Hotbot, and Netcenter categories). Some of these directories charge for submission, but $400-500 total will get your most important pages into the most trafficked places.
Yahoo still offers free submissions, except for business categories, which cost $199. But even the fee doesn't guarantee they'll accept your site, just that they'll decide on it within a week with free submissions, you don't even get the promise that they'll ever get around to evaluating it, given the incredible volume of submissions.
Once you've submitted your pages, be ready to wait a month, two, or three before they're crawled and indexed. It's frustrating, but processing a billion Web pages takes time at a nonstop rate of one hundred per second, it would still take almost four months.
Make a Crawler Page
It isn't necessary to submit every page on your site to the search engines. Just make sure they can find all the pages that matter by hopping links from your front door. To do that, make a "crawler page" that contains nothing but a link to every page you want search engines to crawl. Use the page's TITLE info as the link text this helps improve your site score. For an example, check out Artloop's crawler page.
Basically, the crawler page is a site map that lists all the pages on your site it may be a bit too big for humans to read through, but it will be no problem for a search engine. Add an obscure link to the crawler page on one of your site's top-level pages, using a small amount of text. MSN used to use 1x1 images for this trick, but the Google geeks warned us to avoid such obviously invisible tags. "Why not just label it 'site map?'" one asked. Search engine spiders will find it as soon as they get to your site, and suck down all the pages it finds on it.
Don't worry, the crawler page won't show up in search results. It does get pulled into the search engine's index, but because it has no text or tags to match a query, it isn't listed as a result. The pages it links to, however, will appear because the search engine's spider found them right after it visited the crawler page. Wired News, for example, uses hierarchical sets of crawler pages to make sure every story ever published is crawlable from the top of the site.
For Artloop, we decided to break the crawler pages down into 100KB pages or smaller, just to be careful we wanted to prevent search spiders from timing out or deciding the pages were too big to crawl.
Pay to Play?
Not too long ago, in response to years of complaints from commercial site owners who demanded their pages be indexed and up to date, Inktomi announced a new service that lets site owners pay to have individual URLs crawled and indexed quickly. If you're wondering whether paid listings are worth it, I suggest trying just a couple of your URLs first pick the ones you feel are poised to make the most money to see if the return on investment meets your needs.
Remember that Inktomi will rank search results largely on the links to your page from other domains. And if no one is linking to you, expect to see your page appear at the end of the results list, not at the top.
There are ways, however, to get your site moving up through the ranks.
Step 2: Get Ranked
Most people that are concerned with search engine optimization focus obsessively on keywords and HTML tags. But when it comes to getting ranked by search engines, the only tags that matter are TITLE, and the META tags KEYWORDS and DESCRIPTION. And you have to be very careful about how you handle each one.
TITLE makes a big difference, especially with Google. It should be short (less than 40 characters seems to work best) and, most importantly, should match the search queries people will be using to find your site. This could lead to a struggle with the marketing managers: They'll want your site's page titles to contain the company name and/or a positioning statement. Ask them what good that will do if no one ever sees the pages.
This is a good TITLE tag that will generate traffic from people searching for "picasso":
This is a mediocre one:
<TITLE>Artstuff: Pablo Picasso</TITLE>
This one will put you out of business:
<TITLE>Artstuff: Your Number One Online Resource for Fine Art Solutions!!!</TITLE>
Keyword spamming is the number one favorite trick for search engine optimization. But many of the sites that stuff a zillion keywords into their pages are hoping to get clicks to their pages just to show ads they don't care if they get any repeat business. But if you want to draw real customers, focus on the keywords you think your users will be searching for.
For our Picasso page, something like this would work (note that uppercase letters don't matter):
<META NAME="keywords" content="Pablo Picasso, Pablo, Picasso, painting, cubist, painting, ceramics, collage, Spain, Guernica, Paris, 20th century, Girl Before a Mirror">
Repeating the most important keyword twice seems to work with some search engines, but repeating more than that will cause some of them to ignore the whole page. Although none of the representatives from the search companies would confirm specific behavior, it seems that they tend to ignore keyword lists longer than 1024 characters, .
What keywords are people searching for? It's important to focus on the right ones. Zipf's Law predicts that traffic for any particular keyword on a search engine will be proportional to its popularity rank. That is, the number of queries (and hence potential clickthroughs to your site) for the most popular keyword will be ten times greater than that for the tenth most popular term. And traffic to term #10 will be 1,000 times higher than traffic to term number 10,000. Search engine logs don't quite match Zipf's curve, and they vary from one engine to the next. But the lesson remains: If you're not matching the top keywords, forget it.
Where to find the top keywords? Two free resources are searchterms.com and a weekly emailing from Wordtracker. Keyword popularity varies from search engine to search engine, but across the Web (and according to a few well-placed contacts at search engines) these listings are close enough. For a more interactive approach, try GoTo's Search Term suggestion tool, which lets you enter keywords and then shows you how popular similar search terms are on the site.
This field gets used for the page summary on Inktomi and some other engines, so don't cram it with keywords: A scary-looking description on a search engine's results page could discourage people from clicking through to your page, even if it scores high. (We'll cover more on descriptions in Step 3.)
It never hurts to have the search terms you want to match near the top of the page. But cramming in a list of spam-style keywords can also backfire Google will display them under the page title on its results page, and Inktomi will show them (as do many others) if there is no DESCRIPTION tag.
Stuffing long strings of repeated keywords into pages used to magically get them to the top of search engine results, but that was before the search engineers realized what was going on and learned how to prevent this from happening. Once in a while you'll see a "spamdexed" page near the top of your results, but this trick works less and less frequently these days.
Links from Other Domains
Look at the top results for the terms you most want to match. Will those sites link to you from their domain? If they do, some of their relevance will rub off on your pages. There are ways to use this dishonestly (see "How to Cheat Honestly" on Page Five), but usually sites only link to other sites they're comfortable being associated with.
Even if your site does manage to claw its way to a plum position in the search results, that doesn't guarantee that users will follow the link that still takes some convincing.
Step 3: Get Clicked
All of the work you've done to get your site crawled at the top of rankings is meaningless if you neglect the final step: Getting the searcher to click through to your site. These days, few users will click on a page described as "Pablo Picasso Pablo Picasso Pablo Picasso art art art art" in search engine results. But if you use TITLE to specify the most likely search term that matches the page, and DESCRIPTION to provide a quick (50 words max) synopsis of the info on the page, your site will attract a lot more clicks.
For Artloop's artist profile pages, we specified that TITLE contain the artist's name, and the DESCRIPTION would hold the summary that appears later in the page, like this:
<TITLE>Artloop: Andy Warhol (Warhola)
<META NAME="description" content="American painter, born in Pittsburgh, and a leading figure in Pop">
Don't Scare Them Away
This is where gateway pages, redirects, shadow domains, and other trickery often fail: The would-be customer gets to your site only to discover it contains confusing pages, poor navigation, gratuitous redirects, or exactly the same content as the last site they looked at huh? When users find pages of such a dubious nature, do you think they're going to trust the site with their credit card number on, say, a $1400 order for two DJ turntables? I sure didn't: When I landed at a site like that recently, I immediately clicked Back and wound up dropping my money on a pair of pricey Technics decks at a site that looked like a real, honest company, rather than a network of sites designed to capture me.
Another mistake new Web marketers make is trying to stop search engines from sending users directly to individual pages on the site something they huffily call "deep linking." They'll force their Webmaster to redirect anyone who hasn't come through the site's front door back to the home page, as if the site were a brick-and-mortar store. This is usually justified as "customer experience" and "branding," but all it really says is the site doesn't trust its customers to know what they want.
I'm guessing most sites abandon this practice once they look at their log files and see their would-be customers abandoning the site after being pulled away from a product they were ready to buy.
All that said, there are ways to beat the system, as long as you don't mind getting your hands a little dirty.
How to Cheat Honestly
As much as I talk up Google, their ranking system isn't foolproof. In short, it ranks individual URLs based on which other URLs link to them, which URLs link to those, and so on. That's the simplified explanation you can read about eigenvectors and normal link matrices in this paper written by Google's creators.
While the system works better than old search engine rankings based on keywords and page content, it's not perfect. Links from popular sites can count more than they should, or not enough if the link comes from an obscure page.
But when Google's engineers read the original version of this article, they bristled at some of our suggestions even though we'd tested them. Emails led to phone calls, and eventually we spent a caffeinated afternoon at the Googleplex in Mountain View, CA, using whiteboards and napkins to sketch out what actually raises your rankings, and what doesn't. We came away with some solid suggestions for where to invest your time wisely:
See? There are a lot of ways to improve your site ranking, and they're all relatively easy. So why on earth would you ever pay someone else to do it?
- Make sure your dynamic pages are crawlable (see above), and make sure the URLs remain constant. If you use one URL on the site map, another for the dynamically generated page, and yet another after giving the user a cookie, the URLs other sites use to link to your pages may not be the same as the one Google indexes. URL inconsistency keeps your pages from being ranked as high as they should be.
- Google crawls the Web in descending order of PageRank, meaning the highest ranked pages are crawled first and most often. So while a crawler page will make your pages findable, getting other sites to link to the individual pages will get them crawled more completely, and thus raise their scores.
- Focus on getting pages that are considered the authoritiy on the topic that you cover to link to your pages. Notice we said pages, not sites. For example, I have a page that's listed by Yahoo, but it's on an obscure part of the directory that no one else links to, so it doesn't help me as much as that link from Dave Winer's blog.
- Ranking trickles down through popular domains with lots of interpage links, raising the value of all pages on a popular site and hence any page it links to. This is something all bloggers have realized. For example, let's say a post on my blog gets Slashdotted. Not many Web pages will link to the actual Slashdot post, so you'd think it wouldn't do much for my site's scores. But the value of the many links to Slashdot's home page trickles down through to the navigable links inside the site, and eventually to the posting about my page.
- Creating fake domains is a popular trick people use to try to raise their Google scores, hoping to make it appear that other domains are linking to them. The Google guys giggle at this obvious scam: If you understand how vectors work, spreading your pages across multiple domains, or building duplicate sites, does no better than if you'd simply added those pages to your original domain. That's because it's the number of inbound links from elsewhere on the Web that raises your overall score, and it's unlikely that fake domains will make that number go up. Google does make some score adjustments concerning URLs within the same domain to improve the overall results quality, but spreading your pages across ten domains won't do much. And according to Google's anti-spam cop, duplicate domains are the easiest scam to spot.
- You can find authoritative pages by using Google's "link:" operator on pages that come at the top search results. For example, search Google for "link:www.webmonkey.com". The result is a list of the pages that link to Webmonkey's front door, listed in descending PageRank order. Who'd have guessed that Lycos "Legal Terms and Conditions" page was such a hot property?
Don't Buy the Snake Oil
The beat-the-system allure of search engine optimization seems to draw the same sort of folks who populate Las Vegas. That's what motivated me to write this article: Too many self-marketing site owners I talked to had bought into a cops-and-robbers game at the expense of their budget and mental focus.
What's more, some of the software solutions we looked at, which stuff tags and create gateway pages, did nothing ... or worse. You can guess why: Search engine developers buy copies of the same software, learn how to recognize its output, and then demote your site or block it altogether when they spot that pattern in your pages. At that point, the software's maker offers an upgrade (for a fee, of course) so you can get around the blocking. As soon as the search engineers figure out the upgrade, you're out again and need another upgrade. Great business model, isn't it?
What's even more repelling, some of the consultants for optimization were outright con men. One claimed to be CEO of a huge consulting firm its website lists a dozen unfamiliar but big-sounding companies as clients but if you call the office numbers listed on their sites, guess who answers the phone?
We're not saying all SEO consultants are crooks, but we do recommend that you research their results skeptically before you spend, especially since you can probably do just as well by yourself. This article entered Google's index at #7 for "search engine optimization" a month after it was published, ahead of hundreds of professional SEO sites, mostly because we took our own advice. (On that note: If you love this article, the best way to thank us is to link to it!)
While playing with Google's "link:" feature, we found a wealth of sites and mailing lists offering even more free advice. We emailed and called the best ones with a burning question: Where to send the 100 questions a month we're getting about this article? And a short list of starting points emerged ....
Final Tips and Resources
If you're looking for more seach-optimization guidance, try these resources:
- Since 1995, Danny Sullivan has been tracking and reporting on search engines at Search Engine Watch. He offers this advice: "Start by designing search-engine-friendly pages. Use good titles and good copy (i.e., text on the page) to match popular search terms and tap into natural traffic. People only use tricks to make up for the fact they don't have good copy."
- Danny suggests you tap into email forums like I-Search and the Rank Write Roundtable, where Webmasters discuss what works and doesn't, and share news and tips.
- Try the search engines themselves they offer all kinds of information about search engine optimization, rankings, relevancy, submission, keywords, META tags, and everything else. Or just look at the pages whose rankings you wish you had, and see if you can reverse-engineer what they've done.
- Browse your local computer book store. There are many books on search engine optimization, but read them skeptically: I ordered one for $50, and it turned out to be a 500-page infomercial for products and services with suspicious ties to the author.
- Check out Artloop's source code. Christina and company will continue to think up new ways to achieve search optimization, and our very latest strategies will be implemented in that site's pages.
Between this article and the other resources listed, you can get more traffic to your pages than most of the $5,000-plus consulting jobs we looked at while writing our spec for Artloop, and you'll wind up with a lot less cruft cluttering up your site. But it's easy to get obsessed with search engine optimization. Just like with the lottery or the ponies, people get sucked in, hoping to find that magic META tag that saves them from ever having to work again.
Don't Get Obsessed
Search engine optimization can do a lot for your traffic, just like a good retail location. But it's a game of rapidly diminishing returns. That's why the people who do it professionally are consultants who keep moving from client to client.
The best strategy is to design your site to be crawled and ranked well from the start, rather than tacking on keyword-laden gateway pages and shadow domains after the fact. And if you're looking to become a world-class business, quit nit-picking your search engine ranking and look at successful Web merchants, like my hometown favorite, L.L. Bean: Their success comes from being what people are searching for in the first place.
Paul Boutin is a senior editor at Wired magazine. He discovered the Internet in 1980 as an MIT freshman and hasn't slept since.