Real Story Group. Make Better Technology Decisions.

Formerly CMS Watch. Here's our story
What Real Independence means. Find Out

  • Schedule a Demo
  • Free Sample
  • Contact
  • Subscriber Login
  • Your cart is empty.
Sign up for our Newsletter
  • Home
  • Evaluation Reports
  • Premium Subscriptions
  • About
  • Blog
  • Buy Now
  • Recent Entries
  • Get Custom Feeds

 

 

 

Bloem Adriaan Bloem

Scared of Spiders?

3-Sep-2008

Tags: Enterprise Search, Implementation, Industry Standards, Information Architecture, Google Search Appliance

My girlfriend really doesn't like spiders, so I regularly get to roll up my newspaper and squash one. I'm not so much afraid of them, but that's starting to change: they can be lethal to a website.

In Enterprise Search, we tend to use "spider" and "crawler" as synonyms: the piece of software that grabs a website's URLs and fetches the pages. The thing is, though, the spider should crawl your websites; it shouldn't bring your websites to a crawl. I was reminded of this when reading about Twiceler, the spider employed by new public web search engine Cuil. It seems that in an effort to be a Google-killer, Cuil has been hammering websites into submission.

Google itself has created its fair share of problems as well (you may remember the spider of doom which deleted a complete site). And often, it's reasonable to point the finger at your CMS or website implementation -- your website shouldn't run away frightened when confronted with one of these arachnids.

But more importantly, it's very easy to do the same thing yourself with your own site search engine. I recently demonstrated this when I ran a proof of concept for a consulting customer: the four-threaded spider I let loose on their webserver brought the site down, and most of the resulting index consisted of the dreaded "500 internal server error." I apologized -- even though it went otherwise unnoticed (the outage lasted for less than a minute) -- and counted myself lucky to be working with a product that actually let me tune the crawler to less aggressive settings.

If you're going to run your own search engine, it's important to know what its spider does and whether you can keep it under control. You don't want it to interfere with normal operations and you probably don't want it to crawl away where you can't find it anymore. And this is something you'll want to know even before testing the software: read up on the details first.

And in case all of this arachnophobia gets you into the sci-fi mood: remember that robots don't always deter spiders, even though technically, they should...

    Now Get the Complete Real Story

    Vendor Evaluations

    Learn the real strengths and weaknesses of major vendors from around the world, in our research stream.

Tweet

close x

Free Sample Request

  Digital and Media Asset Management
  Document Management (ECM)
  Enterprise Collaboration & Social Software
  Enterprise Search
  Portals and Content Integration
  SharePoint Ecosystem
  Web Content Management
 Send me bi-weekly tips and insights from Real Story Group.
Your personal information, including your e-mail address, will be held in the strictest of confidence and will never be shared with anyone.

Subscriber Log In


Remember Me
Forgot password?


Not a subscriber?
Learn about our subscriptions

Research Mentioned in this Post

Vendor Evaluations

 | 

Our Newsletter

Get the Real Story bi-weekly.

Have Questions?

USA & Canada
+1 800 325 6190

UK
+44 (0) 20 3318 1911

International
+1 617 340 6464


All Other Inquiries

Our Customers Say

"Portals are where 'synergy' stops being a buzz word and becomes a tangible business benefit. If you're considering portal software, you can't afford to miss this comprehensive market review. And if you've already started a project, the common-sense advice contained in this outstanding research could save you thousands of dollars -- and hours."

Eric L. Reiss, Author of "Practical Information Architecture"

next More

Real Story Group

Follow us on:  RSS  |  Twitter  |  Facebook  |  YouTube

Evaluation Reports

  • Web Content Management
  • Document Management (ECM)
  • Portals and Content Integration
  • Enterprise Search
  • Digital and Media Asset Management
  • SharePoint Ecosystem
  • Enterprise Collaboration & Social Software

Premium Subscriptions

  • Research Streams
  • Advisory Papers
  • Vendors Evaluated
  • Schedule Analyst Consultation
  • Online Education
  • Configure a Subscription

About Us

  • Our Methodology
  • Our Team
  • Media
  • Customer List
  • Events
  • Consulting
  • Contact Us

Need Help?

  • Talk to an Expert
  • FAQs
  • Customer Support
  • Contact Sales Team
  • Help with your account

Copyright Real Story Group 2001 - 2012. All rights reserved.

  • Contact Us
  • Copyright Policy
  • Privacy Policy
  • Terms of Use

Log In

Remember MeForgot password?

close x
close x

All analyst firms claim to be independent or vendor-neutral. We're different.

Real Independence


Get the real story on commercial and open source tools from a firm that works only for you, the technology customer.

close x

Newsletter Signup

Thank you for signing up for The Real Story Group Newsletter. You will receive our monthly newsletter, plus updates with new information on the technology streams you have expressed interest in below.










Choose the streams that you’d like to receive updates for: