• Home
  • Research
  • What We Offer
  • Who We Are
  • Blog
  • Your cart is empty.
  • Log in
  • Purchase
  • Free Sample
  • Contact
  • Recent Entries
  • Get Custom Feeds
Team Blog
Bloem

Scared of Spiders?

Added By Adriaan Bloem at 3-Sep-2008 | Twitter: @adriaanbloem |

My girlfriend really doesn't like spiders, so I regularly get to roll up my newspaper and squash one. I'm not so much afraid of them, but that's starting to change: they can be lethal to a website.

In Enterprise Search, we tend to use "spider" and "crawler" as synonyms: the piece of software that grabs a website's URLs and fetches the pages. The thing is, though, the spider should crawl your websites; it shouldn't bring your websites to a crawl. I was reminded of this when reading about Twiceler, the spider employed by new public web search engine Cuil. It seems that in an effort to be a Google-killer, Cuil has been hammering websites into submission.

Google itself has created its fair share of problems as well (you may remember the spider of doom which deleted a complete site). And often, it's reasonable to point the finger at your CMS or website implementation -- your website shouldn't run away frightened when confronted with one of these arachnids.

But more importantly, it's very easy to do the same thing yourself with your own site search engine. I recently demonstrated this when I ran a proof of concept for a consulting customer: the four-threaded spider I let loose on their webserver brought the site down, and most of the resulting index consisted of the dreaded "500 internal server error." I apologized -- even though it went otherwise unnoticed (the outage lasted for less than a minute) -- and counted myself lucky to be working with a product that actually let me tune the crawler to less aggressive settings.

If you're going to run your own search engine, it's important to know what its spider does and whether you can keep it under control. You don't want it to interfere with normal operations and you probably don't want it to crawl away where you can't find it anymore. And this is something you'll want to know even before testing the software: read up on the details first.

And in case all of this arachnophobia gets you into the sci-fi mood: remember that robots don't always deter spiders, even though technically, they should...

Next steps: Get a free research sample or purchase complete vendor evaluations to obtain immediate access.

Categories: Adriaan Bloem, Search and Information Access, Implementation, Industry Standards, Information Architecture, Google Search Appliance

Tweet

My Research

Remember MeForgot password?

Not a subscriber? Learn about our subscriptions

Categories

Channel

  • Collaboration & Community Software (161)
  • Component Content Management (79)
  • Digital Asset Management (141)
  • Enterprise Content Management (615)
  • Evaluating SharePoint (131)
  • Portals and Content Integration (351)
  • Search and Information Access (297)
  • SharePoint Across the Enterprise (68)
  • Web Analytics (172)
  • Web Content Management (860)

Analyst

  • Adriaan Bloem (99)
  • Tony Byrne (986)
  • Apoorv Durga (34)
  • Jarrod Gingras (49)
  • Alan Pelz-Sharpe (229)
  • Theresa Regli (88)

Topics

  • Asia-Pacific Marketplace (5)
  • Building Business Case (237)
  • Cloud Computing (10)
  • E-Discovery (13)
  • European Marketplace (30)
  • Governance (29)
  • Green Computing (1)
  • Implementation (324)
  • Industry Events (20)
  • Industry Standards (197)
  • Information Architecture (162)
  • Intranets (14)
  • Marketplace at Large (918)
  • Mobile Computing (5)
  • Open Source (128)
  • Selecting Technology (911)
  • Services Oriented Architecture (9)
  • Software-as-a-Service (26)
  • Usability (5)
  • Vendor Viability & Financials (198)
  • XML (93)

Industries

  • Energy (4)
  • Finance (13)
  • Government (34)
  • Health Care (12)
  • Higher Ed (20)
  • Legal (18)
  • Manufacturing (7)
  • Pharma (6)
  • Publishing-Media (17)
  • Retail (9)

Dates

  • 2010 (207)
  • 2009 (292)
  • 2008 (345)
  • 2007 (294)
  • 2006 (206)
  • 2005 (222)
  • 2004 (109)
  • 2003 (100)
  • 2002 (97)
  • 2001 (44)

Have Questions?

Sales & Customer Support

+1 800 325 6190 (USA)+44 (0) 20 3318 1911 (UK)+1 617 340 6464 (Int'l)sales@realstorygroup.com support@realstorygroup.com

All other inquiries: info@realstorygroup.com

Copyright, 2001 - 2010, Real Story Group. All rights reserved.

  • Contact Us
  • Copyright Policy
  • Privacy Policy
  • Terms of Use

Vendor Evaluations

  • Collaboration & Community Software
  • Digital Asset Management
  • Enterprise Content Management
  • Portals & Content Integration
  • Search & Information Access
  • SharePoint Across the Enterprise
  • Web Analytics
  • Web Content Management

What You Get

  • Vendor Evaluations
  • Advisory Papers
  • One-on-One Advice
  • Online Education
  • Consulting Services
  • Free Research Sample
  • Purchase Now

Need Help?

  • Research & Advisory
       Overview
  • Talk to an Expert
  • FAQs
  • Customer Support
  • Contact Sales Team

Who We Are

  • We're Different
  • Our Team
  • Media
  • Customer List
  • Events
  • Contact Us

Get the real story via our bi-weekly newsletter.

Follow us on: RSS twitter

Log In

Remember MeForgot password?