Contents of this page
Search engine robots and others
Link Checkers, Link monitors and bookmark managers
Validators
FTP clients and download managers
Browsers
Offline browsers and other agents
Other miscellaneous agents
Sites that regularly visit
Awards for this page
Search engines and other sites send robots to read and index your pages. This page
reverses that process and indexes the robots. This information has been
gleaned by looking at the server logs for www.jafsoft.com. Whenever a page is
read from a web site, the log file records a number of details including the
time, the IP address and usually the referrer page and the user agent. You
can see this in our analysis of a server log sample.
Unlike many pages that list web robots, this page actually tries to go visit the robots themselves. Where possible links are provided to the robots home pages, and descriptions are given of what they're up to. This page is updated regularly as more information is found (the last update was on 5-Jul-2001).
Well behaved robots will identify themselves, often supplying web or email addresses you can contact. In any case, the pattern of pages being read and the IP addresses being used soon sorts the men from the robots.
Good robots will read robots.txt to see what your site policy is, but there are other ways of spotting robots. In addition to the search engine robots, other "user agents" will visit your site, e.g. to validate links to your site from other people's pages. Often these will just access the HEAD of the file, rather than doing a GET on the whole file.
You can also visit our page describing the engines in some detail.
This page is regularly converted from this text file by the author's own text to HTML converter AscToHTM. The last update was on 5-Jul-2001. This software is available as shareware (cost $40)
The following table lists the search engines that spider the web, the IP addresses that they use, and the robot names they send out to visit your site. Version numbers are usually included in the robot names, but are omitted here except where it implies a visit from a different IP address or (as in inktomi) a different search engine.
Often multiple IP addresses are used, in which case we just give a flavour of the names or numbers. Inktomi is a company that offers search engine technology and is used by a number of sites (e.g. www.snap.com and www.hotbot.com)
Wherever <nn> appears this indicates a number of different digits may be used.
Home page/search engine |
Robot identifier |
IP address(es) |
---|---|---|
www.aesop.com |
AESOP_com_SpiderMan |
209.189.115.49 |
www.alexa.com |
ia_archiver |
green.alexa.com sarah.alexa.com |
www.altavista.com |
Scooter |
test-scooter.pa.alta-vista.net brillo.pa.alta-vista.net av-dev4.pa.alta-vista.net scooter.aveurope.co.uk bigip1-snat.sv.av.com |
Mercator |
mercator.pa-x.dec.com scooter.pa.alta-vista.net election2000crawl-complaints-to-admin.webresearch.pa-x.dec.com |
|
Scooter2_Mercator_3-1.0 |
scooter.sv.av.com |
|
roach.smo.av.com-1.0 |
avfwclient.sv.av.com |
|
Tv<nn>_Merc_resh_26_1_D-1.0 |
tv<nn>.sv.av.com |
|
www.altavista.co.uk |
AltaVista-Intranet jan.gelin@av.com |
host-119.altavista.se |
www.alltheweb.com |
FAST-WebCrawler crawler@fast.no |
209.67.247.154 |
www.fast.no/faq/faqfastwebsearch/faqfastwebcrawler.html |
||
Wget |
ext-gw.trd.fast.no |
|
www.acoon.de |
Acoon Robot |
194.231.42.178 |
www.atomz.com |
Atomz |
router-sc.atomz.com |
www.crawler.de |
Crawler admin@crawler.de |
crawlit.crawler.de |
www.daum.net |
RaBot Agent-admin/ phortse@hanmail.net |
210.183.28.46 |
contact/jylee@kies.co.kr |
211.50.57.6 |
|
RaBot Agent-admin/ webmaster@kisco.go.kr |
202.30.94.34 |
|
www.excite.com |
ArchitextSpider |
Musical instrumentss are used in the name such as viola.excite.com cello.excite.com piano.excite.com kazoo.excite.com ride.excite.com sabian.excite.com sax.excite.com bugle.excite.com snare.excite.com ziljian.excite.com bongos.excite.com maturana.excite.com mandolin.excite.com piccolo.excite.com kettle.excite.com ichiban.excite.com (and the rest of the band) more recently first names are being used like philip.excite.com peter.excite.con perdita.excite.com macduff.excite.com agouti.excite.com |
(excite) |
ArchitectSpider |
crimpshrine.atext.com ichiban.atext.com |
www.euroseek.net |
Arachnoidea arachnoidea@euroseek.net |
212.209.54.134 |
www.ezresults.com |
EZResult |
216.28.23.59 |
www.findsame.com |
DIIbot |
207.230.106.188 |
(see also www.powerinter.net below) |
robot@digital-integrity.com |
|
www.fireball.de |
KIT-Fireball |
???? |
www.geckobot.com |
geckobot |
???.rdc1.az.coxatwork.com |
www.gendoor.com (Genealogical Search Engine) |
GenCrawler |
???? |
www.google.com |
Googlebot googlebot@googlebot.com http://googlebot.com/ |
c<nn>.googlebot.com |
www.goo.ne.jp |
moget/2.0 moget@goo.ne.jp |
202.229.31.13 |
(inktomi) |
Slurp.so/1.0 |
q2004.inktomisearch.com |
slurp@inktomi.com |
j5006.inktomisearch.com |
|
(inktomi) |
Slurp/2.0j |
202.212.5.34 |
slurp@inktomi.com www.inktomisearch.com |
goo313.goo.ne.jp |
|
(inktomi) |
Slurp/2.0-KiteHourly slurp@inktomi.com; www.inktomi.com/slurp.html |
y400.inktomi.com |
(inktomi) |
Slurp/2.0-OwlWeekly spider@aeneid.com www.inktomi.com/slurp.html |
209.185.143.198 |
(inktomi) |
Slurp/3.0-AU slurp@inktomi.com www.inktomisearch.com |
j6000.inktomi.com |
www.hubat.com |
Hubater |
209.114.176.250 |
www.infoseek.com |
UltraSeek |
cde2c923.infoseek.com cde2c91f.infoseek.com |
InfoSeek Sidewinder |
cca26215.infoseek.com |
|
www.informatch.com/mediabot/ |
MP3Bot |
212.204.169.52 |
www.ip3000.com |
C-PBWF-ip3000.com-crawler ip3000.com-crawler |
www.ip3000.com |
www.lexis-nexis.com |
LNSpiderguy |
firewall5.lexis-nexis.com |
www.looksmart.com |
MantraAgent |
fjupiter.looksmart.com |
www.lycos.com |
Lycos_Spider_(T-Rex) |
bos-spider<n>.bos.lycos.com 216.35.194.188 |
www.mirago.co.uk |
HenryTheMiragoRobot |
194.202.39.46 |
www.northernlight.com |
Gulliver |
marvin.northernlight.com taz.northernlight.com |
www.portaljuice.com |
PJspider |
timber.nextopia.com |
www.powerinter.net but it won't let us in :-( |
DIIbot |
node-d8e93393.powerinter.net |
http://navi.ocn.ne.jp/ |
nttdirectory_robot super-robot@super.navi.ocn.ne.jp |
lilis00.navi.ocn.ne.jp |
griffon griffon@super.navi.ocn.ne.jp |
lilis04.navi.ocn.ne.jp |
|
www.maxbot.com |
Spider/maxbot.com admin@maxbot.com |
search.wport.com |
??? |
various (fakes agent on each access) |
pool0058.cvx2-bradley.dialup.earthlink.net |
??? |
gazz/1.0 |
deleuze.infobee.ne.jp |
gazz@nttrd.com |
derrida.infobee.ne.jp |
|
??? |
??? |
search-8.xift.com |
www.nationaldirectory.com |
NationalDirectory-SuperSpider |
spider.nationaldirectory.com 209.116.58.143 |
www.pinpoint.com |
CrawlerBoy Pinpoint.com |
nitrogen.pinpoint.com |
www.petersnews.com |
user<n>.ip3000.com |
news<n>.petersnews.com |
http://www.vestris.com/alkaline |
AlkalineBOT |
host130.uv-ray.com |
www.singingfish.com |
asterias |
grouper.singingfish.com |
www.speedfind.de |
speedfind ramBot xtreme |
BWEB.highway.telekom.at |
www.surfnomore.com |
Surfnomore Spider v1.1 |
165.90.194.245 |
www.supersnooper.com |
Robot@SuperSnooper.Com |
207.8.212.162 |
www.travel-finder.com |
ESISmartSpider |
202.46.33.15 |
www.uksearcher.co.uk |
UK Searcher Spider |
- |
www.walhello.com |
appie |
...speed.planet.nl |
www.websmostlinked.com |
Nazilla |
- |
www.webwombat.com.au |
www.WebWombat.com.au |
202.139.99.131 |
www.webtop.com |
MuscatFerret |
ferret<nn>.webtop.com |
www.whizbanglabs.com |
WhizBang! Lab |
216.250.143.108 |
www.wisenut.com |
ZyBorg |
- |
(in beta) |
(info@WISEnut.com) |
|
www.wire.co.uk |
WIRE WebRefiner: webrefiner@wire.co.uk |
brighton.wire.co.uk |
www.worldsearchcenter.com |
WSCbot |
??? |
<client sites> |
libwww-perl |
www.linpro.no/lwp/ |
http://verno.ueda.info.waseda.ac.jp/ |
||
Iron33 |
207.18.183.251 |
Link checkers and bookmark managers are run by people wanting to keep their pages and bookmarks up to date. Being visited by a link checker is good news as it means that someone has linked to you, and cares that you're still alive. Link monitors regularly check your pages for changes, usually because someone has selected your page as "one to watch".
(pause for warm glow :-)
If you have access to the server log, check the referrer page to try and get the URL from which you are linked. Sometimes these URLs are inside password protected parts of sites, so you won't be able to view the page.
If you build up a list of sites that link to you, these are the guys you should tell when you move (moral - never move)
It's also quite common for the Link checker to give no indication of which URL it's coming from. Some link checkers always come from the same IP address, more usually they come from the client's site. It depends on whether the site owner has purchased a copy of the link checking software, or signed up to some centralized link checking service. If you get the client's IP address you can always try visiting that if they blank the referrer URL field, and surfing their site.
Some of these tools appear to imply they're extracting email addresses (e.g. emailSiphon). As such they're probably unwelcome visitors since these addresses are probably being collected for spammers. You can read more about this at www.csc.ncsu.edu/~brabec/antispam.html
A page listing various link checkers (and other tools) can be found at www.softwareqatest.com/qatweb1.html#LINK
Robot identifier |
IP address(es) |
Link Checker home page |
---|---|---|
LinkWalker |
lw.seventwentyfour.com 209.167.50.23 |
www.seventwentyfour.com |
LinkAlarm |
linkalarm.com |
www.linkalarm.com |
NetMind-Minder |
marvin.netmind.com (retired) gary.netmind.com meg.netmind.com inyanga.netmind.com leo.netmind.com gemini.netmind.com |
www.netmind.com |
Check&Get |
<client site> |
http://checkget.udm.net/ (also shown as referrer page) |
CheckWeb |
<client site> |
www.asi.fr/~duby/chkweb.htm |
CNET_Snoop |
www.download.com (only if you have software listed at that site) |
|
EmailSiphon |
<client site> |
<email collector> We don't list information like this on this site. |
EmailWolf |
<client site> |
www.pixeltech.com.au/~msw/ewolf/index.html |
The Informant The Intraformant |
cosmo.dartmouth.edu |
http://informant.dartmouth.edu/ |
jdwhatsnew.cgi |
<client site> |
www.jdrowell.com/Linux/Projects/jdwhatsnew |
LinkLint-checkonly |
-- |
www.goldwarp.com/bowlin/linklint/ |
javElink |
salix.ingetech.com |
www.dailydiffs.com |
Lambda LinkCheck |
195.139.70.25 |
www.stud.ifi.uio.no/~lmariusg/download/python/LinkCheck.html |
LinkScan Server |
<client site> |
www.elsop.com |
LinkSweeper |
<client site> |
www.lss.com.au/lss/windows/ls/linksweeper.htm |
LinkVerify Spider |
frances.yourwebhost.com |
www.enduser.co.uk/linkverify/ |
Linkbot |
<client site> |
www.tetranetsoftware.com/products/linkbot.htm |
Morning Paper |
<client site> |
www.boutell.com/morning/ |
NetLookout |
-- |
www.frugalsoft.com/lookout/ |
NetMechanic www.elsop.com |
gamma.netmechanic2.com |
www.netmechanic.com |
Rational SiteCheck |
<client site> |
www.rational.com/products/teamtest/prodinfo/sitecheck.jtmpl |
Robozilla |
h-206-<n>-<n>-<n>.netscape.com |
http://directory.mozilla.org/ (checks links in the dmoz directory) |
SyncIT |
<client site> |
www.bookmarksync.com |
WatzNew Agent |
<client site> |
www.watznew.com |
WebTrends Link Analyzer |
<client site> |
www.webtrends.com |
Xenu's Link Sleuth |
<client site> |
www.snafu.de/~tilman/xenulink.html |
Validators check your web pages for HTML correctness and standards compliance. Since other people are unlikely to send a validator to your site, you don't usually see much of this. Consequently the "list" below is restricted to the on-line validators I've used myself.
However if you choose to validate your own site, then the validation attempts will appear in your logs. The following list is thus limited to the on-line validator I use (and recommend) and a URL submission service that I use.
Robot Identifier |
IP address |
Validator home page |
---|---|---|
W3C_Validator |
abyss.w3.org |
http://validator.w3.org/ |
Tooter |
selfpromotion.com |
www.selfpromotion.com. This is used as part of a link submission agent (trebor@animeigo.com) |
If you offer files for download, then you'll start to be visited by various FTP clients. Clients like Go!Zilla and GetRight are smart in that they can resume downloads that have been interrupted. This relies on your web server supporting the necessary protocol, but that's fairly standard these days.
If your download files are over 1Mb in size (or if your server is slow), you'll often see the same IP address make multiple partial downloads of your file (look at the file size). In the case of Clients line Go!Zilla and GetRight if these add up to the right number of bytes, then chances are the download succeeded.
Client Identifier |
FTP Client home page |
---|---|
BatchFTP |
www.dynamicnet.net/products/batchftp.htm |
ChinaClaw |
http://go2.163.com/~22787/chinaclaw.htm (Chinese) (Chinese download utility) |
DA |
www.lidan.com www.downloadaccelerator.com |
Download Demon |
www.netzip.com |
Download Wonder |
www.forty.com |
Go!Zilla |
www.gozilla.com |
GetRight MyGetRight |
www.getright.com |
GetSmart |
http://members.xoom.com/m507/ |
JetCar (or FlashGet) |
www.amazesoft.com |
LeechFTP |
http://stud.fh-heilbronn.de/~jdebis/leechftp/ |
Mass Downloader |
www.geocities.com/SiliconValley/Vista/2865/md.htm |
NetZip Downloader SmartDownload |
www.netzip.com |
NetAnts |
www.netants.com |
Net Vampire |
www.netvampire.com |
Octopus |
http://moskalyuk.com/octopus/ |
RealDownload |
http://service.real.com/help/faq/rdown4/rdownfaqa01.html |
Most browsers identify themselves with a string that begins "Mozilla...". I've chosen not to document those (as yet). Here are a few of the rarer browser identifiers that I've seen.
Browser identifier |
Information |
---|---|
xChaos_Arachne |
http://browser.arachne.cz/ (DOS-compatible browser. Linux version under development) |
IBrowse |
http://www.hisoft.co.uk/ (search for IBrowse) Amiga-based browser |
ICab |
http://www.icab.de/index.html (Macintosh-only) |
Konqueror |
http://www.konqueror.org/konq-browser.html (Linux KDE browser) |
Lynx |
http://lynx.browser.org/ (Cross-platform text based browser) |
OmniWeb |
http://www.omnigroup.com/products/omniweb/ (Macintosh-only) |
Opera |
http://www.opera.com/ (Cross-platform, small, efficient and standards lead browser) |
pwWebSpeak |
http://www.prodworks.com/issound/catalog/catalog_pwwebspeak.html Audio Browser |
QWeb |
http://sunsite.auc.dk/qweb/ (Linux browser) (see also http://browswerwatch.internet.com/news/story/qweb8.html) |
VMS_Mosaic |
http://vaxa.wvnet.edu/vmswww/vms_mosaic.html (OpenVMS only version of Mosaic, a pre-Netscape browser) |
WannaBe |
http://mindstory.com/wb2/ (Macintosh text-only browser) |
Agent Identifier |
Agent home page |
---|---|
AnswerChase |
www.answerchase.com/advan.html a personal search robot. |
beholder or e-sense |
www.vigiltech.com/esensedisclaim.html www.vigiltech.com/esensedisclaim.html |
contype |
Possibly Adobe Acrobat or Reader or Adobe Acrobat Reader used with MSIE (I have been unable to confirm this) |
DaviesBot |
www.wholeweb.net/web/ |
DigOut4U |
www.arisem.com/Enu/ |
DISCoFinder |
www.ars.ru/eng/products/discof.asp |
eCatch |
www.ecatch.com |
EirGrabber |
http://www2p.biglobe.ne.jp/~eir/index.htm (Japanese software from the "Eir Project") |
Excalibur Internet Spider |
www.excalib.com/products/ispi/index.shtml |
ExtractorPro |
-- |
FairAd Client |
www.hager.co.at/fordelka/fairad.htm (German) A German pay-to-surf client |
FavOrg |
http://www.zdnet.com/pcmag/stories/solutions/0,8224,2649295,00.html A utility written by PC Magazine to fetch icons files (favicon.ico) for your IE favorites |
Favorites Sweeper |
www.manitoolssoftware.cjb.net. Another "favorites" tidy-up utility |
GigaBaz GigaBazVStheWeb crawler@brainbot.com |
http://brainbot.com/web/en/ |
Giskard |
http://212.145.12.170/ (Spanish) www.oralco.com (Trivia note: Giskard is probably named after the Isaac Asimov robot) |
infoGIST |
www.infogist.com |
iSiloWeb |
www.isilo.com/screensh.htm (for palm pilot) |
larbin |
http://pauillac.inria.fr/~ailleret/prog/larbin/index-eng.html |
LexiBot |
www.lexibot.com |
Links |
http://gossamer-threads.com/scripts/links/ (Link management cgi script) |
logikabot |
www.logika.net |
Kenjin Spider |
www.kenjin.com/kenjin/info.html |
Mata Hari (Internet search agent) |
www.thewebtools.com |
MoveAnnouncer |
www.moveannouncer.com (notifies webmasters when your pages have moved) |
MSIECrawler MSProxy |
(Microsoft IE4.0) |
NEC Research Agent |
http://heavenly.nj.nec.com/ Research "Inquirus" (meta?) search engine |
NexTools WebAgent |
www.igsnet.com/igs/wagent.html |
Offline Explorer |
www.metaproducts.com/OE.html |
Oxxbot1 |
www.oxxfordinfo.com (Data mining bot on IP 216.0.86.75) |
NetAttache |
Offline browser www.tympani.com/store/NAProDownload.html |
ParaSite |
www.ianett.com/parasite/ |
Phoaks |
www.phoaks.com/index.html. An index or web resources listed in UseNet. See also www.public.iastate.edu/~CYBERSTACKS/Aristotle.htm |
Pita (Chub.Stanford.EDU) |
-- |
PolyBot |
http://cis.poly.edu/polybot/ crawls from weasel.poly.edu and grampus.poly.edu |
PureSight |
www.puresight.com/Products/PureSightHomeDescription.htm |
Searchworks Spider |
www.nedesign.com/Phipps/products.html |
SilentSurf |
http://www4.silentsurf.com/ |
SiteMapper |
www.trellian.com/mapper/index.html |
SiteSnagger |
www.zdnet.com/pcmag/pctech/content/17/04/ut1704.001.html |
SpaceBison |
http://members.tripod.com/Proxomitron/features.html A web filter that is "ShonenWare", i.e. you should purchase a Shonen Knife CD if you use it. Shonen Knife are a great Japanese band, much loved by the late Kurt Cobain. Sometimes this sets the referrer page to the band's home page at http://www.mmjp.or.jp/knife/ (or maybe the users just happen to go there themselves). |
SpotOn |
www.spoton.com (IE add-on that organizes your browsing) |
SQ Webscanner |
http://macinsearch.com/users/webscanner/ (on holiday last time I looked) |
SuperBot |
www.sparkleware.com/superbot/index.html |
Teleport Pro |
www.tenmax.com/teleport/pro/home.htm |
teoma_agent1 teoma_admin@hawkholdings.com |
www.teoma.com Another coming soon search tool. Crawls from IP address 63.236.92.148. Hawk holdings is the holding company. The venture is between qwest.net and Baxter Investments |
UCmore |
www.ucmore.com A broswer plug-in (initially IE only) that searches for related pages and categories. In my experience this seems to entail accessing a favicon.ico file on a daily basis (presumably to refresh the "favorites" list) |
UdmSearch |
http://search.mnogo.ru/ Search engine technology, as used at sites such as www.maplesearch.com. Now called mnoGoSearch. |
vspider |
www.verity.com/products/intspider/ A commercial spidering product. |
Webbandit |
http://softwaresolutions.net/webbandit/index.htm |
Webclipping.com |
www.Webclipping.com |
webcollage |
Form collage from randomly select web images www.jwz.org/webcollage/ pet project of one of the authors of Netscape. Seems to come from differing IP nodes. |
WebCompass |
??? (quarterdeck search engine software) |
WebCopier |
www.maximumsoft.com |
WebFetch |
www.webfetch.com |
WebGather |
http://pccms.pku.edu.cn:8000/ Chinese search project |
Webpush |
www.webhauler.com/webpush.htm |
WebReaper |
www.otway.com/webreaper/ |
Webrobot |
www.multimania.com/dilletb/WebRobot/ |
WebVCR |
www.netresultscorp.com/fs_webvcr_info.html |
WebStripper |
www.solentsoftware.com/webstripper/ |
WebTwin |
www.WebTwin.com Convert websites into help files. |
webwasher |
www.webwasher.com/en/products/wwash/functions.htm (browser filter) |
WebZIP |
www.spidersoft.com |
Zeus 1500 Webster Pro Zeus 2500 Webster Pro Zeus 4300 Webster Pro |
www.homepagesw.com/webster_overview.htm |
These agents are ones that we've seen, but been unable to get information for, or which are slightly unusual in origin. If you have any additional information on any of these, feel free to send it to search@jafsoft.com
User Agent |
Information |
---|---|
Albert Indexer |
www.albert.com/papers.htm Multi-lingual search technology |
Aranha |
Seems to be from a yet-to-be launched site www.girafa.com. Spiders using IP 212.150.51.90 which also seems to be Aranha.girafa.com |
AVSearch |
Seems to be the AltaVista personal search agent. The crawling site is sometimes referred to in the agent name |
Checkbot |
Seems to come from www.oxxfordinfo.com who offer B2B services |
Digimarc WebReader |
Digimarc search images on the web looking for digital watermatrs More details at www.digimarc.com/about/index.shtml |
EchO!/2.0 |
Spiders from 194.254.160.3, which would seem to be part of www.voila.com, a French-based search engine. |
FinaleRobot robot-master@expressus.com |
The www.expressus.com site describes an Interactive Natural Language encyclopedia that will become a search engine at www.final-e.com. Good name, but at present it just maps back onto the ExpressUs site (not such a good name). Crawls from IP address 64.114.34.115 |
GentleSpider |
Some sort of spider that usually visits using an IP address from within www.research.att.com or crawler.tivra.com |
Gulper Web Bot |
www.ecsl.cs.sunysb.edu/~maxim/cgi-bin/Link/GulperBot (Open research project to produce opinion-based search engine) |
InterGO |
www.teachersoft.com http://browserwatch.internet.com/news/story/intergo1.html This was a child-safe browser, nut it seems no associated page remains |
InternetArchive |
Presumably www.internetarchive.com, but that's in "stealth mode" |
Internet Ninja |
www.ifour.co.jp (Japanese Macintosh browser?) |
InternetSeer |
A web monitoring service. More details at www.internetseer.com/support/faq.jsp |
Katriona |
Something to do with the European Regional Internet Registry (RIPE) Browses using IP address 213.219.19.148 |
larbin sebastien.ailleret@inria.fr ghi@lcs.mit.edu cosmos |
And from the people that brought you xyro (see below), comes another, newer bot. This one seems to crawl from the IP address cremant.inria.fr. Update more recently it's also been seen coming from barracutta.lcs.mit.edu And then there was "cosmos", crawling from pomelos.inria.fr Seems these people are a webbot factory. Cosmos doesn't offer an email address. |
LEIA |
Unable to find (Too many "Star Wars" references get in the way) |
libwww-perl |
The PERL programming language comes with a number of routines for constructing web-aware scripts. This and related strings are the default user agent identifiers, although it's perfectly easy to change this to be whatever you want. |
MultiText |
Research project to index the last weeks' news items http://multitext.uwaterloo.ca/NetSearch.html |
NetCruiser |
www.netcruiser-software.com/products.html It's not clear to me which of these products this might be, but I'm assuming it's one of them. |
ORA_checksite |
http://www.oreilly.com/openbook/webclient/ch06.html Identifier used in a sample perl program in the online book "Web Client Programming with Perl". The program is used to check links. Obviously people have tried it, and it works :-) |
PintaSpider |
Unable to find But the spider came from www.cnet.fr |
PitSpyder Thread<n>0 |
Unable to find |
psbot |
www.picsearch.org/bot.html A bot indexinx pictures. Crawls from ps.direct2internet.com |
RepoMonkey Bait & Tackle |
A bit of detective work here. Recent entries in the the log file link this to the site www.hungryhippo.com, although the robot always appears to come from an IP address at backflip.com (a bookmarking service). Visiting www.hungryhippo.com reveals a "coming soon" site. Looking at the HTML source leads to another page at http://www.mezzaluna.net/hungryhippo.com/ (appears identical). The META tags for this page all appear to be references to day trading, futures, training and the like, although we did spot the word "fibonacci" (our favourite :-). So... possibly a future search engine related to stock trading?, or maybe the Monkey and Hippo are just feeding me a red herring? There's more. The picture on the Kenjin site at www.kenjin.com/kenjin/info.html is currently the same as that at HungryHippo. Kenjin is an Autonomy company. |
Robot2.0(PingSoft) |
There are several "PingSoft"s around, but I suspect that this belongs to one of the products listed at http://pingsoft.com.cn/english/e_index.html (e.g. SmartHunter) since I was visited froma Chinese IP address. |
ru-robot 0.1_hseo(at)cs.rutgers.edu |
Unable to find details on this, but I'm guessing it's a research spider from www.rutgers.edu. Crawls using the IP teal.rutgers.edu |
TaWWWantula |
Unable to find |
TeraCrawl |
Unable to find |
unlostBot unlostBot@unlost.com |
www.unlost.com is "under construction". The robot came from IP address 212.37.219.147 which is in France. |
utopy crawler@utopy.com |
Coming soon at www.utopy.com (requires flash). This venture-capital funded site is "running in stealth mode" before launching the "new new thing" (is that a typo?). One of the Flash pages defines Utopia (geddit?), and some of the browsing is done by IP addresses at ...myutopy.com. |
UtilMind HTTPGet |
Probably the perl-based (uses the httpget library) web page grabber "Web Thief", described at www.utilmind.com/scripts/webthief.html |
UrlScope |
Unable to find |
VCI WebViewer |
Web browser object, that may be incorporated into software www.homepagesw.com/webster_dl.htm |
WAVETools |
A set of Delphi components offered to build Internet applications from www.transerve.com |
Web Hound |
Unable to find Or rather, I found several different "web hounds", so can't tell which this was, |
Web Magnet |
www.webmagnet.com this appears to be a tool used by this web consultancy. |
WebSymmetrix |
Originates in Korea, and is possibly related to their National Computerization Agency. Uses IP address 210.183.28.39 |
WhosTalking |
http://softwaresolutions.net/whostalking/ Software that tracks Trademark usage |
xyro xcrawler@inria.fr |
Seems to be a spider associated with a French research institute. Usually crawls using the IP address vamos.inria.fr |
Some IP addresses, or sites may regularly visit you, although the user agent may be obscure, or even change.
Here are a few that I've been able to work out
Site address(es) |
Description |
---|---|
proxy.netsetter.org |
This is a site thet offers a speed-up to your surfing, in return for being able to monitoring people's surfing habits. The speed-ups are acheived through a variety of techniques, and the monitoring info is sold on, although your privacy is protected. Visit www.netsetter.org for more details. |
pwoshoes.transport.com |
Not known |
...lightrealm.com |
This site daily reads any xml files submitted to a shareware site in PAD format. PAD is a means for describing shareware devised by the Association of Shareware Professionals (www.asp-shareware.org). This site is performing daily checks, looking to automatically update its lists with any changes. |
All awards gratefully received :-)
This page is © 2000-2001 John A Fotheringham. It may not be
reproduced without permission,
although you are welcome to save a copy for personal use to your hard disk.
home -
search engines -
contact us -
news -
product index -
search this site |