网上坏蜘蛛搜索引擎bot/spider等HTTP USER AGENT关键字一览(无重复,持续更新)

九月 7, 2014 by · 5 Comments 

下面数组中罗列的都是对网站无实际意义的爬虫(crawler)、蜘蛛(spider)或机器人(bot)。

只要在HTTP_USER_AGENT发现下面数组中的关键词,就可以直接干掉了(百度、谷歌、360等能带来流量的蜘蛛已经排除,Yandex基本不会为中文网站带来流量,因此也被列入其中)。

此数组持续更新!数月以来,从未误杀!

$bad_spiders_array=array(
'Crawler','Barkrowler','CakePHP','GarlikCrawler','Go-http-client','ias_crawler','ICC-Crawler','PotPlayer','Riddler','Scrapy','WINAMP','viz/viz','ZXing','Castro','Jakarta Commons','ltx71','NativeHost','SalesIntelligent','Xenu Link Sleuth','Y!J-ASR','BUbiNG','CRAZYWEBCRAWLER','http Cnrdn','Lavf','NSPlayer','spray-can','stagefright','voltron','LibVLC','A6-Indexer','crawler4j','wsr-agent','DigitalPebble Crawler','MBCrawler','AhrefsBot','GrapeshotCrawler','proximic','SemrushBot','ahoy!','alkaline','ananzi','anthill','arachnophilia','arale','araneo','aretha','ariadne','arks','askjeeves','atn worldwide','auresys','backrub','big brother','bjaaland','blackwidow','bloodhound','calif','cassandra','christcrawler.com','churl','cienciaficcion.net','cmc/0.01','collective','combine system','computingsite robi/1.0','crawler.feedback','cusco','cyberspyder link test','katalog/index','die blinde kuh','digger','direct hit grabber','download express','dwcp','ebiness','e-collector','emacs-w3 search engine','esculapio','esther','evliya celebi','fastcrawler','felix ide','fetchrover','fido','fish search','fouineur','freecrawl','funnelweb','gazz','gcreep','getterroboplus puu','geturl','golem','grapnel/0.01 experiment','griffon','gromit','Gluten','hämähäkki','harvest','havindex','hi (html index) search','hku www octopus','ht://dig','html_analyzer','htmlgobble','hyper-decontextualizer','ia_archiver','ibm_planetwide','image.kapsi.net','imagelock','incywincy','informant','infoseek sidewinder','ingrid','inktomi slurp','inspector web','intelliagent','internet shinchakubin','iron33','israeli-search','javabee','jcrawler','jumpstation','katipo','kdd-explorer','kilroy','kit-fireball','labelgrabber','larbin','legs','link validator','linkscan','linkwalker','lockon','logo.gif crawler','lycos','mac wwwworm','magpie','marvin/infoseek','mattie','mediafox','merzscope','mindcrawler','mnogosearch search engine software','moget','monster','motor','muncher','muninn','muscat ferret','mwd.search','nec-meshexplorer','nederland.zoek','netcarta webmap engine','netmechanic','netscoop','newscan-online','nhse web forager','nomad','northern light gulliver','nzexplorer','objectssearch','occam','OOZBOT','openfind data gatherer','orb search','pack rat','pageboy','parasite','patric','pegasus','perlcrawler 1.0','pgp key agent','phpdig','piltdownman','pioneer','plumtreewebaccessor','poppi','popular iconoclast','raven search','roadhouse crawling system','robofox','robozilla','rules','scooter','search.aus-au.com','searchprocess','senrigan','sg-scout','shagseeker','sift','site searcher','site valet','sitetech-rover','skymob.com','slcrawler','sleek','snooper','suke','suntek search engine','sven','sygol','tach black widow','tarantula','templeton','the peregrinator','the web moose','the web wombat','the world wide web wanderer','the world wide web worm','titan','titin','ucsd crawl','udmsearch','unnamed','url check','valkyrie','verticrawl','victoria','vision-search','voyager','w3m2','w3mir','walhello appie','wallpaper (alias crawlpaper)','web core / roots','webcatcher','webcopy','webfetcher','webinator','weblayers','weblinker','weblog monitor','webmirror','webquest','webreaper','websnarf','webstolperer','webvac','webwalk','webwalker','webwatch','webzinger','wget','whatuseek winona','wild ferret web hopper','wired digital','wwwc ver','xget','daumoa','jobo','echo!','linkchecker','bloglines','twiceler','appie','sun4u','httrack','sisi','robi','webster pro','webster','zeus','scirus','picosearch','plucker','disco pump','gulliver','emailsiphon','teleport pro','fetch','pamuk','webcopier','webcapture','mass downloader','awv0.8d','crescent internet toolpak','webstripper','sitesucker','webdup','python-urllib','python','franklin locator','ck-sillydog','pockethttp','java','kototoi.org','teragramwebcrawler','vagabondo','nogoop-httpclient','myoperatb','myoperatb','accoona-ai-agent','arachmo','b-l-i-t-z-b-o-t','boitho.com-dc','cerberian drtrs','charlotte','converacrawler','cosmos','covario ids','dataparksearch','earthcom.info','fast enterprise crawler','fast-webcrawler','findlinks','g2crawler','holmes','htdig','iccrawler','ichiro','igdespyder','issuecrawler','l.webis','lwp-trivial','mabontland','magpie-crawler','mnogosearch','mogimogi','morning paper','mvaclient','netresearchserver','netseer crawler','newsgator','ng-search','nutchcvs','nymesis','oegp','orbiter','peew','pompos','postpost','pycurl','qseero','radian6','sandcrawler','sbider','scoutjet','scrubby','searchsight','seekbot','semanticdiscovery','sensis web crawler','shim-crawler','shopwiki','snappy','sqworm','stackrambler','teoma','tineye','truwogps','updated','vortex','vyu2','webcollage','websquash.com','wf84','womlpefactory','yacy','yahooseeker','yahooseeker-testing','yandeximages','yandexmetrika','yeti','yooglifetchagent','zyborg','wordpress','a6-indexer','wsr-agent','Microsoft Office','JDatabaseDriver','facebookexternalhit','The Knowledge AI','Twitterbot','VenusCrawler','aria2','GetCode','CCBot','NetTrack','Go-http-client','IAS crawler','POE-Component','VelenPublicWebCrawler','www.ru','Nutch Master Test','Wotbox','orion-semantics.com','lwp-request','ShortLinkTranslate','mj12bot','WinHttpRequest','Exabot','Auto Spider','Applebot','DuckDuckGo','SeznamBot','moatbot','DotBot','SurdotlyBot','28logsSpider','zgrab','Windows-Media-Player','spbot','Mail.RU_Bot','Backlink','SiteExplorer','SEOkicks','linkdexbot','Qwantify','DataXu','ExtLinksBot','gvfs/','evc-batch','Cliqzbot','YandexBot','YandexMobileBot','newspaper','Clickagy','Chicken laser','coccocbot','Microsoft Windows Network Diagnostics','spuhex.com','smtbot','Dataprovider','HybridBot','Sky-Wapproxy','SafeDNSBot','HatenaBookmark','Meta_Bot','ToutiaoSpider','HttpComponents','ips-agent','yandex.com/bots','(ziva)','Jersey','Auto Shell Spider','User-Agent','curl/','MPlayer','internal request','Grammarly','package','TrendsmapResolver','PaperLiBot','startmebot','WebFuck','GStreamer','httpsrc','AntennaPod','panscient.com','webscan','Screaming Frog','WFilter Live','trendictionbot','nsrbot','PlurkBot','Mojolicious','AlphaBot','tracemyfile','VCTestClient','heritrix','MiniRedir','Iframely','rest-client','Cappuccino','FirmsBot','BOT for JCE','Nimbostratus-Bot','Emacs-w3m','WordupinfoSearch','Dispatch','Paracrawl','Mr.4x3','axios','Typhoeus','tools.random','WhatCMSBot','InetURL','NetpeakCheckerBot','Goose','lua-resty','WhatWeb','special_archiver','XoviBot','Wappalyzer','OK-Search-Bot','abot','Mechanize','uipbot','GnowitNewsbot','PostmanRuntime','HoneyBee','gobuster','Bidtellect','Sonos','RankingBot','Uptimebot','Synapse','Re-re Studio','Mappy','Statastico','Linguee Bot','PocketImageCache','colly','YunSecurityBot','archive.org_bot','CheckMarkNetwork','Jooblebot','ZoomBot','Linkbot','Streamline3Bot','LetsearchBot','Linguee-Bot','Thither.Direct','Bose/','PPBot','IndeedBot','Everyonedomainsbot','PPBot','MixnodeCache','NetpeakSpiderBot','TagVisit','RestSharp','Symfony','Needle','kubectl','vuhuvBot','Staddlebot','ddline.cn','AdsrvrContextual','_zbot','PagePeeker','OutclicksBot','Kozmosbot','PicoFeed','Mediatoolkitbot','netdisk','ESP32','Traackr.com','Discordbot','AndersPinkBot','Validator','SemanticJuice','aiHitBot','Zoxh.Com','foobar2000','bitlybot','beegoServer','MFC_Tear_Sample','Quantcastbot','Bytespider','HeiKe','ManicTime','News','Windows 95','Windows 98','WebPictures','SBL-BOT','DreamPassport','Blazer','RealMedia','Liberate DTV','Cyberdog','Fuzz Faster','portalmmm','WannaBe','bluefish','Utopia WebWasher','Offline Explorer','Visicom','Barca','ANTFresco','Hotzonu','Wfuzz','Dillo','iSiloX','Commerce Browser Center','W3CLineMode','Pandalytics','LinkpadBot','daum.net','NewTV','GigablastOpenSource','MAZBot','pilicanbot','EchoboxBot','Cincraw','ScraperBot','admantx','AspiegelBot','BDCbot','LogStatistic','MAZBot','CheckHost','7Siters','BorneoBot','Cincraw','HuaweiWebCatBot'
);

服务器层面iptables推荐屏蔽的C类地址段:
iptables -I INPUT -s 54.36.148.0/24 -j DROP
iptables -I INPUT -s 54.36.149.0/24 -j DROP
iptables -I INPUT -s 47.74.240.0/24 -j DROP
iptables -I INPUT -s 46.229.168.0/24 -j DROP
iptables-save > /etc/sysconfig/iptables
service iptables restart

1、54.36.148.*和54.36.149.*是AhrefsBot的IP段。
2、47.74.240.*是阿里云新加坡节点IP段,该ip段上有主机不间断地扫描网站根目录下面的.rar和.zip文件(类似/www.rar, /web.zip),且伪装成baiduspider
3、46.229.168.* 是SemRush bot的IP段

//华为云:
159.138.1.* ~ 159.138.159.*
159.138.224.* ~ 159.138.250.*
114.119.128.* ~ 114.119.191.*

——本文最后由傅老师于2020-06-13编辑过

Comments

5 Responses to “网上坏蜘蛛搜索引擎bot/spider等HTTP USER AGENT关键字一览(无重复,持续更新)”
  1. Web 说道:

    收藏了

  2. 范明明 说道:

    感谢博主,被这些垃圾UA的扫描困扰很久了。防火墙统统屏蔽!

评论


八 × 九 =