网上坏蜘蛛搜索引擎bot/spider等HTTP USER AGENT关键字一览(无重复,持续更新)

九月 7, 2014 by · 5 Comments 

下面数组中罗列的都是对网站无实际意义的爬虫(crawler)、蜘蛛(spider)或机器人(bot)。

只要在HTTP_USER_AGENT发现下面数组中的关键词,就可以直接干掉了(百度、谷歌、360等能带来流量的蜘蛛已经排除,Yandex基本不会为中文网站带来流量,因此也被列入其中)。

此数组持续更新!数年以来,从未误杀!

headless
Bytespider
Crawler
Barkrowler
CakePHP
GarlikCrawler
Go-http-client
ias_crawler
ICC-Crawler
PotPlayer
Riddler
Scrapy
WINAMP
viz/viz
ZXing
Castro
Jakarta Commons
ltx71
NativeHost
SalesIntelligent
Xenu Link Sleuth
Y!J-ASR
BUbiNG
CRAZYWEBCRAWLER
http Cnrdn
Lavf
NSPlayer
spray-can
stagefright
voltron
LibVLC
A6-Indexer
crawler4j
wsr-agent
DigitalPebble Crawler
MBCrawler
AhrefsBot
GrapeshotCrawler
proximic
SemrushBot
ahoy!
alkaline
ananzi
anthill
arachnophilia
arale
araneo
aretha
ariadne
arks
askjeeves
atn worldwide
auresys
backrub
big brother
bjaaland
blackwidow
bloodhound
calif
cassandra
christcrawler.com
churl
cienciaficcion.net
cmc/0.01
collective
combine system
computingsite robi/1.0
crawler.feedback
cusco
cyberspyder link test
katalog/index
die blinde kuh
digger
direct hit grabber
download express
dwcp
ebiness
e-collector
emacs-w3 search engine
esculapio
esther
evliya celebi
fastcrawler
felix ide
fetchrover
fido
fish search
fouineur
freecrawl
funnelweb
gazz
gcreep
getterroboplus puu
geturl
golem
grapnel/0.01 experiment
griffon
gromit
Gluten
h?m?h?kki
harvest
havindex
hi (html index) search
hku www octopus
ht://dig
html_analyzer
htmlgobble
hyper-decontextualizer
ia_archiver
ibm_planetwide
image.kapsi.net
imagelock
incywincy
informant
infoseek sidewinder
ingrid
inktomi slurp
inspector web
intelliagent
internet shinchakubin
iron33
israeli-search
javabee
jcrawler
jumpstation
katipo
kdd-explorer
kilroy
kit-fireball
labelgrabber
larbin
legs
link validator
linkscan
linkwalker
lockon
logo.gif crawler
lycos
mac wwwworm
magpie
marvin/infoseek
mattie
mediafox
merzscope
mindcrawler
mnogosearch search engine software
moget
monster
motor
muncher
muninn
muscat ferret
mwd.search
nec-meshexplorer
nederland.zoek
netcarta webmap engine
netmechanic
netscoop
newscan-online
nhse web forager
nomad
northern light gulliver
nzexplorer
objectssearch
occam
OOZBOT
openfind data gatherer
orb search
pack rat
pageboy
parasite
patric
pegasus
perlcrawler 1.0
pgp key agent
phpdig
piltdownman
pioneer
plumtreewebaccessor
poppi
popular iconoclast
raven search
roadhouse crawling system
robofox
robozilla
rules
scooter
search.aus-au.com
searchprocess
senrigan
sg-scout
shagseeker
sift
site searcher
site valet
sitetech-rover
skymob.com
slcrawler
sleek
snooper
suke
suntek search engine
sven
sygol
tach black widow
tarantula
templeton
the peregrinator
the web moose
the web wombat
the world wide web wanderer
the world wide web worm
titan
titin
ucsd crawl
udmsearch
unnamed
url check
valkyrie
verticrawl
victoria
vision-search
voyager
w3m2
w3mir
walhello appie
wallpaper (alias crawlpaper)
web core / roots
webcatcher
webcopy
webfetcher
webinator
weblayers
weblinker
weblog monitor
webmirror
webquest
webreaper
websnarf
webstolperer
webvac
webwalk
webwalker
webwatch
webzinger
wget
whatuseek winona
wild ferret web hopper
wired digital
wwwc ver
xget
daumoa
jobo
echo!
linkchecker
bloglines
twiceler
appie
sun4u
httrack
sisi
robi
webster pro
webster
zeus
scirus
picosearch
plucker
disco pump
gulliver
emailsiphon
teleport pro
fetch
pamuk
webcopier
webcapture
mass downloader
awv0.8d
crescent internet toolpak
webstripper
sitesucker
webdup
python-urllib
python
franklin locator
ck-sillydog
pockethttp
java
kototoi.org
teragramwebcrawler
vagabondo
nogoop-httpclient
myoperatb
myoperatb
accoona-ai-agent
arachmo
b-l-i-t-z-b-o-t
boitho.com-dc
cerberian drtrs
charlotte
converacrawler
cosmos
covario ids
dataparksearch
earthcom.info
fast enterprise crawler
fast-webcrawler
findlinks
g2crawler
holmes
htdig
iccrawler
ichiro
igdespyder
issuecrawler
l.webis
lwp-trivial
mabontland
magpie-crawler
mnogosearch
mogimogi
morning paper
mvaclient
netresearchserver
netseer crawler
newsgator
ng-search
nutchcvs
nymesis
oegp
orbiter
peew
pompos
postpost
pycurl
qseero
radian6
sandcrawler
sbider
scoutjet
scrubby
searchsight
seekbot
semanticdiscovery
sensis web crawler
shim-crawler
shopwiki
snappy
sqworm
stackrambler
teoma
tineye
truwogps
updated
vortex
vyu2
webcollage
websquash.com
wf84
womlpefactory
yacy
yahooseeker
yahooseeker-testing
yandeximages
yandexmetrika
yeti
yooglifetchagent
zyborg
wordpress
a6-indexer
wsr-agent
Microsoft Office
JDatabaseDriver
facebookexternalhit
The Knowledge AI
Twitterbot
VenusCrawler
aria2
GetCode
CCBot
NetTrack
Go-http-client
IAS crawler
POE-Component
VelenPublicWebCrawler
www.ru
Nutch Master Test
Wotbox
orion-semantics.com
lwp-request
ShortLinkTranslate
mj12bot
WinHttpRequest
Exabot
Auto Spider
DuckDuckGo
SeznamBot
moatbot
DotBot
SurdotlyBot
28logsSpider
zgrab
Windows-Media-Player
spbot
Mail.RU_Bot
Backlink
SiteExplorer
SEOkicks
linkdexbot
Qwantify
DataXu
ExtLinksBot
gvfs/
evc-batch
Cliqzbot
YandexBot
YandexMobileBot
newspaper
Clickagy
Chicken laser
coccocbot
Microsoft Windows Network Diagnostics
spuhex.com
smtbot
Dataprovider
HybridBot
Sky-Wapproxy
SafeDNSBot
HatenaBookmark
Meta_Bot
ToutiaoSpider
HttpComponents
ips-agent
yandex.com/bots
(ziva)
Jersey
Auto Shell Spider
User-Agent
curl/
MPlayer
internal request
Grammarly
package
TrendsmapResolver
PaperLiBot
startmebot
WebFuck
GStreamer
httpsrc
AntennaPod
panscient.com
webscan
Screaming Frog
WFilter Live
trendictionbot
nsrbot
PlurkBot
Mojolicious
AlphaBot
tracemyfile
VCTestClient
heritrix
MiniRedir
Iframely
rest-client
Cappuccino
FirmsBot
BOT for JCE
Nimbostratus-Bot
Emacs-w3m
WordupinfoSearch
Dispatch
Paracrawl
Mr.4×3
axios
Typhoeus
tools.random
WhatCMSBot
InetURL
NetpeakCheckerBot
Goose
lua-resty
WhatWeb
special_archiver
XoviBot
Wappalyzer
OK-Search-Bot
abot
Mechanize
uipbot
GnowitNewsbot
PostmanRuntime
HoneyBee
gobuster
Bidtellect
Sonos
RankingBot
Uptimebot
Synapse
Re-re Studio
Mappy
Statastico
Linguee Bot
PocketImageCache
colly
YunSecurityBot
archive.org_bot
CheckMarkNetwork
Jooblebot
ZoomBot
Linkbot
Streamline3Bot
LetsearchBot
Linguee-Bot
Thither.Direct
Bose/
PPBot
IndeedBot
Everyonedomainsbot
PPBot
MixnodeCache
NetpeakSpiderBot
TagVisit
RestSharp
Symfony
Needle
kubectl
vuhuvBot
Staddlebot
ddline.cn
AdsrvrContextual
_zbot
PagePeeker
OutclicksBot
Kozmosbot
PicoFeed
Mediatoolkitbot
netdisk
ESP32
Traackr.com
Discordbot
PinkBot
Validator
Semantic
aiHitBot
Zoxh.Com
foobar2000
bitlybot
beegoServer
MFC_Tear_Sample
Quantcastbot
HeiKe
ManicTime
News
Windows 95
Windows 98
WebPictures
SBL-BOT
DreamPassport
Blazer
RealMedia
Liberate DTV
Cyberdog
Fuzz Faster
portalmmm
WannaBe
bluefish
Utopia WebWasher
Offline Explorer
Visicom
Barca
ANTFresco
Hotzonu
Wfuzz
Dillo
iSiloX
Commerce Browser Center
W3CLineMode
Pandalytics
LinkpadBot
daum.net
NewTV
GigablastOpenSource
MAZBot
pilicanbot
EchoboxBot
Cincraw
ScraperBot
admantx
AspiegelBot
BDCbot
LogStatistic
MAZBot
CheckHost
7Siters
BorneoBot
Cincraw
HuaweiWebCatBot
PetalBot
ZoominfoBot
Pinterestbot
MojeekBot
SeoBot
LogStatistic
l9explore
FMODStudio
AndroidDownloadManager
Nutch Spider
DomainStats
seostar
omgili
webprosbot
ThinkChaos
WellKnown
Punkspider
DataForSeo
Keybot
Baispider
Turnitin
github
FlfBaldrBot
Dataprovider.com
Qwarrybot
SummalyBot
inetdex
zaldamo
Pleroma
Mastodon
AlexandriaOrgBot
Leikibot
MRGbot
WellKnownBot
Pixalate
Slackbot
Swisscows
domainsproject
meta search ray
seekport
GoFrame
Expanse
tchelebi
Netcraft
CensysInspect

服务器层面iptables推荐屏蔽的C类地址段:
iptables -I INPUT -s 54.36.148.0/24 -j DROP
iptables -I INPUT -s 54.36.149.0/24 -j DROP
iptables -I INPUT -s 47.74.240.0/24 -j DROP
iptables -I INPUT -s 46.229.168.0/24 -j DROP
iptables-save > /etc/sysconfig/iptables
service iptables restart

1、54.36.148.*和54.36.149.*是AhrefsBot的IP段。
2、47.74.240.*是阿里云新加坡节点IP段,该ip段上有主机不间断地扫描网站根目录下面的.rar和.zip文件(类似/www.rar, /web.zip),且伪装成baiduspider
3、46.229.168.* 是SemRush bot的IP段

//华为云:
159.138.1.* ~ 159.138.159.*
159.138.224.* ~ 159.138.250.*
114.119.128.* ~ 114.119.191.*

——本文最后由傅老师于2022-10-01编辑过

Comments

5 Responses to “网上坏蜘蛛搜索引擎bot/spider等HTTP USER AGENT关键字一览(无重复,持续更新)”
  1. Web 说道:

    收藏了

  2. 范明明 说道:

    感谢博主,被这些垃圾UA的扫描困扰很久了。防火墙统统屏蔽!

评论


二 × = 四