Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ticket #112 - Enhanced bot detection. #125

Merged
merged 4 commits into from
Mar 22, 2019

Conversation

mahype
Copy link
Contributor

@mahype mahype commented Mar 22, 2019

No description provided.

@MatzeKitt
Copy link
Member

What about the wget identifier? I think this would also be a valid “bot-like” request.

@Zodiac1978
Copy link
Member

I have re-checked against the other bot-check and these lines would still trigger a view although it is a bot:

    "pattern": "Mediapartners-Google"
   "pattern": "APIs-Google"
   "pattern": "[wW]get"
   "pattern": "Python-urllib"
   "pattern": "python-requests"
   "pattern": "libwww-perl"
   "pattern": "httpunit"
   "pattern": "nutch"
   "pattern": "Go-http-client"
   "pattern": "phpcrawl"
   "pattern": "BIGLOTRON"
   "pattern": "Teoma"
   "pattern": "convera"
   "pattern": "Gigablast"
   "pattern": "ia_archiver"
   "pattern": "webmon "
   "pattern": "HTTrack"
   "pattern": "grub.org"
   "pattern": "netresearchserver"
   "pattern": "speedy"
   "pattern": "fluffy"
   "pattern": "findlink"
   "pattern": "panscient"
   "pattern": "ips-agent"
   "pattern": "yanga"
   "pattern": "YandexImages"
   "pattern": "CyberPatrol"
   "pattern": "postrank"
   "pattern": "page2rss"
   "pattern": "linkdex"
   "pattern": "ezooms"
   "pattern": "heritrix"
   "pattern": "findthatfile"
   "pattern": "europarchive.org"
   "pattern": "mappydata"
   "pattern": "eright"
   "pattern": "Apercite"
   "pattern": "Aboundex"
   "pattern": "summify"
   "pattern": "ec2linkfinder"
   "pattern": "Yeti"
   "pattern": "RetrevoPageAnalyzer"
   "pattern": "Sogou"
   "pattern": "wotbox"
   "pattern": "ichiro"
   "pattern": "drupact"
   "pattern": "coccoc"
   "pattern": "integromedb"
   "pattern": "siteexplorer.info"
   "pattern": "proximic"
   "pattern": "changedetection"
   "pattern": "WeSEE:Search"
   "pattern": "CC Metadata Scaper"
   "pattern": "g00g1e.net"
   "pattern": "binlar"
   "pattern": "A6-Indexer"
   "pattern": "ADmantX"
   "pattern": "MegaIndex"
   "pattern": "ltx71"
   "pattern": "BUbiNG"
   "pattern": "Qwantify"
   "pattern": "lipperhey"
   "pattern": "Y!J"
   "pattern": "AddThis"
   "pattern": "MetaURI"
   "pattern": "Scrapy"
   "pattern": "Livelap[bB]ot"
   "pattern": "CapsuleChecker"
   "pattern": "[email protected]"
   "pattern": "DeuSu\\/"
   "pattern": "Sonic"
   "pattern": "Sysomos"
   "pattern": "Trove"
   "pattern": "deadlinkchecker"
   "pattern": "Slack-ImgProxy"
   "pattern": "Embedly"
   "pattern": "iskanie"
   "pattern": "SkypeUriPreview"
   "pattern": "Google-Adwords-Instant"
   "pattern": "WhatsApp"
   "pattern": "electricmonk"
   "pattern": "BingPreview\\/"
   "pattern": "Yahoo Link Preview"
   "pattern": "Daum\\/"
   "pattern": "Xenu Link Sleuth"
   "pattern": "Pcore-HTTP"
   "pattern": "pingdom"
   "pattern": "AppInsights"
   "pattern": "PhantomJS"
   "pattern": "Jetslide"
   "pattern": "newsharecounts"
   "pattern": "Barkrowler"
   "pattern": "TinEye"
   "pattern": "LinkArchiver"
   "pattern": "YaK\\/"
   "pattern": "Digg Deeper"
   "pattern": "dcrawl"
   "pattern": "Snacktory"
   "pattern": "NING\\/"
   "pattern": "okhttp"
   "pattern": "Nuzzel"
   "pattern": "omgili"
   "pattern": "PocketParser"
   "pattern": "um-LN"
   "pattern": "MuckRack"
   "pattern": "AHC\\/"
   "pattern": "NetcraftSurveyAgent"
   "pattern": "Apache-HttpClient"
   "pattern": "AppEngine-Google"
   "pattern": "Jetty"
   "pattern": "Upflow"
   "pattern": "Thinklab"
   "pattern": "Traackr.com"
   "pattern": "Twurly"
   "pattern": "Mastodon"
   "pattern": "http_get"
   "pattern": "BrandVerity"
   "pattern": "check_http"
   "pattern": "EZID"
   "pattern": "^LCC "
   "pattern": "Buck\\/"
   "pattern": "Genieo"
   "pattern": "MeltwaterNews"
   "pattern": "Moreover"
   "pattern": "newspaper\\/"
   "pattern": "ScoutJet"
   "pattern": "(^| )sentry\\/"
   "pattern": "seoscanners"
   "pattern": "Hatena"
   "pattern": "Google Web Preview"
   "pattern": "adscanner"
   "pattern": "Netvibes"
   "pattern": "Baidu-YunGuanCe"
   "pattern": "BTWebClient"
   "pattern": "Disqus"
   "pattern": "Feedly"
   "pattern": "Fever"
   "pattern": "Flamingo_SearchEngine"
   "pattern": "FlipboardProxy"
   "pattern": "G2 Web Services"
   "pattern": "vkShare"
   "pattern": "Siteimprove.com"
    "pattern": "DareBoost"
    "pattern": "Miniflux\\/"
    "pattern": "Feedspot"
    "pattern": "SEOkicks"
    "pattern": "tracemyfile"
    "pattern": "zgrab"
    "pattern": "PR-CY.RU"
   "pattern": "Datafeedwatch"
   "pattern": "Zabbix"
   "pattern": "google-xrawler"
   "pattern": "axios"
   "pattern": "Amazon CloudFront"
   "pattern": "Pulsepoint"
   "pattern": "CloudFlare-AlwaysOnline"
  "pattern": "Google-Structured-Data-Testing-Tool"
  "pattern": "WordupInfoSearch"
   "pattern": "WebDataStats"
   "pattern": "HttpUrlConnection"
   "pattern": "outbrain"
   "pattern": "W3C_Validator"
   "pattern": "Validator\\.nu"
   "pattern": "W3C-checklink"
   "pattern": "W3C-mobileOK"
   "pattern": "W3C_I18n-Checker"
   "pattern": "FeedValidator"
   "pattern": "W3C_CSS_Validator"
   "pattern": "W3C_Unicorn"
   "pattern": "Google-PhysicalWeb"
   "pattern": "Blackboard"
   "pattern": "BazQux"
   "pattern": "Twingly"
   "pattern": "Rivva"
   "pattern": "Dataprovider.com"
   "pattern": "GroupHigh\\/"
   "pattern": "theoldreader.com"
   "pattern": "AnyEvent"
   "pattern": "Nmap Scripting Engine"
   "pattern": "2ip.ru"
   "pattern": "Clickagy"
   "pattern": "Google Favicon"

@Zodiac1978
Copy link
Member

I think we can start with this PR and then iterate. This is much better than our status quo, so I will merge this and new ideas should be discussed in a new issue.

@Zodiac1978 Zodiac1978 merged commit c1161a4 into master Mar 22, 2019
@stklcode stklcode deleted the issue/112-refine-user-agent-check branch April 13, 2020 12:12
@stklcode stklcode added this to the 1.7.0 milestone Apr 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants