Suche
Beiträge, die mit crawler getaggt sind
The domain
Given the complexity of setting up something like this, we believe that the crawler is likely operating with bad intentions. While there could be some use for an index of instances based on community region, tracking the actual physical location of the server backends is highly suspicious. I'd encourage all instance admins to consider whether something like this poses a threat, and to take appropriate action.
For anyone interested in going beyond a simple domain block, please see these log excerpts typical of being crawled via AP probes. Logs are taken from a non-standard Sharkey deployment and may not directly translate to other software, but I've tried to include as much detail as possible anyway.
Sharkey admins can check whether you've been scanned by searching for backend log patterns like this (make sure to replace your instance hostname where appropriate):
RE: https://enby.life/notes/a4vj8c2xq1
mirage.foxb612.com
and IP address 65.108.53.178
have been blocked (defederated) from Enby.Life. These are part of a fediverse crawler system that indexes servers based on the country where they are physically located. This wouldn't normally be against our rules, but the crawler goes to great lengths to de-anonymize instances, including sending fake-signed ActivityPub probes to obtain the server's true IP address. Requests from the crawler use a web browser's User Agent to evade filters, and documentation on the website mentions that CloudFlare bypasses are also in use.Given the complexity of setting up something like this, we believe that the crawler is likely operating with bad intentions. While there could be some use for an index of instances based on community region, tracking the actual physical location of the server backends is highly suspicious. I'd encourage all instance admins to consider whether something like this poses a threat, and to take appropriate action.
For anyone interested in going beyond a simple domain block, please see these log excerpts typical of being crawled via AP probes. Logs are taken from a non-standard Sharkey deployment and may not directly translate to other software, but I've tried to include as much detail as possible anyway.
Sharkey admins can check whether you've been scanned by searching for backend log patterns like this (make sure to replace your instance hostname where appropriate):
Feb 17 20:10:21 campsite run-sharkey.sh[241576]: INFO * [apserv sigcheck] req-yzi /users/9fpwmts9tv (by Mozilla/5.0 (X11; Linux x86_64; rv:127.0) Gecko/20100101 Firefox/127.0) apparently from mirage.foxb612.com: we don't know the user for keyId https://mirage.foxb612.com/kiite/key/enby.life/1739823020/NHc8pVYoNGmLk3My/main-key, trying to fetch via https://mirage.foxb612.com/kiite/key/enby.life/1739823020/NHc8pVYoNGmLk3My/main-key
Alternately, anyone with Activity Logging in place can check for AP fetch errors like this:id,at,duration,host,request_uri,object_uri,accepted,result,object,context_hash
a4n23pddff,2025-02-24 20:10:24.433000 +00:00,894.86,mirage.foxb612.com,https://mirage.foxb612.com/kiite/key/enby.life/1740427823/Y93ZjgZHZlxNSuxa/main-key,,false,Error: invalid content type of AP response - content type is not application/activity+json or application/ld+json: https://mirage.foxb612.com/kiite/key/enby.life/1740427823/Y93ZjgZHZlxNSuxa/main-key,,
A final indicator is reverse-proxy logs showing this domain as part of an HTTP Signature header. Here's an example from our Caddy server:Feb 24 20:10:25 campsite caddy[916]: 2025/02/24 20:10:25.329 ERROR http.log.access.log0 handled request {
"request": {
"remote_ip": "65.108.53.178",
"remote_port": "53964",
"client_ip": "65.108.53.178",
"proto": "HTTP/1.1",
"method": "GET",
"host": "enby.life",
"uri": "/users/9fpwmts9tv",
"headers": {
"Accept-Encoding": [
"gzip, deflate"
],
"Accept": [
"application/activity+json"
],
"Connection": [
"keep-alive"
],
"Content-Type": [
"application/activity+json"
],
"Date": [
"Mon, 24 Feb 2025 20:10:23 GMT"
],
"Signature": [
"keyId=\"https://mirage.foxb612.com/kiite/key/enby.life/1740427823/Y93ZjgZHZlxNSuxa/main-key\",algorithm=\"rsa-sha256\",headers=\"(request-target) host date\",signature=\"5umGzjOXHeV8DdI4NjQqwbag6ChMKYS6\""
],
"User-Agent": [
"Mozilla/5.0 (X11; Linux x86_64; rv:127.0) Gecko/20100101 Firefox/127.0"
]
},
"tls": {
"resumed": false,
"version": 772,
"cipher_suite": 4865,
"proto": "http/1.1",
"server_name": "enby.life"
}
},
"bytes_read": 0,
"user_id": "",
"duration": 0.901198418,
"size": 254,
"status": 500,
"resp_headers": {
"Date": [
"Mon, 24 Feb 2025 20:10:25 GMT"
],
"Access-Control-Allow-Origin": [
"*"
],
"Alt-Svc": [
"h3=\":443\"; ma=2592000"
],
"Content-Type": [
"application/json; charset=utf-8"
],
"Strict-Transport-Security": [
"max-age=15552000; preload"
],
"Access-Control-Allow-Methods": [
"GET, OPTIONS"
],
"Content-Length": [
"254"
],
"Access-Control-Allow-Headers": [
"Accept"
],
"Server": [
"Caddy"
],
"Access-Control-Expose-Headers": [
"Vary"
],
"Cache-Control": [
"private, max-age=0, must-revalidate"
]
}
}
#FediBlock #BlockRecommendation #Moderation #Crawler #ScraperRE: https://enby.life/notes/a4vj8c2xq1
Hazelnoot (@hazelnoot)
found a weird new fedi crawler thing, will post details and block instructions soonenby.life