I looked at my website's logs recently, and noticed that a particular organization is hammering it. Sometimes multiple requests in the same second. Thousands of requests per day. Note: my website has a few hundred pages, that change very infrequently. Not cool!
The organization is "internet-measurement.com". I know that their constant requests are reducing the website's responsiveness for the few humans that visit. Their description of what they're doing doesn't make me think they're benefiting me in any way, but it's not clear.
For organizations that follow the Robots Exclusion Protocol, one can control which urls are hit by a web spider, and how frequently, using a "robots.txt" file:
robots.txt:# robots.txt for www.EricBT.com User-agent: * Crawl-Delay: 20 ... User-agent: InternetMeasurement Disallow: /
The Robots Exclusion Protocol is a "scout's honor" protocol. There is nothing forcing a third party to honor it.
I don't know if internet-measurement.com is honoring my robots.txt file. Even if it is, I'm pretty sure it is not honoring the unofficial "Crawl-Delay" that I specified.
For organizations like internet-measurement.com that provide a user-agent header in requests, one can write Express middleware to block requests:
libs/config.js:... exports.blockedUserAgentRegex = /internet-measurement/i; ...
To match multiple user-agent values, use a regular expression like this:
/internet-measurement|useragent1|useragent2|useragent3/ilibs/blockRequests.js:
const config = require('./config');
exports.blockByUserAgent = (req, res, next) => {
// user-agent header may not be present, so default to empty string.
const userAgent = (req.headers['user-agent'] || '');
const shouldBlock = config.blockedUserAgentRegex.test(userAgent);
if (shouldBlock) {
global.logger.info(`blockByUserAgent: blocking request: url: "${req.url}" method: "${req.method}" ip address: ${req.ip} user-agent: "${userAgent}"`);
return res.status(403).send('Forbidden');
}
else {
next();
}
};
Note: The above code should be reasonably efficient if the regular expression is kept simple. Since the regular expression is used to process each request, an inefficient regex will reduce the performance of the entire website.
app.js:
const express = require('express');
...
const app = express();
...
// Block requests with specific user-agent headers
app.use(blockRequests.blockByUserAgent);
...
The beautiful curl utility makes this code easy to test:
eric@eric-VirtualBox:~$ curl -v -H "User-Agent: internet-measurement" https://ericbt.com * Host ericbt.com:443 was resolved. ... < HTTP/1.1 403 Forbidden ...
Windows users can test with the PowerShell Invoke-WebRequest command:
PS C:\Users\erict> Invoke-WebRequest -UserAgent "internet-measurement" -Uri https://ericbt.com
Invoke-WebRequest : Forbidden
At line:1 char:2
+ Invoke-WebRequest -UserAgent "internet-measurement" -Uri https://eri ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (System.Net.HttpWebRequest:HttpWebRequest) [Invoke-WebRequest], WebException
+ FullyQualifiedErrorId : WebCmdletWebResponseException,Microsoft.PowerShell.Commands.InvokeWebRequestCommand
Photo by Alan Murray-Rust
| Title | Date |
| Node.js + Express: How to Block Requests by User-Agent Headers | January 7, 2026 |
| Vault 3 is Now Available for Windows on ARM Machines! | December 13, 2025 |
| Vault 3: How to Include Outline Text in Exported Photos | October 26, 2025 |
| .NET Public-Key (Asymmetric) Cryptography Demo | July 20, 2025 |
| Raspberry Pi 3B+ Photo Frame | June 17, 2025 |
| EBTCalc (Android) Version 1.53 is now available | May 19, 2024 |
| Vault 3 Security Enhancements | October 24, 2023 |