Eric Bergman-Terrell's Blog

Node.js + Express: How to Block Requests by User-Agent Headers
January 7, 2026

My website's logs show that a particular organization is hammering it. Sometimes with multiple requests in the same second. With *thousands* of requests per day. Not cool!

The organization is "internet-measurement.com". I know that their constant requests reduce the website's responsiveness for the humans that visit. internet-measurement.com's description of what they're doing doesn't make me think they're benefiting me in any way, but it's not clear.

For organizations that follow the Robots Exclusion Protocol, one can control which urls are hit by a web spider, and how frequently, using a "robots.txt" file:

robots.txt:
# robots.txt for www.EricBT.com
User-agent: *
Crawl-Delay: 20

...

User-agent: InternetMeasurement
Disallow: /

The Robots Exclusion Protocol is just a set of completely voluntary rules. There is nothing forcing a third party to honor it.

I don't know if internet-measurement.com is honoring my robots.txt file. Even if it is, I'm pretty sure it is not honoring the unofficial "Crawl-Delay" that I specified.

For organizations like internet-measurement.com that provide a user-agent header in requests, one can write Express middleware to block requests:

libs/config.js:
...

exports.blockedUserAgentRegex = /InternetMeasurement/i;

...

To match multiple user-agent values, use a regular expression like this:

/InternetMeasurement|useragent1|useragent2|useragent3/i
libs/blockRequests.js:
const config = require('./config');

exports.blockByUserAgent = (req, res, next) => {
    // user-agent header may not be present, so default to empty string.
    const userAgent = (req.headers['user-agent'] || '');

    const shouldBlock = config.blockedUserAgentRegex.test(userAgent);

    if (shouldBlock) {
        global.logger.info(`blockByUserAgent: blocking request: url: "${req.url}" method: "${req.method}" ip address: ${req.ip} user-agent: "${userAgent}"`);

        return res.status(403).send('Forbidden');
    }
    else {
        next();
    }
};

The above code should be reasonably efficient if the regular expression is kept simple. Since the regular expression is used to process each request, an inefficient regex will reduce the performance of the entire website.

app.js:
const express = require('express');
...

const app = express();
...

// Block requests with specific user-agent headers
app.use(blockRequests.blockByUserAgent);

...

The beautiful curl utility makes this code easy to test:

eric@eric-VirtualBox:~$ curl -v -H "User-Agent: InternetMeasurement" https://ericbt.com
* Host ericbt.com:443 was resolved.
...
< HTTP/1.1 403 Forbidden
...

Windows users can test with the PowerShell Invoke-WebRequest command:

PS C:\Users\erict>  Invoke-WebRequest -UserAgent "InternetMeasurement" -Uri https://ericbt.com
Invoke-WebRequest : Forbidden
At line:1 char:2
+  Invoke-WebRequest -UserAgent "InternetMeasurement" -Uri https://eri ...
+  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (System.Net.HttpWebRequest:HttpWebRequest) [Invoke-WebRequest], WebException
    + FullyQualifiedErrorId : WebCmdletWebResponseException,Microsoft.PowerShell.Commands.InvokeWebRequestCommand

Carcasonne Walls
HTTP Status Code 403: Forbidden!

Keywords: Node.js, Express, User-Agent, middleware, robots.txt, Robots Exclusion Protocol, spider, Crawl-Delay, internet-measurement.com, internet-measurement, InternetMeasurement, curl, PowerShell

Reader Comments

Comment on this Blog Post

Recent Posts

TitleDate
Node.js + Express: How to Block Requests by User-Agent HeadersJanuary 7, 2026
Vault 3 is Now Available for Windows on ARM Machines!December 13, 2025
Vault 3: How to Include Outline Text in Exported PhotosOctober 26, 2025
.NET Public-Key (Asymmetric) Cryptography DemoJuly 20, 2025
Raspberry Pi 3B+ Photo FrameJune 17, 2025
EBTCalc (Android) Version 1.53 is now availableMay 19, 2024
Vault 3 Security EnhancementsOctober 24, 2023