HAProxy

DataDome HAProxy module detects and protects against bot activity.

Before the regular HAProxy process starts, the module makes a call to one of our Regional Endpoints using a KeepAlive connection.

Depending on the response, the module will either block the query or let HAProxy proceed with the regular process.
The module has been developed to protect the visitors' experience: If any errors were to occur during the process, or if the timeout is reached, the module will automatically disable its blocking process and allow those hits.

Compatibility

DataDome module has been tested with HAProxy versions 1.8 and higher.

Due to a bug found in HAProxy SPOE, the following minor versions are not compatible: 1.8.9, 1.8.21, 1.9.8 until 1.9.11 included.

πŸ“˜

Unsupported HAProxy versions

Most non-LTS (Long Term Support) versions are unmaintained. We encourage you to upgrade your setup if you are using one of the following versions which are not supported by HAProxy anymore.

  • 1.7+
  • 1.8+
  • 1.9+
  • 2.1+
  • 2.3+
  • 2.5+

Install HAProxy with Lua support

If you already have HAProxy binary with Lua support, you can skip this section.

yum install https://centos7.iuscommunity.org/ius-release.rpm
yum install haproxy18u
apt-get install haproxy

Configuration

You need to follow the steps below:

  • Download the latest DataDome module here and unzip it in your HAProxy configuration directory. The archive includes the following files:
    • spoe-datadome.conf: configuration of the SPOE filter
    • datadome.lua: a LUA script that handles the transformation of the HTTP request
  • Edit the spoe-datadome.conf file and replace DATADOME_API_KEY with your own API Key
  • Update your HAProxy configuration file by replacing with the actual path where you placed the file, and setting the different blocks needed:
global
    [...]
    lua-load <PATH>/datadome.lua 
    [...]

# Example of frontend which will be protected
frontend http
    [...]
    # Insert these lines on each frontend you want to protect
    http-request set-var(txn.dummy1) var(txn.dd.x_datadome_request_headers)
    http-request set-var(txn.dummy2) var(txn.dd.x_datadome_headers)
    http-request set-var(txn.dummy3) var(txn.dd.x_datadome_response)
    http-request set-var(txn.dummy4) var(txn.dd.body)
    http-request set-var(txn.dummy5) var(txn.dd.error)
    filter spoe engine datadome config <PATH>/spoe-datadome.conf
    http-request lua.Datadome_request_hook
    http-response lua.Datadome_response_hook
    # Insert this line before all default_backend / use_backend directives
    use_backend failure_backend if { var(txn.dd.status) -m str "blocked" }
    default_backend [...]

# Backend to server the "blocked page"
backend failure_backend
    mode http
    http-request    use-service     lua.failure_service 

# Backend to contact Datadome API
backend spoe-datadome
    mode tcp
    timeout connect 1s
    option tcp-check
    tcp-check connect ssl
    server datadome-spoe1 api.datadome.co:12346 check ssl verify none

Optional: To maximize high availability, our endpoints rely on several IPs. To benefit from this IP resolution, we suggest inserting a "resolvers" section inside your HAProxy configuration. You can find the full documentation for HAproxy v1.8 in the following link: https://cbonte.github.io/haproxy-dconv/1.8/configuration.html#5.3.2

🚧

Keep the failure_backend declaration first

HAProxy is using backends in the order they are defined in the configuration file.

backend failure_backend should remain first in order to be used by HAProxy when a request is blocked by DataDome. If not, blocked requests will be let through and reach the backend you defined in priority.

Reference documentation from HAProxy here

All of these rules are evaluated in their declaration order, and the first one which matches will
assign the backend.

Note: The TCP connection to DataDome is based on the values set in the global and default sections.

Settings

SettingsDescriptionDefault value
API endpoint URLURL of the closest endpoint.
More info here
api.datadome.co
API endpoint portPlain TCP: 12345
SSL: 12346
Timeout hello (spoe-datadome.conf)Timeout for the SPOE for beginning handshake.
Should be at least 4 times the latency RTT with DataDome (1 for TCP, 2 for TLS, 1 for SPOE) +10 ms.
100 ms
Timeout idle (spoe-datadome.conf)Maximum time to wait for an agent to close an idle connection.
Value must be smaller than the "timeout server" of the SPOE backend.
10 minutes
Timeout processing (spoe-datadome.conf)Maximum time to wait for a stream to process an event.
A hit is generated if the upper-bound limit of DataDome latency overhead is reached.
You can find the number of timeouted connections by logging the txn.dd.error variable. On timeout, this variable is set to 1 (see below for other codes).
50 ms
ACL static_file url_regUsing HAProxy ACL. By default no calls will be made to DataDome for static assets..(js|css|jpg|jpeg|png|ico|
gif|tiff|svg|woff|woff2|ttf|
eot|mp4|otf)$

FAQ

Can I have DataDome response status in the log?

πŸ“˜

Module compatibility

Only supported on HAProxy18 1.8.0+ and HAPEE 1.5.1+ modules.

The specific HAProxy variables are set as below:

  • When the interrogation is correctly handled by DataDome, the txn.dd.x_datadome_response contains the value of the HTTP response API
  • When there is an issue in the call to DataDome, the variable txn.dd.error contains the SPOE error code:
    • The complete code list can be found in the link below: https://www.haproxy.org/download/1.8/doc/SPOE.txt
    • The main codes are as follows:
      • 1: A timeout occurred during the event processing
      • 2: An error was triggered during the resource allocation
      • 5: The frame processing has been interrupted by HAProxy
      • 255: An unknown error occurred during the event processing
      • Higher than 256: A SPOP error occurred during the event processing (Refer to SPOE documentation)

Can I get Bot Name, Bot Type and Bot/Human flags in my application?

From version 1.8.0 of this module, you can log the values of the DataDome headers by configuring your log format.
HA Proxy configuration. The list of all headers exposed is available in our Log Enrichment page.

# frontend settings with DataDome integration
http-request lua.Datadome_request_hook
http-response lua.Datadome_response_hook

 # Custom log for DataDome Enrich headers 
log-format "X-DataDome-botname: %{+Q}[lua.ddHeaders(X-DataDome-botname)] | X-DataDome-isbot: %{+Q}[lua.ddHeaders(X-DataDome-isbot)] | X-DataDome-ruletype: %{+Q}[lua.ddHeaders(X-DataDome-ruletype)]"

use_backend failure_backend if { var(txn.dd.status) -i -m str blocked }

I see that some requests are blocked in the DataDome dashboard, but the captcha is not displayed

HAProxy evaluates the use_backend directive in declaration order, and picks the first one matching the rules defined.

The failure_backend must be declared first to display the captcha. It is triggered only when the request is blocked by DataDome.

Exclude pages from DataDome protection

In the spoe-datadome.conf file, the option to call DataDome is managed by an HAProxy ACL.
By default, we exclude requests with paths ending by js, css, jpg, jpeg, png, ico, gif, tiff, svg, woff, woff2, ttf, eot, mp4, otf.
You can use the complete HAProxy ACL rules set to choose which requests will be sent to DataDome.