HAProxy
DataDome HAProxy module detects and protects against bot activity.
Before the regular HAProxy process starts, the module makes a call to one of our Regional Endpoints using a KeepAlive connection.
Depending on the response, the module will either block the query or let HAProxy proceed with the regular process.
The module has been developed to protect the visitors' experience: If any errors were to occur during the process, or if the timeout is reached, the module will automatically disable its blocking process and allow those hits.
Compatibility
DataDome module has been tested with HAProxy versions 1.8 and higher.
Due to a bug found in HaProxy SPOE, the following minor versions are not compatible: 1.8.9, 1.8.21, 1.9.8 till 1.9.11.
HAProxy 1.7+,1.8+,1.9+,2.1+,2.3+,2.5+ are not supported anymore by HAProxy and are not “Long Term Support” (LTS) versions. We encourage our customers to upgrade their system to a supported version.
Get HAProxy with LUA through package manager
If you already have HAProxy binary with LUA support, you can skip this section.
yum install https://centos7.iuscommunity.org/ius-release.rpm
yum install haproxy18u
apt-get -t stretch-backports install haproxy
Configuration
You need to follow the steps below:
- Download the latest DataDome module here and unzip it in your HAProxy configuration directory. The archive includes the following files:
- spoe-datadome.conf: configuration of the SPOE filter
- datadome.lua: a LUA script that handles the transformation of the HTTP request
- Edit the spoe-datadome.conf file and replace DATADOME_API_KEY with your own API Key
- Update your HAProxy configuration file by replacing with the actual path where you placed the file, and setting the different blocks needed:
global
[...]
lua-load <PATH>/datadome.lua
[...]
# Example of frontend which will be protected
frontend http
[...]
# Insert these lines on each frontend you want to protect
http-request set-var(txn.dummy1) var(txn.dd.x_datadome_request_headers)
http-request set-var(txn.dummy2) var(txn.dd.x_datadome_headers)
http-request set-var(txn.dummy3) var(txn.dd.x_datadome_response)
http-request set-var(txn.dummy4) var(txn.dd.body)
http-request set-var(txn.dummy5) var(txn.dd.error)
filter spoe engine datadome config <PATH>/spoe-datadome.conf
http-request lua.Datadome_request_hook
http-response lua.Datadome_response_hook
# Insert this line before all default_backend / use_backend directives
use_backend failure_backend if { var(txn.dd.status) -i blocked }
default_backend [...]
# Backend to server the "blocked page"
backend failure_backend
mode http
http-request use-service lua.failure_service
# Backend to contact Datadome API
backend spoe-datadome
mode tcp
timeout connect 1s
option tcp-check
tcp-check connect ssl
server datadome-spoe1 api.datadome.co:12346 check ssl verify none
Optional: To maximize high availability, our endpoints rely on several IPs. To benefit from this IP resolution, we suggest inserting a "resolvers" section inside your HAProxy configuration. You can find the full documentation for HAproxy v1.8 in the following link: https://cbonte.github.io/haproxy-dconv/1.8/configuration.html#5.3.2
Note: The TCP connection to DataDome is based on the values set in the global and default sections.
Settings
Settings | Description | Default value |
---|---|---|
API endpoint URL | URL of the closest endpoint. More info here | api.datadome.co |
API endpoint port | Plain TCP: 12345 SSL: 12346 | |
Timeout hello (spoe-datadome.conf) | Timeout for the SPOE for beginning handshake. Should be at least 4 times the latency RTT with DataDome (1 for TCP, 2 for TLS, 1 for SPOE) +10 ms. | 100 ms |
Timeout idle (spoe-datadome.conf) | Maximum time to wait for an agent to close an idle connection. Value must be smaller than the "timeout server" of the SPOE backend. | 10 minutes |
Timeout processing (spoe-datadome.conf) | Maximum time to wait for a stream to process an event. A hit is generated if the upper-bound limit of DataDome latency overhead is reached. You can find the number of timeouted connections by logging the txn.dd.error variable. On timeout, this variable is set to 1 (see below for other codes). | 50 ms |
ACL static_file url_reg | Using HAProxy ACL. By default no calls will be made to DataDome for static assets. | .(js|css|jpg|jpeg|png|ico| gif|tiff|svg|woff|woff2|ttf| eot|mp4|otf)$ |
FAQ
Can I have DataDome response status in the log?
The specific HAProxy variables are set as below:
- When the interrogation is correctly handled by DataDome, the txn.dd.x_datadome_response contains the value of the HTTP response API
- When there is an issue in the call to DataDome, the variable txn.dd.error contains the SPOE error code:
- The complete code list can be found in the link below: https://www.haproxy.org/download/1.8/doc/SPOE.txt
- The main codes are as follows:
- 1: A timeout occurred during the event processing
- 2: An error was triggered during the resource allocation
- 5: The frame processing has been interrupted by HAProxy
- 255: An unknown error occurred during the event processing
- Higher than 256: A SPOP error occurred during the event processing (Refer to SPOE documentation)
Can I get Bot Name, Bot Type and Bot/Human flags in my application?
From version 1.8.0 of this module, you can log the values of the DataDome headers by configuring your log format.
You can find more information here.
Exclude pages from DataDome protection
In the spoe-datadome.conf file, the option to call DataDome is managed by an HAProxy ACL.
By default, we exclude requests with paths ending by js, css, jpg, jpeg, png, ico, gif, tiff, svg, woff, woff2, ttf, eot, mp4, otf.
You can use the complete HAProxy to choose which requests will be sent to DataDome.
Updated about 1 month ago