How to install and configure -- The Reliable, High Performance TCP/HTTP Load Balancer - HA Proxy

Whether you are researching load
balancer's for your own needs or the needs of your employer, you will definitely come across HAProxy. As such, you may be asking what it is and how it can benefit you or your company.

When researching load balancers, you will find your options usually fall into one of two categories; Hardware based vs. Software Based. In the hardware realm, you will find options such as F5’s BIG-IP, Citrix NetScaler and Kemp Technologies that offer dedicated appliances that are running proprietary software.

On the opposite side of that, you have software based solutions where you are able to use commodity hardware that fits your needs independent of the load balancing software being used. In this realm,
you will find solutions such as NGINX and HAProxy with the latter being the focus of this guide.

The goal of the team behind HAProxy is to provide a “free, very fast and reliable solution” for load-balancing TCP and HTTP-based applications. Because of this HAProxy is considered by many to be the de facto standard when it comes to software-based load balancing and is currently being used by sites such as GitHub, Reddit, Twitter and Tumblr to name a few.
It has been designed to run on Linux, Solaris, FreeBSD, OpenBSD as well as AIX platforms. While it’s designed to run on most x86-64 hardware that has limited resources, it will perform best when provided enterprise-grade hardware such as 10+ Gig NIC’s and Xeon class CPU’s or similar.

Installing HAProxy on CentOS 7. As a fast developing open source application HAProxy available for install in the CentOS default repositories might not be the latest release. To find out what version number is being offered through the official channels enter the following command.

sudo yum info haproxy  

The output should look like this:

alt

HAProxy has always three active stable versions of the releases, two of the latest versions in development plus a third older version that is still receiving critical updates. You can always check the currently newest stable version listed on the HAProxy website and then decide which version you wish to go with.

In this guide, we will be installing the currently latest stable version of 1.7 (2017/08/18), which was not yet available in the standard repositories. Instead, you will need to install it from the source. But first, check that you have the prerequisites to download and compile the program.

sudo yum install gcc pcre-static pcre-devel -y  

NOTE: It might happen that kernel-headers are missing. By default, the CentOS 7 cloud servers have kernel-headers disabled, so you'll need to install them manually by:

sudo yum install -y kernel-headers --disableexcludes=all  

Download the source code with the command below. You can check if there is a newer version available at the HAProxy http://www.haproxy.org/#down

wget http://www.haproxy.org/download/1.7/src/haproxy-1.7.9.tar.gz -O ~/haproxy.tar.gz  

Once the download is complete, extract the files using the command below.

tar -xvf haproxy.tar.gz  

Change into the extracted source directory.

cd haproxy-1.7.9  

Then compile the program for your system.

make TARGET=linux2628  

And finally, install HAProxy itself.

sudo make install  

The output should look like this:
alt

With that done, HAProxy is now installed but requires some additional steps to get it operational. Continue below with setting up the software and services.

Setting up HAProxy for your server

Now, since we have choosen the "hard" way we need to add the following directories and the statistics file for HAProxy records.

sudo mkdir -p /etc/haproxy  
sudo mkdir -p /var/lib/haproxy  
sudo touch /var/lib/haproxy/stats  

Next, we will create a symbolic link for the binary to allow you to run HAProxy commands as a normal user.

sudo ln -s /usr/local/sbin/haproxy /usr/sbin/haproxy  

If you want to add the proxy as a service to the system, copy the haproxy.init file from the examples to your /etc/init.d directory. Change the file permissions to make the script executable and then reload the systemd daemon.

sudo cp ~/haproxy-1.7.9/examples/haproxy.init /etc/init.d/haproxy  
sudo chmod 755 /etc/init.d/haproxy  
sudo systemctl daemon-reload  

For general usage, it is also recommended to add a new user for HAProxy to be run under and assign password for it.

sudo useradd -r haproxy  
passwd haproxy  

You can double check the installed version number with the following command.

haproxy -v  

The output needs to look like this:
alt

Lastly, the firewall on CentOS 7 is quite restrictive for this project by default. Use the following commands to allow the required services and reload the firewall. Whatever type of firewall you are using (built-in), iptables, csf/apf please open port 8181 TCP. We will use firewall-cmd now:

sudo firewall-cmd --permanent --zone=public --add-service=http  
sudo firewall-cmd --permanent --zone=public --add-port=8181/tcp  
sudo firewall-cmd --reload  

Note: If you don't have firewalld installed you can install it with yum install firewalld then service firewalld restart and proceed with the commands above. Otherwise, you can use iptables as I said before.

Configuring the load balancer How to Install HAProxy Load Balancer on CentOS
/Tutorials /How to Install HAProxy Load Balancer on CentOS Tutorials Get started
Try this guide out on UpCloud with our free trial!
Load balancing is a common solution for distributing web applications horizontally across multiple hosts while providing the users with a single point of access to the service. HAProxy is one of the most popular open source load balancing software, which also offers high availability and proxy functionality.

HAProxy aims to optimise resource usage, maximise throughput, minimise response time, and avoid overloading any single resource. It is available for install on many Linux distributions like CentOS 7 in this guide, but also on Debian 8 and Ubuntu 16 systems.

HAProxy is particularly suited for very high traffic websites and is therefore often used to improve web service reliability and performance for multi-server configurations. This guide lays out the steps for setting up HAProxy as a load balancer on CentOS 7 to its own cloud host which then directs the traffic to your web servers.

As a pre-requirement for the best results, you should have a minimum of two web servers and a server for the load balancer. The web servers need to be running at least the basic web service such as nginx or httpd to test out the load balancing between them.

Contents

Installing HAProxy CentOS 7

Setting up HAProxy for your server

Configuring the load balancer

Load balancing at layer 4

Different load balancing algorithms

Configuring load balancing for layer 7

Testing the setup

Password protecting the statistics page

Installing HAProxy CentOS 7

As a fast developing open source application HAProxy available for install in the CentOS default repositories might not be the latest release. To find out what version number is being offered through the official channels enter the following command.

sudo yum info haproxy
HAProxy has always three active stable versions of the releases, two of the latest versions in development plus a third older version that is still receiving critical updates. You can always check the currently newest stable version listed on the HAProxy website and then decide which version you wish to go with.

In this guide, we will be installing the currently latest stable version of 1.7, which was not yet available in the standard repositories. Instead, you will need to install it from the source. But first, check that you have the prerequisites to download and compile the program.

sudo yum install gcc pcre-static pcre-devel -y
Download the source code with the command below. You can check if there is a newer version available at the HAProxy download page.

wget https://www.haproxy.org/download/1.7/src/haproxy-1.7.8.tar.gz -O ~/haproxy.tar.gz
Once the download is complete, extract the files using the command below.

tar xzvf ~/haproxy.tar.gz -C ~/
Change into the extracted source directory.

cd ~/haproxy-1.7.8
Then compile the program for your system.

make TARGET=linux2628
And finally, install HAProxy itself.

sudo make install
With that done, HAProxy is now installed but requires some additional steps to get it operational. Continue below with setting up the software and services.

Setting up HAProxy for your server

Next, add the following directories and the statistics file for HAProxy records.

sudo mkdir -p /etc/haproxy
sudo mkdir -p /var/lib/haproxy
sudo touch /var/lib/haproxy/stats
Create a symbolic link for the binary to allow you to run HAProxy commands as a normal user.

sudo ln -s /usr/local/sbin/haproxy /usr/sbin/haproxy
If you want to add the proxy as a service to the system, copy the haproxy.init file from the examples to your /etc/init.d directory. Change the file permissions to make the script executable and then reload the systemd daemon.

sudo cp ~/haproxy-1.7.8/examples/haproxy.init /etc/init.d/haproxy
sudo chmod 755 /etc/init.d/haproxy
sudo systemctl daemon-reload
For general usage, it is also recommended to add a new user for HAProxy to be run under.

sudo useradd -r haproxy
Afterwards, you can double check the installed version number with the following command.

haproxy -v
HA-Proxy version 1.7.8 2017/07/07
Copyright 2000-2017 Willy Tarreau willy@haproxy.org
In this case, the version should be 1.7.8 like shown in the example output above.

Lastly, the firewall on CentOS 7 is quite restrictive for this project by default. Use the following commands to allow the required services and reload the firewall.

sudo firewall-cmd --permanent --zone=public --add-service=http
sudo firewall-cmd --permanent --zone=public --add-port=8181/tcp
sudo firewall-cmd --reload
Configuring the load balancer

Setting up HAProxy for load balancing is a quite straight forward process. Basically, all you need to do is tell HAProxy what kind of connections it should be listening for and where the connections should be relayed to.

This is done by creating a configuration file /etc/haproxy/haproxy.cfg with the defining settings. You can read about the configuration options at HAProxy https://cbonte.github.io/haproxy-dconv/1.7/configuration.html in case you are interested to find out more.

1) Load balancing at layer 4
We can now start with the basic setup and configure haproxy.cfg file with the popular editor nano.

nano /etc/haproxy/haproxy.cfg  

Add the following sections to the file. Replace the with what ever you want to call you servers on the statistics page and the with the private IPs for the servers you wish to direct the web traffic to. You may network your servers using VPN, PrivateLAN or whatever datacenter provides:

global  
   log /dev/log local0
   log /dev/log local1 notice
   chroot /var/lib/haproxy
   stats timeout 30s
   user haproxy
   group haproxy
   daemon

defaults  
   log global
   mode http
   option httplog
   option dontlognull
   timeout connect 5000
   timeout client 50000
   timeout server 50000

frontend http_front  
   bind *:80
   stats uri /haproxy?stats
   default_backend http_back

backend http_back  
   balance roundrobin
   server <server name> <private IP>:80 check
   server <server name> <private IP>:80 check

In my particular case, the haproxy.cfg file is looking like this:
alt

Where I will route port 80 traffic to two other servers that I have already installed. This defines a layer 4 load balancer with a front-end name httpfront listening to the port number 80, which then directs the traffic to the default backend named httpback. The additional stats URI /haproxy?stats enables the statistics page at that specified address.

Different load balancing algorithms

Configuring the servers in the backend section allows HAProxy to use these servers for load balancing according to the roundrobin algorithm whenever available.

The balancing algorithms are used to decide which server at the backend each connection is transferred to. Some of the useful options include the following:

  • Roundrobin: Each server is used in turns according to their weights. This is the smoothest and fairest algorithm when the servers’ processing time remains equally distributed. This algorithm is dynamic, which allows server weights to be adjusted on the fly.

  • Leastconn: The server with the lowest number of connections is chosen. Round-robin is performed between servers with the same load. Using this algorithm is recommended with long sessions, such as LDAP, SQL, TSE, etc, but it is not very well suited for short sessions such as HTTP.

  • First: The first server with available connection slots receives the connection. The servers are chosen from the lowest numeric identifier to the highest, which defaults to the server’s position on the farm. Once a server reaches its maxconn value, the next server is used.

  • Source: The source IP address is hashed and divided by the total weight of the running servers to designate which server will receive the request. This way the same client IP address will always reach the same server while the servers stay the same.

In this article, I will do my best to explain the terminology and a few of the key terms and concepts you should understand when working with HAProxy. When working with load balancers, these are the key concepts that will apply to all solutions.

2) Configuring load balancing for layer 7

Another possibility is to configure the load balancer to work on layer 7, which is useful when parts of your web application are located on different hosts. This can be accomplished by conditioning the connection transfer for example by the URL. Definition - What does Layer 7 mean?

Layer 7 refers to the seventh and topmost layer of the Open Systems Interconnect (OSI) Model known as the application layer. This is the highest layer which supports end-user processes and applications. Layer 7 identifies the communicating parties and the quality of service between them, considers privacy and user authentication, as well as identifies any constraints on the data syntax. This layer is wholly application-specific.

Open the HAProxy configuration file with a text editor, nano in our case and:

sudo nano /etc/haproxy/haproxy.cfg  

Then set the front and backend segments according to the example below. Once again change fields and with your real settings:

frontend http_front  
   bind *:80
   stats uri /haproxy?stats
   acl url_blog path_beg /blog
   use_backend blog_back if url_blog
   default_backend http_back

backend http_back  
   balance roundrobin
   server <server name> <private IP>:80 check
   server <server name> <private IP>:80 check

backend blog_back  
   server <server name> <private IP>:80 check

Explanation of each term. The front end declares an ACL rule named urlblog that applies to all connections with paths that begin with /blog. Usebackend defines that connections matching the urlblog condition should be served by the backend named blogback, while all other requests are handled by the default backend.

At the backend side, the configuration sets up two server groups, httpback like before and the new one called blogback that servers specifically connections to example.com/blog. This is very useful if you plan to divide Backend with Frontend like with Magento for example where there are many concurrent backend users doing a lot of changes.

After making the configurations, save the file and restart HAProxy with the next command.

sudo systemctl restart haproxy  

The output should look like this:

alt

If you get any errors or warnings at start up, check the configuration for any mistypes and that you have created all the necessary files and folders, then try restarting again.

TESTING
Now we can test the setup and make sure this is working as expected. With the HAProxy configured and running, open your load balancer server’s public IP in a web browser and check that you get connected to your backend correctly. The parameter stats URI in the configuration enables the statistics page at the defined address.

http://<load balancer public IP>/haproxy?stats  

When you load the statistics page and all of your servers are listed in green your configuration was successful!
alt

In my case that is http://54.229.71.221/haproxy?stats and now when I visit just http://54.229.71.221 URL I can see the Nginx default page while pressing F5 or CTRL-R clearly HA Proxy is doing its job honoring "roundrobin" algorithm.
alt

The statistics page contains some helpful information to keep track of your web hosts including up and down times and session counts. If a server is listed in red, check that the server is powered on and that you can ping it from the load balancer machine.

In case your load balancer does not reply, check that HTTP connections are not getting blocked by the firewall. Also, confirm that HAProxy is running with the command below.

sudo systemctl status haproxy  

Password protecting the statistics page Having the statistics page simply listed at the front end, however, is publicly open for anyone to view, which might not be such a good idea. Instead, you can set it up to its own port number by adding the example below to the end of your haproxy.cfg file. Replace the 'nemanja' and 'password' with something secure.

listen stats  
   bind *:8181
   stats enable
   stats uri /
   stats realm Haproxy\ Statistics
   stats auth username:password

After adding the new listen group, remove the old reference to the stats uri from the frontend group. When done, save the file and restart HAProxy again.

sudo systemctl restart haproxy  

Then open the load balancer again with the new port number, and log in with the username and password you set in the configuration file.

http://load balancer ip-public:8181
Check that your servers are still reporting all green and then open just the load balancer IP without any port numbers on your web browser.

http://loadbalancer-ip-public/ OR use domain name instead.

If your backend servers have at least slightly different landing pages you will notice that each time you reload the page you get the reply from a different host. You can try out different balancing algorithms in the configuration section or take a look at the full documentation page here https://cbonte.github.io/haproxy-dconv/