Go, Raspberry PI Cluster and Performance

Posted by Riomhaire Research on Tuesday, September 19, 2017

Introduction

It’s been a busy few weeks here are Riomhaire Research since we decided to embrace the DevOps development culture. There is a lot to learn, although alot of the techniques have been around for years, it is the combination of Agile Developement with test automation, instrumentation and virtualization with clustering which makes it so potent.

One of the early decisions we made was to look at software to manage distributed systems and automate the deployment of applications - especially from the perspective of a cluster. There are a number of big players in this sphere: Ansible, Kubernetes, Puppet, Chef to name a few and we plan to look at a number of them to see how they stack up against each other. As a company we don’t have a big server farm (although we do have a number of desktops which could be used), and at the moment we dont want to pay for aws cloud instances - although prices are quite reasonable. So is there a ‘cheap’ alternative? The answer is ‘yes’ if your application needs and demands are quite modest.

Theign - A Raspberry PI Cluster

The Raspberry PI is a small single-board computer which with version 3 has:

  1. 6× ARM Cortex-A53 CPU’s running at 1.2GHz
  2. 1GB of RAM .
  3. GPU: Broadcom VideoCore IV
  4. 10/100 Ethernet,
  5. 2.4GHz 802.11n wireless
  6. Bluetooth 4.1
  7. microSD upto 32GB
  8. HDMI
  9. 4× USB 2.0

All in a package around the size of a credit card and at a price of around 30 Euro. Ideal for the hobbyist and tinkerer alike.

There are many sites which show you how to build Raspberry PI based cluster: Instructables and Thundax to name a few. To cut a potentially long story short, for the cost of a mornings work and 360 Euro we built ‘Theign’ a six node PI Cluster:

Theign

Creating the boot images and after the first boot defining the machine name and enabling remote ssh access with auto-login via certificates are necessary to enable Ansible to work.

LightAuth

The next thing is to see what kind of performance we got when we deploy an application to each of the nodes in the cluster. From colleague’s we have heard a number of good things about ANSBLE so we thought we would use that for a first try to install and distribute an application. LightAuth is the application we chose as the test application for a number of reasons:

  1. We wrote and maintain it as an application for learning Go-Lang and its community.
  2. The application is not a trivial ‘hello world’ web-service. The Authenticate method uses pbkdf2 to generate hashes for example.
  3. There are few dependencies.
  4. Configuration files are external - not everything is baked into the application.
  5. There is a JMeter test script which generates call rates which we can use to gather performance.

In short what we are trying to create is:

Test Environment

Where each node within the cluster will run an instance of the LightAuth application, and the API router will route calls to instances dependent on its own logic (see Linkerd documentation).

The API Router is a common pattern that we have described before and we configured our API router (using linkerd) to point to each of the huscarl nodes within the Theign cluster for the routes defined within the LightAuth documentation:

   /http/1.1/*/*/api/v1/session => /#/api/v1/session;          
   /http/1.1/*/*/api/v1/authentication => /#/api/v1/authentication;           

And the files for ‘authentication’ and ‘session’ list the hosts in the cluster and the port 3000 used by LightAuth as a default.

Ansible and Service Installation

We have an application (LightAuth) and it has a configuration file (the list of users) and we want the application to start when the server starts up - a service. As described in our Mail Server post defining a service which is understood by Linux is a fairly straight forward. The basic steps are:

  1. Compile the application for the target Linux version/architecture - In our case Raspbian AMD.
  2. We put the application binary in some well known location - say /usr/bin.
  3. LightAuth list of users is a configuration file - so naturally that lives somewhere under /etc
  4. Define a “lightauth.service’ file and store it with the other service definitions under /etc/systemd/system
  5. Tell Linux to enable and start the service.

First of all we will create a directory to hold all our deployment artifacts. Cross Compiling a GO application for a specific architecture took a jiffy and copying the example users file from the lightauth repository removed the need to generate our own for testing purposes. Creating the service definition, if we only have the minimal requirements then what we need is:

[Unit]
Description=LightAuth Service

[Service]
Restart=always
ExecStart=/usr/bin/lightauth -usersFile /etc/lightauth/users.csv
ExecStop=/usr/bin/killall lightauth

[Install]
WantedBy=default.target

Since this is useful we have now added it to the lightauth project.

Defining the basic ANSIBLE playbook took a little more work and research, but the ANSIBLE documentation is an excellent source of information. It is fairly easy using the “hosts” group feature of ANSIBLE inventory to group your machines/hosts/nodes into logical groupings for ‘dev’, ‘integration’, ‘production’ so you can replay playbooks in each environment. For this experiment we want to use all the nodes in Theign so we created a group called ‘cluster’ which pointed to the nodes within Theign.

It is good practice to document your playbook, though to be honest they are pretty readable without them. The playbook to install and start LightAuth is:

---
- hosts: cluster
  become: yes
  become_method: sudo
  gather_facts: yes

  tasks:
    # Kill/Stop any existing lightauth process
    - name: Stop Existing LightAuth
      command: /usr/bin/killall lightauth
      no_log: True
      ignore_errors: yes      

    # Create lightauth directory within /etc and dont error if it already exists
    - name: Creates lightauth config folder
      file: path=/etc/lightauth state=directory
      no_log: True

    # Copy across users file to the config directory
    - name: Copy across config/usr file
      copy: src=users.csv dest=/etc/lightauth/users.csv owner=root group=root mode=0644 backup=yes

    # Copy lightauth application
    - name: Copy across lightauth application
      copy: src=lightauth dest=/usr/bin/lightauth owner=root group=root mode=0755

    # Copy across service definition file
    - name: Copy across lightauth service
      copy: src=lightauth.service dest=/etc/systemd/system owner=root group=root mode=0777

    # Start and enable the lightauth service 
    - name: Start and enable the lightauth service
      service: name=lightauth enabled=yes state=started

The ‘remove’ service is equally simple within Ansible:

---
- hosts: cluster
  become: yes
  become_method: sudo
  gather_facts: yes

  tasks:
    # Stop and disable the lightauth service 
    - name: Stop and disable the lightauth service
      service: name=lightauth enabled=no state=stopped

    # Remove service definition file
    - name: Remove lightauth service
      file: path=/etc/systemd/system/lightauth.service state=absent

    # Remove lightauth application
    - name: Remove lightauth application
      file: path=/usr/bin/lightauth state=absent

    # Remoce lightauth directory within /etc and dont error if it already exists
    - name: Remove lightauth config folder
      file: path=/etc/lightauth state=absent

What surprised me when we executed the deploy playbook:

 time ansible-playbook  -l cluster deploy-lightauth-playbook.yml

was how long it took to execute - as this copy-and-paste snippet shows:

....
PLAY RECAP *********************************************************************
huscarl001                 : ok=7    changed=2    unreachable=0    failed=0   
huscarl002                 : ok=7    changed=2    unreachable=0    failed=0   
huscarl003                 : ok=7    changed=2    unreachable=0    failed=0   
huscarl004                 : ok=7    changed=2    unreachable=0    failed=0   
huscarl005                 : ok=7    changed=2    unreachable=0    failed=0   
huscarl006                 : ok=7    changed=2    unreachable=0    failed=0   


real	0m20.682s
user	0m7.620s
sys	0m2.760s

Twenty seconds to deploy and install LightAuth across six nodes!!! Wow is all we can say.

Performance Test and Results

As we mentioned before we use the jmeter test script provided with LightAuth to test the cluster. This script was then altered to use the api gateway router as the target. Other than that it uses 50 threads (users) to call the ‘Authenticate’ method to retreive the JWT access token and then makes 50 calls to the ‘Verify’ method to simulate calls that other services would make to verify a token is valid, and finally a call to gather some statistics. This is then repeated a thousand times.

The results we found were quite surprising…

Theign Theign Theign Theign Theign

In short we were getting 2700 Calls/Second sustained over time via what was is effect the same CPU used in a phone.