Odroid-U2 cluster

24-core Odroid-U2 cluster

24-core Odroid-U2 cluster

This cluster is composed of 6 Odroid-U2. Each node is equipped with:
- a quad-core ARM Cortex-A9 CPU (running at 1.7GHz, with 1MB L2 cache)
- a Mali-400 GPU (that we do not use)
- a 10/100 ethernet network card
- a 16GB (class10) micro SD that hosts the system filesystem

Each node is powered using a 5v 2A power adaptor.
The nodes are connected to a Gigabit Ethernet switch.

Installing the system

Download and install the ubuntu image from HardKernel website.
This cluster is connected to a server that serves as a frontend. This server (an old Xeon desktop machine) hosts:
- a NFS server that exports /home and /opt (where compiled software is installed)
- a LDAP server
- a DNS+DHCP server

Once the server is configured, the installation of the nodes is straightforward since most debian packages are available (nfs-common, libpam-ldapd, etc.)

Installing HPC software

Since we mainly work on MPI, we installed the latest versions of MPICH and Open-MPI from source without any problem. The installation is in /opt so that it is common for all the nodes of the cluster.

We also installed performance analysis tools like EZTrace in order to analyze the behavior of applications (pthreads, memory consumption, MPI messages, etc.) on the ARM processors.

Quick performance evaluation

Disclamer: the performance of the network on the stark cluster is really poor (we need to investigate). These figure are only given to get a rough idea of the performance of the cluster.

We compare the performance obtained on the Odroid cluster with the performance obtained on the Stark cluster.
The Stark cluster is composed of 4 nodes connected through a Gigabit Ethernet network (please note that the performance of this network is really poor, we need to investigate). Each node is equipped with:
- a quad-core Intel Xeon E5-2603 (Sandy Bridge) CPU running at 1.80GHz (10MB L3 cache)
- 8 GB of RAM

NAS Parallel Benchark

We ran the MPI version of the NAS Parallel Benchmark (version 3.3) using both MPICH (version 3.0.1) and Open-MPI (version 1.6.3). Here are the results we obtained for Open-MPI. The results with MPICH are similar.

Performance for Class=A Nprocs=16. Only 4 nodes were used for both clusters.

Kernel Execution time on Stark ( s ) Execution time on Odroid ( s )
BT 127.78 99.19
CG 4.76 6.39
EP 1.8 4.73
FT 24.48 24.14
IS 12.00 8.09
LU 25.67 91.59
MG 4.35 4.75
SP 203.66 142.69

Performance for Class=A Nprocs=4. Only 1 node was used for both clusters.

Kernel Execution time on Stark ( s ) Execution time on Odroid ( s )
BT 27.94 159.63
CG 0.66 8.58
EP 6.29 18.74
FT 1.89 16.66
IS 0.34 1.88
LU 19.94 208.58
MG 0.75 9.45
SP 23 270.37

Running Linpack

On both clusters, the ATLAS library is installed.

Performance on one node (4 cores):
Odroid: 3.79 GFLOPS
Stark: 33.92 GFLOPS

Performance on the whole cluster:
Odroid (6 nodes, 24 cores) : 16.13 GFLOPS.
Stark (4 nodes, 16 cores) : 56.06 GFLOPS.

Energy consumption

Using a simple wattmeter, we measured the power consumption of the nodes:
1 node:
- when idle: 2 W
- when computing : approx. 7 W

4 nodes:
- when idle: 8W
- when computing: approx. 24 W

Changing CPU frequency
On the Odroid-U2 boards, it is possible to set the minimum and maximum CPU frequencies, in order to control the power consumption.
The CPU frequency varies from 200MHz (when idle) to 1.7GHz (when computing). It is possible to set the CPU frequency to up to 2GHz, but this heats too much for the passive cooling.

$ apt-get install cpufrequtils
$ cpufreq-info
cpufrequtils 008: cpufreq-info (C) Dominik Brodowski 2004-2009
Report errors and bugs to cpufreq@vger.kernel.org, please.
analyzing CPU 0:
driver: exynos_cpufreq
CPUs which run at the same hardware frequency: 0 1 2
CPUs which need to have their frequency coordinated by software: 0 1 2
maximum transition latency: 11.0 us.
hardware limits: 200 MHz - 2.00 GHz
available frequency steps: 2.00 GHz, 1.92 GHz, 1.80 GHz, 1.70 GHz, 1.60 GHz, 1.50 GHz, 1.40 GHz, 1.30 GHz, 1.20 GHz, 1.10 GHz, 1000 MHz, 900 MHz, 800 MHz, 700 MHz, 600 MHz, 500 MHz, 400 MHz, 300 MHz, 200 MHz
available cpufreq governors: conservative, userspace, powersave, ondemand, performance
current policy: frequency should be within 200 MHz and 1.30 GHz.
The governor "ondemand" may decide which speed to use
within this range.
current CPU frequency is 1.70 GHz (asserted by call to hardware).
cpufreq stats: 2.00 GHz:0.00%, 1.92 GHz:0.00%, 1.80 GHz:0.00%, 1.70 GHz:0.23%, 1.60 GHz:0.00%, 1.50 GHz:0.04%, 1.40 GHz:0.04%, 1.30 GHz:0.04%, 1.20 GHz:0.04%, 1.10 GHz:0.05%, 1000 MHz:0.04%, 900 MHz:0.05%, 800 MHz:0.02%, 700 MHz:0.02%, 600 MHz:0.02%, 500 MHz:0.02%, 400 MHz:0.03%, 300 MHz:0.04%, 200 MHz:99.33% (33410)
$ cpufreq-set -u 1.30GHz

The effects of the CPU frequency on the power consumption needs to be studied.

For any information about this cluster, please contact:
François Trahay ( francois.trahay@it-sudparis.eu )