Design

Stressant is shipped in the Debian GNU/Linux distribution. It is also part of the Grml project since August 2017, so it benefits from its extensive list of utilities, which cover most of the rescue systems out there (e.g. Debian Live and Debirf, see below for a more thorough comparison).

The Grml distribution is an ISO image that can be burned on CD/DVD or copied to a USB drive, or net-bootable images. Grml can perform:

  • memory tests with memtest86
  • hardware detection and inventory with HDT

There are also many more options, for example loading to RAM or setting the system to be read-only, see the cheatcodes list for more details.

The stressant tool

Stressant itself is a Python program that calls other UNIX utilities, collects their output on the screen, in a logfile and/or sends it over email.

The objective of this software is to automate a basic stress-testing suite that, once started, will go through a basic CPU/memory/disk/network test framework and report any errors and failures.

This is done through the stressant script, which performs the following tests:

  • lshw and smartctl for hardware inventory
  • dd, hdparm, fio and smartctl for disk testing - fio can also overwrite disk drives with the proper options (--overwrite and --size=100%)
  • stress-ng for CPU testing
  • iperf3 for network testing

Here is an example test run:

$ sudo ./stressant --email anarcat@anarc.at --writeSize 1M --cpuBurnTime 1s --iperfTime 1
INFO: Starting tests
INFO: CPU cores: 4
INFO: Memory: 16 GiB (16715816960 bytes)
INFO: Hardware inventory
DEBUG: Calling lshw -short
OUTPUT: H/W path             Device     Class          Description
OUTPUT: ==========================================================
OUTPUT: system         Desktop Computer
OUTPUT: /0                              bus            NUC6i3SYB
OUTPUT: /0/0                            memory         64KiB BIOS
OUTPUT: /0/22                           memory         64KiB L1 cache
OUTPUT: /0/23                           memory         64KiB L1 cache
OUTPUT: /0/24                           memory         512KiB L2 cache
OUTPUT: /0/25                           memory         3MiB L3 cache
OUTPUT: /0/26                           processor      Intel(R) Core(TM) i3-6100U CPU @ 2.30GHz
OUTPUT: /0/27                           memory         16GiB System Memory
OUTPUT: /0/27/0                         memory         16GiB SODIMM DDR4 Synchronous 2133 MHz (0.5 ns)
OUTPUT: /0/27/1                         memory         [empty]
OUTPUT: /0/100                          bridge         Skylake Host Bridge/DRAM Registers
OUTPUT: /0/100/2                        display        HD Graphics 520
OUTPUT: /0/100/14                       bus            Sunrise Point-LP USB 3.0 xHCI Controller
OUTPUT: /0/100/14/0          usb1       bus            xHCI Host Controller
OUTPUT: /0/100/14/0/1        scsi3      storage        USB to ATA/ATAPI Bridge
OUTPUT: /0/100/14/0/1/0.0.0  /dev/sdb   disk           500GB 00ABYS-01TNA0
OUTPUT: /0/100/14/0/1/0.0.1  /dev/sdc   disk           500GB 00ABYS-01TNA0
OUTPUT: /0/100/14/0/3                   input          Dell USB Keyboard
OUTPUT: /0/100/14/0/4                   input          Kensington Expert Mouse
OUTPUT: /0/100/14/0/7                   communication  Bluetooth wireless interface
OUTPUT: /0/100/14/1          usb2       bus            xHCI Host Controller
OUTPUT: /0/100/14.2                     generic        Sunrise Point-LP Thermal subsystem
OUTPUT: /0/100/16                       communication  Sunrise Point-LP CSME HECI #1
OUTPUT: /0/100/17                       storage        Sunrise Point-LP SATA Controller [AHCI mode]
OUTPUT: /0/100/1c                       bridge         Sunrise Point-LP PCI Express Root Port #5
OUTPUT: /0/100/1c/0                     network        Wireless 8260
OUTPUT: /0/100/1e                       generic        Sunrise Point-LP Serial IO UART Controller #0
OUTPUT: /0/100/1e.6                     generic        Sunrise Point-LP Secure Digital IO Controller
OUTPUT: /0/100/1f                       bridge         Sunrise Point-LP LPC Controller
OUTPUT: /0/100/1f.2                     memory         Memory controller
OUTPUT: /0/100/1f.3                     multimedia     Sunrise Point-LP HD Audio
OUTPUT: /0/100/1f.4                     bus            Sunrise Point-LP SMBus
OUTPUT: /0/100/1f.6          eno1       network        Ethernet Connection I219-V
OUTPUT: /0/1                 scsi2      storage
OUTPUT: /0/1/0.0.0           /dev/sda   disk           500GB WDC WDS500G1B0B-
OUTPUT: /0/1/0.0.0/1         /dev/sda1  volume         511MiB Windows FAT volume
OUTPUT: /0/1/0.0.0/2         /dev/sda2  volume         244MiB EFI partition
OUTPUT: /0/1/0.0.0/3         /dev/sda3  volume         465GiB EFI partition
INFO: SMART information for /dev/sda
DEBUG: Calling smartctl -i /dev/sda
OUTPUT: smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.9.0-1-amd64] (local build)
OUTPUT: Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
OUTPUT:
OUTPUT: === START OF INFORMATION SECTION ===
OUTPUT: Device Model:     WDC WDS500G1B0B-00AS40
OUTPUT: Serial Number:    XXXXXXXXXXXX
OUTPUT: LU WWN Device Id: XXXXXXXXXXXX
OUTPUT: Firmware Version: XXXXXXXXXXXX
OUTPUT: User Capacity:    500,107,862,016 bytes [500 GB]
OUTPUT: Sector Size:      512 bytes logical/physical
OUTPUT: Rotation Rate:    Solid State Device
OUTPUT: Form Factor:      M.2
OUTPUT: Device is:        Not in smartctl database [for details use: -P showall]
OUTPUT: ATA Version is:   ACS-2 T13/2015-D revision 3
OUTPUT: SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
OUTPUT: Local Time is:    Fri Mar 17 10:24:52 2017 EDT
OUTPUT: SMART support is: Available - device has SMART capability.
OUTPUT: SMART support is: Enabled
OUTPUT:
INFO: Basic disk bandwidth tests
INFO: Writing 1MB file
DEBUG: Calling dd bs=1M count=512 conv=fdatasync if=/dev/zero of=test
OUTPUT: 512+0 records in
OUTPUT: 512+0 records out
OUTPUT: 536870912 bytes (537 MB, 512 MiB) copied, 1.39591 s, 385 MB/s
INFO: Reading 1MB file
DEBUG: Calling dd bs=1M count=512 of=/dev/null if=test
OUTPUT: 512+0 records in
OUTPUT: 512+0 records out
OUTPUT: 536870912 bytes (537 MB, 512 MiB) copied, 0.0848588 s, 6.3 GB/s
INFO: Hdparm test
DEBUG: Calling hdparm -Tt /dev/sda
OUTPUT:
OUTPUT: /dev/sda:
OUTPUT: Timing cached reads:   12406 MB in  2.00 seconds = 6207.39 MB/sec
OUTPUT: Timing buffered disk reads: 1504 MB in  3.00 seconds = 501.13 MB/sec
INFO: Disk stress test
DEBUG: Calling fio --name=stressant --readwrite=randrw --numjob=4 --sync=1 --direct=1 --group_reporting --size=1M --output=/tmp/tmpo2QJnR
OUTPUT: stressant: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
OUTPUT: ...
OUTPUT: fio-2.16
OUTPUT: Starting 4 processes
OUTPUT:
OUTPUT: stressant: (groupid=0, jobs=4): err= 0: pid=978: Fri Mar 17 10:25:07 2017
OUTPUT:   read : io=2160.0KB, bw=5669.3KB/s, iops=1417, runt=   381msec
OUTPUT:     clat (usec): min=141, max=2197, avg=486.59, stdev=344.70
OUTPUT:      lat (usec): min=141, max=2198, avg=486.96, stdev=344.71
OUTPUT:     clat percentiles (usec):
OUTPUT:      |  1.00th=[  145],  5.00th=[  153], 10.00th=[  161], 20.00th=[  175],
OUTPUT:      | 30.00th=[  189], 40.00th=[  217], 50.00th=[  278], 60.00th=[  676],
OUTPUT:      | 70.00th=[  748], 80.00th=[  828], 90.00th=[  916], 95.00th=[  980],
OUTPUT:      | 99.00th=[ 1384], 99.50th=[ 1800], 99.90th=[ 2192], 99.95th=[ 2192],
OUTPUT:      | 99.99th=[ 2192]
OUTPUT:   write: io=1936.0KB, bw=5081.4KB/s, iops=1270, runt=   381msec
OUTPUT:     clat (usec): min=618, max=6602, avg=2566.13, stdev=1029.39
OUTPUT:      lat (usec): min=619, max=6602, avg=2566.71, stdev=1029.39
OUTPUT:     clat percentiles (usec):
OUTPUT:      |  1.00th=[  732],  5.00th=[  900], 10.00th=[  964], 20.00th=[ 1672],
OUTPUT:      | 30.00th=[ 1976], 40.00th=[ 2384], 50.00th=[ 2640], 60.00th=[ 3152],
OUTPUT:      | 70.00th=[ 3312], 80.00th=[ 3440], 90.00th=[ 3568], 95.00th=[ 3856],
OUTPUT:      | 99.00th=[ 4704], 99.50th=[ 4960], 99.90th=[ 6624], 99.95th=[ 6624],
OUTPUT:      | 99.99th=[ 6624]
OUTPUT:     lat (usec) : 250=24.41%, 500=4.88%, 750=7.91%, 1000=19.34%
OUTPUT:     lat (msec) : 2=10.84%, 4=30.66%, 10=1.95%
OUTPUT:   cpu          : usr=0.80%, sys=2.39%, ctx=1945, majf=0, minf=35
OUTPUT:   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
OUTPUT:      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
OUTPUT:      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
OUTPUT:      issued    : total=r=540/w=484/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
OUTPUT:      latency   : target=0, window=0, percentile=100.00%, depth=1
OUTPUT:
OUTPUT: Run status group 0 (all jobs):
OUTPUT:    READ: io=2160KB, aggrb=5669KB/s, minb=5669KB/s, maxb=5669KB/s, mint=381msec, maxt=381msec
OUTPUT:   WRITE: io=1936KB, aggrb=5081KB/s, minb=5081KB/s, maxb=5081KB/s, mint=381msec, maxt=381msec
OUTPUT:
OUTPUT: Disk stats (read/write):
OUTPUT:     dm-3: ios=207/493, merge=0/0, ticks=120/296, in_queue=416, util=58.78%, aggrios=540/1527, aggrmerge=0/0, aggrticks=288/752, aggrin_queue=1040, aggrutil=75.30%
OUTPUT:     dm-0: ios=540/1527, merge=0/0, ticks=288/752, in_queue=1040, util=75.30%, aggrios=540/1326, aggrmerge=0/201, aggrticks=264/704, aggrin_queue=968, aggrutil=74.49%
OUTPUT:   sda: ios=540/1326, merge=0/201, ticks=264/704, in_queue=968, util=74.49%
INFO: CPU stress test for 1s
DEBUG: Calling stress-ng --timeout 1s --cpu 0 --ignite-cpu --metrics-brief --log-brief --tz --times --aggressive
OUTPUT: dispatching hogs: 4 cpu
OUTPUT: cache allocate: default cache size: 3072K
OUTPUT: successful run completed in 1.05s
OUTPUT: stressor       bogo ops real time  usr time  sys time   bogo ops/s   bogo ops/s
OUTPUT: (secs)    (secs)    (secs)   (real time) (usr+sys time)
OUTPUT: cpu                 453      1.04      2.80      0.00       437.62       161.79
OUTPUT: cpu:
OUTPUT: acpitz   27.80 °C
OUTPUT: pch_skylake   32.77 °C
OUTPUT: acpitz   31.78 °C
OUTPUT: x86_pkg_temp   34.40 °C
OUTPUT: for a 1.05s run time:
OUTPUT: 4.22s available CPU time
OUTPUT: 2.81s user time   ( 66.64%)
OUTPUT: 0.01s system time (  0.24%)
OUTPUT: 2.82s total time  ( 66.87%)
OUTPUT: load average: 0.34 0.58 2.52
INFO: Running network benchmark
DEBUG: Calling iperf3 -c iperf.he.net -t 1
OUTPUT: iperf3: error - the server is busy running a test. try again later
ERROR: Command failed: Command 'iperf3 -c iperf.he.net -t 1' returned non-zero exit status 1
INFO: all done
INFO: sent email to ['anarcat@anarc.at'] using anarc.at

Note that there are nice colors in an actual console, the above is just a dump of the logfile.

We currently use the iperf.he.net server from Hurricane Electric as a default server for our tests, but users are encouraged to change that to a local server using the --iperfServer argument to get more accurate results. Notice how that performance test failed, above, because the HE server wasn’t available: this is just another hint that you should use your own server.

A number of public iPerf servers are available, here are a few lists:

Background

This project emanates from a packaging effort of a custom Linux distribution called ‘breakin’. It turned into a simple Python program that reuses existing stress-testing programs packaged in Debian.

Stressant used to be built as a standalone Debian Derivative, a Pure blend based on Debirf. But in 2017, the project was rearchitectured to be based on Grml and to focus on developing a standalone stress-testing tool. While Stressant could still become its own Debian derivative, it seems futile for now to make yet another Debian derivative. Instead, we focus our energy into contributing to the Grml project without needing to create the heavy infrastructure for another Linux distribution.

The Is your Computer Stable? post from Jeff Atwood was a motivation to get back into the project. It outlines a few basic tools to use to make sure your computer is stable:

  • memtest86 - shipped with Grml
  • install Ubuntu - we assume you’ll do that anyways
  • MPrime to stress the CPU - not free software, [stress-ng][] was chosen instead and gives similar results here
  • badblocks test (with -sv) - this is covered by the fio test
  • smartctl -i/-a/-t to identify and test harddrives
  • dd and hdparm to get quick stats - done
  • bonnie++ for more extensive benchmarks - Grml people suggested we use fio instead
  • iperf for network testing - this assumes a local server, we instead use iperf3 and public servers
  • furmark for testing the GPU - Windows-only, no Linux equivalent, the Phoronix test suite uses ffmpeg tests for that purpose

The idea was to regroup this in a single tool that would perform all those tests, without reinventing the wheel of course.

Stressant was also highly coupled with Koumbit’s infrastructure as this is where the Debirf recipes were originally developed. It needed a CI system to build the images, which was originally done with Jenkins. This was then done with Gitlab CI, but failed to build images because of issues with debirf and, ultimately, docker itself. This is why why Grml was used as a basis for future development.

Remaining work

Stressant could run in a tmux or screen session that would show the current task on one pane and syslog (or journalctl) in another. This would allow for more information to be crammed in a single display while at the same time making remote access (e.g. through SSH) easier to switch to.

Note

Parallism is discussed as part of a larger redesign in issue #3.

Finally, we need clear and better documentation on various testing tools there are out there, a bit like TAILS is doing. For example, we used to ship with diskscan but i didn’t even remember that and I am not sure what to use it for or when to use it. A summary description of the available tools, maybe through a menu system or at least a set of HTML files, would be useful. I use Sphinx and RST for this because of the simplicity and availability of tools like readthedocs.org and the ease of creation for offline documentation (PDF and ePUB). A rendered copy of the documentation is available on stressant.readthedocs.io and in the stressant-doc package. The metapackage (stressant-meta) lists the relevant recovery tools and some of those are documented in the stressant manual page.

Similar software

In the meantime, here’s a list of software that’s similar to stressant or that could be used by stressant.

Test suites

Those are fairly similar to stressant in that they perform multiple benchmarks:

Purpose-specific tools

  • chipsec - framework for analyzing the security of PC platforms including hardware, system firmware (BIOS/UEFI), and platform components
  • FWTS - Ubuntu’s Firmware Test Suite - performs sanity checks on Intel/AMD PC firmware. It is intended to identify BIOS and ACPI errors and if appropriate it will try to explain the errors and give advice to help workaround or fix firmware bugs
  • Power Stress and Shaping Tool (PSST) - a “controlled power tool for Intel SoC components such as CPU and GPU. PSST enables very fine control of stress function without its own process overhead”, packaged in Debian as psst
  • The stress terminal (s-tui) - mostly for testing CPU, temperature and power usage, now included in the meta-package
  • tinymembench - memory bandwidth userland tester
  • stressdisk - “Stress test your disks / memory cards / USB sticks before trusting your valuable data to them”

Building images by hand

Note

Starting from August 2017, stressant is part of Grml, and it’s usually superfluous to build your own image unless you’re into that kind of kinky stuff. Those notes are kept mostly for historical purposes.

There is a handy build-iso.sh script that will setup APT repositories and run all the right commands to build a Grml Stressant ISO image on a recent Debian release (tested on Jessie and stretch). Note that you can pass extra flags to the grml-live command with the $GRML_LIVE_FLAGS environment variable. What follows is basically a description of what that script does.

To build an image by hand, you will need to first install the grml-live package which is responsible for building Grml images. For this, you will need to add the Grml Debian repository to your sources.list file. Instructions for doing so are available in the files section of the Grml site.

Once this is done, you should be able to build an image using:

sudo grml-live -c DEBORPHAN,GRMLBASE,GRML_FULL,RELEASE,AMD64,IGNORE,STRESSANT \
   -s unstable -a amd64 \
  -o $PWD/grml -U $USER \
  -v $(date +%Y.%m) -r gossage -g grml64-full-stressant

This will build a “full” Grml release (-c) based on Debian unstable on a 64 bit architecture (-a) in the ./grml subdirectory (-o). The files will be owned (-U) by the current user ($USER). The version number (-v), the release name (-r) and flavor (-g) are just cargo-culted from the upstream official release. See the grml-live for further options, but do note the -u option that can be used to rerun the builds if you want to only update the image to the latest release, for example.

The resulting ISO will be in ./grml/grml_isos/grml64-full_$(date +%Y.%m%d).iso. To make a multi-arch ISO, you should use the grml2iso command. For example, this is how upstream builds the 96 ISO which features the 32 bits and 64 bits architectures:

grml2iso -o grml96-small_2014.11.iso grml64-small_2014.11.iso grml32-small_2014.11.iso

Build system review

The following is a summary evaluation of the different options considered by the Stressant project to build live images. This problem space is currently in flux: at the time of writing, the tools used to build the Debian Live images are changing and the future of the project was uncertain. Keep this in mind when you read this in the future. Here are the options that were considered, with a detailed evaluation below:

The Debian cloud team also considered a few tools to generate their cloud images, and some are relevant here (FAI and vmdebootstrap), see this post for details. There’s also this more exhaustive list of tools to build Debian systems.

DebIRF

Debirf stands for Debian InitRamFs and builds the live image into the initrd file. It was originally used by Stressant because it was simple and easy to modify. It also allowed booting from the network easily, as we only had to load the kernel and didn’t have to bother with ISO images loading or NFS, like other options.

In the end, however, Debirf proved to be too limited for our needs: it doesn’t provide a way to embed boot-level, arbitrary binaries like memtest86 because it is too tightly coupled with the Linux kernel. Furthermore, we were having serious issues building debirf images in newer releases, either in Debian 8 (bug #806377) or 9 (bug #848834).

FAI

I have tried to use FAI to follow the lead of the Debian cloud team. Unfortunately, I stumbled upon a few bugs. First, fai-diskimage would fail to build an image if the host uses LVM. This was fixed in FAI 5.3.3 (or maybe 5.3.4?). Also, FAI seems to fetch base files from a cleartext URL, which seems like a dubious security choice.

After finding this tutorial, I figured I would give it a try again. Unfortunately, after asking on IRC (#debian-cloud on OFTC), I was told (by Noah!) that “fai-diskimage is probably not what you want for an iso image” and they suggested I use fai-cd. Unfortunately, fai-cd works completely differently: it doesn’t support the --class system that fai-diskimage was built with, so we can’t reuse those already mysterious recipes. fai-cd seems to be built towards creating install medium and not live images.

All this seems to make FAI mostly unusable for the task at hand, although it should be noted that grml-live uses FAI to build their images…

vmdebootstrap

vmdebootstrap is a minimal image building tool, written in Python, used for Debian live images. It requires root (for loop filesystems creation), and doesn’t support shipping the Debian installer anymore.

We have had good results with vmdebootstrap, but the fact that it requires a loop device has made it difficult to use Gitlab’s CI system. Docker has a bug that makes it impossible to use loop devices and kpartx commands in it. So to build images through Gitlab’s CI would require full virtualization instead of just Docker, something that’s not provided by Gitlab.com right now. This problem is probably shared by all image building tools, however.

Worse: VirtualBox did not make it to stretch at all which makes it difficult to deploy new builders for it.

live-build and live-wrapper

live-build is a set of tools used by Debian live and other blends (e.g. PGP cleanroom uses live-build). It used to be a set of shell scripts, but it now uses live-wrapper which uses vmdebootstrap in the end.

It is unclear if I am better off using live build or vmdebootstrap directly. The PGP cleanroom build uses live build, so maybe I should do that as well…

Notes: live-wrapper is where the idea of using HDT comes from. Unfortunately, it looks like the boot menus don’t actually work yet (bug #813527). Furthermore, live-wrapper doesn’t support initrd-style netboot, so we would need documentation on how to boot from ISO files over PXE.

Grml

Grml is a project quite similar to the original goal of stressant:

The project is really interesting and we have therefore switched the focus of Stressant towards creating an integrated stress-test tool on top of grml instead of trying to fix all the issues those guys are already struggling with… We use grml-debootstrap, a tool similar to vmdebootstrap, to build stressant images.

Grml has most of the packages we had in our dependencies, except those:

blktool
bonnie++
chntpw
diskscan
e2tools
fatresize
foremost
hfsplus
i7z
lm-sensors
mtd-utils
scrub
smp-utils
stress-ng
tofrodos
u-boot-tools
wodim

Of those, only stress-ng is actually required by stressant.

The remaining issues with Grml integration are:

  1. add stressant to the Grml build (pull request #34 - done!)
  2. review the above packages we collected from various rescue modes and see if they are relevant
  3. hook stressant in the magic Grml menu to start directly from the boot menu - we can use the scripts=path-name argument for this, it looks in the “DCS dir”, which is / or whatever myconfig= points at
  4. figure out how to chain into memtest86 to complete the test suite

Because stressant has been accepted into Debian, we should not need to setup our own build system, unless Grml refuses to integrate the package directly. In any case, we may want to setup our own Continuous Integration (CI) system to build feature branches and similar. The proper way to do this seems to be to add the .deb file directly at the root (/) of the live filesystem with the install local files technique and the debs boot-time argument.

mkosi

mkosi is another tool from the systemd folks which we may want to consider.