Design¶
Stressant is shipped in the Debian GNU/Linux distribution. It is also part of the Grml project since August 2017, so it benefits from its extensive list of utilities, which cover most of the rescue systems out there (e.g. Debian Live and Debirf, see below for a more thorough comparison).
The Grml distribution is an ISO image that can be burned on CD/DVD or copied to a USB drive, or net-bootable images. Grml can perform:
- memory tests with
memtest86
- hardware detection and inventory with HDT
There are also many more options, for example loading to RAM or setting the system to be read-only, see the cheatcodes list for more details.
The stressant tool¶
Stressant itself is a Python program that calls other UNIX utilities, collects their output on the screen, in a logfile and/or sends it over email.
The objective of this software is to automate a basic stress-testing suite that, once started, will go through a basic CPU/memory/disk/network test framework and report any errors and failures.
This is done through the stressant
script, which performs the
following tests:
lshw
andsmartctl
for hardware inventorydd
,hdparm
,fio
andsmartctl
for disk testing -fio
can also overwrite disk drives with the proper options (--overwrite
and--size=100%
)stress-ng
for CPU testingiperf3
for network testing
Here is an example test run:
$ sudo ./stressant --email anarcat@anarc.at --writeSize 1M --cpuBurnTime 1s --iperfTime 1
INFO: Starting tests
INFO: CPU cores: 4
INFO: Memory: 16 GiB (16715816960 bytes)
INFO: Hardware inventory
DEBUG: Calling lshw -short
OUTPUT: H/W path Device Class Description
OUTPUT: ==========================================================
OUTPUT: system Desktop Computer
OUTPUT: /0 bus NUC6i3SYB
OUTPUT: /0/0 memory 64KiB BIOS
OUTPUT: /0/22 memory 64KiB L1 cache
OUTPUT: /0/23 memory 64KiB L1 cache
OUTPUT: /0/24 memory 512KiB L2 cache
OUTPUT: /0/25 memory 3MiB L3 cache
OUTPUT: /0/26 processor Intel(R) Core(TM) i3-6100U CPU @ 2.30GHz
OUTPUT: /0/27 memory 16GiB System Memory
OUTPUT: /0/27/0 memory 16GiB SODIMM DDR4 Synchronous 2133 MHz (0.5 ns)
OUTPUT: /0/27/1 memory [empty]
OUTPUT: /0/100 bridge Skylake Host Bridge/DRAM Registers
OUTPUT: /0/100/2 display HD Graphics 520
OUTPUT: /0/100/14 bus Sunrise Point-LP USB 3.0 xHCI Controller
OUTPUT: /0/100/14/0 usb1 bus xHCI Host Controller
OUTPUT: /0/100/14/0/1 scsi3 storage USB to ATA/ATAPI Bridge
OUTPUT: /0/100/14/0/1/0.0.0 /dev/sdb disk 500GB 00ABYS-01TNA0
OUTPUT: /0/100/14/0/1/0.0.1 /dev/sdc disk 500GB 00ABYS-01TNA0
OUTPUT: /0/100/14/0/3 input Dell USB Keyboard
OUTPUT: /0/100/14/0/4 input Kensington Expert Mouse
OUTPUT: /0/100/14/0/7 communication Bluetooth wireless interface
OUTPUT: /0/100/14/1 usb2 bus xHCI Host Controller
OUTPUT: /0/100/14.2 generic Sunrise Point-LP Thermal subsystem
OUTPUT: /0/100/16 communication Sunrise Point-LP CSME HECI #1
OUTPUT: /0/100/17 storage Sunrise Point-LP SATA Controller [AHCI mode]
OUTPUT: /0/100/1c bridge Sunrise Point-LP PCI Express Root Port #5
OUTPUT: /0/100/1c/0 network Wireless 8260
OUTPUT: /0/100/1e generic Sunrise Point-LP Serial IO UART Controller #0
OUTPUT: /0/100/1e.6 generic Sunrise Point-LP Secure Digital IO Controller
OUTPUT: /0/100/1f bridge Sunrise Point-LP LPC Controller
OUTPUT: /0/100/1f.2 memory Memory controller
OUTPUT: /0/100/1f.3 multimedia Sunrise Point-LP HD Audio
OUTPUT: /0/100/1f.4 bus Sunrise Point-LP SMBus
OUTPUT: /0/100/1f.6 eno1 network Ethernet Connection I219-V
OUTPUT: /0/1 scsi2 storage
OUTPUT: /0/1/0.0.0 /dev/sda disk 500GB WDC WDS500G1B0B-
OUTPUT: /0/1/0.0.0/1 /dev/sda1 volume 511MiB Windows FAT volume
OUTPUT: /0/1/0.0.0/2 /dev/sda2 volume 244MiB EFI partition
OUTPUT: /0/1/0.0.0/3 /dev/sda3 volume 465GiB EFI partition
INFO: SMART information for /dev/sda
DEBUG: Calling smartctl -i /dev/sda
OUTPUT: smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.9.0-1-amd64] (local build)
OUTPUT: Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
OUTPUT:
OUTPUT: === START OF INFORMATION SECTION ===
OUTPUT: Device Model: WDC WDS500G1B0B-00AS40
OUTPUT: Serial Number: XXXXXXXXXXXX
OUTPUT: LU WWN Device Id: XXXXXXXXXXXX
OUTPUT: Firmware Version: XXXXXXXXXXXX
OUTPUT: User Capacity: 500,107,862,016 bytes [500 GB]
OUTPUT: Sector Size: 512 bytes logical/physical
OUTPUT: Rotation Rate: Solid State Device
OUTPUT: Form Factor: M.2
OUTPUT: Device is: Not in smartctl database [for details use: -P showall]
OUTPUT: ATA Version is: ACS-2 T13/2015-D revision 3
OUTPUT: SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
OUTPUT: Local Time is: Fri Mar 17 10:24:52 2017 EDT
OUTPUT: SMART support is: Available - device has SMART capability.
OUTPUT: SMART support is: Enabled
OUTPUT:
INFO: Basic disk bandwidth tests
INFO: Writing 1MB file
DEBUG: Calling dd bs=1M count=512 conv=fdatasync if=/dev/zero of=test
OUTPUT: 512+0 records in
OUTPUT: 512+0 records out
OUTPUT: 536870912 bytes (537 MB, 512 MiB) copied, 1.39591 s, 385 MB/s
INFO: Reading 1MB file
DEBUG: Calling dd bs=1M count=512 of=/dev/null if=test
OUTPUT: 512+0 records in
OUTPUT: 512+0 records out
OUTPUT: 536870912 bytes (537 MB, 512 MiB) copied, 0.0848588 s, 6.3 GB/s
INFO: Hdparm test
DEBUG: Calling hdparm -Tt /dev/sda
OUTPUT:
OUTPUT: /dev/sda:
OUTPUT: Timing cached reads: 12406 MB in 2.00 seconds = 6207.39 MB/sec
OUTPUT: Timing buffered disk reads: 1504 MB in 3.00 seconds = 501.13 MB/sec
INFO: Disk stress test
DEBUG: Calling fio --name=stressant --readwrite=randrw --numjob=4 --sync=1 --direct=1 --group_reporting --size=1M --output=/tmp/tmpo2QJnR
OUTPUT: stressant: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
OUTPUT: ...
OUTPUT: fio-2.16
OUTPUT: Starting 4 processes
OUTPUT:
OUTPUT: stressant: (groupid=0, jobs=4): err= 0: pid=978: Fri Mar 17 10:25:07 2017
OUTPUT: read : io=2160.0KB, bw=5669.3KB/s, iops=1417, runt= 381msec
OUTPUT: clat (usec): min=141, max=2197, avg=486.59, stdev=344.70
OUTPUT: lat (usec): min=141, max=2198, avg=486.96, stdev=344.71
OUTPUT: clat percentiles (usec):
OUTPUT: | 1.00th=[ 145], 5.00th=[ 153], 10.00th=[ 161], 20.00th=[ 175],
OUTPUT: | 30.00th=[ 189], 40.00th=[ 217], 50.00th=[ 278], 60.00th=[ 676],
OUTPUT: | 70.00th=[ 748], 80.00th=[ 828], 90.00th=[ 916], 95.00th=[ 980],
OUTPUT: | 99.00th=[ 1384], 99.50th=[ 1800], 99.90th=[ 2192], 99.95th=[ 2192],
OUTPUT: | 99.99th=[ 2192]
OUTPUT: write: io=1936.0KB, bw=5081.4KB/s, iops=1270, runt= 381msec
OUTPUT: clat (usec): min=618, max=6602, avg=2566.13, stdev=1029.39
OUTPUT: lat (usec): min=619, max=6602, avg=2566.71, stdev=1029.39
OUTPUT: clat percentiles (usec):
OUTPUT: | 1.00th=[ 732], 5.00th=[ 900], 10.00th=[ 964], 20.00th=[ 1672],
OUTPUT: | 30.00th=[ 1976], 40.00th=[ 2384], 50.00th=[ 2640], 60.00th=[ 3152],
OUTPUT: | 70.00th=[ 3312], 80.00th=[ 3440], 90.00th=[ 3568], 95.00th=[ 3856],
OUTPUT: | 99.00th=[ 4704], 99.50th=[ 4960], 99.90th=[ 6624], 99.95th=[ 6624],
OUTPUT: | 99.99th=[ 6624]
OUTPUT: lat (usec) : 250=24.41%, 500=4.88%, 750=7.91%, 1000=19.34%
OUTPUT: lat (msec) : 2=10.84%, 4=30.66%, 10=1.95%
OUTPUT: cpu : usr=0.80%, sys=2.39%, ctx=1945, majf=0, minf=35
OUTPUT: IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
OUTPUT: submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
OUTPUT: complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
OUTPUT: issued : total=r=540/w=484/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
OUTPUT: latency : target=0, window=0, percentile=100.00%, depth=1
OUTPUT:
OUTPUT: Run status group 0 (all jobs):
OUTPUT: READ: io=2160KB, aggrb=5669KB/s, minb=5669KB/s, maxb=5669KB/s, mint=381msec, maxt=381msec
OUTPUT: WRITE: io=1936KB, aggrb=5081KB/s, minb=5081KB/s, maxb=5081KB/s, mint=381msec, maxt=381msec
OUTPUT:
OUTPUT: Disk stats (read/write):
OUTPUT: dm-3: ios=207/493, merge=0/0, ticks=120/296, in_queue=416, util=58.78%, aggrios=540/1527, aggrmerge=0/0, aggrticks=288/752, aggrin_queue=1040, aggrutil=75.30%
OUTPUT: dm-0: ios=540/1527, merge=0/0, ticks=288/752, in_queue=1040, util=75.30%, aggrios=540/1326, aggrmerge=0/201, aggrticks=264/704, aggrin_queue=968, aggrutil=74.49%
OUTPUT: sda: ios=540/1326, merge=0/201, ticks=264/704, in_queue=968, util=74.49%
INFO: CPU stress test for 1s
DEBUG: Calling stress-ng --timeout 1s --cpu 0 --ignite-cpu --metrics-brief --log-brief --tz --times --aggressive
OUTPUT: dispatching hogs: 4 cpu
OUTPUT: cache allocate: default cache size: 3072K
OUTPUT: successful run completed in 1.05s
OUTPUT: stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s
OUTPUT: (secs) (secs) (secs) (real time) (usr+sys time)
OUTPUT: cpu 453 1.04 2.80 0.00 437.62 161.79
OUTPUT: cpu:
OUTPUT: acpitz 27.80 °C
OUTPUT: pch_skylake 32.77 °C
OUTPUT: acpitz 31.78 °C
OUTPUT: x86_pkg_temp 34.40 °C
OUTPUT: for a 1.05s run time:
OUTPUT: 4.22s available CPU time
OUTPUT: 2.81s user time ( 66.64%)
OUTPUT: 0.01s system time ( 0.24%)
OUTPUT: 2.82s total time ( 66.87%)
OUTPUT: load average: 0.34 0.58 2.52
INFO: Running network benchmark
DEBUG: Calling iperf3 -c iperf.he.net -t 1
OUTPUT: iperf3: error - the server is busy running a test. try again later
ERROR: Command failed: Command 'iperf3 -c iperf.he.net -t 1' returned non-zero exit status 1
INFO: all done
INFO: sent email to ['anarcat@anarc.at'] using anarc.at
Note that there are nice colors in an actual console, the above is just a dump of the logfile.
We currently use the iperf.he.net
server from Hurricane
Electric as a default server for our tests, but
users are encouraged to change that to a local server using the
--iperfServer
argument to get more accurate results. Notice how that
performance test failed, above, because the HE server wasn’t available:
this is just another hint that you should use your own server.
A number of public iPerf servers are available, here are a few lists:
Background¶
This project emanates from a packaging effort of a custom Linux distribution called ‘breakin’. It turned into a simple Python program that reuses existing stress-testing programs packaged in Debian.
Stressant used to be built as a standalone Debian Derivative, a Pure blend based on Debirf. But in 2017, the project was rearchitectured to be based on Grml and to focus on developing a standalone stress-testing tool. While Stressant could still become its own Debian derivative, it seems futile for now to make yet another Debian derivative. Instead, we focus our energy into contributing to the Grml project without needing to create the heavy infrastructure for another Linux distribution.
The Is your Computer Stable? post from Jeff Atwood was a motivation to get back into the project. It outlines a few basic tools to use to make sure your computer is stable:
- memtest86 - shipped with Grml
- install Ubuntu - we assume you’ll do that anyways
- MPrime to stress the CPU - not free software, [stress-ng][] was chosen instead and gives similar results here
- badblocks test (with
-sv
) - this is covered by thefio
test - smartctl -i/-a/-t to identify and test harddrives
dd
andhdparm
to get quick stats - done- bonnie++ for more extensive benchmarks - Grml people suggested we use fio instead
- iperf for network testing - this assumes a local server, we instead use iperf3 and public servers
- furmark for testing the GPU - Windows-only, no Linux equivalent, the Phoronix test suite uses ffmpeg tests for that purpose
The idea was to regroup this in a single tool that would perform all those tests, without reinventing the wheel of course.
Stressant was also highly coupled with Koumbit’s infrastructure as this is where the Debirf recipes were originally developed. It needed a CI system to build the images, which was originally done with Jenkins. This was then done with Gitlab CI, but failed to build images because of issues with debirf and, ultimately, docker itself. This is why why Grml was used as a basis for future development.
Remaining work¶
Stressant could run in a tmux or screen session that would show the current task on one pane and syslog (or journalctl) in another. This would allow for more information to be crammed in a single display while at the same time making remote access (e.g. through SSH) easier to switch to.
Note
Parallism is discussed as part of a larger redesign in issue #3.
Finally, we need clear and better documentation on various testing tools
there are out there, a bit like TAILS is doing. For example, we used to
ship with diskscan
but i didn’t even remember that and I am not sure
what to use it for or when to use it. A summary description of the
available tools, maybe through a menu system or at least a set of HTML
files, would be useful. I use Sphinx and RST for this
because of the simplicity and availability of tools like readthedocs.org
and the ease of creation for offline documentation (PDF and ePUB). A
rendered copy of the documentation is available on
stressant.readthedocs.io and in
the stressant-doc
package. The metapackage (stressant-meta
)
lists the relevant recovery tools and some of those are documented in
the stressant manual page.
Similar software¶
In the meantime, here’s a list of software that’s similar to stressant or that could be used by stressant.
Test suites¶
Those are fairly similar to stressant in that they perform multiple benchmarks:
- Breakin - stress-test and hardware diagnostics tool
- Checkbox - Ubuntu’s certification tool, shipped with Debian stretch but removed because upstream switched to snaps, new RFP
- Inquisitor - hardware testing suite
- OpenBenchmarking.org - a good source of benchmarking tools
- PerfKit Benchmarker - GCP’s benchmarking tool
- Phoronix test suite - far-ranging benchmarking suite
- Stressapptest - Stressful Application Test, userspace memory and IO test - similar to stressant
- bench-scripts - a review of many benchmarking scripts that provide a nice and simple interface for basic benchmarks
- sys_basher - another stress-testing tool
- Ars Technica - has a interesting post detailing a few key fio commands that should be ran
- hardware - python module for hardware inventory, reuses
lshw
,pciutils
,smartmontools
, etc, not in Debian yet
Purpose-specific tools¶
- chipsec - framework for analyzing the security of PC platforms including hardware, system firmware (BIOS/UEFI), and platform components
- FWTS - Ubuntu’s Firmware Test Suite - performs sanity checks on Intel/AMD PC firmware. It is intended to identify BIOS and ACPI errors and if appropriate it will try to explain the errors and give advice to help workaround or fix firmware bugs
- Power Stress and Shaping Tool (PSST) - a “controlled power tool for Intel SoC components such as CPU and GPU. PSST enables very fine control of stress function without its own process overhead”, packaged in Debian as psst
- The stress terminal (s-tui) - mostly for testing CPU, temperature and power usage, now included in the meta-package
- tinymembench - memory bandwidth userland tester
- stressdisk - “Stress test your disks / memory cards / USB sticks before trusting your valuable data to them”
Building images by hand¶
Note
Starting from August 2017, stressant is part of Grml, and it’s usually superfluous to build your own image unless you’re into that kind of kinky stuff. Those notes are kept mostly for historical purposes.
There is a handy build-iso.sh
script that will setup APT
repositories and run all the right commands to build a Grml Stressant
ISO image on a recent Debian release (tested on Jessie and stretch). Note
that you can pass extra flags to the
grml-live command with the
$GRML_LIVE_FLAGS
environment variable. What follows is basically a
description of what that script does.
To build an image by hand, you will need to first install the
grml-live
package which is responsible for building Grml images. For
this, you will need to add the Grml Debian
repository to your sources.list
file.
Instructions for doing so are available in the files section of the
Grml site.
Once this is done, you should be able to build an image using:
sudo grml-live -c DEBORPHAN,GRMLBASE,GRML_FULL,RELEASE,AMD64,IGNORE,STRESSANT \
-s unstable -a amd64 \
-o $PWD/grml -U $USER \
-v $(date +%Y.%m) -r gossage -g grml64-full-stressant
This will build a “full” Grml release (-c
) based on Debian unstable
on a 64 bit architecture (-a
) in the ./grml
subdirectory
(-o
). The files will be owned (-U
) by the current user
($USER
). The version number (-v
), the release name (-r
) and
flavor (-g
) are just cargo-culted from the upstream official
release.
See the grml-live for further options,
but do note the -u
option that can be used to rerun the builds if
you want to only update the image to the latest release, for example.
The resulting ISO will be in
./grml/grml_isos/grml64-full_$(date +%Y.%m%d).iso
. To make a
multi-arch ISO, you should use the grml2iso
command. For example,
this is how upstream
builds
the 96
ISO which features the 32 bits and 64 bits architectures:
grml2iso -o grml96-small_2014.11.iso grml64-small_2014.11.iso grml32-small_2014.11.iso
Build system review¶
The following is a summary evaluation of the different options considered by the Stressant project to build live images. This problem space is currently in flux: at the time of writing, the tools used to build the Debian Live images are changing and the future of the project was uncertain. Keep this in mind when you read this in the future. Here are the options that were considered, with a detailed evaluation below:
The Debian cloud team also considered a few tools to generate their cloud images, and some are relevant here (FAI and vmdebootstrap), see this post for details. There’s also this more exhaustive list of tools to build Debian systems.
DebIRF¶
Debirf stands for Debian InitRamFs and builds the live image into the
initrd
file. It was originally used by Stressant because it was
simple and easy to modify. It also allowed booting from the network
easily, as we only had to load the kernel and didn’t have to bother
with ISO images loading or NFS, like other options.
In the end, however, Debirf proved to be too limited for our needs: it doesn’t provide a way to embed boot-level, arbitrary binaries like memtest86 because it is too tightly coupled with the Linux kernel. Furthermore, we were having serious issues building debirf images in newer releases, either in Debian 8 (bug #806377) or 9 (bug #848834).
FAI¶
I have tried to use FAI to follow the lead of the Debian cloud team. Unfortunately, I stumbled upon a few bugs. First, fai-diskimage would fail to build an image if the host uses LVM. This was fixed in FAI 5.3.3 (or maybe 5.3.4?). Also, FAI seems to fetch base files from a cleartext URL, which seems like a dubious security choice.
After finding this
tutorial,
I figured I would give it a try again. Unfortunately, after asking on
IRC (#debian-cloud
on OFTC), I was told (by Noah!) that
“fai-diskimage is probably not what you want for an iso image” and
they suggested I use fai-cd
. Unfortunately, fai-cd
works
completely differently: it doesn’t support the --class
system that
fai-diskimage
was built with, so we can’t reuse those already
mysterious recipes. fai-cd
seems to be built towards creating
install medium and not live images.
All this seems to make FAI mostly unusable for the task at hand, although it should be noted that grml-live uses FAI to build their images…
vmdebootstrap¶
vmdebootstrap is a minimal image building tool, written in Python, used for Debian live images. It requires root (for loop filesystems creation), and doesn’t support shipping the Debian installer anymore.
We have had good results with vmdebootstrap
, but the fact that it
requires a loop device has made it difficult to use Gitlab’s CI
system. Docker has a bug that makes it
impossible to use loop devices and kpartx
commands in it. So to
build images through Gitlab’s CI would require full virtualization
instead of just Docker, something that’s not provided by Gitlab.com
right now. This problem is probably shared by all image building
tools, however.
Worse: VirtualBox did not make it to stretch at all which makes it difficult to deploy new builders for it.
live-build and live-wrapper¶
live-build is a set of tools used by Debian live and other blends (e.g. PGP cleanroom uses live-build). It used to be a set of shell scripts, but it now uses live-wrapper which uses vmdebootstrap in the end.
It is unclear if I am better off using live build or vmdebootstrap directly. The PGP cleanroom build uses live build, so maybe I should do that as well…
Notes: live-wrapper is where the idea of using HDT comes from. Unfortunately, it looks like the boot menus don’t actually work yet (bug #813527). Furthermore, live-wrapper doesn’t support initrd-style netboot, so we would need documentation on how to boot from ISO files over PXE.
Grml¶
Grml is a project quite similar to the original goal of stressant:
- based on Debian
- provides rescue tools
- live CD/USB image
- also provides support for netboot through grml-terminalserver, which can use remote squashfs
The project is really interesting and we have therefore switched the focus of Stressant towards creating an integrated stress-test tool on top of grml instead of trying to fix all the issues those guys are already struggling with… We use grml-debootstrap, a tool similar to vmdebootstrap, to build stressant images.
Grml has most of the packages we had in our dependencies, except those:
blktool
bonnie++
chntpw
diskscan
e2tools
fatresize
foremost
hfsplus
i7z
lm-sensors
mtd-utils
scrub
smp-utils
stress-ng
tofrodos
u-boot-tools
wodim
Of those, only stress-ng
is actually required by stressant
.
The remaining issues with Grml integration are:
- add stressant to the Grml build (pull request #34 - done!)
- review the above packages we collected from various rescue modes and see if they are relevant
- hook stressant in the magic Grml menu to start directly from the boot
menu - we can use the
scripts=path-name
argument for this, it looks in the “DCS
dir”, which is/
or whatevermyconfig=
points at - figure out how to chain into memtest86 to complete the test suite
Because stressant
has been accepted into Debian, we should not need
to setup our own build system, unless Grml refuses to integrate the
package directly. In any case, we may want to setup our own Continuous
Integration (CI) system to build feature branches and similar. The
proper way to do this seems to be to add the .deb
file directly at
the root (/
) of the live filesystem with the install local
files technique and
the debs
boot-time argument.