Register a Self-Hosted Benchmark Runner
|
Audience: SKaiNET maintainers. This page is for the engineer operating the self-hosted CI runner that publishes full engine benchmark results. Project users — teams consuming SKaiNET as a library — never need to read or run any of this. |
Why this exists
The engine benchmark workflow has
two jobs. The smoke job runs on ubuntu-latest for every PR and
push to develop; it verifies that the harness, JSON schema, and PTS
profiles still build. The full job is the one whose numbers we
publish — but a shared, virtualized GitHub-hosted runner is too noisy
to produce repeatable results. The full job therefore runs on a
self-hosted Linux x86 box that the SKaiNET project controls.
That box must be registered with GitHub Actions and given a stable
label set (self-hosted,linux,x86_64,skainet-bench-linux-x86) so the
workflow’s runs-on: clause can route the job to it.
Prerequisites
-
Linux x86_64 host (Ubuntu 22.04 LTS or newer recommended).
-
Outbound HTTPS access on port 443 (see Running behind NAT — what you don’t need to configure for what the runner actually talks to).
-
sudoon the box — needed once to install the runner’s systemd service unit. -
A SKaiNET checkout on the box, or at least the
scripts/directory containingregister_bench_runner.shandinstall_pts.sh. -
JDK 21+ on the box. The workflow assumes a system JDK is on
PATH; install one withsudo apt install temurin-21-jdkor equivalent. -
Optional but recommended: Phoronix Test Suite, installed via
./scripts/install_pts.sh. The benchmark workflow installs it itself if missing, but pre-installing avoids a per-run download.
Running behind NAT — what you don’t need to configure
Self-hosted GitHub Actions runners work fine behind NAT — that is in fact their normal deployment mode. No port forwarding, no public IP, no inbound firewall rules.
How it works: the runner agent opens an outbound long-poll HTTPS
connection to api.github.com and holds it open. GitHub pushes new
job assignments back down that already-open connection. There is no
inbound traffic; the NAT box just sees a regular outbound HTTPS
session, indistinguishable from a browser.
What the runner needs to reach (all outbound, all on TCP/443):
| Destination | Purpose |
|---|---|
|
control plane, job dispatch, token refresh |
|
job descriptors, runner self-update |
|
artifact upload/download, action tarballs |
|
checkouts via |
|
only if a workflow uses container actions or pulls images |
|
only if a workflow hits GitHub Packages |
Practical implications for this skainet-bench-linux-x86 runner:
-
No router config required. A home NAT just needs to allow outbound 443, which it almost certainly does by default.
-
No DDNS, no public IP. GitHub never connects to the runner — the runner always initiates.
-
actions/checkout@v6works normally — it is an outbound git fetch over HTTPS, NAT-friendly. -
actions/upload-artifact@v7works normally — outbound HTTPS to Azure blob storage. -
External model downloads inside the benchmark job are also outbound; same story.
-
Corporate egress firewalls. If the host sits behind one that whitelists destinations, the hostnames above must be allowed. GitHub publishes its IP ranges at https://api.github.com/meta if stricter rules are required.
The one thing NAT does affect: the runner’s long-poll connection
occasionally drops and reconnects (NAT session timeout, ISP-side
reset). The systemd service installed by
register_bench_runner.sh (via ./svc.sh install / start)
handles this automatically — the agent retries the long-poll on
disconnect. Brief gaps will appear in
journalctl -u actions.runner.* but no jobs are lost.
Step-by-step
1. Generate a runner registration token
Tokens are short-lived (~60 minutes). Generate one immediately before running the script.
-
Open
https://github.com/SKaiNET-developers/SKaiNET/settings/actions/runners/new?arch=x64&os=linux -
Find the line beginning
./config.sh --url … --tokenin the page’s "Configure" section. -
Copy the value after
--token. Treat it as a secret — anyone holding it for the next hour can register a runner on this repo.
2. Run the registration script
From the SKaiNET checkout on the runner host:
GH_RUNNER_TOKEN="<paste-token-here>" \
REPO=SKaiNET-developers/SKaiNET \
./scripts/register_bench_runner.sh
What the script does:
-
Downloads the
actions/runnerrelease tarball (RUNNER_VERSION=2.328.0by default) into$HOME/actions-runner. -
Calls
./config.shwith name$(hostname)-skainet-benchand labelsself-hosted,linux,x86_64,skainet-bench-linux-x86. -
Installs and starts a systemd service via
sudo ./svc.sh installandsudo ./svc.sh start. (This is the only step that prompts for the sudo password.)
Override the runner name with RUNNER_NAME=… if you want
something other than the hostname, or the install directory with
RUNNER_DIR=…. The label set is fixed in the script because the
workflow’s runs-on: clause expects it verbatim.
3. Confirm the runner is online
In the GitHub UI, the runner should appear at
https://github.com/SKaiNET-developers/SKaiNET/settings/actions/runners
with status Idle. From the host itself:
systemctl status 'actions.runner.*'
journalctl -u 'actions.runner.*' -n 50 --no-pager
The journal should show a Connected to GitHub line and then a
heartbeat-style sequence of Listening for Jobs messages.
4. Fire the full lane to validate end-to-end
Trigger the workflow manually:
gh workflow run engine-benchmarks.yml --ref develop
(This requires a PAT with workflow scope, or use the "Run workflow"
button on the workflow’s page in the GitHub UI.)
Within ~30 seconds the runner journal should pick up the job. The
full lane completes in 10–20 minutes; the published artifacts land
at engine-full-records-<run-id> on the workflow run page.
Optional hardening
The default registration script runs the agent as the invoking user in their home directory. For tighter isolation:
-
Create a dedicated
gha-runnersystem user and re-run the script as that user (sudo -u gha-runner -H ./scripts/register_bench_runner.sh). -
Place
~gha-runner/actions-runneron a partition withnoexecoff, sufficient inodes, and at least 20 GiB of free space. -
Set CPU governor to
performancevia a systemd-cpu-affinity unit so benchmark runs are not throttled (the run script already warns ifcpu0is inpowersave). -
Restrict the runner to specific workflows by removing the
skainet-bench-linux-x86label from workflows that should not run here.
Removing or rotating the runner
To take the host out of rotation:
cd ~/actions-runner
sudo ./svc.sh stop
sudo ./svc.sh uninstall
# Generate a removal token via the GitHub UI (same place as registration),
# then:
./config.sh remove --token <removal-token>
The GitHub UI also shows an Offline badge after ~10 minutes of
disconnection; a runner that will stay offline for longer can be
deleted directly from the UI without running config.sh remove.
Related
-
Engine benchmark program — the workflow this runner serves.
-
Reading the matmul benchmark — how to interpret the numbers the runner publishes.
-
Build from source — the build the runner executes.