Skip to content

systemslab-agent Config Reference

The agent configuration file is always found at /etc/systemslab/agent.toml There is also a file with default configuration at /usr/share/systemslab/agent.toml.

Options

server_url required

This URL that this agent will use to connect to the systemslab server.

agent_addr

The IP or hostname at which the server can find this agent.

If not specified then the server will use the remote address when the agent connects to register itself. This will not work if the server is behind a reverse proxy or in a different network than the agent.

INFO

If a hostname is provided here, then DNS resolution will be performed by the server. DNS resolution on the server on the server may not result in the same IP address as if it was performed on the agent.

port

The port on which this agent will listen. By default, this is 1427.

register

Whether this agent should immediately register itself with the systemslab server.

By default, this is set to true, so the agent will register itself with the systemslab server at startup. Otherwise, the agent will wait for an external service to register it with the systesmlab server first.

name

A human-readable name for this server. If not specified then the current hostname will be used.

actions_dir

The directory that this agent will look in for step binaries.

By default, step binaries are installed in /usr/lib/systemslab-actions.

cgroup_dir

A path to the current cgroup in a mounted cgroupv2 filesystem.

This option is set by the launch script and will be overridden if set in the config file.

tags

Tags to apply to this agent.

In SystemsLab, tags act as constraints. A job can only run on a host that has a superset of the tags it requires. Configure the tags here in order to constrain which jobs can be run on this host.

An example tag set would look like this

toml
tags = [
    "gcs",
    "service-a",
    "nvme"
]

A host annotated with those tags would be able to run jobs with tags ["gcs", "nvme"] or ["service-a"] but it would not be able to run a job with tags ["service-b", "nvme"].

pidfile

Create a set of interface files under /run/systemslab-agent.

This is meant to be used by other programs to discover information about the current sytsemslab agent on this machine.

This is set by the launch script so any value set in the config file will be ignored.

metrics_scrape_interval

The interval at which the agent will scrape metrics from rezolus, in seconds.

This can be any floating point value. Values lower than 10ms will be clamped to 10ms.

parquet_batch_size

The maximum number of metrics rows which will be processed as a single record batch. Higher values result in more efficient processing, but use more system memory during the parquet conversion phase. Lower values use less memory, but may take longer to process the metrics artifact after the run has completed.

The best value will depend upon how many rezolus metrics are enabled and how much system memory is available.

parquet_histogram_type 0.0.94

The representation used to store histogram columns amongst the metrics.

Available values:

  • standard - all histogram buckets are stored in a single column.
  • sparse - only non-zero histograms buckets are stored as a pair of columns with the index and the count of the non-zero buckets.

isolation_strategy

The isolation strategy used to clean up spawned process jobs.

Available values:

  • none - no attempt is made to clean up orphaned processes after the job ends.
  • cgroup - processes belonging to the job are isolated in a single cgroup and orphaned processes are killed after the job completes. This option required that a cgroupv2 controller is provided via the cgroup_dir config options, which is done automatically by the systemd launch script.
  • systemd - the job is launched as a systemd scope and scope is terminated after the job completes. This option requires that a D-BUS session bus is available for the agent to connect to.

INFO

If a process is launched with sudo permissions then it may still outlive the job if it ignores signals. The agent does not have the permissions required kill such processes. In addition, sudo ignores SIGTERM so it does not normally shut down during cleanup.

By default, the agent will attempt to use the cgroup isolation strategy. If there is no cgroupv2 filesystem available on the current system then the none isolation strategy will be used.

WARNING

The systemd isolation strategy is not currently suitable for use within a systemd service. Attempting to set it in your config file will likely result in the agent failing to start.

experiment_timeout 0.0.86

The maximum allowed runtime, in seconds, that a job is allowed to run before it will be terminated.

The default value is 43200 (12 hours).

[log]

This section controls logging and log messages emitted by the agent.

level

A log filter that controls which log statements are enabled within the agent.

The filter expression can be either a single level (e.g. info, or debug) or a more complicated expression that specifies locations and their levels. The full syntax is documented at https://docs.rs/tracing-subscriber/latest/tracing_subscriber/filter/struct.EnvFilter.html

If not specified then the RUST_LOG environment variable will be used to determine the log level.

cluster_id

A cluster id that is used when reporting error telemetry.

This does not affect any of the functionality of systemslab but allows error telemetry to traced back to the customer cluster that emitted them.

This should be named something like <company name>/<cluster name>. For example, a cluster within IOP systems might be named iop/cluster-1.