Appearance
systemslab-agent
Config Reference
The agent configuration file is always found at /etc/systemslab/agent.toml
There is also a file with default configuration at /usr/share/systemslab/agent.toml
.
Options
server_url
required
This URL that this agent will use to connect to the systemslab server.
agent_addr
The IP or hostname at which the server can find this agent.
If not specified then the server will use the remote address when the agent connects to register itself. This will not work if the server is behind a reverse proxy or in a different network than the agent.
INFO
If a hostname is provided here, then DNS resolution will be performed by the server. DNS resolution on the server on the server may not result in the same IP address as if it was performed on the agent.
port
The port on which this agent will listen. By default, this is 1427.
register
Whether this agent should immediately register itself with the systemslab server.
By default, this is set to true
, so the agent will register itself with the systemslab server at startup. Otherwise, the agent will wait for an external service to register it with the systesmlab server first.
name
A human-readable name for this server. If not specified then the current hostname will be used.
actions_dir
The directory that this agent will look in for step binaries.
By default, step binaries are installed in /usr/lib/systemslab-actions
.
cgroup_dir
A path to the current cgroup in a mounted cgroupv2 filesystem.
This option is set by the launch script and will be overridden if set in the config file.
tags
Tags to apply to this agent.
In SystemsLab, tags act as constraints. A job can only run on a host that has a superset of the tags it requires. Configure the tags here in order to constrain which jobs can be run on this host.
An example tag set would look like this
toml
tags = [
"gcs",
"service-a",
"nvme"
]
A host annotated with those tags would be able to run jobs with tags ["gcs", "nvme"]
or ["service-a"]
but it would not be able to run a job with tags ["service-b", "nvme"]
.
pidfile
Create a set of interface files under /run/systemslab-agent
.
This is meant to be used by other programs to discover information about the current sytsemslab agent on this machine.
This is set by the launch script so any value set in the config file will be ignored.
metrics_scrape_interval
The interval at which the agent will scrape metrics from rezolus, in seconds.
This can be any floating point value. Values lower than 10ms will be clamped to 10ms.
parquet_batch_size
The maximum number of metrics rows which will be processed as a single record batch. Higher values result in more efficient processing, but use more system memory during the parquet conversion phase. Lower values use less memory, but may take longer to process the metrics artifact after the run has completed.
The best value will depend upon how many rezolus metrics are enabled and how much system memory is available.
parquet_histogram_type
0.0.94
The representation used to store histogram columns amongst the metrics.
Available values:
standard
- all histogram buckets are stored in a single column.sparse
- only non-zero histograms buckets are stored as a pair of columns with the index and the count of the non-zero buckets.
isolation_strategy
The isolation strategy used to clean up spawned process jobs.
Available values:
none
- no attempt is made to clean up orphaned processes after the job ends.cgroup
- processes belonging to the job are isolated in a single cgroup and orphaned processes are killed after the job completes. This option required that a cgroupv2 controller is provided via thecgroup_dir
config options, which is done automatically by the systemd launch script.systemd
- the job is launched as a systemd scope and scope is terminated after the job completes. This option requires that a D-BUS session bus is available for the agent to connect to.
INFO
If a process is launched with sudo permissions then it may still outlive the job if it ignores signals. The agent does not have the permissions required kill such processes. In addition, sudo
ignores SIGTERM so it does not normally shut down during cleanup.
By default, the agent will attempt to use the cgroup
isolation strategy. If there is no cgroupv2 filesystem available on the current system then the none
isolation strategy will be used.
WARNING
The systemd
isolation strategy is not currently suitable for use within a systemd service. Attempting to set it in your config file will likely result in the agent failing to start.
experiment_timeout
0.0.86
The maximum allowed runtime, in seconds, that a job is allowed to run before it will be terminated.
The default value is 43200 (12 hours).
[log]
This section controls logging and log messages emitted by the agent.
level
A log filter that controls which log statements are enabled within the agent.
The filter expression can be either a single level (e.g. info
, or debug
) or a more complicated expression that specifies locations and their levels. The full syntax is documented at https://docs.rs/tracing-subscriber/latest/tracing_subscriber/filter/struct.EnvFilter.html
If not specified then the RUST_LOG
environment variable will be used to determine the log level.
cluster_id
A cluster id that is used when reporting error telemetry.
This does not affect any of the functionality of systemslab but allows error telemetry to traced back to the customer cluster that emitted them.
This should be named something like <company name>/<cluster name>
. For example, a cluster within IOP systems might be named iop/cluster-1
.