Nomad
Nomad Autoscaler Telemetry
The Nomad Autoscaler agent collects various runtime metrics about the performance of different libraries and subsystems. These metrics are aggregated on a ten second interval and are retained for one minute. To configure the telemetry output please see the agent configuration.
This data can be accessed via the /v1/metrics HTTP endpoint, via sending a
signal to the Nomad Autoscaler process or via a number of integrations.
To view this data via sending a signal to the Nomad Autoscaler process: on Unix,
this is USR1 while on Windows it is BREAK. Once Nomad Autoscaler receives
the signal, it will dump the current telemetry information to the agent's stderr.
This telemetry information can be used for debugging or otherwise getting a better view of what Nomad is doing.
Below is sample output of a telemetry dump:
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.sys_bytes': 74793216.000
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.malloc_count': 219856.000
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.free_count': 183613.000
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.total_gc_pause_ns': 348822.000
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.total_gc_runs': 5.000
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.num_goroutines': 12.000
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.policy.total_num': 0.000
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.alloc_bytes': 4316568.000
[2020-08-25 10:01:20 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.heap_objects': 36243.000
[2020-08-25 10:01:20 +0100 BST][S] 'nomad-autoscaler.runtime.gc_pause_ns': Count: 5 Min: 38083.000 Mean: 69764.400 Max: 122291.000 Stddev: 31487.808 Sum: 348822.000 LastUpdated: 2020-08-25 10:01:26.574809 +0100 BST m=+1.241576679
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.alloc_bytes': 4370504.000
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.malloc_count': 220853.000
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.free_count': 183613.000
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.policy.total_num': 0.000
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.num_goroutines': 12.000
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.total_gc_pause_ns': 348822.000
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.total_gc_runs': 5.000
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.sys_bytes': 74793216.000
[2020-08-25 10:01:30 +0100 BST][G] 'nomad-autoscaler.pathfinder.runtime.heap_objects': 37240.000
Runtime Metrics
The runtime metrics help understand the Nomad Autoscaler agent's memory and load pressure performance.
| Metric | Description | Type |
|---|---|---|
nomad-autoscaler.runtime.num_goroutines | Number of running goroutines | Gauge |
nomad-autoscaler.runtime.alloc_bytes | The number of allocated heap bytes | Gauge |
nomad-autoscaler.runtime.sys_bytes | The total bytes of memory obtained from the OS | Gauge |
nomad-autoscaler.runtime.malloc_count | Cumulative count of heap objects allocated | Gauge |
nomad-autoscaler.runtime.free_count | Cumulative count of heap objects freed | Gauge |
nomad-autoscaler.runtime.heap_objects | Number of allocated heap objects | Gauge |
nomad-autoscaler.runtime.total_gc_pause_ns | Cumulative nanoseconds in GC stop-the-world pauses | Gauge |
nomad-autoscaler.runtime.total_gc_runs | Number of completed GC cycles | Gauge |
nomad-autoscaler.runtime.gc_pause_ns | Number of nanoseconds to complete the last GC cycle | Timer |
Policy Metrics
Policy metrics provide insights into the performance of the Nomad Autoscaler's policy handling.
| Metric | Description | Type | Labels |
|---|---|---|---|
nomad-autoscaler.policy.total_num | The number of policies currently held within the autoscaler | Gauge | |
nomad-autoscaler.policy.queue.horizontal | The number of scaling actions currently being executed for horizontal scaling | Gauge | |
nomad-autoscaler.policy.queue.cluster | The number of scaling actions currently being executed for cluster scaling | Gauge | |
nomad-autoscaler.policy.source.error_count | Tracks the number of errors generated by the policy sources | Counter | policy_source |
Scaling Metrics
Scaling metrics provide insight into the performance of scaling actions as well as overall success and failure counters.
| Metric | Description | Type | Labels |
|---|---|---|---|
nomad-autoscaler.scale.evaluate_ms | The time taken to evaluate the checks within a single policy | Timer | policy_id, target_name |
nomad-autoscaler.scale.invoke_ms | The time taken to invoke scaling based on the scaling evaluations | Timer | policy_id, target_name |
nomad-autoscaler.scale.invoke.success_count | Tracks the number of successful scaling actions triggered | Counter | policy_id, target_name |
nomad-autoscaler.scale.invoke.error_count | Tracks the number of unsuccessful scaling actions triggered | Counter | policy_id, target_name |
Plugin Metrics
Plugin metrics provide insight into the performance of Nomad Autoscaler plugins and help identify potential bottle necks or latency issues.
| Metric | Description | Type | Labels |
|---|---|---|---|
nomad-autoscaler.plugin.manager.access_ms | The time taken to dispense a plugin | Timer | |
nomad-autoscaler.target.status.invoke_ms | The time taken to perform the target plugin status call | Timer | policy_id, plugin_name |
nomad-autoscaler.target.scale.invoke_ms | The time taken to perform the target plugin scale call | Timer | policy_id, plugin_name |
nomad-autoscaler.apm.query.invoke_ms | The time taken to perform the APM plugin query call | Timer | policy_id, plugin_name |
nomad-autoscaler.strategy.run.invoke_ms | The time taken to perform the strategy plugin run call | Timer | policy_id, plugin_name |