Create a task driver plugin

This page provides conceptual information for creating a task driver plugin to extend Nomad's workload execution functionality.

Task drivers in Nomad are the runtime components that execute workloads. For a real world example of a Nomad task driver plugin implementation, refer to the exec2 driver.

Authoring a task driver plugin

Authoring a task driver consists of implementing the BasePlugin and DriverPlugin interfaces and adding a main package to launch the plugin.

The nomad-skeleton-driver-plugin project exists to help bootstrap the development of new task driver plugins. It provides most of the boilerplate necessary for a task driver plugin, along with detailed comments.

Lifecycle and state

A task driver plugin is long-lived and its lifetime is not bound to the Nomad client. This means that the Nomad client can restart without restarting the driver. Nomad ensures that one instance of the task driver is running. If the task driver crashes or otherwise terminates, Nomad launches another instance of it.

Task drivers should maintain as little state as possible. State for a task is stored by the Nomad client on task creation. This enables a pattern where the task driver can maintain an in-memory state of the running tasks, and if necessary the Nomad client can recover tasks into the task driver state.

Base plugin API

`PluginInfo() (*PluginInfoResponse, error)`

A PluginInfoResponse contains meta data about the plugin.

PluginInfoResponse{
    // Type is the plugin type which is implemented
    Type: PluginTypeDriver,
    // Plugin API versions supported by the plugin
    PluginApiVersions: []string{drivers.ApiVersion010},
    // Version of the plugin
    PluginVersion: "0.1.0",
    // Name of the plugin
    Name: "foodriver",
}

`SetConfig(config *Config) error`

The SetConfig function is called when starting the plugin for the first time. The Config given has two different configuration fields. The first, PluginConfig, is an encoded configuration from the plugin block of the client config. The second, AgentConfig, is the Nomad agent's configuration, which is given to all plugins.

`ConfigSchema() (*hclspec.Spec, error)`

The ConfigSchema function allows a plugin to tell Nomad the schema for its configuration. This configuration is given in a plugin block of the client configuration. The schema is defined with the hclspec package.

Task driver plugin API

`TaskConfigSchema() (*hclspec.Spec, error)`

This function returns the schema for the task driver configuration of the task. For more information on hclspec.Spec, refer to the HCL specifications section.

`Capabilities() (*Capabilities, error)`

Capabilities define what features the task driver implements.

type Capabilities struct {
    // SendSignals marks the task driver as being able to send signals
    SendSignals bool

    // Exec marks the task driver as being able to execute arbitrary commands
    // such as health checks. Used by the ScriptExecutor interface.
    Exec bool

    //FSIsolation indicates what kind of filesystem isolation the task driver supports.
    FSIsolation fsisolation.Mode

    //NetIsolationModes lists the set of isolation modes supported by the task driver
    NetIsolationModes []NetIsolationMode

    // MustInitiateNetwork tells Nomad that the task driver must create the network
    // namespace and that the CreateNetwork and DestroyNetwork RPCs are implemented.
    MustInitiateNetwork bool

    // MountConfigs tells Nomad which mounting config options the task driver supports.
    MountConfigs MountConfigSupport

    // DisableLogCollection indicates this driver has disabled log collection
    // and the client should not start a logmon process.
    DisableLogCollection bool

    // DynamicWorkloadUsers indicates this driver is capable (but not required)
    // of making use of a UID/GID not backed by a user known to the operating
    // system. The allocation of a unique, not-in-use UID/GID is managed by the
    // Nomad client ensuring no overlap.
    DynamicWorkloadUsers bool
}

The file system isolation options are the following:

fsisolation.Image: The task driver isolates tasks as machine images.
fsisolation.Chroot: The task driver isolates tasks with chroot or pivot_root.
fsisolation.Unveil: The task driver isolates tasks with the Landlock LSM or other unveil like system.
fsisolation.None: The task driver has no filesystem isolation.

The network isolation modes are the following:

NetIsolationModeHost: The task driver supports disabling network isolation and using the host network.
NetIsolationModeGroup: The task driver supports using the task group network namespace.
NetIsolationModeTask: The task driver supports isolating the network to just the task.
NetIsolationModeNone: There is no network to isolate. This is used for task that the client manages remotely.

`Fingerprint(context.Context) (<-chan *Fingerprint, error)`

This function is called by the client when the plugin is started. It allows the driver to indicate its health to the client. The channel returned should immediately send an initial Fingerprint, then send periodic updates at an interval that is appropriate for the task driver until the context is canceled.

The fingerprint consists of a HealthState and HealthDescription to inform the client about its health. Additionally an Attributes field is available for the task driver to add additional attributes to the client node. The fingerprint HealthState can be one of the following states:

HealthStateUndetected: Indicates that the necessary dependencies for the driver are not detected on the system. Ex. java runtime for the java driver
HealthStateUnhealthy: Indicates that something is wrong with the task driver runtime. Ex. docker daemon stopped for the Docker driver
HealthStateHealthy: All systems go

`StartTask(TaskConfig) (TaskHandle, *DriverNetwork, error)`

This function takes a TaskConfig that includes all of the configuration needed to launch the task. Additionally, the task driver configuration can be decoded from the TaskConfig by calling *TaskConfig.DecodeDriverConfig(t interface{}), passing in a pointer to the task driver specific configuration struct. The TaskConfig includes an ID field which future operations on the task are referenced by.

Drivers return a *TaskHandle that contains the required information for the task driver to reattach to the running task in the case of plugin crashes or restarts. Some of this required state is specific to the task driver implementation, thus a DriverState field exists to allow the task driver to encode custom state into the struct. Helper fields exist on the TaskHandle to GetDriverState and SetDriverState removing the need for the task driver to handle serialization.

A *DriverNetwork can optionally be returned to describe the network of the task if it is modified by the task driver. An example of this is in the Docker driver where tasks can be attached to a specific Docker network.

If an error occurs, it is expected that the task driver cleans up any created resources prior to returning the error.

Logging

Nomad handles all rotation and plumbing of task logs. In order for task stdout and stderr to be received by Nomad, they must be written to the correct location. Prior to starting the task through the task driver, the Nomad client creates FIFOs for stdout and stderr. These paths are given to the task driver in the TaskConfig. Use the fifo package to support cross platform writing to these paths.

Dynamic workload users

Nomad is capable of dynamically allocating unused UID/GID values for use by task drivers when launching a task. These UID/GID values are deallocated when the task is destroyed. The pool of available UID/GID values can be controlled in client config via the users block.

TaskHandle schema versioning

A Version field is available on the TaskHandle struct to facilitate backwards compatible recovery of tasks. This field is opaque to Nomad, but it allows the driver to handle recover tasks that were created by an older version of the plugin.

`RecoverTask(*TaskHandle) error`

When a driver is restarted, it is not expected to persist any internal state to disk. To support this, Nomad attempts to recover a task that was previously started if the task driver does not recognize the task ID. During task recovery, Nomad calls RecoverTask, passing the TaskHandle that was returned by the StartTask function. If no error is returned, it is expected that the task driver can now operate on the task by referencing the task ID. If an error occurs, the Nomad client marks the task as lost.

`WaitTask(context.Context, id string) (<-chan *ExitResult, error)`

The WaitTask function is expected to return a channel that sends an *ExitResult when the task exits or close the channel when the context is canceled. It is also expected that calling WaitTask on an exited task immediately sends an *ExitResult on the returned channel.

`StopTask(taskID string, timeout time.Duration, signal string) error`

The StopTask function is expected to stop a running task by sending the given signal to it. If the task does not stop during the given timeout, the task driver must forcefully kill the task.

StopTask does not clean up resources of the task or remove it from the driver's internal state. A call to WaitTask after StopTask is valid and should be handled.

`DestroyTask(taskID string, force bool) error`

The DestroyTask function cleans up and removes a task that has terminated. If force is set to true, the task driver must destroy the task even if it is still running. If WaitTask is called after DestroyTask, it should return drivers.ErrTaskNotFound as no task state should exist after DestroyTask is called.

`InspectTask(taskID string) (*TaskStatus, error)`

The InspectTask function returns detailed status information for the referenced taskID.

`TaskStats(context.Context, id string, time.Duration) (<-chan *cstructs.TaskResourceUsage, error)`

The TaskStats function returns a channel which the task driver should send stats to at the given interval. The driver must send stats at the given interval until the given context is canceled or the task terminates.

`TaskEvents(context.Context) (<-chan *TaskEvent, error)`

The Nomad client publishes events associated with an allocation. The TaskEvents function allows the task driver to publish driver specific events about tasks and the Nomad client associates them with the correct allocation.

An Eventer utility, available in the github.com/hashicorp/nomad/drivers/shared/eventer package, implements an event loop and publishing mechanism for use in the TaskEvents function.

`SignalTask(taskID string, signal string) error`

Optional - can be skipped by embedding drivers.DriverSignalTaskNotSupported

The SignalTask function is used by drivers that support sending OS signals (SIGHUP, SIGKILL, SIGUSR1 etc.) to the task. It is an optional function and is listed as a capability in the task driver Capabilities struct.

`ExecTask(taskID string, cmd []string, timeout time.Duration) (*ExecTaskResult, error)`

Optional - can be skipped by embedding drivers.DriverExecTaskNotSupported

The ExecTask function is used by the Nomad client to execute commands inside the task execution context. For example, the Docker driver executes commands inside the running container. ExecTask is called for Consul script checks.

HCL specifications

*hclspec.Spec is a struct that defines the schema to validate an HCL entity against. Find the full documentation of the different HCL attribute types in the hclspec godoc.

For a basic example, review the driver configuration for the raw_exec driver.

job "example" {
...
      driver = "raw_exec"
      config {
        command = "/bin/sleep"
        args = ["100"]
      }
}

The config block is what is validated against the hclspec.Spec. The command key takes a string attribute, and the args key takes an array attribute. The corresponding *hclspec.Spec would be:

    spec :=  hclspec.NewObject(map[string]*hclspec.Spec{
        "command": hclspec.NewAttr("command", "string", true),
        "args":    hclspec.NewAttr("args", "list(string)", false),
    })

Create a task driver plugin

Authoring a task driver plugin

Lifecycle and state

Base plugin API

PluginInfo() (*PluginInfoResponse, error)

SetConfig(config *Config) error

ConfigSchema() (*hclspec.Spec, error)

Task driver plugin API

TaskConfigSchema() (*hclspec.Spec, error)

Capabilities() (*Capabilities, error)

Fingerprint(context.Context) (<-chan *Fingerprint, error)

StartTask(*TaskConfig) (*TaskHandle, *DriverNetwork, error)

Logging

Dynamic workload users

TaskHandle schema versioning

RecoverTask(*TaskHandle) error

WaitTask(context.Context, id string) (<-chan *ExitResult, error)

StopTask(taskID string, timeout time.Duration, signal string) error

DestroyTask(taskID string, force bool) error

InspectTask(taskID string) (*TaskStatus, error)

TaskStats(context.Context, id string, time.Duration) (<-chan *cstructs.TaskResourceUsage, error)

TaskEvents(context.Context) (<-chan *TaskEvent, error)

SignalTask(taskID string, signal string) error

ExecTask(taskID string, cmd []string, timeout time.Duration) (*ExecTaskResult, error)