Nomad
Create a device plugin
This page provides conceptual information for creating a device driver plugin to extend Nomad's workload execution functionality.
Nomad has built-in support for scheduling compute resources such as CPU, memory, and networking. Use Nomad device driver plugins to support scheduling tasks with other devices, such as GPUs. Device driver plugins are responsible for fingerprinting these devices and working with the Nomad client to make them available to assigned tasks.
For a real world example of a Nomad device plugin implementation, refer to the Nvidia GPU plugin.
Authoring a device plugin
Authoring a device plugin in Nomad consists of implementing the BasePlugin and DevicePlugin interfaces alongside a main package to launch the plugin.
The nomad-skeleton-device-plugin exists to help bootstrap the development of new device plugins. It provides most of the boilerplate necessary for a device plugin, along with detailed comments.
Lifecycle and state
A device plugin is long-lived. Nomad ensures that one instance of the plugin is running. If the plugin crashes or otherwise terminates, Nomad launches another instance of it.
However, unlike task driver plugins, device plugins do not currently have an interface for persisting state to the Nomad client. Instead, the device plugin API emphasizes fingerprinting devices and reporting their status. After helping to provision a task with a scheduled device, a device plugin does not have any responsibility, or ability, to monitor the task.
Base plugin API
PluginInfo() (*PluginInfoResponse, error)
A PluginInfoResponse
contains meta data about the plugin.
PluginInfoResponse{
// Type is the plugin type which is implemented
Type: PluginTypeDriver,
// Plugin API versions supported by the plugin
PluginApiVersions: []string{drivers.ApiVersion010},
// Version of the plugin
PluginVersion: "0.1.0",
// Name of the plugin
Name: "foodriver",
}
SetConfig(config *Config) error
The SetConfig
function is called when starting the plugin for the first
time. The Config
given has two different configuration fields. The first,
PluginConfig
, is an encoded configuration from the plugin
block of the
client config. The second, AgentConfig
, is the Nomad agent's configuration,
which is given to all plugins.
ConfigSchema() (*hclspec.Spec, error)
The ConfigSchema
function allows a plugin to tell Nomad the schema for its
configuration. This configuration is given in a plugin block of
the client configuration. The schema is defined with the hclspec
package.
Device driver plugin API
Fingerprint(context.Context) (<-chan *FingerprintResponse, error)
The client calls the Fingerprint
function when the plugin is
started. This function allows the plugin to provide Nomad with a list of
discovered devices, along with their attributes, for the purpose of scheduling
workloads using devices. The channel returned should immediately send an initial
FingerprintResponse
, then send periodic updates at an
appropriate interval until the context is canceled.
Each fingerprint response consists of either an error or a list of device groups. A device group is a list of detected devices that are identical for the purpose of scheduling, which means they have identical attributes.
Stats(context.Context, time.Duration) (<-chan *StatsResponse, error)
The Stats
function returns a channel on which the plugin should
emit device statistics, at the specified interval, until either an error is
encountered or the specified context is cancelled. The StatsResponse
object
allows dimensioned statistics to be returned for each device in a device group.
Reserve(deviceIDs []string) (*ContainerReservation, error)
The Reserve
function accepts a list of device IDs and returns the
information necessary for the client to make those devices available to a task.
Currently, the ContainerReservation
object allows the plugin to specify
environment variables for the task, as well as a list of host devices and files
to be mounted into the task's filesystem. Any orchestration required to prepare
the device for use should also be performed in this function.
HCL specifications
*hclspec.Spec
is a struct that defines the schema to validate an HCL entity
against. Find the full documentation of the different HCL attribute types in the hclspec godoc.
For a basic example, review the driver configuration for the raw_exec driver.
job "example" {
...
driver = "raw_exec"
config {
command = "/bin/sleep"
args = ["100"]
}
}
The config
block is what is validated against the hclspec.Spec
. The
command
key takes a string attribute, and the args
key takes an array
attribute. The corresponding *hclspec.Spec
would be:
spec := hclspec.NewObject(map[string]*hclspec.Spec{
"command": hclspec.NewAttr("command", "string", true),
"args": hclspec.NewAttr("args", "list(string)", false),
})