buildbox runner

Intro

In the buildbox architecture, a worker machine runs two processes: a worker, which is in charge of fetching work from a remote execution server, and a runner, which is invoked by that worker to carry out the task.

A runner will be spawned with an Action. It will execute the command contained in it and return an ActionResult message.

Having buildbox-worker as an intermediary between the remote execution service and the runner allows to separate concerns and reuse code, sparing developers of runners from having to implement RWAPI logic.

When launching buildbox-worker, the path of a buildbox-runner binary is specified:

./buildbox-worker \
  --buildbox-run=/usr/bin/buildbox-run-hosttools \
  --bots-remote=$REMOTE_EXECUTION_SERVER_ADDRESS \
  --cas-remote=$CAS_SERVER_ADDRESS

Once it receives a request, it will invoke the specified runner with the necessary parameters:

./buildbox-run-hosttools \
  --action=$WORKDIR/action_file \
  --remote=$CAS_SERVER_ADDRESS \
  --action-result=$WORKDIR/action_result

(Note: in the terminology used by the Remote Worker API, a worker is a machine capable of being used to execute a command.)

buildboxcommon::Runner class

The Runner class in the buildbox-common library provides facilities to easily create a new type of runner. An implementation can inherit from that class and implement its key functionality by overriding its execute() method.

This class also provides helper methods to parse command line arguments, stage and capture directories, fetch a Command from the CAS server, write an ActionResult file and read the output written by the command to stdout and stderr.

class Runner {
  public:
    virtual ActionResult execute(const Command &command, const Digest &inputRootDigest);

    virtual bool parseArg(const char *);

    virtual void printSpecialUsage();

    virtual void printSpecialCapabilities();

    virtual ~Runner(){};

    ...
}

The execute() method is the most important method of the runner: it implements the actual execution of a command. The Runner interface also allows runners to define their own command line arguments and report special capabilities that they might offer.

$PATH lookups

The Remote Execution API 2 forbids commands that do not specify a path to an executable. That is, gcc should be invoked as /usr/bin/gcc or ./gcc (if gcc is contained in the input root).

In consequence, runners should never perform $PATH lookups for remote execution commands.

Runners can however search for commands that they might need to invoke in preparation for the execution of an user-provided command. For that buildbox-common provides the SystemUtils::getPathToCommand() method.

Logs

Methods that are defined inside a class that inherits from buildboxcommon::Runner and need to log messages should use the BUILDBOX_RUNNER_LOG(level, message) logging macro, where the level argument is a level from the regular logging macros provided by buildbox-common.

(Currently, this runner-specific macro will attach the id of the Action being executed to every log line.)

Error causes

When encountering errors, a runner exits with a with non-zero status code.

However, it can useful for diagnostics to propagate information about the cause of the error. For that buildbox-worker and buildboxcommon::Runner follow a convention where the latter, when aborting, will attempt to write a Status 4 protobuf file containing a descriptive error code and a message string.

Therefore the process that invokes a runner can, when detecting that the runner exited unsuccessfully, attempt to gather more information by reading that file.

The Runner::errorStatusCodeFilePath(actionResultPath) helper takes the path where the ActionResult should be writen and returns the path where the error status file is expected to be present, if created.

LocalCAS protocol

It is expected that most runners will rely on the LocalCAS protocol 1 to stage directories and capture the produced outputs. Therefore, Runner::d_use_localcas_protocol is set to true by default.

For the case where a casd instance is not available, Runner::parseArguments() offers the --disable-localcas CLI option.

ExecutedActionMetadata

The REAPI ActionResult message contains a field ExecutedActionMetadata 3 that contains timestamps for different operations. Those related to a worker are:

  1. worker_start_timestamp: received the action.

  2. worker_completed_timestamp: completed the action, including all stages.

  3. input_fetch_start_timestamp: started fetching action inputs.

  4. input_fetch_completed_timestamp: finished fetching action inputs.

  5. execution_start_timestamp: started executing the action command.

  6. execution_completed_timestamp: completed executing the action command.

  7. output_upload_start_timestamp: started uploading action outputs.

  8. output_upload_completed_timestamp: finished uploading action outputs.

buildboxcommon::Runner automatically sets 1, 2, 5 and 6 (that is, all timestamps except those related to fetching inputs and uploading outputs).

Implementations of a runner must use the void Runner::metadata_mark_{input_download, output_upload}_{start, end}(ExecutedActionMetadata*) functions to set the timestamps of the input fetching and output upload operations.

auxiliary_metadata

When the --collect-execution-stats command-line option is set, buildboxcommon::Runner will also generate an execution_stats.proto message with metrics from the execution of the command. Currently those are values reported by getrusage(2).

That message will be uploaded to CAS wrapped in a protobuf.Any 5 message, and its digest attached to the execution_metadata.auxiliary_metadata field of the ActionResult.

In order to access the metrics from the metadata, a client can do:

if (!actionResult.execution_metadata().auxiliary_metadata().empty()) {
  const auto &metadataEntry =
      actionResult.execution_metadata().auxiliary_metadata(0);

  Digest d;
  if (metadataEntry.UnpackTo(&d) && d.size_bytes() > 0) {
      // The entry contains a Digest, fetch the blob it references:
      const auto any = casClient.fetchMessage<google::protobuf::Any>(d);

      // Check whether the blob contains `ExecutionStatistics` message:
      build::buildbox::ExecutionStatistics stats;
      if (any.UnpackTo(&stats)) {
        // `stats` is a valid ExecutionStatistics message.
      }
  }
}

Note that ExecutionStatistics is not packed directly into the auxiliary_metadata field (which is of type Any) because this can cause fatal errors when attempting to convert an ActionResult message to its JSON representation. This is due to a limitation in some converters that requires them to have the definitions for the protobuf messages packed in fields of type Any, which is not the case when an ActionResult is handled by an execution server or other tools that do not need to be aware of execution_stats.proto and rely on obtaining a JSON representation of protobufs 6.

1

https://gitlab.com/BuildGrid/buildbox/buildbox-common/blob/master/protos/build/buildgrid/local_cas.proto

2

https://github.com/bazelbuild/remote-apis/blob/178b756a22d441d8d06873a70bcd0ef01d876467/build/bazel/remote/execution/v2/remote_execution.proto#L445

3

https://github.com/bazelbuild/remote-apis/blob/178b756a22d441d8d06873a70bcd0ef01d876467/build/bazel/remote/execution/v2/remote_execution.proto#L788

4

https://github.com/googleapis/googleapis/blob/master/google/rpc/status.proto

5

https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/any.proto#L48

6

https://gitlab.com/BuildGrid/buildbox/buildbox-common/-/issues/81