Remote execution overview

Remote execution aims to speed up build processes by relying on two separate but related concepts: remote caching and remote execution. Remote caching allows users to share build outputs while remote execution allows the running of operations on a remote cluster of machines which may be more powerful (or different configurations) than what the user has access to locally.

The Remote Execution API (REAPI) describes a gRPC + protocol-buffers interface that has three main services for remote caching and execution:

A ContentAddressableStorage (CAS) service: a remote storage end-point where content is addressed by digests, a digest being a pair of the hash and size of the data stored or retrieved.
An ActionCache (AC) service: a mapping between build actions already performed and their corresponding resulting artifact (usually lives with the CAS service).
An Execution service: the main end-point allowing one to request build job to be performed against the build farm.

The Remote Worker API (RWAPI) describes another gRPC + protocol-buffers interface that allows a central BotsService to manage a farm of pluggable workers.

BuildGrid is combining these two interfaces in order to provide a complete remote caching and execution service. The high level architecture can be represented like this:

$digraph remote_execution_overview { node [shape = record, width=2, height=1]; ranksep = 2 compound=true edge[arrowtail="vee"]; edge[arrowhead="vee"]; client [label = "Client", color="#0342af", fillcolor="#37c1e8", style=filled, shape=box] database [label = "Database", color="#8a2be2", fillcolor="#9370db", style=filled, shape=box] subgraph cluster_controller{ label = "Controller"; labeljust = "c"; fillcolor="#42edae"; style=filled; controller [label = "{ExecutionService|BotsInterface\n}", fillcolor="#17e86a", style=filled]; } subgraph cluster_worker0 { label = "Worker 1"; labeljust = "c"; color="#8e7747"; fillcolor="#ffda8e"; style=filled; bot0 [label = "{Bot|Host-tools}" fillcolor="#ffb214", style=filled]; } subgraph cluster_worker1 { label = "Worker 2"; labeljust = "c"; color="#8e7747"; fillcolor="#ffda8e"; style=filled; bot1 [label = "{Bot|BuildBox}", fillcolor="#ffb214", style=filled]; } client -> controller [ dir = "both", headlabel = "REAPI", labelangle = 20.0, labeldistance = 9, labelfontsize = 15.0, lhead=cluster_controller]; database -> controller [ dir = "both", headlabel = "SQL", labelangle = 20.0, labeldistance = 9, labelfontsize = 15.0, lhead=cluster_controller]; controller -> bot0 [ dir = "both", labelangle= 340.0, labeldistance = 7.5, labelfontsize = 15.0, taillabel = "RWAPI ", lhead=cluster_worker0, ltail=cluster_controller]; controller -> bot1 [ dir = "both", labelangle= 20.0, labeldistance = 7.5, labelfontsize = 15.0, taillabel = " RWAPI", lhead=cluster_worker1, ltail=cluster_controller]; }$

BuildGrid can be split up into separate endpoints. It is possible to have a separate ActionCache and CAS from the Controller. The following diagram shows a typical setup.

$digraph remote_execution_overview { node [shape=record, width=2, height=1]; compound=true graph [nodesep=1, ranksep=2] edge[arrowtail="vee"]; edge[arrowhead="vee"]; client [label="Client", color="#0342af", fillcolor="#37c1e8", style=filled, shape=box] database [label = "Database", color="#8a2be2", fillcolor="#9370db", style=filled, shape=box] cas [label="CAS", color="#840202", fillcolor="#c1034c", style=filled, shape=box] subgraph cluster_controller{ label="Controller"; labeljust="c"; fillcolor="#42edae"; style=filled; controller [label="{ExecutionService|BotsInterface\n}", fillcolor="#17e86a", style=filled]; } actioncache [label="ActionCache", color="#133f42", fillcolor="#219399", style=filled, shape=box] subgraph cluster_worker0 { label="Worker"; labeljust="c"; color="#8e7747"; fillcolor="#ffda8e"; style=filled; bot0 [label="{Bot}" fillcolor="#ffb214", style=filled]; } client -> controller [ dir="both"]; database -> controller [ dir="both"]; client -> cas [ dir="both", lhead=cluster_controller]; controller -> bot0 [ dir="both", lhead=cluster_worker0]; //ltail=cluster_controller]; cas -> bot0 [ dir="both", lhead=cluster_worker0]; actioncache -> controller [ dir="both"]; client -> actioncache [ dir="both", constraint=false, ]; }$

The flow of BuildGrid requests

BuildGrid uses various threads to achieve different tasks. The following diagram is an overview of the interactions between components of BuildGrid in response to a GRPC Request.

The Light Green color is used to signify distinct threads, and entities outside of the green boxes are shared among all threads.

$digraph buildgrid_overview { node [shape=record, width=2, height=1]; fontsize=16; compound=true; graph [nodesep=0.1, ranksep=0] edge [arrowtail="vee", arrowhead="vee", fontsize=16, fontcolor="#02075D", color="#02075D",]; splines=polyline; rankdir=LR; subgraph cluster_clients{ label="GRPC Clients\n(REAPI/RWAPI)"; labeljust="c"; fillcolor="#ffccdd"; style=filled; clients [label="Remote Execution Clients|Bots|CAS Clients\n", fillcolor="#ff998e", style=filled] } subgraph cluster_bgd { label="BuildGrid Process"; labeljust="c"; fillcolor="#ffda8e"; style=filled; subgraph cluster_bgd_services { label="BuildGrid Services"; labeljust="c"; fillcolor="#ffb214"; fontsize=14; bgd_services [ label="Execution|Bots|CAS\n", fillcolor="#ffb214", style=filled] } subgraph cluster_data { label="Persistent Data"; labeljust="c"; fillcolor="#9370db"; data [label="CAS Backend";shape=cylinder;] data_store [label="DataStore";shape=cylinder;] } jobwatcher [ label="Job Watcher Thread", labeljust="c", fillcolor="#42edae", fontsize=14, style=filled, ] pruner [ label="Scheduler Pruner\nThread", fillcolor="#42edae", fontsize=14, style=filled, ] deferredwrites [ label="Deferred Writer\nThreads", fillcolor="#42edae", fontsize=14, style=filled, ] subgraph cluster_mainthread { label="Main Thread"; fillcolor="#42edae"; fontsize=14; subgraph cluster_asyncioloop { label="asyncio loop"; labeljust="c"; fillcolor="#00A572"; style=filled; asyncio_loop [label="Log Writer|BotSession Reaper|State Monitoring Worker\n", fillcolor="#29AB87", style=filled]; } } subgraph cluster_grpc { label="GRPC Thread"; fillcolor="#42edae"; fontsize=14; subgraph cluster_grpcserver{ label="GRPC Server"; labeljust="c"; fillcolor="#37c1e8"; style=filled; grpc_server [label="unary_unary|unary_stream\n", fillcolor="#37c1cc", style=filled]; } grpccb [ label="Pluggable\nTermination Callback\n(per request type)", fillcolor="#37c1cc", style=filled, ] } subgraph cluster_grpcservicer { label="GRPC Servicer\n(Running within ThreadPool)\n`gRPC_Executor_n`"; labeljust="c"; fillcolor="#42edae"; style=filled; fontsize=14; grpc_servicer [label="def Execute:\l|def WaitExecute:\l|def ...:\l", fillcolor="#17e86a", style=filled]; } grpc_server -> grpc_servicer [ dir="forward", label="2. ThreadPool.submit()", ltail=cluster_grpcserver, lhead=cluster_grpcservicer ] grpc_servicer -> grpc_server [ dir="forward", label="3. Prepares response", lhead=cluster_grpcserver, ltail=cluster_grpcservicer ] grpc_server -> grpccb [ dir="forward", label="5. Calls Termination Callback\n(optional)", lhead=cluster_grpcserver, ] } subgraph cluster_metrics { label="Metrics Writer Process"; labeljust="c"; fillcolor="#ffda8e"; style=filled; thread [ label="Main Thread", fillcolor="#42edae", fontsize=14, style=filled, ] } clients -> grpc_server [ dir="forward", label="1.\ngrpc:Execute\lgrpc:WaitExecute\lgrpc:...\l", lhead=cluster_grpcserver, ltail = cluster_clients, ]; grpc_server -> clients[ dir="forward", label="4. Sends Response", ltail=cluster_grpcserver, lhead = cluster_clients, ]; # Invisible edges to improve the layout bgd_services -> data [style=invis]; asyncio_loop -> data_store [style=invis]; data -> jobwatcher [style=invis]; data -> pruner [style=invis]; data -> deferredwrites [style=invis]; }$

Concurrency Model

As is shown in the diagram above there are various approaches to concurrency used in a bgd server process, and the purpose of everything isn’t necessarily immediately clear.

This section describes the concurrency model in more detail.

First, the two processes. The main bgd server process forks shortly after startup, with the parent being the actual gRPC server process, and the child being a separate process to handle publishing metrics.

Metrics Writer Process

This subprocess consumes LogRecord and MonitoringRecord messages from a multiprocessing.Queue. These messages are then processed and published to the configured monitoring output, e.g. a StatsD server.

Messages are sent to the queue in the main BuildGrid process, by the methods in buildgrid.server.metrics_utils.

This metrics writer lives in a subprocess because the required rate of metrics throughput when BuildGrid is under load wasn’t achievable using coroutines or threads in the same process as the gRPC handler threads.

BuildGrid Process

This is the main bgd server process. It uses both asyncio and threading for concurrency, whilst it runs a gRPC server to handle incoming requests to the configured BuildGrid services.

Coroutines

There are potentially three long-running coroutines in a BuildGrid server. There are also some other short-lived coroutines that run, as implementation details of the Janus queue used for logging and for submitting metrics asynchronously to the Metrics Writer Process.

Log Writer Coroutine

This coroutine handles formatting and writing log messages to stdout.

In BuildGrid, the logging library gets configured to write logs into a Janus queue. A Janus queue is a simple wrapper around a queue.Queue with both synchronous and asynchronous get/put methods.

The Log Writer coroutine asynchronously reads messages from this Janus queue, and writes them to stdout. This approach is intended to ensure that writing log messages doesn’t become a bottleneck in a gRPC request handler at some point.

State Monitoring Worker Coroutine

The State Monitoring Worker periodically inspects any Execution or Bots services and related Schedulers, and generates metrics about their current state. These metrics are then sent to the multiprocessing.Queue used by the Metrics Writer.

If the server doesn’t have any Execution or Bots services, and doesn’t have any Schedulers configured either, then this coroutine will currently still exist but won’t actually do anything.

BotSession Reaper Coroutine

Note

This is only present when the server contains a Bots service.

The BotSession Reaper is part of the Bots service, and keeps track of when specific workers were last seen by BuildGrid. If a worker fails to reconnect by the agreed deadline, then this coroutine handles killing the relevant BotSession and requeueing the job that was being executed by the missing worker (if any).

Threads

There are a significant number of threads in use in a bgd server process. The majority of these live inside the gRPC thread pool used to handle gRPC requests concurrently, but there are a few threads that we create for both background tasks and optimizing access to resources.

Main Thread

This is the main execution thread. It contains the asyncio event loop which runs the two coroutines noted above. It also starts the gRPC server, and handles stopping the gRPC server cleanly.

gRPC Thread

This is a daemon thread created by the gRPC server when Server.start is called. This thread handles the actual running of the gRPC server, such as receiving incoming requests and routing them to handler methods, and running any callbacks when connections are closed.

gRPC Executor Threads

This is a number of threads in a ThreadPoolExecutor used by the gRPC server to handle incoming requests. The number is configurable using the thread-pool-size configuration key.

When the gRPC server gets a new request, it locates the correct handler method in the servicers that are registered with the server, and then runs that handler in a thread from this pool.

Job Watcher Thread

Note

This is only present when the server contains a Scheduler.

This thread is part of the data store implementations used by the Scheduler.

BuildGrid needs to keep track of all the jobs it is currently serving update streams for (via either Execute or WaitExecution requests), and detect changes in the state of those jobs so that the update messages can be sent.

Rather than having each handler thread monitor the database (or in-memory job state) independently, the handlers register their job with the Job Watcher thread, and wait for notification of updates.

The Job Watcher thread watches for updates in the data store (e.g. by using PostgreSQL’s LISTEN if available in the SQL implementation), and when it finds an update to a job that is being watched will notify the relevant handler threads of the change.

Scheduler Pruner Thread

Note

This is only present when the server contains an SQL Scheduler with pruning enabled in the configuration.

This thread is part of the SQL Scheduler data store implementation.

If pruning is enabled, then this thread is started to manage pruning the jobs table in the database, which otherwise will grow to very large sizes without some external cleanup mechanism.

Deferred CAS Writer Threads

Note

These are only present when the server contains a !with-cache storage configured with deferred writes enabled.

This is a number of threads which handle deferring writes of blobs to the cache’s fallback storage. It’s another ThreadPoolExecutor, this time used by the implementation of the !with-cache storage backend to defer the write of a blob to the configured fallback storage. This allows writes to return early, after just the write to the cache layer which is likely faster than the fallback.

The number of threads in this pool is configurable using the fallback-writer-threads configuration key in the !with-cache config dictionary.