Konrad Botor requested to merge kbotor/ci-rewrite-for-parallel-replay into develop Jan 23, 2024

CI rewrite for parallel replay

Merge request prerequisites

Requires common-ci-configuration!37 to be merged first.

Runner tag prerequisites

every runner capable of replay must be tagged with the tag defined by DATA_REPLAY_TAG (currently data-cache-storage)
every runner capable of replay and running on the same server must be tagged with a tag unique to that server (eg. hive-builder-5 for the current runners)
every runner capable of replay must have a maximum of 10 tags (an arbitrary limit explained later on)

This way every cache pool is represented by a unique combination of tags.

Note: Currently the two runners capable of replay do not have the server-specific tag, but since they run on the same server the solution works anyway. The tag needs to be added, however, before configuring any other runners on other servers to run replay jobs.

How it works

Job determine-runner-tag, tagged with $DATA_REPLAY_TAG starts on one of the replay-capable runners. The specific runner is determined by GitLab's algorithm.
Job determine-runner-tag reads all the tags of the runners it's running on from the $CI_RUNNER_TAGS variable and saves those tags in a dotenv file in separate variables prefixed with RUNNER_TAG_.
Trigger job main-pipeline-trigger reads the dotenv file and passes the first 10 RUNNER_TAG_ variables to the new pipeline it triggers - after replacing the old prefix with DYNAMIC_RUNNER_TAG_.
All the jobs in the main pipeline tagged with tags from $DYNAMIC_RUNNER_TAG_0 to $DYNAMIC_RUNNER_TAG_9 pick up the variables passed by the trigger job and run on a runner with those tags. Since the tags uniquely identify a specific server/cache pool, it is guaranteed that all the jobs will run on the same server and thus have access to the same cache.

Note: Unfortunately, there seems to be no way of passing an arbitrary number of variables from the dotenv file to the child pipeline. As such I decided to pass a maximum of 10 tags. This can be easily changed, but I do not foresee a need to have more than 10 tags per replay-capable runner any time soon.

Note 2: The test results from the child pipeline are imported to the partner pipeline in a way loosely based on this MR: https://gitlab.com/gitlab-org/gitlab/-/merge_requests/97588. Apparently, even GitLab developers themselves need this feature and yet it doesn't exist. Any new JUnit-report-generating jobs need to be added to the list of jobs to import test results from defined in job dynamic-pipeline-test-results-collector.

Warning: All the CI rewrite for parallel replay merge requests in all the projects need to be merged before changing runner tags or adding new runners to avoid issues - the original configuration allows the replay jobs to run on any runner tagged with data-cache-storage.

Edited Feb 28, 2024 by Konrad Botor

CI rewrite for parallel replay

CI rewrite for parallel replay

Merge request prerequisites

Runner tag prerequisites

How it works

Merge request reports