# benchmark-results-collector

This repository containins a script to collect information from benchmark logs.

Pipeline is triggered by other repositories that generate data artifacts.

#

Available modes:

| Mode | Project  | `SOURCE` variable value | Supported logs                                     |
|------|----------|-------------------------|----------------------------------------------------|
| 1    | hivemind | hivemind                | request_process_times.log <br> hivemind-server.log |
| 2    | hivemind | hivemind                | hivemind-sync.log                                  |
| 3    | hived    | hived                   | replay_benchmark.json                              |

All `main.py` launch parameters and `benchmark_description` table columns:

| No. | Param                              | Type                             | Required                 | Description                                                                        | Table column name                 | Example                                               |
|-----|------------------------------------|----------------------------------|--------------------------|------------------------------------------------------------------------------------|-----------------------------------|-------------------------------------------------------|
| 1.  | -m <br> --mode                     | int (choice)                     | ✔️                       | 1 - SERVER_LOG / REQUEST_PROCESS_TIMES <br> 2 - SYNC_LOG <br> 3 - REPLAY_BENCHMARK | `N/A`                             | -m 1                                                  |
| 2.  | -j <br> --job-id                   | int                              | ✔️                       | Benchmark id <br> ex. CI_JOB_ID that produced artifacts.                           | id                                | -j 1234                                               |
| 3.  | -f <br> --file                     | string                           | ✔️                       | Source .log / .json file path.                                                     | `N/A`                             | -f hive-server.log                                    |
| 4.  | -db <br> --database-url            | string                           | ✔️                       | RFC 1738 encoded URL to the database.                                              | `N/A`                             | -db postgresql://postgres:pass@localhost:5432/results |
| 5.  | --desc                             | string                           | ❌ <br> empty str default | Benchmark description.                                                             | description                       | --desc "Hivemind CI spawn"                            |
| 6.  | --exec-env-desc                    | string                           | ❌ <br> empty str default | Execution environment description. ex. branch=$CI_COMMIT_REF_SLUG                  | execution_environment_description | --exec-env-desc branch=master                         |
| 7.  | --server-name                      | string                           | ❌ <br> empty str default | ex. $CI_RUNNER_DESCRIPTION                                                         | server_name                       | --server-name localhost                               |
| 8.  | --app-version                      | string                           | ❌ <br> empty str default | ex. $(git describe --tags)                                                         | app_version                       | --app-version 0.25.4                                  |
| 9.  | --testsuite-version                | string                           | ❌ <br> empty str default | ex. commit_short_sha=$CI_COMMIT_SHORT_SHA                                          | testsuite_version                 | --testsuite-version commit_short_sha=8b8eabe1         |
| 10. | `N/A` <br> collected automatically | timestamp <br> without time zone | `N/A`                    | Benchmark results collector run timestamp.                                         | timestamp                         | `N/A`                                                 |
| 11. | `N/A` <br> collected automatically | string                           | `N/A`                    | Hostname of the machine where  the Python interpreter is currently executing.      | runner                            | `N/A`                                                 |

## How to launch manually

To run the script manually, we must provide all required parameters.
> :warning: because the benchmark id is evaluated from $CI_JOB_ID <br>
> Remember about the unique key id `-j`, which cannot be repeated on the CI (it's best to use a value up to 200k because
> $CI_JOB_ID is currently at 240k)

Steps:

- `git clone git@gitlab.syncad.com:hive/benchmark-results-collector.git`
- `cd benchmark-results-collector`
- `pip install -e .`
- `python3 benchmark_results_collector/main.py --help`

## How to trigger manually by providing the URL to the zipped artifacts

It is possible to run the collector semi-automatically, without the need for cloning or building the entire project.
It is enough to have the URL where the data artifacts and settings saved in `variables.env` will be located.

Example usage:

1. Enter descriptive variables in the appropriate format into the file named `variable.env`
   > :warning: pay attention to the `JOB_ID`, it must be unique, it cannot appear in the database already

   ```bash
   SOURCE=hivemind
   JOB_ID=111
   DESC=test
   EXEC_ENV_DESC=branch=test
   SERVER_NAME=.env test server
   APP_VERSION=1.0
   TESTSUITE_VERSION=commit_short_sha=test
   ```

2. Pack previously created file with artifacts and share it, you need a direct link to download the archive.
3. Run pipeline in this repository as shown in the image below
   ![Run pipeline with URL variable](./screenshots/launch-with-given-url.png)

## How to trigger in CI/CD of upstream repository

In order to automatically run a `benchmark results collector` during a CI of another repository you need to write a
trigger in the upstream's repository.

Below is presented what a trigger job should look like:

```yaml
Trigger benchmark-results-collector:
  stage: benchmark
  needs: [ "1M replay and push" ]     #name of the job, which produces artifacts
  variables:
    ARTIFACTS_URL: https://gitlab.syncad.com/api/v4/projects/$CI_PROJECT_ID/jobs/$ARTIFACTS_JOB_ID/artifacts
    PRIVATE_TOKEN: $READ_API_PRIVATE_TOKEN
    #description:
    SOURCE: HIVED_CI
    JOB_ID: $ARTIFACTS_JOB_ID
    DESC: "Test CI spawn"
    EXEC_ENV_DESC: branch=$CI_COMMIT_REF_SLUG
    SERVER_NAME: $SERVER_NAME
    APP_VERSION: $APP_VERSION
    TESTSUITE_VERSION: commit_short_sha=$CI_COMMIT_SHORT_SHA
  trigger:
    project: hive/benchmark-results-collector
    branch: master
    strategy: depend
```

Also, some of the [predefined CI variables](https://docs.gitlab.com/ee/ci/variables/predefined_variables.html) cannot be
used in a trigger job and this job cannot have any script section.

To pass the variables you have to use the `dotenv` artifact.

The code below shows how to pass variables between jobs:

```yaml
  after_script:
    - echo "ARTIFACTS_JOB_ID=$CI_JOB_ID" >> variables.env
    - echo "APP_VERSION=$(git describe --tags)" >> variables.env
    - echo "SERVER_NAME=$CI_RUNNER_DESCRIPTION" >> variables.env
  artifacts:
    reports:
      dotenv: variables.env
```