fix: add retry logic to PostgreSQL check in extract-test-cache.sh

Summary

  • Add retry loop to the PostgreSQL check before cache extraction
  • Default 15s timeout (configurable via POSTGRES_CHECK_TIMEOUT)
  • Prevents race condition when HAF restarts PostgreSQL during initialization

Problem

HAF service containers restart PostgreSQL during initialization (after applying CI config or installing extensions). The single-shot pg_isready check could fail if it ran during this ~3 second restart window, causing the script to:

  1. Think PostgreSQL wasn't running
  2. Delete the cache data directory
  3. Re-extract cache (which is now corrupted because PostgreSQL was using it)

This caused intermittent failures in hivemind's e2e_benchmark_on_postgrest job.

Solution

Replace the single pg_isready check with a retry loop that waits up to 15 seconds before deciding PostgreSQL isn't running. This gives HAF time to complete its restart cycle.

Test plan

  • Verify script syntax with shellcheck
  • Test with hivemind pipeline that previously failed

Merge request reports

Loading