Add NFS cache support for CI pipeline

Summary

This MR adds comprehensive NFS cache support for cross-builder CI execution with all cache-manager bug fixes from HAF MR !722.

Stability Verification

3 consecutive passing pipelines achieved:

Key Changes

Cache Manager Refactoring

  • Centralized NFS operations: All NFS interactions now go through cache-manager.sh from HAF submodule
  • Removed manual fallback logic: Eliminated ~70-130 lines of duplicate tar extraction/permission code per repo
  • Generic path support: HAF's copy_datadir.sh now supports all cache types via regex pattern matching

Critical Bug Fixes (from HAF develop)

  1. Tablespace symlink conversion disabled: PostgreSQL requires absolute paths, conversion to relative paths broke setup scripts
  2. haf_pipeline cache type support: Added to permission relaxation and tar exclusion conditions
  3. NFS directory permissions: chmod 777 after mkdir to prevent permission-denied lock failures (UID mapping issues)
  4. Local cache permissions: chmod 777 for builder /cache/ directories (same umask issue)
  5. hivemind_sync permission fix: Proper permission handling for hivemind sync operations
  6. copy_datadir.sh tar fallback fix: Tar extraction fallback for service containers

Architecture

  • Service containers: Use HAF's copy_datadir.sh with NFS tar fallback (local → wait/retry → NFS)
  • CI jobs: Use cache-manager.sh get/put for all cache operations
  • Wait-retry logic: 30s default wait before NFS fallback allows local cache sharing between jobs on same builder
  • Tar format: All NFS cache stored as tar archives for atomic operations and faster NFS writes

Technical Details

HAF Submodule

  • Now pointing to HAF develop (9611e8909) with all cache-manager fixes
  • Contains all cache-manager work: initial implementation + permission/tablespace fixes
  • HAF_COMMIT variable synchronized with submodule and include ref

CI Configuration

  • Validation job ensures HAF_COMMIT, include ref, and submodule are all in sync
  • Service containers properly extract NFS tar archives for sync data
  • Test jobs use NFS path for DATA_SOURCE when local cache unavailable

Testing

All test patterns verified:

  • Sync job with NFS cache storage
  • Test jobs with NFS cache retrieval
  • Cross-builder execution (jobs run on different builders than sync)
  • Service container tar extraction
Edited by Dan Notestein

Merge request reports

Loading