Add NFS cache support for CI pipeline
Summary
This MR adds comprehensive NFS cache support for cross-builder CI execution with all cache-manager bug fixes from HAF MR !722.
Stability Verification
3 consecutive passing pipelines achieved:
Key Changes
Cache Manager Refactoring
-
Centralized NFS operations: All NFS interactions now go through
cache-manager.shfrom HAF submodule - Removed manual fallback logic: Eliminated ~70-130 lines of duplicate tar extraction/permission code per repo
-
Generic path support: HAF's
copy_datadir.shnow supports all cache types via regex pattern matching
Critical Bug Fixes (from HAF develop)
- Tablespace symlink conversion disabled: PostgreSQL requires absolute paths, conversion to relative paths broke setup scripts
- haf_pipeline cache type support: Added to permission relaxation and tar exclusion conditions
-
NFS directory permissions:
chmod 777after mkdir to prevent permission-denied lock failures (UID mapping issues) -
Local cache permissions:
chmod 777for builder/cache/directories (same umask issue) - hivemind_sync permission fix: Proper permission handling for hivemind sync operations
- copy_datadir.sh tar fallback fix: Tar extraction fallback for service containers
Architecture
-
Service containers: Use HAF's
copy_datadir.shwith NFS tar fallback (local → wait/retry → NFS) -
CI jobs: Use
cache-manager.sh get/putfor all cache operations - Wait-retry logic: 30s default wait before NFS fallback allows local cache sharing between jobs on same builder
- Tar format: All NFS cache stored as tar archives for atomic operations and faster NFS writes
Technical Details
HAF Submodule
- Now pointing to HAF develop (9611e8909) with all cache-manager fixes
- Contains all cache-manager work: initial implementation + permission/tablespace fixes
- HAF_COMMIT variable synchronized with submodule and include ref
CI Configuration
- Validation job ensures HAF_COMMIT, include ref, and submodule are all in sync
- Service containers properly extract NFS tar archives for sync data
- Test jobs use NFS path for DATA_SOURCE when local cache unavailable
Testing
All test patterns verified:
- Sync job with NFS cache storage
- Test jobs with NFS cache retrieval
- Cross-builder execution (jobs run on different builders than sync)
- Service container tar extraction
Edited by Dan Notestein