Fix cache-manager cleanup race condition with LRU index
Two fixes for the LRU index corruption issue:
1. Add locking to LRU index modification in cmd_cleanup
- The cleanup was modifying the LRU index without holding the global lock
- This caused race conditions with _update_lru and concurrent cleanups
- Now uses the same flock pattern as _update_lru
2. Add minimum age protection (5 minutes)
- Prevents cleanup from deleting entries that were just created
- Fixes the issue where async _maybe_cleanup would delete the cache
that was just put because it was the only entry in the index
Root cause: The LRU index was corrupted/emptied due to concurrent
modifications, leaving only 1 entry. When cleanup ran, it deleted
the newest cache (thinking it was oldest) because it was the only
tracked entry, while 600GB+ of untracked tar files remained.
Loading
Please sign in to comment