Commit 664ad131 authored by Dan Notestein's avatar Dan Notestein
Browse files

Fix cache-manager cleanup race condition with LRU index

Two fixes for the LRU index corruption issue:

1. Add locking to LRU index modification in cmd_cleanup
   - The cleanup was modifying the LRU index without holding the global lock
   - This caused race conditions with _update_lru and concurrent cleanups
   - Now uses the same flock pattern as _update_lru

2. Add minimum age protection (5 minutes)
   - Prevents cleanup from deleting entries that were just created
   - Fixes the issue where async _maybe_cleanup would delete the cache
     that was just put because it was the only entry in the index

Root cause: The LRU index was corrupted/emptied due to concurrent
modifications, leaving only 1 entry. When cleanup ran, it deleted
the newest cache (thinking it was oldest) because it was the only
tracked entry, while 600GB+ of untracked tar files remained.
parent 806f23bf
Loading
Loading
Loading
Loading