Feature #78
Dynamic client-side file deletion
| Status: | Closed | Start: | 07/12/2010 | |
|---|---|---|---|---|
| Priority: | Normal | Due date: | ||
| Assigned to: | jbk | % Done: | 100% |
|
| Category: | Core | |||
| Target version: | - | Estimated time: | 8.00 hours |
Description
The BOINC clients seem to re-download the sticky files and only delete them when directly asked to do so.
Add support in the scheduler to delete files that are known not to have been in use for a while.
This will replace the current hardcoded mechanism.
Related issues
| related to Renderfarming.net - BURP - Bug #80: Per-part input files do not make any sense | Closed | 08/07/2010 |
Associated revisions
Added downstream patch from Olivier Romand from Renderfarm.fi related to deleting files that are no longer used on the clients, related to #78
Change to use the queue_active_files table in the scheduler, related to #78
Use the correct field name when marking files active, related to #78
Added the ActiveFileQueueHandler which tracks live files and removes stale ones from the list of active files so that they can be deleted on the clients, fixes #78
SQL fixes related to #78
History
Updated by jbk about 1 year ago
- File delete_file.patch.tar added
Patch received from Olivier Romand on Renderfarm.fi
Updated by jbk about 1 year ago
Tested on BURP-main, seems to work great - I'll just quote myself for reference here:
A good start!
Actually the point about using a separate table for keeping track of active files is to keep files on the clients that are used frequently – rather than just those files which are in use right now. Obviously the currently active files are a subset of these files.
With the Sunflower release [snip] this will become pretty important since libraries will be able to be used across many sessions – even with some time between those sessions where the files are not actively used.
The way it will work: 1) When a session is accepted all its input files are added to the list of active files (if not already there) with a unix timestamp set to NOW. This table is the non-existant queue_active_files and has a file id and a timestamp. This table may need additional info to provide a quick-lookup index for the scheduler. 2) A (handler) daemon checks every hour for files in that table which are older than NOW-MAX_AGE and removes them. These are files which have not been referenced by any session in MAX_AGE seconds and can be safely removed from the client storage. The age should be stored in the Configuration class as “storage.maxAge”. 3) The scheduler, upon a connection from a client, checks the table table and requests the client to remove any files which are no longer in queue_active_files
This should ensure that popular libraries stay on the clients while infrequently used libraries and input files get purged within MAX_AGE (which would typically be 2 months or so).
(For now it would also make sense to remove the primary input file immediately once sessions complete, since they are not reused).
We tested the nasty case were a file would be mark as deleted even though some hosts are still computing: the file is requested to be deleted but won’t be until the end of the workunit.
Very nice, this is the kind of borderline tests that are very valuable to know about!
Updated by jbk about 1 year ago
- Assigned to set to jbk
- % Done changed from 0 to 30
Updated by jbk 12 months ago
- % Done changed from 30 to 80
Now only requires new sessions to poke the files they use - and remove the files they KNOW that noone will use after they have completed.
Updated by jbk 11 months ago
This is related to #80 because the frame inputfile information is currently being made available during the preprocessing step but do not yet have per-frame files attached.