...
- hard to predict what scheduled task will be executed next (yes, we have "Scheduled Tasks" grid, but due lots of information displayed there it takes time to figure that out)
- [system log, scheduled tasks] Logging ability for Scheduled Tasks - any output produced (e.g. what data being processed, what errors might have happened) by currently running scheduled task is not logged anywhere
- all scheduled tasks are processed one after another and if one of them crashes, then all others won't be processed as well
- no parallelization can be applied in scheduled task execution, because they gather data to process by themselves and therefore will basically process same data if tried to execute in parallel
Solution
- Create "
ScheduledTaskData
" database table with following structure:- Id
- TaskClass - scheduled task class
- TaskData - a set of data to be processed by scheduled task class mentioned
- PlannedStartedOn - when scheduled task will be executed
- StartedOn - when scheduled task executed was started (start time)
- FinishedOn - when scheduled task finished executing (even in case of error)
- CreatedOn - when task record was created
- RegularOutput - any output made during task handling
- ErrorOutput - any errors happened during task handling
- Status - {1 - Not Started, 2 - Finished Successfully, 3 - Finished with Errors}
- ProcessId - the PID of the process, that is handling (or was handling in past) this task
- create an
AbstractScheduledTask
base class with following methods:- generate() - will collect data to be processed (e.g. DB record IDs or file paths on disk) and create a record in
ScheduledTaskData
table - process($task_data) - will process the given task data
- generate() - will collect data to be processed (e.g. DB record IDs or file paths on disk) and create a record in
- sub-class
AbstractScheduledTask
for each current scheduled task - scheduled task handling would look like this:
- generate task data for all scheduled tasks
- start processing the tasks if any (preferable in separate processes, not forks) in parallel to handle incoming task pool
...
- single scheduled task now has all it's execution attempts recorded and we need to be able to see them all, especially ones that failed
- when there is no data to process we won't be incrementing "Last Run On" field in each scheduled task to indicate that it at least was executed but did nothing
- since we'll generate several task data records per scheduled task upfront, when we can, then "Next Run On" column becomes irrelevant as well
- past scheduled task runs without errors (e.g. more than 1 month old) can be deleted automatically as well
- if the error happened during some scheduled task handling, then main scheduled task grid should clearly indicate that and maybe we need to send e-mail to admin as well