Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • hard to predict what scheduled task will be executed next (yes, we have "Scheduled Tasks" grid, but due lots of information displayed there it takes time to figure that out)
  • [system log, scheduled tasks] Logging ability for Scheduled Tasks - any output produced (e.g. what data being processed, what errors might have happened) by currently running scheduled task is not logged anywhere
  • all scheduled tasks are processed one after another and if one of them crashes, then all others won't be processed as well
  • no parallelization can be applied in scheduled task execution, because they gather data to process by themselves and therefore will basically process same data if tried to execute in parallel

Solution

  • Create "ScheduledTaskData" database table with following structure:
    • Id
    • TaskClass - scheduled task class 
    • TaskData - a set of data to be processed by scheduled task class mentioned
    • PlannedStartedOn - when scheduled task will be executed
    • StartedOn - when scheduled task executed was started (start time)
    • FinishedOn - when scheduled task finished executing (even in case of error)
    • CreatedOn - when task record was created
    • RegularOutput - any output made during task handling
    • ErrorOutput - any errors happened during task handling
    • Status - {1 - Not Started, 2 - Finished Successfully, 3 - Finished with Errors}
    • ProcessId - the PID of the process, that is handling (or was handling in past) this task
  • create an AbstractScheduledTask base class with following methods:
    • generate() - will collect data to be processed (e.g. DB record IDs or file paths on disk) and create a record in ScheduledTaskData table
    • process($task_data) - will process the given task data
  • sub-class AbstractScheduledTask for each current scheduled task
  • scheduled task handling would look like this:
    1. generate task data for all scheduled tasks
    2. start processing the tasks if any (preferable in separate processes, not forks) in parallel to handle incoming task pool

...

  • single scheduled task now has all it's execution attempts recorded and we need to be able to see them all, especially ones that failed
  • when there is no data to process we won't be incrementing "Last Run On" field in each scheduled task to indicate that it at least was executed but did nothing
  • since we'll generate several task data records per scheduled task upfront, when we can, then "Next Run On" column becomes irrelevant as well
  • past scheduled task runs without errors (e.g. more than 1 month old) can be deleted automatically as well
  • if the error happened during some scheduled task handling, then main scheduled task grid should clearly indicate that and maybe we need to send e-mail to admin as well

Related Tasks