/
Diagnostic features for Semaphores [5.2.2-B1]

Diagnostic features for Semaphores [5.2.2-B1]

In-Portal is using temporary table for data editing. This allows user to work on snapshot of the data being edited at the moment, when he/she opened editing page. Upon clicking on "Save" button the data from temporary table replaces data from live table.

When several users working with same In-Portal section in parallel there can be cases, when several users hit "Save" button at same time. To avoid data integrity issues a semaphore system was created, that doesn't allow to save data from same section for 2+ users at the same time.

If UserB is trying it save data while UserA is trying to do the same, then:

  • when UserA data saving time is less than 30 seconds, then UserB will wait that time and try to save data again
  • when UserA data saving time is more than 30 seconds (probably an SQL error happened), then UserB will see following message:
Copying operation in Temporary tables has failed. Please contact website administrator.

The system does prevent data corruption, but unfortunately it doesn't log much of diagnostic information that later can be used to prevent SQL error from happening in the first place.

Proposing to:

  1. each time above message is displayed add a "Fatal Error" type record to "System Log" so that we know, that somebody is experiencing a problem
  2. along with each semaphore specify not only "unit prefix" and "timestamp", but also:
    1. complete backtrace
    2. IDs of records (from temp table) that were copied
  3. for each stale semaphore:
    1. drop it
    2. add a record in System Log about it

Solution

  1. when somebody hits a semaphore and sees error message log it into "System Log" with "Error" Log Level - 0.5h
  2. add following columns to "Semaphores" table: - 0.5h
    • MainIDs - db type: text; comma-separated list of IDs
    • UserId - db type: int
    • IpAddress - db type: varchar(15)
    • Hostname - db type: varchar(255)
    • LogRequestURI - db type: varchar(255)
    • Backtrace - db type: longtext;
  3. add following setting in the "System Settings" sub-section in the "Configuration > Website > Advanced" section: - 0.5h
    • name: "SemaphoreLifetime"
    • title: "Semaphore Inactivity Timeout (seconds)"
    • default: 300 (means 5 minutes)
  4. create "adm:OnDeleteStaleSemaphores" event, that will: - 1.5h
    • get records from "Semaphores" table, where "Semaphores.Timestamp + {SemaphoreLifetime} > UNIX_TIMESTAMP()"
    • if none found, then exit
    • for each found record:
      • create record in "SystemLog" table (using "$this->Application->log" method) where:
        • the "setLogField" method will be used to replace "LogHostname", "LogRequestSource", "LogRequestURI", "LogUserId", "IpAddress" and "LogSessionKey" field values with one, that Semaphore has
        • the "addTrace" method will be used to add trace from semaphore
        • the "setUserData" method will be used to store: value of "MainIDs" field, semaphore creation time, semaphore removal time
        • "Log Level" would be "error"
        • "Message" would be "Stale semaphore discovered"
        • goal is to make that record look like it was created along with semaphore creation
      • delete semaphore record
  5. turn "adm:OnDeleteStaleSemaphores" event into a scheduled task, that will run each 5 minutes - 0.5h

Quote: 3.5h*1.4=5h

Related Discussions

Related Tasks

INP-1569 - Getting issue details... STATUS