Page MenuHomeIn-Portal Phabricator

INP-1835 - Auto-retry database queries, that stumbled upon a lock/deadlock
Changes PlannedPublic

Authored by alex on Dec 2 2022, 10:19 AM.

Details

Reviewers
erik
Test Plan

Part 1 - existing install new settings testing

  1. on the existing In-Portal installation:
    1. go to .../core/install.php URL
    2. confirm, that the "Installation Maintenance" step is shown (if perform the recommended steps shown on the screen)
    3. enter the login/password of the "root" user
    4. select the "Update Database Configuration" option
    5. press the "Continue" button
    6. confirm, that these new settings are now shown on the configuration screen and they're not empty:
      • Database Error Backoff Max Retry Attempts
      • Database Error Backoff Logic Base Time (in milliseconds)
      • Database Lock Retry Debugging Enabled
  2. specify empty values for "Database Error Backoff*" settings
    1. press the "Continue" button
    2. confirm, that the validation error was shown
  3. change values for all of the settings to non-empty values
    1. press the "Continue" button
    2. press ENTER in the Web Browser Location field
    3. confirm, that the "Installation Maintenance" step is shown
    4. enter the login/password of the "root" user
    5. select the "Update Database Configuration" option
    6. press the "Continue" button
    7. confirm, that previously specified settings retained their new values

Part 2 - new install new settings testing

  1. try installing a clean In-Portal
  2. confirm, that these new settings are now shown on the "Database Configuration" step and they're not empty:
    • Database Error Backoff Max Retry Attempts
    • Database Error Backoff Logic Base Time (in milliseconds)
    • Database Lock Retry Debugging Enabled
  3. specify empty values for "Database Error Backoff*" settings
    1. press the "Continue" button
    2. confirm, that the validation error was shown
  4. change values for all of the settings to non-empty values
    1. press the "Continue" button
    2. press ENTER in the Web Browser Location field
    3. confirm, that the "System Requirements Check" step is shown
    4. follow the installation instructions until you reach the "Database Configuration" step
    5. confirm, that previously specified settings retained their new values

Part 3 - preparations

  1. in the Admin Console:
    • go to the ConfigurationWebsiteAdvanced section
    • specify e-mail in the Send User-defined "System Log" messages to system setting
    • select Immediate option in the Email Delivery system setting
    • save changes
  2. in the phpMyAdmin (or other database client):
    • login with a user with a SUPER privilege (e.g. root)
    • execute SET GLOBAL innodb_lock_wait_timeout = 10 database query
    • change LanguageLabels engine type to InnoDB
  3. in the PhpStorm:
    • create the /dl_test.php file as a copy from the /index.php file
    • in the "/dl_test.php" file replace $application->Run(); line with this code:
    • in the /core/units/logs/system_logs/system_logs_config.php file change declaration of the LogTimestamp field to include 'time_format' => 'H:i:s',
$user = $application->GetVar('user');

if ( $user === false ) {
    throw new LogicException('Please specify the "user" request variable.');
}

$cookie_flag_name = 'user_' . $user . '_window_opened';

if ( !array_key_exists($cookie_flag_name, $_COOKIE) ) {
    echo '<br/><br/>Ready for reload.';
    setcookie($cookie_flag_name, 1);
    exit;
}

// Lock.
if ( $user === 'a' ) {
    $application->Conn->Query('START TRANSACTION'); // Step 1.
    $application->Conn->Query('SELECT * FROM ' . TABLE_PREFIX . 'LanguageLabels WHERE PhraseId = 1 LOCK IN SHARE MODE'); // Step 2.
    sleep(2 * 60); // delay to keep transaction open and ensure lock presence in "user=b" process.
}
elseif ( $user === 'b' ) {
    sleep(5); // delay to allow "step 1" and "step 2" from "user=a" process to execute.
    $application->Conn->Query('START TRANSACTION'); // Step 3.

    /*
     * Lock starts at below line, because we're attempting to delete
     * a record, that was selected within a transaction in another
     * process.
     *
     * InnoDB will wait 10 seconds in hopes, that lock can be acquired
     * and then throws an error upon which we issue a delay and retry a
     * query.
     *
     * Each retry is logged into a System Log. Wait time with default
     * configuration:
     * 1. initial query lock waiting: 10 sec
     * 2. 1st retry delay: (100*2^1)ms = 200ms
     * 3. waiting for lock of 1st retry: 10 sec
     * 4. 2nd retry delay: (100*2^2)ms = 400ms
     * 5. waiting for lock of 2nd retry: 10 sec
     * 6. 3rd retry delay: (100*2^3)ms = 800ms
     * 7. waiting for lock of 3rd retry: 10 sec
     *
     * Total wait time: 41 sec 400 ms + 5 seconds initial delay for user=b.
     */
    $application->Conn->Query('DELETE FROM ' . TABLE_PREFIX . 'LanguageLabels WHERE PhraseId = 1'); // Step 4.
}

// Deadlock.
if ( $user === 'c' ) {
    // the "user c" start the process
    $application->Conn->Query('START TRANSACTION'); // Step 1.
    sleep(5);
    $application->Conn->Query('UPDATE ' . TABLE_PREFIX . 'LanguageLabels SET l1_Translation = "Active C" WHERE PhraseId = 1'); // Step 3.

    /*
     * Lock happens here first.
     */
    $application->Conn->Query('UPDATE ' . TABLE_PREFIX . 'LanguageLabels SET l1_Translation = "Add C" WHERE PhraseId = 2'); // Step 5.
}
elseif ( $user === 'd' ) {
    $application->Conn->Query('START TRANSACTION'); // Step 2.
    $application->Conn->Query('UPDATE ' . TABLE_PREFIX . 'LanguageLabels SET l1_Translation = "Add D" WHERE PhraseId = 2'); // Step 4.

    /*
     * Deadlock happens here then.
     */
    $application->Conn->Query('UPDATE ' . TABLE_PREFIX . 'LanguageLabels SET l1_Translation = "Active D" WHERE PhraseId = 1'); // Step 6.
}

Part 4 - regular lock retries

  1. in the PhpStorm:
    1. enable the "DBEnableLockRetryDebugging" setting in the /system/config.php file
  2. in the Web Browser
    1. delete the user_a_window_opened and user_b_window_opened cookies
    2. open the /dl_test.php?user=a in 1st tab
    3. confirm, that page showed Ready for reload.
    4. open the /dl_test.php?user=b in 2nd tab
    5. confirm, that page showed Ready for reload.
    6. select both tabs and via context menu (right clicking) reload both of them simultaneously
    7. wait 2.5 minutes maximum for both tab to finish loading
  3. confirm, that:
    • in Web Browser:
      • 1st tab:
        • finished loading in approximately 2 minutes
        • has nothing shown except debugger toolbar
      • 2nd tab:
        • finished loading in approximately 1 minute
        • shows debugger with an Lock wait timeout exceeded; try restarting transaction (1205) SQL error about DELETE FROM LanguageLabels WHERE PhraseId = 1 database query
    • in the Logs & ReportsSystem Log section of the Admin Console:
      • when executing script with DBEnableLockRetryDebugging setting enabled:
        • 4 records were added with Lock wait timeout exceeded; try restarting transaction message
        • each record was added with approximately 10 seconds delay
        • 1st, 2nd and 3rd record represent errored query retry attempts and they show delay and retry attempt number in the User Data field of the system log record
        • 4th record shows just an error without retry information
      • when executing script with DBEnableLockRetryDebugging setting disabled:
        • 1 record was added with Lock wait timeout exceeded; try restarting transaction message and no retry information
    • in e-mail client:
      1. an e-mail was received about each retry record added to the System Log
  4. disable the "DBEnableLockRetryDebugging" setting in the /system/config.php
  5. repeat the test, but this time no records about lock retries should be made

Part 5 - deadlock retries (incomplete)

  1. enable debug mode in the Web Browser
  2. enable the "DBEnableLockRetryDebugging" setting in the /system/config.php
  3. place breakpoint inside each IF statement of the dl_test.php file
  4. open the /dl_test.php?user=c in 1st tab
  5. open the /dl_test.php?user=d in 2nd tab
  6. execute code line by line from the breakpoint as shown in comments (e.g. "Step 1" in 1st debug tab of PhpStorm, then switch to 2nd debug tab of PhpStorm and execute "Step 2" and so on)
  7. confirm, that:
    • a regular lock happened
    • after all scripts are finished detailed lock retry info is shown in the system log
  8. disable the "DBEnableLockRetryDebugging" setting in the /system/config.php
  9. repeat the test, but this time no records about lock retries should be made

Part 6 - cleaning up

  1. execute SET GLOBAL innodb_lock_wait_timeout = 50 database query by MySQL user with a SUPER privilege (e.g. root)

Diff Detail

Repository
rINP In-Portal
Branch
/in-portal/branches/5.2.x
Lint
Lint ErrorsExcuse: Not fixing old CS issues in this release.
SeverityLocationCodeMessage
Errorcore/install.php:537PHPCS.E.CodingStandard.Arrays.Array.SpaceAfterKeywordCodingStandard.Arrays.Array.SpaceAfterKeyword
Errorcore/install.php:537PHPCS.E.Generic.PHP.LowerCaseKeyword.FoundGeneric.PHP.LowerCaseKeyword.Found
Errorcore/install.php:543PHPCS.E.CodingStandard.WhiteSpace.ControlStructureSpacing.NoLineBeforeOpenCodingStandard.WhiteSpace.ControlStructureSpacing.NoLineBeforeOpen
Errorcore/install.php:543PHPCS.E.CodingStandard.WhiteSpace.ControlStructureSpacing.SpaceBeforeCloseBraceCodingStandard.WhiteSpace.ControlStructureSpacing.SpaceBeforeCloseBrace
Errorcore/install.php:543PHPCS.E.CodingStandard.WhiteSpace.ControlStructureSpacing.SpacingAfterOpenBraceCodingStandard.WhiteSpace.ControlStructureSpacing.SpacingAfterOpenBrace
Errorcore/install.php:544PHPCS.E.CodingStandard.WhiteSpace.ControlStructureSpacing.SpaceBeforeCloseBraceCodingStandard.WhiteSpace.ControlStructureSpacing.SpaceBeforeCloseBrace
Errorcore/install.php:544PHPCS.E.CodingStandard.WhiteSpace.ControlStructureSpacing.SpacingAfterOpenBraceCodingStandard.WhiteSpace.ControlStructureSpacing.SpacingAfterOpenBrace
Errorcore/install.php:1764PHPCS.E.Squiz.Scope.MethodScope.MissingSquiz.Scope.MethodScope.Missing
Errorcore/kernel/db/db_connection.php:371PHPCS.E.CodingStandard.Commenting.DocComment.TagValueIndentCodingStandard.Commenting.DocComment.TagValueIndent
Errorcore/kernel/db/db_connection.php:371PHPCS.E.CodingStandard.Commenting.FunctionComment.MissingParamCommentCodingStandard.Commenting.FunctionComment.MissingParamComment
Errorcore/kernel/db/db_connection.php:372PHPCS.E.CodingStandard.Commenting.DocComment.TagValueIndentCodingStandard.Commenting.DocComment.TagValueIndent
Errorcore/kernel/db/db_connection.php:372PHPCS.E.CodingStandard.Commenting.FunctionComment.MissingParamCommentCodingStandard.Commenting.FunctionComment.MissingParamComment
Errorcore/kernel/db/db_connection.php:372PHPCS.E.CodingStandard.Commenting.FunctionComment.SpacingAfterParamTypeCodingStandard.Commenting.FunctionComment.SpacingAfterParamType
Errorcore/kernel/db/db_connection.php:373PHPCS.E.CodingStandard.Commenting.DocComment.NonParamGroupCodingStandard.Commenting.DocComment.NonParamGroup
Errorcore/kernel/db/db_connection.php:373PHPCS.E.CodingStandard.Commenting.FunctionComment.InvalidReturnCodingStandard.Commenting.FunctionComment.InvalidReturn
Errorcore/kernel/db/db_connection.php:375PHPCS.E.Squiz.Commenting.FunctionCommentThrowTag.MissingSquiz.Commenting.FunctionCommentThrowTag.Missing
Errorcore/kernel/db/db_connection.php:417PHPCS.E.Generic.CodeAnalysis.EmptyStatement.DetectedCatchGeneric.CodeAnalysis.EmptyStatement.DetectedCatch
Errorcore/kernel/db/db_connection.php:482PHPCS.E.CodingStandard.WhiteSpace.ControlStructureSpacing.SpaceBeforeCloseBraceCodingStandard.WhiteSpace.ControlStructureSpacing.SpaceBeforeCloseBrace
Errorcore/kernel/db/db_connection.php:482PHPCS.E.CodingStandard.WhiteSpace.ControlStructureSpacing.SpacingAfterOpenBraceCodingStandard.WhiteSpace.ControlStructureSpacing.SpacingAfterOpenBrace
Errorcore/kernel/db/db_connection.php:1296PHPCS.E.Squiz.WhiteSpace.FunctionSpacing.AfterLastSquiz.WhiteSpace.FunctionSpacing.AfterLast
Errorcore/kernel/db/db_load_balancer.php:147PHPCS.E.CodingStandard.Arrays.Array.SpaceAfterKeywordCodingStandard.Arrays.Array.SpaceAfterKeyword
Errorcore/kernel/db/db_load_balancer.php:147PHPCS.E.Generic.PHP.LowerCaseKeyword.FoundGeneric.PHP.LowerCaseKeyword.Found
Errorcore/kernel/utility/system_config.php:86PHPCS.E.CodingStandard.Strings.ConcatenationSpacing.NoSpaceAfterCodingStandard.Strings.ConcatenationSpacing.NoSpaceAfter
Errorcore/kernel/utility/system_config.php:86PHPCS.E.CodingStandard.Strings.ConcatenationSpacing.NoSpaceAfterCodingStandard.Strings.ConcatenationSpacing.NoSpaceAfter
Errorcore/kernel/utility/system_config.php:86PHPCS.E.CodingStandard.Strings.ConcatenationSpacing.NoSpaceBeforeCodingStandard.Strings.ConcatenationSpacing.NoSpaceBefore
Unit
No Unit Test Coverage
Build Status
Buildable 1168
Build 1168: arc lint + arc unit

Event Timeline

alex created this revision.Dec 2 2022, 10:19 AM
alex requested review of this revision.Dec 2 2022, 10:19 AM
alex edited the test plan for this revision. (Show Details)Dec 2 2022, 10:24 AM
alex added a project: Restricted Project.
alex edited the test plan for this revision. (Show Details)Dec 2 2022, 10:26 AM
alex updated this revision to Diff 1126.Dec 2 2022, 10:34 AM

Added an explanatory comments.

alex edited the test plan for this revision. (Show Details)Dec 2 2022, 11:35 AM
alex updated this revision to Diff 1127.Dec 2 2022, 12:20 PM

Send e-mail upon retry.

alex updated this revision to Diff 1128.Dec 2 2022, 12:22 PM

Send e-mail upon retry (for real).

erik added a comment.Dec 2 2022, 12:42 PM

@alex trouble in part3 1 G - cant't get "detailed lock retry info is shown in the system log".

alex added a comment.Dec 2 2022, 2:30 PM
In D439#8859, @erik wrote:

@alex trouble in part3 1 G - cant't get "detailed lock retry info is shown in the system log".

@erik ,

  1. Can you please be more specific:
    1. no records are added to the System Log
    2. records, that are added don't contain retry information in the User Data field

?

  1. have you enabled retry info collection in the /system/config.php as written in the test plan?
  2. is the System Log enabled /system/config.php?
erik added a comment.Dec 5 2022, 5:10 AM

On processing line with "Step 4." comment, i have lock much more then 50 seconds. Practically, it never ends. In case, when lock continue, in other debug window go to step 5 - lock ends, but no system log record about this lock created.

1A - no records are added to the System Log

  1. Yes, $_CONFIG['Database']['DBEnableLockRetryDebugging'] = '1';
  2. Yes, system log is emabled - $_CONFIG['Misc']['SystemLogMaxLevel'] = '5';
alex added a comment.Jan 25 2023, 5:00 AM
In D439#8859, @erik wrote:

@alex trouble in part3 1 G - cant't get "detailed lock retry info is shown in the system log".

Debugging on Erik's machine revealed very strange xDebug behavior upon lock error detection: it shows the correct 1205 error initially, but on the if ( in_array($this->errorCode, $recoverable_errors) ) { line the error code suddenly becomes 0 and no logging/auto-retry ever happens.

Alex needs to rewrite test case to use sleep(...) constructs to emulate delays instead of using xDebug.

In D439#8861, @erik wrote:

On processing line with "Step 4." comment, i have lock much more then 50 seconds. Practically, it never ends. In case, when lock continue, in other debug window go to step 5 - lock ends, but no system log record about this lock created.

1A - no records are added to the System Log

  1. Yes, $_CONFIG['Database']['DBEnableLockRetryDebugging'] = '1';
  2. Yes, system log is emabled - $_CONFIG['Misc']['SystemLogMaxLevel'] = '5';

I'll investigate that, but most like the 50 (50 seconds) delay value of the innodb_lock_wait_timeout MySQL variable combined with 3 retries creates the impression of the script freeze.

alex edited the test plan for this revision. (Show Details)Jan 25 2023, 5:02 AM
alex edited the test plan for this revision. (Show Details)
alex edited the test plan for this revision. (Show Details)Jan 27 2023, 1:36 AM
alex edited the test plan for this revision. (Show Details)Jan 27 2023, 4:08 AM
alex updated this revision to Diff 1129.Jan 27 2023, 4:14 AM

Bugfix: logging retry attempts reset retry counter resulting in indefinite retry attempts.

alex edited the test plan for this revision. (Show Details)Jan 27 2023, 4:28 AM
alex edited the test plan for this revision. (Show Details)Jan 29 2023, 9:01 AM
erik requested changes to this revision.Feb 3 2023, 1:46 AM

Tried test part 4. No messages in system log given. Maybe, because of START TRANSACTION SQL command. Transaction not committed, so no records created in DB.

This revision now requires changes to proceed.Feb 3 2023, 1:46 AM
alex planned changes to this revision.Feb 3 2023, 2:18 AM

Need to:

  • finalize the plan for dead lock testing
  • may add test plan for reconnect testing