diff --git a/FAQ.md b/FAQ.md index 05b92ac818f6a..caf43da05f9ad 100644 --- a/FAQ.md +++ b/FAQ.md @@ -1052,6 +1052,6 @@ When you execute the `LOAD DATA LOCAL` statement but the MySQL client does not a The solution is to use the `--local-infile=1` option when you start the MySQL client. For example, use command like `mysql --local-infile=1 -u root -h 127.0.0.1 -P 4000`. The default value of `local-infile` is different in different versions of MySQL client, therefore you need to configure it in some MySQL clients and do not need to configure it in some others. -#### ERROR 9001 (HY000): PD server timeoutstart timestamp may fall behind safepoint +#### ERROR 9001 (HY000): PD server timeout start timestamp may fall behind safe point This error occurs when TiDB fails to access PD. A worker in the TiDB background continuously queries the safepoint from PD and this error occurs if it fails to query within 100s. Generally it is because the PD failure or network failure between TiDB and PD. For the details of common errors, see [Error Number and Fault Diagnosis](sql/error.md). diff --git a/README.md b/README.md index 3ad35c4f2079b..8701406e7571f 100644 --- a/README.md +++ b/README.md @@ -71,6 +71,7 @@ - [Error Codes and Troubleshooting](sql/error.md) - [Compatibility with MySQL](sql/mysql-compatibility.md) - [TiDB Memory Control](sql/tidb-memory-control.md) + - [Slow Query Log](sql/slow-query.md) + Advanced Usage - [Read Data From History Versions](op-guide/history-read.md) - [Garbage Collection (GC)](op-guide/gc.md) diff --git a/op-guide/tidb-config-file.md b/op-guide/tidb-config-file.md index eadb42a3821ab..2a1eaaa5340fb 100644 --- a/op-guide/tidb-config-file.md +++ b/op-guide/tidb-config-file.md @@ -50,6 +50,12 @@ Configuration about log. - Default: false - If you set the value to true, the log does not output timestamp +### `slow-query-file` + +- The file name of the slow query log +- Default: "" +- After you set it, the slow query log is output to this file separately + ### `slow-threshold` - To output the threshold value of consumed time in the slow log diff --git a/sql/slow-query.md b/sql/slow-query.md new file mode 100644 index 0000000000000..0a7378cced0f8 --- /dev/null +++ b/sql/slow-query.md @@ -0,0 +1,97 @@ +--- +title: Slow Query Log +summary: Use the slow query log to identify problematic SQL statements. +category: user guide +--- + +# Slow Query Log + +The slow query log is a record of SQL statements that took a long time to perform. + +A problematic SQL statement can increase the pressure on the entire cluster, resulting in a longer response time. To solve this problem, you can use the slow query log to identify the problematic statements and thus improve the performance. + +### Obtain the log + +By `grep` the keyword `SLOW_QUERY` in the log file of TiDB, you can obtain the logs of statements whose execution time exceeds [slow-threshold](../op-guide/tidb-config-file.md#slow-threshold). + +You can edit `slow-threshold` in the configuration file and its default value is 300ms. If you configure the [slow-query-file](../op-guide/tidb-config-file.md#slow-query-file), all the slow query logs will be written in this file. + +### Usage example + +``` +2018/08/20 19:52:08.632 adapter.go:363: [warning] [SLOW_QUERY] cost_time:18.647928814s +process_time:1m6.768s wait_time:12m11.212s backoff_time:600ms request_count:2058 +total_keys:1869712 processed_keys:1869710 succ:true con:3 user:root@127.0.0.1 +txn_start_ts:402329674704224261 database:test table_ids:[31],index_ids:[1], +sql:select count(c) from sbtest1 use index (k_1) +``` + +### Fields description + +This section describes fields in the slow query log based on the usage example above. + +#### `cost_time` + +The execution time of this statement. Only the statements whose execution time exceeds [slow-threshold](../op-guide/tidb-config-file.md#slow-threshold) output this log. + +#### `process_time` + +The total processing time of this statement in TiKV. Because data is sent to TiKV concurrently for execution, this value might exceed `cost_time`. + +#### `wait_time` + +The total waiting time of this statement in TiKV. Because the Coprocessor of TiKV runs a limited number of threads, requests might queue up when all threads of Coprocessor are working. When a request in the queue takes a long time to process, the waiting time of the subsequent requests will increase. + +#### `backoff_time` + +The waiting time before retry when this statement encounters errors that require a retry. The common errors as such include: lock occurs, Region split, the TiKV server is busy. + +#### `request_count` + +The number of Coprocessor requests that this statement sends. + +#### `total_keys` + +The number of keys that Coprocessor has scanned. + +#### `processed_keys` + +The number of keys that Coprocessor has processed. Compared with `total_keys`, `processed_keys` does not include the old versions of MVCC or the MVCC `delete` marks. A great difference between `processed_keys` and `total_keys` indicates that the number of old versions are relatively large. + +#### `succ` + +Whether the execution of the request succeeds or not. + +#### `con` + +Connection ID (session ID). For example, you can use the keyword `con:3` to `grep` the log whose session ID is 3. + +#### `user` + +The name of the user who executes this statement. + +#### `txn_start_ts` + +The start timestamp of the transaction, that is, the ID of the transaction. You can use this value to `grep` the transaction-related logs. + +#### `database` + +The current database. + +#### `table_ids` + +The IDs of the tables involved in the statement. + +#### `index_ids` + +The IDs of the indexes involved in the statement. + +#### `sql` + +The SQL statement. + +### Identify problematic SQL statements + +Not all of the `SLOW_QUERY` statements are problematic. Only those whose `process_time` is very large will increase the pressure on the entire cluster. + +The statements whose `wait_time` is very large and `process_time` is very small are usually not problematic. The large `wait_time` is because the statement is blocked by real problematic statements and it has to wait in the execution queue, which leads to a much longer response time. diff --git a/tools/tikv-control.md b/tools/tikv-control.md index 144ea870682d8..c183beb76a857 100644 --- a/tools/tikv-control.md +++ b/tools/tikv-control.md @@ -243,7 +243,7 @@ success! Use the `recover-mvcc` command in circumstances where TiKV cannot run normally caused by MVCC data corruption. It cross-checks 3 CFs ("default", "write", "lock") to recover from various kinds of inconsistency. -Use the `--regions` option to specify involved Regions by `region_id`. Use the `--pd`option to specify PD endpoints. +Use the `--regions` option to specify involved Regions by `region_id`. Use the `--pd` option to specify PD endpoints. ```bash $ tikv-ctl --db /path/to/tikv/db recover-mvcc --regions 1001,1002 --pd 127.0.0.1:2379 diff --git a/trouble-shooting.md b/trouble-shooting.md index 3e29392e6f763..f9986f4bbc2ca 100644 --- a/trouble-shooting.md +++ b/trouble-shooting.md @@ -90,7 +90,8 @@ Before starting the process, make sure the result of `ulimit -n` is large enough ## Database access times out and the system load is too high -Provide the following information: +First, check the [SLOW-QUERY](./sql/slow-query.md) log and see if it is because of some inappropriate SQL statement. +If you failed to solve the problem, provide the following information: + The deployment topology - How many `tidb-server`/`pd-server`/`tikv-server` instances are deployed?