MySQL Basics

MySQL Execution Process#

As we can see, the architecture of MySQL is divided into two layers: Server Layer and Storage Engine Layer.

The Server Layer is responsible for establishing connections, analyzing, and executing SQL. Most of MySQL's core functional modules are implemented here, including connectors, query caches, parsers, preprocessors, optimizers, executors, etc. Additionally, all built-in functions (such as date, time, mathematical, and encryption functions) and all cross-storage engine functionalities (such as stored procedures, triggers, views, etc.) are implemented in the Server Layer.
The Storage Engine Layer is responsible for data storage and retrieval. It supports multiple storage engines such as InnoDB, MyISAM, Memory, etc., and different storage engines share a single Server Layer. The most commonly used storage engine now is InnoDB, which became the default storage engine for MySQL starting from version 5.5. The index data structure we often refer to is implemented by the Storage Engine Layer, and different storage engines support different types of indexes. For example, InnoDB supports B+ tree index types, which are used by default, meaning that the primary key index and secondary index created in the data table use B+ tree indexes by default.

Step 1: Connector#

The work of the connector:

Establish a connection with the client through a TCP three-way handshake;
Verify the client's username and password; if either the username or password is incorrect, an error will be reported;
If both the username and password are correct, it will read the user's permissions, and subsequent permission logic judgments will be based on the permissions read at that time;

How to check how many clients are connected to the MySQL service?

Execute show processlist

Do idle connections occupy resources indefinitely?

Controlled by the wait_timeout parameter, the default value is 8 hours (28880 seconds). If an idle connection exceeds this time, the connector will automatically disconnect it.

Is there a limit on the number of MySQL connections?

Controlled by the max_connections parameter.

How to solve the memory issue caused by long connections?

Regularly disconnect long connections.
The client actively resets the connection.
MySQL version 5.7 implemented the mysql_reset_connection() function interface. When the client performs a large operation, it can call the mysql_reset_connection function in the code to reset the connection, achieving memory release. This process does not require reconnection and re-permission verification but restores the connection to the state just after it was created.

Step 2: Query Cache#

When the client sends an SQL statement to the MySQL service, after receiving the SQL statement, MySQL will parse the first field of the SQL statement to see what type of statement it is.
If the SQL is a query statement (select statement), MySQL will first check the query cache (Query Cache) for cached data to see if this command has been executed before. This query cache is stored in memory in a key-value format, where the key is the SQL query statement and the value is the result of the SQL statement query.
If the queried statement hits the query cache, it will directly return the value to the client. If the queried statement does not hit the query cache, it will continue to execute, and once executed, the query result will be stored in the query cache.
However, for tables that are frequently updated, the hit rate of the query cache is very low because as long as a table has an update operation, the query cache for that table will be cleared. Therefore, MySQL version 8.0 directly removed the query cache.

Step 3: Parse SQL#

Before officially executing the SQL query statement, MySQL will first parse the SQL statement, a task handled by the "parser."

Parser#

The parser will perform the following two tasks.
The first task is lexical analysis. MySQL will identify keywords based on the input string and construct an SQL syntax tree, making it easier for subsequent modules to obtain SQL types, table names, field names, where conditions, etc.
The second task is syntax analysis. Based on the results of lexical analysis, the syntax parser will determine whether the input SQL statement meets MySQL syntax rules.
If the SQL statement we input has incorrect syntax, an error will be reported at the parser stage.
However, note that the existence of a table or field is not checked in the parser.

Step 4: Execute SQL#

Preprocessor#

Let's first discuss what happens during the preprocessing stage.

Check whether the tables or fields in the SQL query statement exist;
Expand the * symbol in select * to all columns on the table;

Optimizer#

After the preprocessing stage, an execution plan must be formulated for the SQL query statement, a task handled by the "optimizer."
The optimizer is mainly responsible for determining the execution plan of the SQL query statement. For example, when there are multiple indexes in a table, the optimizer will decide which index to use based on the cost of the query.

Executor#

After the optimizer has determined the execution plan, MySQL will actually start executing the statement, a task completed by the "executor." During the execution process, the executor will interact with the storage engine, and the interaction is done on a record-by-record basis.

Primary Key Index Query#

select * from product where id = 1;

This query statement uses the primary key index for its query condition and is an equality query. Since the primary key id is unique, there will not be records with the same id, so the optimizer decides to use the access type as const for the query, meaning it will query a record using the primary key index. The execution process between the executor and the storage engine is as follows:

The executor's first query will call the function pointed to by the read_first_record function pointer. Since the optimizer chose the access type as const, this function pointer is directed to the InnoDB engine's index query interface, passing the condition id = 1 to the storage engine to locate the first record that meets the condition.
The storage engine locates the first record with id = 1 through the B+ tree structure of the primary key index. If the record does not exist, it will report an error to the executor indicating that the record cannot be found, and the query ends. If the record exists, it will return the record to the executor;
After the executor reads the record from the storage engine, it will check whether the record meets the query condition. If it does, it will send it to the client; if not, it will skip that record.
The executor's query process is a while loop, so it will query again. However, since this is not the first query, it will call the function pointed to by the read_record function pointer. Since the optimizer chose the access type as const, this function pointer points to a function that always returns -1, so when this function is called, the executor exits the loop, ending the query.

Full Table Scan#

select * from product where name = 'iphone';

This query statement does not use an index for its query condition, so the optimizer decides to use the access type as ALL for the query, meaning it will perform a full table scan. The execution process between the executor and the storage engine is as follows:

The executor's first query will call the function pointed to by the read_first_record function pointer. Since the optimizer chose the access type as all, this function pointer is directed to the InnoDB engine's full scan interface, allowing the storage engine to read the first record in the table;
The executor will check whether the name of the read record is iphone. If not, it will skip it; if it is, it will send the record to the client (yes, every time the Server layer reads a record from the storage engine, it sends it to the client. The reason the client displays all records directly is that it waits until the query statement is completed before displaying all records).
The executor's query process is a while loop, so it will query again, calling the function pointed to by the read_record function pointer. Since the optimizer chose the access type as all, the read_record function pointer still points to the InnoDB engine's full scan interface, so it continues to ask the storage engine layer to read the next record after the previous one. After the storage engine retrieves the next record, it returns it to the executor (Server layer), and the executor continues to check the condition, skipping the record if it does not meet the query condition, otherwise sending it to the client;
This process repeats until the storage engine has read all records in the table and returns the completion information to the executor (Server layer);
The executor receives the completion information reported by the storage engine and exits the loop, stopping the query.

Index Condition Pushdown#

Index condition pushdown can reduce the secondary index's back table operations during queries, improving query efficiency by offloading some responsibilities from the Server layer to the Storage Engine layer.

select * from t_user where age > 20 and reward = 100000;

When a composite index encounters a range query (> or <), it will stop matching, meaning that the age field can utilize the composite index, but the reward field cannot.
So, when not using index condition pushdown (versions before MySQL 5.6), the execution process between the executor and the storage engine is as follows:

The Server layer first calls the storage engine's interface to locate the first secondary index record that meets the query condition, which means locating the first record where age > 20;
After the storage engine quickly locates this record based on the secondary index's B+ tree, it retrieves the primary key value and then performs a back table operation to return the complete record to the Server layer;
The Server layer checks whether the reward of that record equals 100000. If so, it sends it to the client; otherwise, it skips that record;
Next, it continues to ask the storage engine for the next record. After locating the record in the secondary index, the storage engine retrieves the primary key value and performs a back table operation to return the complete record to the Server layer;
This process continues until the storage engine has read all records in the table.

As we can see, without index condition pushdown, every time a secondary index record is queried, a back table operation must be performed, and the record is returned to the Server, which then checks whether the reward equals 100000.
With index condition pushdown, the task of checking whether the reward equals 100000 is handed over to the storage engine layer, as follows:

The Server layer first calls the storage engine's interface to locate the first secondary index record that meets the query condition, which means locating the first record where age > 20;
After locating the secondary index, the storage engine does not perform a back table operation immediately but first checks whether the condition (reward equals 100000) for the columns included in that index (the reward column) holds. If the condition does not hold, it will directly skip that secondary index. If it holds, it will perform a back table operation to return the complete record to the Server layer.
The Server layer checks whether other query conditions (there are no other conditions in this query) hold. If they do, it sends the record to the client; otherwise, it skips that record and requests the next record from the storage engine.
This process continues until the storage engine has read all records in the table.

As we can see, with index condition pushdown, although the reward column cannot utilize the composite index, it is included in the composite index (age, reward), so the storage engine filters out records that meet reward = 100000 before performing the back table operation to retrieve the entire record. Compared to not using index condition pushdown, this saves a lot of back table operations.
When you see "Using index condition" in the Extr part of the execution plan, it indicates that index condition pushdown is being used.

How are MySQL records stored?#

Which file does MySQL data reside in?#

MySQL data is stored on disk, specifically in files, discussed here in the context of the InnoDB storage engine.
There are three files, which represent:

db.opt, used to store the default character set and character collation rules for the current database.
t_order.frm, the table structure of t_order is stored in this file. A .frm file is generated for every table created in MySQL, which is used to store the metadata information of each table, mainly including the table structure definition.
t_order.ibd, the table data of t_order is stored in this file. Table data can either reside in a shared tablespace file (filename: ibdata1) or in an exclusive tablespace file (filename: table_name.ibd). This behavior is controlled by the innodb_file_per_table parameter. If the innodb_file_per_table parameter is set to 1, the stored data, indexes, and other information will be stored separately in an exclusive tablespace. Starting from MySQL version 5.6.6, its default value is 1, so from this version onward, the data of each table in MySQL is stored in a separate .ibd file.

Now we know that the data of a database table is stored in the file named "table_name.ibd," which is also called an exclusive tablespace file.

What is the structure of the tablespace file?#

A tablespace consists of segments, extents, pages, and rows. The logical storage structure of the InnoDB storage engine is roughly as follows:

Row#

Records in a database table are stored by row, and each row record has a different storage structure based on its row format. This will be detailed later.

Page#

Records are stored by row, but database reads are not done by "row." Otherwise, a single read (i.e., a single I/O operation) would only handle one row of data, which would be very inefficient.
Therefore, InnoDB reads and writes data by "page," meaning that when a record needs to be read, the entire page is read into memory rather than reading the row record from disk.
The default size of each page is 16KB, which guarantees a maximum of 16KB of contiguous storage space.
Pages are the smallest unit of disk management for the InnoDB storage engine, meaning that every read and write operation in the database is done in units of 16KB, with at least 16K of content being read from disk into memory and at least 16K of content being flushed from memory to disk at a time.
There are many types of pages, with common ones being data pages, undo log pages, overflow pages, etc. Row records in a data table are managed using "data pages."

Extent#

We know that the InnoDB storage engine organizes data using B+ trees.
The leaf nodes of the B+ tree are connected by a doubly linked list. If storage space is allocated by page, the physical positions of two adjacent pages in the linked list may not be contiguous and could be very far apart, leading to a lot of random I/O during disk queries, which is very slow.
This issue can be easily solved by ensuring that the physical positions of adjacent pages in the linked list are also adjacent, allowing for sequential I/O, which significantly improves performance during range queries (scanning leaf nodes).
So how is this specifically solved?
When the data volume in the table is large, space for a certain index is allocated not by pages but by extents. Each extent is 1MB in size, meaning that 64 contiguous pages (for 16KB pages) will be allocated as one extent, ensuring that adjacent pages in the linked list are also physically adjacent, allowing for sequential I/O.

Segment#

A tablespace is composed of various segments, which consist of multiple extents. Segments are generally divided into data segments, index segments, and rollback segments, etc.

Index Segment: A collection of extents that store the non-leaf nodes of the B+ tree;
Data Segment: A collection of extents that store the leaf nodes of the B+ tree;
Rollback Segment: A collection of extents that store rollback data. We previously discussed that MVCC utilizes rollback segments to implement multi-version query data.

What are the row formats in InnoDB?#

InnoDB provides four row formats: Redundant, Compact, Dynamic, and Compressed row formats.
The Redundant row format is rarely used now, so we will focus on the Compact row format, as Dynamic and Compressed row formats are very similar to Compact.

What does the COMPACT row format look like?#

A complete record is divided into two parts: "extra information of the record" and "actual data of the record."

Extra Information of the Record#

The extra information of the record contains three parts: variable-length field length list, NULL value list, and record header information.

Variable-Length Field Length List

The difference between varchar(n) and char(n) is well understood: char is fixed-length, while varchar is variable-length. The actual length (size) of the stored data for variable-length fields is not fixed.
Therefore, when storing data, the size occupied by the data must also be stored in the "variable-length field length list," allowing the corresponding length of data to be read based on this "variable-length field length list" when retrieving data. Other variable-length fields such as TEXT, BLOB, etc., are implemented in the same way.

Why is the information in the "variable-length field length list" stored in reverse order?

Because the pointer in the "record header information" points to the position between the next record's "record header information" and "actual data." The benefit of this is that reading left gives the record header information, while reading right gives the actual data, which is convenient.
Similarly, the NULL value list information also needs to be stored in reverse order.

Does every database table's row format have a "variable-length field byte size list"?

When a data table has no variable-length fields, such as all int type fields, the row format in the table will not have a "variable-length field length list."

What is the maximum value of n in varchar(n)?

MySQL stipulates that, except for large object types like TEXT and BLOB, the total byte length occupied by all columns (excluding hidden columns and record header information) cannot exceed 65535 bytes.
The n in varchar(n) represents the maximum number of characters that can be stored, not the byte size. To calculate the maximum byte size that varchar(n) can store, one must also consider the character set of the database table, as the character set indicates how many bytes each character occupies. For example, in the ASCII character set, 1 character occupies 1 byte, so varchar(100) means it can store a maximum of 100 bytes of data.
If there are multiple fields, it must be ensured that the total length of all fields + the byte size occupied by the variable-length field byte size list + the byte size occupied by the NULL value list <= 65535.

NULL Value List

If there are columns that allow NULL values, each column corresponds to a binary bit (bit), and the binary bits are arranged in reverse order according to the column order.

When the binary bit value is 1, it indicates that the column's value is NULL.
When the binary bit value is 0, it indicates that the column's value is not NULL.

Additionally, the NULL value list must be represented using an integer number of bytes (1 byte = 8 bits). If the number of binary bits used is not sufficient for an integer number of bytes, the high bits of the byte are filled with 0.

Does every database table's row format have a "NULL value list"?

When all fields in a data table are defined as NOT NULL, the row format in the table will not have a NULL value list.
Therefore, when designing a database table, it is generally recommended to set fields as NOT NULL, which can save at least 1 byte of space (the NULL value list occupies at least 1 byte of space).

Is the "NULL value list" fixed at 1 byte of space? If so, how is it represented when a record has 9 fields, all of which are NULL?

The space for the "NULL value list" is not fixed at 1 byte.
When a record has 9 fields that are all NULL, a "NULL value list" of 2 bytes will be created, and so on.

Record Header Information

The record header information contains a lot of content, and I won't list them all here, but I will mention a few important ones:

delete_mask: Indicates whether this data has been deleted. From this, we can see that when we execute delete to remove records, the records are not actually deleted; instead, the delete_mask of this record is marked as 1.
next_record: The position of the next record. From this, we can see that records are organized through a linked list. As mentioned earlier, it points to the position between the next record's "record header information" and "actual data," making it convenient to read left for record header information and right for actual data.
record_type: Indicates the type of the current record, where 0 represents a normal record, 1 represents a non-leaf node record of the B+ tree, 2 represents the minimum record, and 3 represents the maximum record.

Actual Data of the Record#

The actual data part of the record includes not only the fields we defined but also three hidden fields: row_id, trx_id, and roll_pointer. Let's look at what these three fields are.

row_id: If we specified a primary key or unique constraint column when creating the table, there will be no hidden row_id field. If neither a primary key nor a unique constraint is specified, InnoDB will add a hidden row_id field to the record. The row_id is not mandatory and occupies 6 bytes.
trx_id: Transaction ID, indicating which transaction generated this data. The trx_id is mandatory and occupies 6 bytes.
roll_pointer: The pointer to the previous version of this record. The roll_pointer is mandatory and occupies 7 bytes.

The trx_id and roll_pointer are related to the MVCC mechanism.

How does MySQL handle row overflow?#

The basic unit of interaction between disk and memory in MySQL is the page, which is generally 16KB or 16384 bytes. However, a varchar(n) type column can store up to 65532 bytes, and some large objects like TEXT and BLOB may store even more data. In such cases, a single page may not be able to store a record, leading to row overflow, where excess data is stored in another "overflow page."
If a data page cannot accommodate a record, the InnoDB storage engine will automatically store the overflow data in an "overflow page." Generally, InnoDB data is stored in "data pages." However, when row overflow occurs, the overflow data will be stored in "overflow pages."
When row overflow occurs, only part of the data for that column will be saved in the actual data section of the record, while the remaining data will be placed in the "overflow page." The actual data section will store a 20-byte pointer to the address of the overflow page, allowing the remaining data to be found. This is roughly illustrated as follows:

The above shows how the Compact row format handles row overflow.
The Compressed and Dynamic row formats are very similar to Compact, with the main difference being how they handle overflow data.
These two formats adopt a complete row overflow method, where the actual data section does not store part of the data for that column but only stores a 20-byte pointer to the overflow page. The actual data is stored entirely in the overflow page, looking like this: