.avif)
.avif)
Partitioning in TiDB
As data volumes expand, managing massive tables becomes a significant hurdle in distributed databases. Without efficient organization, queries slow down, and scaling becomes resource-intensive.
TiDB, a distributed MySQL-compatible HTAP database, addresses this through Table Partitioning. This feature is tightly integrated with TiKV, TiDB's distributed storage layer, allowing specific ranges of data to be processed independently.
In this post, we explore how partitions work internally and detail the various partition types currently supported by TiDB.
How Partitioning Works Internally in TiDB
Before reviewing specific types, it is important to understand the mechanics under the hood.
- Key Mapping: Partitions are essentially logical ranges of keys. TiDB maps these partition ranges to specific TiKV regions (the basic unit of storage).
- Optimizer Logic: When a SQL query runs, the optimizer analyzes the conditions to identify relevant partitions.
- Partition Pruning: Irrelevant partitions are ignored entirely. The optimizer scans only the partitions that match the query criteria. This process, known as Partition Pruning, dramatically reduces I/O and latency.
Partitioning Types :
TiDB currently supports Range, Range COLUMNS, List, List COLUMNS, Hash, and Key partitioning.
Note: If an unsupported partitioning type from MySQL is used, TiDB may return an error or ignore the definition depending on the version compatibility.
1. Range Partition
In Range partitioning, each partition contains rows where the partitioning expression value falls within a specific range. These ranges are defined using the VALUES LESS THAN expression.
CREATE TABLE employees (
id INT NOT NULL,
name VARCHAR(30),
dept VARCHAR(30),
job_code INT,
dept_id INT NOT NULL
)
PARTITION BY RANGE (dept_id) (
PARTITION p0 VALUES LESS THAN (5),
PARTITION p1 VALUES LESS THAN (10),
PARTITION p2 VALUES LESS THAN (15),
PARTITION p3 VALUES LESS THAN (20)
);From the above, all corresponding data to employees whose dept_id is 1 to 4 is stored in p0, 5 to 9 is stored in p1, and so on. Range partitioning requires the partitions to be ordered from lowest to highest.
Handling Out-of-Bounds Data:
Partitions must be ordered from lowest to highest. If you attempt to insert a dept_id of 25, the insert will fail because no range covers it. To handle this, use MAXVALUE.
CREATE TABLE employees (
id INT NOT NULL,
name VARCHAR(30),
dept VARCHAR(30),
job_code INT,
dept_id INT NOT NULL
)
PARTITION BY RANGE (dept_id) (
PARTITION p0 VALUES LESS THAN (5),
PARTITION p1 VALUES LESS THAN (10),
PARTITION p2 VALUES LESS THAN (15),
PARTITION p3 VALUES LESS THAN (20),
PARTITION p4 VALUES LESS THAN MAXVALUE
);MAXVALUE represents an integer value that is larger than all other integer values. Now, all records whose dept_idis equal to or larger than 20 are stored in the p4 partition.
Use case:
Use RANGE partitioning when data naturally grows in ordered numeric ranges, such as department IDs, invoice numbers, or incremental business IDs.
Example:
Query employees from a specific department range:
SELECT * FROM employees WHERE dept_id BETWEEN 5 AND 9;Result: TiDB scans only partition p1 to fetch the data.
2. Range COLUMNS partitioning
Range COLUMNS partitioning is a variation of Range partitioning. We can use one or more columns as partition keys. This supports the following data types (INT , CHAR , VARCHAR , DATE & DATETIME). Apart from these data types , everything else will not be supported.
CREATE TABLE users (
id BIGINT,
country VARCHAR(50),
joined_date DATE
)
PARTITION BY RANGE COLUMNS (signup_date) (
PARTITION p1 VALUES LESS THAN ('2023-01-01'),
PARTITION p2 VALUES LESS THAN ('2024-01-01')
);Use case:
Using RANGE COLUMNS partitioning for time-based data like user signups, orders, or logs where queries frequently filter by date.
Example:
Fetch the data from users table who joined in 2023
SELECT * FROM users WHERE joined_date < '2024-01-01';From the example , Only partition p1 is scanned due to partition pruning.
3.RANGE INTERVAL Partitioning
Introduced in TiDB v6.3.0, this extension allows you to easily generate evenly spaced partitions without manually writing every VALUES LESS THAN clause.
Example:
CREATE TABLE employee (
id int unsigned NOT NULL,
name varchar(30),
dept varchar(30),
job_code int,
store_id int NOT NULL
) PARTITION BY RANGE (id)
INTERVAL (20) FIRST PARTITION LESS THAN (20) LAST PARTITION LESS THAN (100) MAXVALUE PARTITION TiDB automatically creates the table structure as follows:
tidb:4000> show create table employee \G
--------------
show create table employee
--------------
*************************** 1. row ***************************
Table: employee
Create Table: CREATE TABLE `employee` (
`id` int unsigned NOT NULL,
`name` varchar(30) DEFAULT NULL,
`dept` varchar(30) DEFAULT NULL,
`job_code` int DEFAULT NULL,
`store_id` int NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
PARTITION BY RANGE (`id`)
(PARTITION `P_LT_20` VALUES LESS THAN (20),
PARTITION `P_LT_40` VALUES LESS THAN (40),
PARTITION `P_LT_60` VALUES LESS THAN (60),
PARTITION `P_LT_80` VALUES LESS THAN (80),
PARTITION `P_LT_100` VALUES LESS THAN (100),
PARTITION `P_MAXVALUE` VALUES LESS THAN (MAXVALUE))
1 row in set (0.00 sec)Use case:
Using RANGE INTERVAL , TiDB can auto-generate evenly spaced partitions when IDs or timestamps grow continuously
Example:
Adding a record in employee table with growing ID’s
INSERT INTO employee VALUES (45, 'Prem', 'IT', 101, 3); In the above example , data is automatically inserted in P_LT_60 Partition
4. List partitioning
List partitioning assigns rows to partitions based on a specific set of values defined using PARTITION ... VALUES IN (...). Unlike Range, these values do not need to be contiguous.
CREATE TABLE users(
id BIGINT,
region VARCHAR(20)
)
PARTITION BY LIST (region) (
PARTITION p_asia VALUES IN ('IN', 'CN', 'JP'),
PARTITION p_eu VALUES IN ('DE', 'FR', 'UK')
);Use case :
LIST partitioning is used when data is classified into a fixed set of recognized categories, such as regions, countries, or business units.
Example:
Fetch the data from users table only from ‘IN’ region.
SELECT * FROM users WHERE region = 'IN';Result: Only partition p_asia is scanned.
5. List COLUMNS partitioning
List partitioning has a variation called List COLUMNS partitioning. Multiple columns can be used as partition keys. The columns in the text, DATE, and DATETIME data types can be used as partition columns in addition to the integer data type.
Example:
CREATE TABLE emp_details (
id INT NOT NULL,
name VARCHAR(30),
dept VARCHAR(30),
hired DATE NOT NULL DEFAULT '2018-01-01',
store_id INT,
city VARCHAR(15)
)
PARTITION BY LIST COLUMNS(city) (
PARTITION pRegion_1 VALUES IN('Chennai', 'Mumbai', 'Hyderabad'),
PARTITION pRegion_2 VALUES IN('Kolkata', 'Goa', 'Bengaluru')
);6. HASH partitioning
Hash partitioning divides a large table into smaller partitions by applying a hash function to a specified column (like an ID), distributing data evenly across partitions to balance write loads, prevent hotspots, and improve performance for high-write scenarios without clear data rangesUnlike range partitioning, in hash partitioning you just need to specify the number of partitions, whereas in range you have to specify the column values for each partition.
In order to create a table with a HASH partition, you just add PARTITION BY HASH(expr) at the end of create table statement.
Example:
CREATE TABLE employee (
id INT NOT NULL,
name VARCHAR(30),
dept VARCHAR(30),
hired DATE NOT NULL DEFAULT '2018-01-01',
store_id INT
)
PARTITION BY HASH(store_id)
PARTITIONS 4;Note: if the PARTITIONS num is not specified, the default number of partitions is 1.
Use case :
HASH Partitions can be used for high-write workloads where data access is random and even distribution is more important than range filtering
Example:
Insert concurrent employee records:
INSERT INTO employee VALUES (101, 'Alex', 'HR', '2024-01-10', 12);In the above example, TiDB hashes store_id and distributes rows evenly across partitions.
7. Key partitioning
Key Partitioning a method to distribute data across multiple partitions using a hash function on one or more specified columns
A PARTITION BY KEY (columnList) must be added to the CREATE TABLE statement in order to build a key-partitioned table. A column list containing one or more column names is called columnList. Each column in the list may be any data type, with the exception of BLOB, JSON, and GEOMETRY (note that GEOMETRY is not supported by TiDB).
Additionally, you may need to append the definition of the partition names or PARTITIONS num, where num is a positive number that indicates how many partitions a table is divided into.
Example:
CREATE TABLE employee (
id INT NOT NULL,
name VARCHAR(30),
dept VARCHAR(30),
hired DATE NOT NULL DEFAULT '2018-01-01',
store_id INT
)
PARTITION BY KEY(store_id)
PARTITIONS 4;Note : if the PARTITIONS num is not specified, the default number of partitions is 1.
Use case :
Use KEY partitioning when you want TiDB to automatically hash complex or composite keys for balanced data distribution.
Example:
Fetch the data from employee table based on store
SELECT * FROM employee WHERE store_id = 8; In the above example, TiDB hashes the key and directly accesses the relevant partition.
Global Keys with Partitioned Tables in TiDB
A Global Key (usually a Global Index or Primary Key) is a critical concept in TiDB. It spans all partitions of a table, ensuring uniqueness across the entire dataset, not just within a single partition.
Why Global Keys Are Important
When a table is partitioned:
- Data is physically split across partitions
- Without a global index, duplicate values can exist across different partitions
- Queries using non-partition keys may need to scan multiple partitions
Global keys solve this by:
- Enforcing global uniqueness
- Enabling fast point lookups
- Simplifying application logic
Example:
CREATE TABLE orders (
order_id BIGINT PRIMARY KEY,
order_date DATE,
amount DECIMAL(10,2)
)
PARTITION BY RANGE COLUMNS(order_date) (
PARTITION p2024_01 VALUES LESS THAN ('2024-02-01'),
PARTITION p2024_02 VALUES LESS THAN ('2024-03-01')
);From the above example:
- order_id is globally unique across all partitions
- TiDB maintains a global primary key index
- Point queries are efficient
Use case :
Example:
Fetch the data from orders table based on order_id
SELECT * FROM orders WHERE order_id = 987654321;In this scenario, TiDB uses the global primary key index to locate the row immediately, bypassing the need to check all date-based partitions.
Partitioning Management:
Add partition:
ALTER TABLE employee ADD PARTITION (PARTITION `p0` VALUES LESS THAN (200));Drop Partition
ALTER TABLE employee DROP PARTITION p0; Truncate Partition
ALTER TABLE employee TRUNCATE PARTITION p0; Reorg Partition
Reorganize partition is used to merge, split and restructuring of existing partitions to adapt to changing data distribution online.
Split Partition
ALTER TABLE employee REORGANIZE PARTITION `p0to20` INTO
(PARTITION p10 VALUES LESS THAN (10),
PARTITION p15 VALUES LESS THAN (15),
PARTITION p20 VALUES LESS THAN (20),
PARTITION pMax VALUES LESS THAN (MAXVALUE));Merge Partition
ALTER TABLE orders
REORGANIZE PARTITION p2023_01, p2023_02, p2023_03 INTO (
PARTITION p2023_Q1 VALUES LESS THAN ('2023-04-01'));Partitioning in TiDB is a powerful tool for optimizing large-scale distributed tables. By selecting the right partition type—whether Range for history data, List for categories, or Hash for load balancing—you can significantly reduce query latency and operational overhead. With features like Global Keys and online partition reorganization, TiDB ensures that scaling storage does not compromise performance or data integrity.
Struggling with large table performance or complex partitioning strategies? The Mydbops team provides expert TiDB consulting and Remote DBA services to ensure your distributed database scales efficiently.

.avif)

.avif)

.avif)
.avif)