Choosing the right data type for a column is important for designing a good MySQL table. In general, a simple and small type is the first choice since operations over a simple and small type cost fewer CPU resources and a small type also requires less space to store or cache. However, the chosen type must have a range big enough to cover all possible values generated from the application by now or in the near future.
This note is about basic data types in MySQL, namely number, string, datetime, I'll discuss its range, store requirement, precision and few best practices. Let's begin!
The numerical types in MySQL fall into one of four categories: integer, floating-point, decimal and bit.
Data Type | storage(bytes) | signed range | unsigned range |
---|---|---|---|
TINYINT | 1 | -128, 127 | 0, 255 |
SMALLINT | 2 | -32768, 32767 | 0, 65535 |
MEDIMINT | 3 | -8388608, 8388607 | 0, 16777215 |
INT | 4 | -2147483648, 2147483647 | 0, 4294967295 |
BIGINT | 8 | -263, 263-1 | 0, 264 -1 |
FLOAT | 4 | -3.402823466E+38, -1.175494351E-38 | |
DOUBLE | 8 | -1.7976931348623157E+308, -2.2250738585072014E-308 | |
DECIMAL(M, D) | varies | varies, 0 < M <=65, 0 < D <=30, M is total number of digits and D is the number of digits after the decimal point |
Table 1 A simple summary for basic numerical types
Floating-point type like FLOAT
and DOUBLE
store approximate numeric data
values, while DECIMAL
store exact numeric data values. Here is an
illustrative example:
mysql> CREATE TABLE mytable(
-> id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
-> point FLOAT,
-> salary DECIMAL(10,4)
-> );
Query OK, 0 rows affected (0.01 sec)
mysql> insert into mytable (point, salary) VALUE (0.1, 0.1);
Query OK, 1 row affected (0.01 sec)
mysql> select point, point = 0.1, point*10, salary, salary = 0.1, salary*10 from mytable where id = 1;
+-------+-------------+--------------------+--------+--------------+-----------+
| point | point = 0.1 | point*10 | salary | salary = 0.1 | salary*10 |
+-------+-------------+--------------------+--------+--------------+-----------+
| 0.1 | 0 | 1.0000000149011612 | 0.1000 | 1 | 1.0000 |
+-------+-------------+--------------------+--------+--------------+-----------+
1 row in set (0.00 sec)
Example 1 Differences between FLOAT
and DECIMAL
The reason for the excepted results(point != 0.1 and point*10=1.0000000149011612)
is that the point column is a FLOAT
type which is based on IEEE
floating-point standard[3] and not designed to represent exact numeric
values, when 0.1 is inserted, it's stored as a value very close to
0.1(maybe 0.100000001) but never the exact 0.1.
Even though the floating-point errors is really small but not negligible in a
few situations for instance finance. Under that circumstances, DECIMAL
is the
right type to use but not the only type. In practice, people tends to use
BIGINT
to store the financial data in a small unit like cent or micro-cent.
The following definition of id column causes some misunderstanding.
CREATE TABLE (
id INT(8) NOT NULL AUTO_INCREMENT,
name CHAR(8),
salary DECIMAL(8),
);
Example 2 peculiar syntax of INT(M)
CHAR(8)
specifies a string and its maximum character number is 8
DECIMAL(8)
specifies an exact number and its total number of digits is 8
INT(8)
specifies an integer and its display width is 8
In other words, M
in INT(M)
has nothing to do with its range or storage,
INT(M)
is always a 4-byte integer.
Type | Storage(bytes) | Maximum Length |
---|---|---|
CHAR(M) | M*w | 0<=M<=255 characters |
VARCHAR(M) | L+1 or L+2 | 0<=M<=65535 bytes |
TINYTEXT | L+1 | L < 28 bytes |
TEXT | L+2 | L < 216 bytes |
MEDIUMTEXT | L+4 | L < 224 bytes |
LONGTEXT | L+4 | L < 232 bytes |
Table 2 A simple Summary of string types
There are two big categories of string types: CHAR
and VARCHAR
for relatively
short string and TEXT
for relatively long string. It's easy to choose an
appropriate string type based on the length requirement. But the maximum
length and storage requirement is sophisticated and depend on several
factors[1]. I'll explain it in details.
CHAR
is fixed-length string, all you store is string data. By contrast,
VARCHAR
is variable-length, the actual length is stored before the string
data.
For example, if we store the string sql
and hello
into a CHAR(8)
and
VARCHAR(8)
, their storage layout is as following:
0 1 2 3 4 5 6 7
CHAR(8) |h|e|l|l|o| | | | <-- 8 bytes with 3 right-padded spaces
CHAR(8) |s|q|l| | | | | | <-- 8 bytes with 5 right-padded spaces
VARCHAR(8) |5|h|e|l|l|0| <-- 1+5=6 bytes
VARCHAR(8) |3|s|q|l| <-- 1+3=4 bytes
^
|
string length
Example 3 Layout of sql and hello in CHAR(8) and VARCHAR(8)
Apart from this, trailing spaces are removed from CHAR columns upon retrieval.
The maximum length of a CHAR(M)
column is M character. The storage required
depends on M and the character set you use, to put it simple, the number of
characters(length) and the number of bytes for a single character. The length
can be any value 0 to 255, the number of bytes for a single character varies
over different character sets and even different within the same character
set(for instance: utf-8).
The M
in VARCHAR(M)
is not about character numbers but specifies the
maximum bytes of the string. The maximum of M
is subject to the maximum row
size (65,535 bytes, which is shared among all columns) and the character set
used.
The TEXT
type is used to store large strings, the four TEXT
types are:
TINYTEXT
, TEXT
, MEDIUMTEXT
, LONGTEXT
. These four types only differ in
the maximum length. The maximum length of TINYTEXT
is 255=28-1 and only
requires 1 byte to store the length, the maximum length of TEXT
is
65535=216-1 and requires 2 bytes to store the length, and so on.
As you can see, LONGTEXT
can store a string even exceeds aforementioned
maximum row size 65535 due to the "external" storage area for a large string
content, TEXT
columns only contribute 9 to 12 bytes toward the row size limit.
Due to the possible large length of TEXT
, there are some constrains for them.
For instance, you must specify an index prefix length for indexes on TEXT
;
Only the first 1024(in default settings) bytes of the the column are used when
sorting.
Due to the large length issue of TEXT
, you also should take care when dealing
with it. Because retrieve a row with large TEXT
from disk and cache it is
"expensive". Avoid it when possible, don't store large string in a database.
Instead, write them to a separate object data store and use the table to track
the location. Or you can put the LONGTEXT
column to a different table other
than the actively used table.
The maximum characters a LONGTEXT
column can hold is subject to lots of
factors: character set, the configured maximum packet size in the client/server
protocol, available memory, constraints from different part of the whole
system....
+------+ +---------+ +----------+ +--------+ +--------+
| | | | | MySQL | | MySQL | | |
| DISK | <--> | Storage | <--> | internal | <--> | client | <-----> | Client |
| | | Engine | | API | | API | | |
+------+ +---------+ +----------+ +--------+ +--------+
Figure 1 Over-simplified MySQL architecture
Type | Storage(bytes) 5.6.4 version | Range |
---|---|---|
YEAR | 1 | 1901 to 2155 |
DATE | 3 | 1000-01-01 to 9999-12-31 |
TIME | 3+fractional seconds storage | -838:59:59 to 838:59:59 |
DATETIME | 5+fractional seconds storage | 1000-01-01 00:00:00 to 9999-12-31 23:59:59 |
TIMESTAMP | 4+fractional seconds storage | 1970-01-01 00:00:01 to 2038-01-19 03:14:07 UTC |
Fractional Seconds Precision | Storage(bytes) |
---|---|
0 | 0 |
1,2 | 1 |
3,4 | 2 |
5,6 | 3 |
Table 3 A simple summary for Data and time
A DATETIME
type is good, but it doesn't contain any timezone information
which is very likely to cause errors in an international circumstances. So you
may choose the TIMESTAMP
type since it's UTC based but the maximum supported
time is roughly 2038. So in practice, you can just store a 64bit timestamp in a
BIGINT
type.
Other types that's not mentioned above:
BIT
BINARY
, BINARY
, BLOB
Some Key takeaways from this notes:
FLOAT
represents approximate value and DECIMAL
represents exact value
int(M)
specify its display width, not storage bytes
CHAR(M)
is fixed-size M characters while storage for VARCHAR
is variable
VARCHAR
is subject to the row size limit 65535 bytes which is
shared among all columns
TEXT
is expensive, avoid it when possible
1. MySQL 8.0 Reference Manual: Data Type Storage Requirements 2. MySQL 8.0 Reference Manual: Date Types 3. Wikipeida. IEEE 754
Written by Songziyu @China Nov. 2023