Textual Data Types in MySQL: Understanding CHAR, VARCHAR and TEXT
A look at the range, storage and use cases.
Prerequisites
Here is a quick list of terminologies you need to know to get the most out of this article:
Byte - A unit of memory size
Character set - A complete set of characters/symbols and their number codes that can be recognised by a computer system. The default MySQL server character sets are
utf8mb4
andutf8mb4_0900_ai_ci
Non-Binary data - This is the source data in the form of alphabets, digits or special characters which are provided by the user.
Binary data - This is the data in form of 0s and 1s that is understood by computers. When a character that makes up a string is stored (non-binary), it is its representative number code that is actually stored as a binary number.
Introduction
A data type is a classification that specifies which type of value a database field/column has and what type of mathematical, relational and logical operations can be applied to it without causing an error. For example, a field that has its values specified as any of the textual data types cannot allow for mathematical operations to be applied to it.
Textual data types are also known as string data types. They define a linear sequence of characters, words, numbers or other data.
Textual data types in MySQL include:
CHAR
VARCHAR
TEXT
BINARY
VARBINARY
BLOB
ENUM
SET
But for this article, you are going to familiarize yourself with the first three. CHAR
, VARCHAR
and TEXT
data types are probably the most common string data types in MySQL that you are going to use for most of your database design projects.
Why choose the right data type?
It is important to understand and know all about these data types because to store data in a database management system like MySQL, you need to specify a field and a data type that defines the values in that field. The field stores the value entered and the data type validates that the value is as expected. This ensures that the data is collected in the preferred format and that the value of each field is interpreted by the computer as expected.
Choosing the right data type also allows for efficient storage. This is because database fields are allocated appropriate data types that cater to their character length. Allocating the wrong data type can cause a lot of unnecessary memory to be preserved which is wasteful.
Textual Data Types
The CHAR
type
Values in CHAR
columns are Fixed-Length Character Strings. The length can be specified as a value from 0 to 255. CHAR
types are declared with a length in parentheses that indicates the fixed number of characters you want to store. For example, a database field/column "first_name" that is declared as CHAR(25)
is fixed to 25 characters only.
When CHAR
values are stored, they are right-padded with spaces to the specified length. When CHAR
values are retrieved, trailing spaces are removed.
You should use CHAR
when you expect the data values in a column to be strictly the same length. For example, data values in a phone number column and registration number column can always be expected to be the same length so it makes sense for you to declare them as CHAR
.
The VARCHAR
type
Values in VARCHAR
columns are variable-length strings. The length can be specified as a value from 0 to 65,535. VARCHAR
types are declared with a length in parentheses that indicates the maximum number of characters you want to store. For example, a database field/column "first_name" that is declared as VARCHAR(25)
can hold up to 25 characters.
VARCHAR
values are stored as a 1-byte or 2-byte length prefix plus data. The length prefix indicates the number of bytes in the value. A column uses one length byte if values require no more than 255 bytes, two length bytes if values may require more than 255 bytes.
VARCHAR
values are not padded when they are stored. Trailing spaces are retained when values are stored and retrieved.
The storage requirements of VARCHAR as a variable length type depend on these factors:
The actual length of the column value
The column's maximum possible length
The character set used for the column, because some character sets contain multibyte characters
The TEXT
type
Similar to VARCHAR
, values in TEXT
columns are also variable-length strings. TEXT
values are treated as non-binary strings (character strings). They have a character set other than binary. The four TEXT
types are:
TINYTEXT
TEXT
MEDIUMTEXT
LONGTEXT
These TEXT
types differ only in the maximum length of the values they can hold.
TEXT
values can be extremely long therefore it may be desirable to store binary data such as media files in TEXT
columns.
TINYTEXT
type
The TINYTEXT
data type is the smallest of the TEXT
family and is built to efficiently store short information strings. It shares the same character length as VARCHAR
. TINYTEXT
type can store up to 255 bytes (expressed as 2^8 -1) or 255 characters and requires a 1 byte overhead.
TINYTEXT
type can be used to store things like short summaries, URL links, and other shorter objects.
TEXT
type
The standard TEXT
data type is sufficiently capable of handling typical long-form text content. TEXT
type top out at 64 KB (expressed as 2^16 -1) or 65,535 characters and requires a 2 byte overhead.
It is sufficiently large enough to hold text for something like an article, but would not be sufficient for holding the text of an entire book.
MEDIUMTEXT
type
The MEDIUMTEXT
type is useful for storing larger text strings like white papers, books, and code backup. These data objects can be as large as 16 MB (expressed as 2^24 -1) or 16,777,215 characters and require 3 bytes of overhead storage.
LONGTEXT
type
The LONGTEXT
type is for use in extreme text string storage use cases. It is a viable option when the MEDIUMTEXT
type is not big enough. Computer programs and applications often reach text lengths in the LONGTEXT
range. These data objects can be as large as 4 GB (expressed as 2^32 -1) and store up to 4,294,967,295 characters with 4 bytes of overhead storage,
Conclusion
The textual data types are similar but differ in the way they are stored and retrieved. They also differ in maximum length and in whether trailing spaces are retained.
Although this post covers only three fundamental string data types in detail, you can learn more about other string data types as well as other general data types like Numeric, JSON and Date and Time from the MySQL official documentation.
If you learnt a thing or two from reading this, please leave a comment and share this post with others.
Happy further learning ๐.