Textual Data Types in MySQL: Understanding CHAR, VARCHAR and TEXT

Textual Data Types in MySQL: Understanding CHAR, VARCHAR and TEXT

A look at the range, storage and use cases.

ยท

6 min read

Prerequisites

Here is a quick list of terminologies you need to know to get the most out of this article:

  • Byte - A unit of memory size

  • Character set - A complete set of characters/symbols and their number codes that can be recognised by a computer system. The default MySQL server character sets are utf8mb4 and utf8mb4_0900_ai_ci

  • Non-Binary data - This is the source data in the form of alphabets, digits or special characters which are provided by the user.

  • Binary data - This is the data in form of 0s and 1s that is understood by computers. When a character that makes up a string is stored (non-binary), it is its representative number code that is actually stored as a binary number.

Introduction

A data type is a classification that specifies which type of value a database field/column has and what type of mathematical, relational and logical operations can be applied to it without causing an error. For example, a field that has its values specified as any of the textual data types cannot allow for mathematical operations to be applied to it.

Textual data types are also known as string data types. They define a linear sequence of characters, words, numbers or other data.

Textual data types in MySQL include:

  • CHAR

  • VARCHAR

  • TEXT

  • BINARY

  • VARBINARY

  • BLOB

  • ENUM

  • SET

But for this article, you are going to familiarize yourself with the first three. CHAR, VARCHAR and TEXT data types are probably the most common string data types in MySQL that you are going to use for most of your database design projects.

Why choose the right data type?

It is important to understand and know all about these data types because to store data in a database management system like MySQL, you need to specify a field and a data type that defines the values in that field. The field stores the value entered and the data type validates that the value is as expected. This ensures that the data is collected in the preferred format and that the value of each field is interpreted by the computer as expected.

Choosing the right data type also allows for efficient storage. This is because database fields are allocated appropriate data types that cater to their character length. Allocating the wrong data type can cause a lot of unnecessary memory to be preserved which is wasteful.

Textual Data Types

The CHAR type

Values in CHAR columns are Fixed-Length Character Strings. The length can be specified as a value from 0 to 255. CHAR types are declared with a length in parentheses that indicates the fixed number of characters you want to store. For example, a database field/column "first_name" that is declared as CHAR(25) is fixed to 25 characters only.

When CHAR values are stored, they are right-padded with spaces to the specified length. When CHAR values are retrieved, trailing spaces are removed.

You should use CHAR when you expect the data values in a column to be strictly the same length. For example, data values in a phone number column and registration number column can always be expected to be the same length so it makes sense for you to declare them as CHAR.

The VARCHAR type

Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 65,535. VARCHAR types are declared with a length in parentheses that indicates the maximum number of characters you want to store. For example, a database field/column "first_name" that is declared as VARCHAR(25) can hold up to 25 characters.

VARCHAR values are stored as a 1-byte or 2-byte length prefix plus data. The length prefix indicates the number of bytes in the value. A column uses one length byte if values require no more than 255 bytes, two length bytes if values may require more than 255 bytes.

VARCHAR values are not padded when they are stored. Trailing spaces are retained when values are stored and retrieved.

The storage requirements of VARCHAR as a variable length type depend on these factors:

  • The actual length of the column value

  • The column's maximum possible length

  • The character set used for the column, because some character sets contain multibyte characters

The TEXT type

Similar to VARCHAR, values in TEXT columns are also variable-length strings. TEXT values are treated as non-binary strings (character strings). They have a character set other than binary. The four TEXT types are:

  • TINYTEXT

  • TEXT

  • MEDIUMTEXT

  • LONGTEXT

These TEXT types differ only in the maximum length of the values they can hold.

TEXT values can be extremely long therefore it may be desirable to store binary data such as media files in TEXT columns.

TINYTEXT type

The TINYTEXT data type is the smallest of the TEXT family and is built to efficiently store short information strings. It shares the same character length as VARCHAR. TINYTEXT type can store up to 255 bytes (expressed as 2^8 -1) or 255 characters and requires a 1 byte overhead.

TINYTEXT type can be used to store things like short summaries, URL links, and other shorter objects.

TEXT type

The standard TEXT data type is sufficiently capable of handling typical long-form text content. TEXT type top out at 64 KB (expressed as 2^16 -1) or 65,535 characters and requires a 2 byte overhead.

It is sufficiently large enough to hold text for something like an article, but would not be sufficient for holding the text of an entire book.

MEDIUMTEXT type

The MEDIUMTEXT type is useful for storing larger text strings like white papers, books, and code backup. These data objects can be as large as 16 MB (expressed as 2^24 -1) or 16,777,215 characters and require 3 bytes of overhead storage.

LONGTEXT type

The LONGTEXT type is for use in extreme text string storage use cases. It is a viable option when the MEDIUMTEXT type is not big enough. Computer programs and applications often reach text lengths in the LONGTEXT range. These data objects can be as large as 4 GB (expressed as 2^32 -1) and store up to 4,294,967,295 characters with 4 bytes of overhead storage,

Conclusion

The textual data types are similar but differ in the way they are stored and retrieved. They also differ in maximum length and in whether trailing spaces are retained.

Although this post covers only three fundamental string data types in detail, you can learn more about other string data types as well as other general data types like Numeric, JSON and Date and Time from the MySQL official documentation.

If you learnt a thing or two from reading this, please leave a comment and share this post with others.

Happy further learning ๐Ÿ˜€.

ย