Base64 is a binary-to-text encoding scheme that represents binary data (like images, files, or even plain text) in an ASCII string format. It's commonly used when binary data needs to be stored or transferred over media that are designed to deal with text. This ensures that the data remains intact without modification during transport.
Common use cases include embedding image data directly into HTML or CSS files, sending binary data in XML or JSON payloads, and encoding email attachments.
- Convert to Binary: The original data (e.g., a text string) is first converted into a sequence of bytes (usually based on its UTF-8 representation). Each byte consists of 8 bits.
- Group into 24 bits: These bytes are then grouped together. The standard approach takes 3 bytes (3 * 8 = 24 bits) at a time.
- Split into 6-bit chunks: The 24-bit group is divided into four 6-bit chunks (4 * 6 = 24 bits).
- Map to Base64 Characters: Each 6-bit chunk represents a number between 0 and 63. This number is used as an index to look up a character in the Base64 alphabet table (which includes A-Z, a-z, 0-9, +, and /). These four characters form the encoded output for the original 3 bytes.
- Padding: If the original data's length isn't a multiple of 3 bytes, padding is added. If there are only two input bytes left, they form 16 bits. These are split into three 6-bit chunks (the last one having 4 bits). The first three chunks are mapped to Base64 characters, and one '=' padding character is added. If only one input byte is left (8 bits), it's split into two 6-bit chunks (the second having 2 bits). These are mapped to two Base64 characters, and two '=' padding characters are added. The padding ensures the final encoded string length is always a multiple of 4.
Decoding reverses this process: it takes 4 Base64 characters, maps them back to their 6-bit values, combines them into 24 bits, and then splits them back into the original 3 bytes. Padding characters indicate how many original bytes were encoded in the final group.