Bits, Bytes & Character

66; 105; 116; 115; 44; 32; 66; 121; 116; 101; 115; 32; 38; 32; 67; 104; 97; 114; 97; 99; 116; 101; 114

Do you guess what this is? It is the way your computer stores the header of this post. Computers do not store actual text, but numbers. Then, someone created a mapping between numbers and letters and called this the ASCII table. That was back in 1963.

The ASCII table maps the numbers 0–255 to various letters and other characters. That number is important as it is what a computer could store in one byte.

Now we need to travel forward in time. Into the year 1991 to be exact. In the meantime, mankind found out that there is more than plain English and a new numbers-to-characters mapping was invented. But this time, it uses the numbers from 0 – 4 294 967 295. And computers now needed 4 bytes to store a single character.

And this mapping is defined in a clever way so that not all characters required 4 bytes, but can vary between 1 and 4.

All this historical baggage is now causing us problems! In our perfect shiny low-code world!

The Database

Databases also do only store numbers. When we define a text field using as VARCHAR(255), this field can store 255 characters, you might think. Well, not exactly! This field can hold 255 bytes. When you start to store emojis or Korean characters, some of them require 3–4 bytes, you could end up with only 63 actual characters. And a field of type TEXT with 65 536 bytes shrinks down to only 16 384 actual characters.

Appian

When defining a record in Appian, we can define a text field as normal text and long text. Normal text can hold 255, and the long text field 4000. Seems reasonable, as long as you do not enter multibyte Unicode characters.

And, dear Appian engineering team, why on earth do you count characters in bytes???

This gets even more interesting when you start to create user interfaces. Text fields have the parameter characterLimit. It is meant to limit the number of characters a user can enter. But, despite its name, it counts bytes.

4 Korean characters require 12 bytes. Yes, I write this post in the year 2024, not in 1963.

And now …

To make our data models agnostic to multibyte characters, we need to strictly quadruple the number of characters we want to store to get the field sizes in bytes. In the database, this is simple. Inside Appian, where we only have the choice between 255 bytes (63 characters) and 4000 bytes (1000 characters), not so much.

This limitation is only for synced records. When you disable sync, Appian talk directly to the database, and this 255/4000 characters restriction does not exist.

When developing user interfaces, we need to add a separate validation to text fields using the len() function. It counts actual characters, not bytes. And this is how our users, and probably any human, think, when entering text.

I was so happy as Appian introduced that characterLimit parameter. But when counting bytes only …

Summary

Low-code platforms significantly simplify software development, but sometimes they can bring back challenges reminiscent of the early days.

Dear Appian Team,

I trust you are aware of these issues and have already planned certain product improvements to address them. Please escalate this matter to your leadership and prioritize it accordingly.

Yours,

Stefan

Now, back to work! Appian Rocks!

Leave a Reply