Apple has been having a buggy few months. Now we’ve got a new, serious bug in the text-rendering functionality in iPhones. The bug is triggered by a single Telugu character which can cause an iPhone to enter an unbreakable boot loop just by receiving a notification containing the character. Let’s delve in to why a single character can cause such major problems with iOS.
Note: A fix for the Telugu bug is available in the most recent version of iOS (11.2.6). If the Telugu character has locked up your app or device, restore your iPhone via iTunes and update to the most recent version of iOS. If your iPhone is stuck in a boot loop, you may need to put it in the Device Firmware Update (DFU) state to get iTunes to recognize it. When finished, restore your device from your most recent backup, which you hopefully created.
What Is Telugu?
Telugu is a language spoken and written in parts of India, specifically the states of Andhra Pradesh, Telangana, and in the town of Yanam. Like many script-based languages, such as Arabic and other Brahmic scripts, Telugu uses some special features of the Unicode character set to display its characters on a computer screen.
While most Latin letters are represented by single 8-bit Unicode code point for ASCII compatibility (for example, the letter A exists at the Unicode code point
U+0041, which is represented in binary by
01000001), languages written with script or non-Latin letters typically combine more than one Unicode code point to represent their characters.
This is especially true for languages, like Telugu, which combine the languages’ versions of letters in clusters. Unlike English’s stylistic ligatures, the connection between each Telugu letter is linguistically important. To accommodate this, Unicode includes a complex system of attaching characters, each represented by their own code point, to one another.
Considering the sheer number of Unicode code points, this can create near-infinite variety. These points combine together to render a legible character. This way Unicode doesn’t need a Unicode code point for literally every possible Telugu word. Instead, Unicode combines Telugu consonants, vowels and diacritics (“virama”) together to create words that are displayed like a single character. The same applies to other languages with orthographic rules for ligatures, like Arabic.
What Causes the Crash?
The problem seems to be related to the Zero Width Non-Joiner (ZWNJ) at code point
U+200C. The ZWNJ requests that two adjacent characters render without their typical ligature. In English, a ZWNJ keeps the characters ﬀ from being printed with their standard connection ligature, instead separating each f. But when combined with a specific set of four Telugu code points (all of which should combine to a single cluster), for some reason iOS can’t display the result properly.
Some have speculated that Apple’s San Francisco font can’t display the character, while others have said that the specific rendering process Apple uses is to blame. Whatever the exact cause, the attempt to render the character causes a dramatic crash of whatever is rendering it, from Messages and WhatsApp to Springboard. The Unicode code points that make up the character (“gya” meaning “knowledge”) are below:
U+0C4Da virama, or diacritic mark ( )
U+200Czero width non-joiners
U+0C3Eaa ( )
But we can’t even blame Zero Width Non-Joiner (ZWNJ) alone. It’s also used in the innocuous family emojis (????) without any issue. It seems to be a specific combination of some specific code points and the ZWNJ. Adding insult to injury, it seems like the ZWNJ either has no particular effect on the rendering on this Telugu cluster or that it shouldn’t even be there in the first place.
Other Brahmic Script Problems
Telugu isn’t the only language with this issue, however. Bengali and Devanagari, which use Unicode in a similar way for their Brahmic scripts, has the same problem. Manish Goregaokar writes a fasctinating and detailed blog post that breaks the exact crash case down even further:
<consonant1, virama, consonant2, ZWNJ, vowel>in Devanagari, Bengali, and Telugu, where:
consonant2is suffix-joining (
consonant1is not a reph-forming letter
voweldoes not have two glyph components
Conclusion: Why Wasn’t this Caught by Apple?
To understand how this bug got through, you have to put yourself in Apple’s shoes. Sure, this character combination isn’t some super obscure word in the Telugu language. But the iPhone includes support for dozens of languages. There are literally billions of potential combinations in Unicode. With that much variety, meaningful testing for Unicode bugs before a release would make regular software updates basically impossible.
However, the error should not have caused this much damage. Phones shouldn’t get bricked based on the contents of a text message. While hindsight is surely 20/20, it seems like rendering the character as a question mark box (�) would have been better than crashing Springboard.