Unicode Emoji Committee 2021 : “Race is Not a Skin Tone. Gender is Not a Haircut”

Yet this idea seems to underly the simplistic 3-dimensional model used to create the new compound emojis shown below that were added to the Unicode standard in 2021:

5 skin tones × 2 haircuts × 2 mouth expressions (kiss/neutral) × 5 skin tones × 2 haircuts
= 10 × 2 × 10 = 200 depictions

Maybe the richness of human love just doesn’t lend itself to be represented like this?

👨🏻‍❤️‍👨🏻 👨🏻‍❤️‍👨🏼 👨🏻‍❤️‍👨🏽👨🏻‍❤️‍👨🏾 👨🏻‍❤️‍👨🏿 👨🏼‍❤️‍👨🏻👨🏼‍❤️‍👨🏼 👨🏼‍❤️‍👨🏽 👨🏼‍❤️‍👨🏾👨🏼‍❤️‍👨🏿 👨🏽‍❤️‍👨🏻 👨🏽‍❤️‍👨🏼👨🏽‍❤️‍👨🏽 👨🏽‍❤️‍👨🏾 👨🏽‍❤️‍👨🏿👨🏾‍❤️‍👨🏻 👨🏾‍❤️‍👨🏼 👨🏾‍❤️‍👨🏽👨🏾‍❤️‍👨🏾 👨🏾‍❤️‍👨🏿 👨🏿‍❤️‍👨🏻👨🏿‍❤️‍👨🏼 👨🏿‍❤️‍👨🏽 👨🏿‍❤️‍👨🏾👨🏿‍❤️‍👨🏿 👩🏻‍❤️‍👨🏻 👩🏻‍❤️‍👨🏼👩🏻‍❤️‍👨🏽 👩🏻‍❤️‍👨🏾 👩🏻‍❤️‍👨🏿👩🏻‍❤️‍👩🏻 👩🏻‍❤️‍👩🏼 👩🏻‍❤️‍👩🏽👩🏻‍❤️‍👩🏾 👩🏻‍❤️‍👩🏿 👩🏼‍❤️‍👨🏻👩🏼‍❤️‍👨🏼 👩🏼‍❤️‍👨🏽 👩🏼‍❤️‍👨🏾👩🏼‍❤️‍👨🏿 👩🏼‍❤️‍👩🏻 👩🏼‍❤️‍👩🏼👩🏼‍❤️‍👩🏽 👩🏼‍❤️‍👩🏾 👩🏼‍❤️‍👩🏿👩🏽‍❤️‍👨🏻 👩🏽‍❤️‍👨🏼 👩🏽‍❤️‍👨🏽👩🏽‍❤️‍👨🏾 👩🏽‍❤️‍👨🏿 👩🏽‍❤️‍👩🏻👩🏽‍❤️‍👩🏼 👩🏽‍❤️‍👩🏽 👩🏽‍❤️‍👩🏾👩🏽‍❤️‍👩🏿 👩🏾‍❤️‍👨🏻 👩🏾‍❤️‍👨🏼👩🏾‍❤️‍👨🏽 👩🏾‍❤️‍👨🏾 👩🏾‍❤️‍👨🏿👩🏾‍❤️‍👩🏻 👩🏾‍❤️‍👩🏼 👩🏾‍❤️‍👩🏽👩🏾‍❤️‍👩🏾 👩🏾‍❤️‍👩🏿 👩🏿‍❤️‍👨🏻👩🏿‍❤️‍👨🏼 👩🏿‍❤️‍👨🏽 👩🏿‍❤️‍👨🏾👩🏿‍❤️‍👨🏿 👩🏿‍❤️‍👩🏻 👩🏿‍❤️‍👩🏼👩🏿‍❤️‍👩🏽 👩🏿‍❤️‍👩🏾 👩🏿‍❤️‍👩🏿🧑🏻‍❤️‍🧑🏼 🧑🏻‍❤️‍🧑🏽 🧑🏻‍❤️‍🧑🏾🧑🏻‍❤️‍🧑🏿 🧑🏼‍❤️‍🧑🏻 🧑🏼‍❤️‍🧑🏽🧑🏼‍❤️‍🧑🏾 🧑🏼‍❤️‍🧑🏿 🧑🏽‍❤️‍🧑🏻🧑🏽‍❤️‍🧑🏼 🧑🏽‍❤️‍🧑🏾 🧑🏽‍❤️‍🧑🏿🧑🏾‍❤️‍🧑🏻 🧑🏾‍❤️‍🧑🏼 🧑🏾‍❤️‍🧑🏽🧑🏾‍❤️‍🧑🏿 🧑🏿‍❤️‍🧑🏻 🧑🏿‍❤️‍🧑🏼🧑🏿‍❤️‍🧑🏽 🧑🏿‍❤️‍🧑🏾 👨🏻‍❤️‍💋‍👨🏻👨🏻‍❤️‍💋‍👨🏼 👨🏻‍❤️‍💋‍👨🏽 👨🏻‍❤️‍💋‍👨🏾👨🏻‍❤️‍💋‍👨🏿 👨🏼‍❤️‍💋‍👨🏻 👨🏼‍❤️‍💋‍👨🏼👨🏼‍❤️‍💋‍👨🏽 👨🏼‍❤️‍💋‍👨🏾 👨🏼‍❤️‍💋‍👨🏿👨🏽‍❤️‍💋‍👨🏻 👨🏽‍❤️‍💋‍👨🏼 👨🏽‍❤️‍💋‍👨🏽👨🏽‍❤️‍💋‍👨🏾 👨🏽‍❤️‍💋‍👨🏿 👨🏾‍❤️‍💋‍👨🏻👨🏾‍❤️‍💋‍👨🏼 👨🏾‍❤️‍💋‍👨🏽 👨🏾‍❤️‍💋‍👨🏾👨🏾‍❤️‍💋‍👨🏿 👨🏿‍❤️‍💋‍👨🏻 👨🏿‍❤️‍💋‍👨🏼👨🏿‍❤️‍💋‍👨🏽 👨🏿‍❤️‍💋‍👨🏾 👨🏿‍❤️‍💋‍👨🏿👩🏻‍❤️‍💋‍👨🏻 👩🏻‍❤️‍💋‍👨🏼 👩🏻‍❤️‍💋‍👨🏽👩🏻‍❤️‍💋‍👨🏾 👩🏻‍❤️‍💋‍👨🏿 👩🏻‍❤️‍💋‍👩🏻👩🏻‍❤️‍💋‍👩🏼 👩🏻‍❤️‍💋‍👩🏽 👩🏻‍❤️‍💋‍👩🏾👩🏻‍❤️‍💋‍👩🏿 👩🏼‍❤️‍💋‍👨🏻 👩🏼‍❤️‍💋‍👨🏼👩🏼‍❤️‍💋‍👨🏽 👩🏼‍❤️‍💋‍👨🏾 👩🏼‍❤️‍💋‍👨🏿👩🏼‍❤️‍💋‍👩🏻 👩🏼‍❤️‍💋‍👩🏼 👩🏼‍❤️‍💋‍👩🏽👩🏼‍❤️‍💋‍👩🏾 👩🏼‍❤️‍💋‍👩🏿 👩🏽‍❤️‍💋‍👨🏻👩🏽‍❤️‍💋‍👨🏼 👩🏽‍❤️‍💋‍👨🏽 👩🏽‍❤️‍💋‍👨🏾👩🏽‍❤️‍💋‍👨🏿 👩🏽‍❤️‍💋‍👩🏻 👩🏽‍❤️‍💋‍👩🏼👩🏽‍❤️‍💋‍👩🏽 👩🏽‍❤️‍💋‍👩🏾 👩🏽‍❤️‍💋‍👩🏿👩🏾‍❤️‍💋‍👨🏻 👩🏾‍❤️‍💋‍👨🏼 👩🏾‍❤️‍💋‍👨🏽👩🏾‍❤️‍💋‍👨🏾 👩🏾‍❤️‍💋‍👨🏿 👩🏾‍❤️‍💋‍👩🏻👩🏾‍❤️‍💋‍👩🏼 👩🏾‍❤️‍💋‍👩🏽 👩🏾‍❤️‍💋‍👩🏾👩🏾‍❤️‍💋‍👩🏿 👩🏿‍❤️‍💋‍👨🏻 👩🏿‍❤️‍💋‍👨🏼👩🏿‍❤️‍💋‍👨🏽 👩🏿‍❤️‍💋‍👨🏾 👩🏿‍❤️‍💋‍👨🏿👩🏿‍❤️‍💋‍👩🏻 👩🏿‍❤️‍💋‍👩🏼 👩🏿‍❤️‍💋‍👩🏽👩🏿‍❤️‍💋‍👩🏾 👩🏿‍❤️‍💋‍👩🏿 🧑🏻‍❤️‍💋‍🧑🏼🧑🏻‍❤️‍💋‍🧑🏽 🧑🏻‍❤️‍💋‍🧑🏾 🧑🏻‍❤️‍💋‍🧑🏿🧑🏼‍❤️‍💋‍🧑🏻 🧑🏼‍❤️‍💋‍🧑🏽 🧑🏼‍❤️‍💋‍🧑🏾🧑🏼‍❤️‍💋‍🧑🏿 🧑🏽‍❤️‍💋‍🧑🏻 🧑🏽‍❤️‍💋‍🧑🏼🧑🏽‍❤️‍💋‍🧑🏾 🧑🏽‍❤️‍💋‍🧑🏿 🧑🏾‍❤️‍💋‍🧑🏻🧑🏾‍❤️‍💋‍🧑🏼 🧑🏾‍❤️‍💋‍🧑🏽 🧑🏾‍❤️‍💋‍🧑🏿🧑🏿‍❤️‍💋‍🧑🏻 🧑🏿‍❤️‍💋‍🧑🏼 🧑🏿‍❤️‍💋‍🧑🏽🧑🏿‍❤️‍💋‍🧑🏾

Decorate your thoughts my dear : Unicode 8 supports 1051 Emojis

Have you noticed how email subjects, chat messages and other traditional “plain text” content more and more often contains smileys, ideograms, pictograms and other funny symbols?

For example, you can probably see the faces of mouse, bull, cat and monkey right here: 🐭 🐮 🐱 🐵

These expressive tiny pictures are also known as Emojis and can be seamlessly mixed into the letters, numbers, punctuation marks and few traditional “special characters” that most of us know from the computer (or even typewriter?) keyboard.

This is possible because on todays digital devices all of us, whether we know it or not, use an international standard called Unicode that supports all letters and symbols from all official languages and writing systems on our planet (and even Klingon, by the way). And as an extensible standard, Unicode keeps growing and recent versions have increasingly incorporated non-alphanumeric symbols.

For example, there is a whole Unicode block of smileys and cat faces that was added in Unicode version 8 to incorporate a set of symbols that Japanese mobile carriers had already added to the Shift JIS character encoding: 😀 😁 😂 😃 😄 😅 😆 😇 😈 😉 😊 😋 😌 😍 😎 😏 😐 😑 😒 😓 😔 😕 😖 😗 😘 😙 😚 😛 😜 😝 😞 😟 😠 😡 😢 😣 😥 😦 😧 😨 😩 😪 😫 😭 😮 😯 😰 😱 😲 😳 😴 😵 😶 😷 😸 😹 😺 😻 😼 😽 😾 😿 🙀

With the now ubiquituous support of Unicode by computer operating systems, internet services and mobile devices, we are no longer limited to the 90° tilted “ASCII emoticons” using character sequences like :^). With Unicode we can use emojis as single symbols of their own right.

This leads to the question how to enter emojis using your keyboard, be it a physical computer keyboard or through the on-screen touch keyboards on your modern mobile devices. It is possible, with varying degrees of convenience on Apple, Android, Windows or Mac computers and many other devices.

Unfortunately, UI level emoji support on GNU/Linux systems is practically non-existent which effectively limits Linux users to copy/paste from “cheatsheets” or the official Unicode 8 charts with their 1051 emoji and symbol codepoints.

One tip for Debian based systems: Make sure to have the Droid and Symbola fonts installed:

sudo apt-get install ttf-ancient-fonts fonts-droid

Then you can write a chess game for the text console using these Unicode glyphs from the “Miscellaneous Symbols” Unicode block: ♔ ♕ ♖ ♗ ♘ ♙ ♚ ♛ ♜ ♝ ♞ ♟

Or roll the dice: ⚀ ⚁ ⚂ ⚃ ⚄ ⚅

PS: All characters in this blog post are ok to recycle ☺ : ♲ ♳ ♴ ♵ ♶ ♷ ♸ ♹ ♺ ♻ ♼ ♽

As always, please feel free to respond in the comments section below.

HTML / JSP / Servlets / JavaMail / Oracle / CSV / Excel: UTF-8 to Unicode them all

Recently I wrote a web app that

Lets user enter a greeting message with subject and body
Sends an HTML email (“ecard”) to recipients
Stores info about sent messages in Oracle
Reports on recently sent messages on an admin page (HTML table)
Provides the report as downloadable CSV files (often opened in M$ Excel)
Provides an RSS feed about recently sent messages

One goal was to allow any Unicode characters for subject and body text and make sure that web form, servlets, JSP pages, emails, database records and CSV files all support that (no garbled characters anywhere, no data loss through charset conversions).

So here is what I did:

JSP and HTML pages

At the top of the JSP pages:

<!DOCTYPE html>

<%@ page contentType="text/html;charset=UTF-8" %>

In every HTML and JSP page, within the <head> section:

    <meta charset="UTF-8"/>

Servlet filter

In WEB-INF/web.xml:

    <filter>
        <filter-name>UTF8Filter</filter-name>
        <filter-class>net.doepner.servlet.Utf8Filter</filter-class>
    </filter>
    <filter-mapping>
        <filter-name>UTF8Filter</filter-name>
        <url-pattern>/*</url-pattern>
    </filter-mapping>

In net/doepner/servlet/Utf8Filter.java:

package net.doepner.servlet;

import javax.servlet.Filter;
import javax.servlet.FilterChain;
import javax.servlet.FilterConfig;
import javax.servlet.ServletException;
import javax.servlet.ServletRequest;
import javax.servlet.ServletResponse;
import java.io.IOException;

/**
 * Makes sure that we use UTF-8 for all requests and response
 */
public class Utf8Filter implements Filter {

    @Override
    public void init(FilterConfig fc) throws ServletException {
        // nothing to do
    }

    @Override
    public final void doFilter(ServletRequest request,
                               ServletResponse response,
                               FilterChain chain)
            throws IOException, ServletException {
        request.setCharacterEncoding("UTF-8");
        response.setCharacterEncoding("UTF-8");
        chain.doFilter(request, response);
    }

    @Override
    public void destroy() {
        // nothing to do
    }
}

Sending Email

In the code that sends the email (using javax.mail API):

        final MimeBodyPart htmlPart = new MimeBodyPart();
        htmlPart.setContent(template.getHtml(msg), "text/html;charset=utf-8");

        final Multipart multiPart = new MimeMultipart("alternative");
        multiPart.addBodyPart(htmlPart);

        final MimeMessage email =
                new MimeMessage(Session.getDefaultInstance(properties));

        // setting the sender and recipient is omitted here for brevity

        email.setSubject(msg.getSubject(), "UTF-8");
        email.setContent(multiPart);

        Transport.send(email);

Oracle database

For Unicode support in Oracle, make sure that

Use NLS_CHARACTERSET = AL32UTF8 and regular VARCHAR2 columns
Or use NVARCHAR2 column types.

I used approach A. I haven’t actually tried approach B myself.

Here is a useful query to see current charset settings:

SELECT * FROM nls_database_parameters nls 
         WHERE nls.parameter LIKE '%CHAR%SET%';

CSV generation

See my earlier blog post about CSV generation in a Servlet using my CsvWriter utility class.

The important bits are:

private static final char BYTE_ORDER_MARK = (char) 0xfeff;

Put that byte sequence (the so-called “BOM“) at the very beginning of the response content. Some applications (like M$ Excel) will otherwise not detect the UTF-8 encoding correctly.

Do this on the writer object from the getWriter() method on the servlet response:

// The BOM is required so that Excel will recognize UTF-8
// characters properly, i.e. all non-ASCII letters, etc.
writer.print(BYTE_ORDER_MARK);

RSS feed

I generate the RSS feed with an JSP page. Just make sure you have this on the top of the page:

<?xml version="1.0" encoding="UTF-8"?>
<%@ page contentType="text/xml;charset=UTF-8" %>

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31