HTML / JSP / Servlets / JavaMail / Oracle / CSV / Excel: UTF-8 to Unicode them all

Recently I wrote a web app that

  • Lets user enter a greeting message with subject and body
  • Sends an HTML email (“ecard”) to recipients
  • Stores info about sent messages in Oracle
  • Reports on recently sent messages on an admin page (HTML table)
  • Provides the report as downloadable CSV files (often opened in M$ Excel)
  • Provides an RSS feed about recently sent messages

One goal was to allow any Unicode characters for subject and body text and make sure that web form, servlets, JSP pages, emails, database records and CSV files all support that (no garbled characters anywhere, no data loss through charset conversions).

So here is what I did:

JSP and HTML pages

At the top of the JSP pages:

<!DOCTYPE html>

<%@ page contentType="text/html;charset=UTF-8" %>

In every HTML and JSP page, within the <head> section:

    <meta charset="UTF-8"/>

Servlet filter

In WEB-INF/web.xml:

    <filter>
        <filter-name>UTF8Filter</filter-name>
        <filter-class>net.doepner.servlet.Utf8Filter</filter-class>
    </filter>
    <filter-mapping>
        <filter-name>UTF8Filter</filter-name>
        <url-pattern>/*</url-pattern>
    </filter-mapping>

In net/doepner/servlet/Utf8Filter.java:

package net.doepner.servlet;

import javax.servlet.Filter;
import javax.servlet.FilterChain;
import javax.servlet.FilterConfig;
import javax.servlet.ServletException;
import javax.servlet.ServletRequest;
import javax.servlet.ServletResponse;
import java.io.IOException;

/**
 * Makes sure that we use UTF-8 for all requests and response
 */
public class Utf8Filter implements Filter {

    @Override
    public void init(FilterConfig fc) throws ServletException {
        // nothing to do
    }

    @Override
    public final void doFilter(ServletRequest request,
                               ServletResponse response,
                               FilterChain chain)
            throws IOException, ServletException {
        request.setCharacterEncoding("UTF-8");
        response.setCharacterEncoding("UTF-8");
        chain.doFilter(request, response);
    }

    @Override
    public void destroy() {
        // nothing to do
    }
}

Sending Email

In the code that sends the email (using javax.mail API):

        final MimeBodyPart htmlPart = new MimeBodyPart();
        htmlPart.setContent(template.getHtml(msg), "text/html;charset=utf-8");

        final Multipart multiPart = new MimeMultipart("alternative");
        multiPart.addBodyPart(htmlPart);

        final MimeMessage email =
                new MimeMessage(Session.getDefaultInstance(properties));

        // setting the sender and recipient is omitted here for brevity

        email.setSubject(msg.getSubject(), "UTF-8");
        email.setContent(multiPart);

        Transport.send(email);

Oracle database

For Unicode support in Oracle, make sure that

  1. Use NLS_CHARACTERSET = AL32UTF8 and regular VARCHAR2 columns
  2. Or use NVARCHAR2 column types.

I used approach A. I haven’t actually tried approach B myself.

Here is a useful query to see current charset settings:

SELECT * FROM nls_database_parameters nls 
         WHERE nls.parameter LIKE '%CHAR%SET%';

CSV generation

See my earlier blog post about CSV generation in a Servlet using my CsvWriter utility class.

The important bits are:

private static final char BYTE_ORDER_MARK = (char) 0xfeff;

Put that byte sequence (the so-called “BOM“) at the very beginning of the response content. Some applications (like M$ Excel) will otherwise not detect the UTF-8 encoding correctly.

Do this on the writer object from the getWriter() method on the servlet response:

// The BOM is required so that Excel will recognize UTF-8
// characters properly, i.e. all non-ASCII letters, etc.
writer.print(BYTE_ORDER_MARK);

RSS feed

I generate the RSS feed with an JSP page. Just make sure you have this on the top of the page:

<?xml version="1.0" encoding="UTF-8"?>
<%@ page contentType="text/xml;charset=UTF-8" %>

Easily generate CSV in Java (e.g. from Servlet)

To generate CSV from Java consider this simple interface:

package net.doepner;

import java.io.IOException;

/**
 * Generates CSV (comma separated values) for rows of Java objects
 */
public interface ICsvWriter {

    /**
     * Adds a row of objects to the CSV document
     *
     * @param values The objects in the row (count must match the number of the
     *               headers)
     */
    void row(Object... values);

    /**
     * Writes CSV based on the the String representations of the objects
     *
     * @param appendable The writer to append to
     * @throws IOException If underlying IO fails
     */
    void appendTo(Appendable appendable) throws IOException;
}

I implemented the interface with this CsvWriter class:

package net.doepner;

import java.io.IOException;
import java.util.Arrays;
import java.util.Collection;
import java.util.LinkedList;
import java.util.regex.Pattern;

/**
 * Convenient generation of CSV
 */
public class CsvWriter implements ICsvWriter {

    private static final CharSequence CSV_ROW_END = "\r\n";
    private static final char LINE_BREAK_WITHIN_CELL = '\n';

    private static final Pattern QUOTES = Pattern.compile("\"");
    private static final String ESCAPED_QUOTE = "\"\"";

    private final Object[] headers;
    private final Collection<Object[]> rows = new LinkedList<Object[]>();

    /**
     * @param headers The objects representing the column headers
     */
    public CsvWriter(Object... headers) {
        this.headers = Arrays.copyOf(headers, headers.length);
        if (cols() == 0) {
            throw new IllegalArgumentException("No columns");
        }
    }

    @Override
    public final void row(Object... values) {
        if (values.length != cols()) {
            throw new IllegalArgumentException("Specify " + cols() + "values ");
        }
        rows.add(Arrays.copyOf(values, values.length));
    }

    @Override
    public final void appendTo(Appendable appendable) throws IOException {
        appendRow(appendable, headers);

        for (Object[] row : rows) {
            appendRow(appendable, row);
        }
    }

    private static void appendRow(Appendable appendable, Object[] row)
            throws IOException {
        boolean first = true;
        for (Object value : row) {
            if (first) {
                first = false;
            } else {
                appendable.append(",");
            }
            appendable.append('"');
            appendable.append(toCsvString(value));
            appendable.append('"');
        }
        appendable.append(CSV_ROW_END);
    }

    private static CharSequence toCsvString(Object object) {
        if (object instanceof Iterable) {
            final StringBuilder sb = new StringBuilder();
            boolean first = true;
            for (Object o : (Iterable<?>) object) {
                if (first) {
                    first = false;
                } else {
                    sb.append(LINE_BREAK_WITHIN_CELL);
                }
                sb.append(toCsvString(o));
            }
            return sb.toString();
        } else {
            if (object == null) {
                return "";
            } else {
                final String s = object.toString();
                return QUOTES.matcher(s).replaceAll(ESCAPED_QUOTE);
            }
        }
    }

    private int cols() {
        return headers.length;
    }
}

Example for how it can be used in a Servlet, here with full support for Unicode characters using UTF-8, in a way that even Excel understands:

    private static final char BYTE_ORDER_MARK = (char) 0xfeff;
 
    private static void generateCsv(ServletResponse resp, 
                                    Iterable<IMessage> messages)
            throws ServletException {

        resp.setContentType("text/csv");
        resp.setCharacterEncoding("UTF-8");

        final ICsvWriter csv = new CsvWriter(
                "Date", "Sender", "Recipients", "Subject");

        for (IMessage msg : messages) {
            csv.row(msg.getDateTime(), msg.getSender(), 
                    msg.getRecipients, msg.getSubject());
        }

        final PrintWriter writer = getResponseWriter(resp);
        try {
            // The BOM is required so that Excel will recognize UTF-8
            // characters properly, i.e. all non-ASCII letters, etc.
            writer.print(BYTE_ORDER_MARK);
            writer.flush();

            csv.appendTo(writer);
            writer.flush();

        } catch (IOException e) {
            throw new ServletException(e);
        }
    }

    private static PrintWriter getResponseWriter(ServletResponse resp)
            throws ServletException {
        try {
            return resp.getWriter();
        } catch (IOException e) {
            throw new ServletException(e);
        }
    }

Tomcat: Require authentication but no roles

Sometimes you might want to protect some parts of your Java web application from anonymous access, but not impose any authorization constraints: Every authenticated user should be automatically authorized, no matter what roles they have or don’t have.

Tomcat supports this by setting allRolesMode=”authOnly” on the Realm definition, usually in META-INF/context.xml, in combination with <security-constraint> entries in WEB-INF/web.xml that declare an <auth-constraint> with <role-name>*</role-name>.

Simple accordion using CSS3 :target selector (no JavaScript)

I made a simple “accordion” style UI that consists of several collapsed sections with only the “current” one expanded, using the CSS3 :target selector. I wanted the basic functionality of the JQuery UI accordion, but in a minimalistic way, without any JavaScript.

The :target selector is supported in all modern browsers, including IE9 and IE10, but not in IE8 and earlier.

In a real application with a variable number of accordion sections you would probably generate the HTML for the sections with some sort of iteration. You need to make sure that each section id is a valid unique id on the page and that the href of the link in the section header points to the section id, prefixed by ‘#’.

The minimal sample code is in jsfiddle 1, a version with some more visual style is in jsfiddle 2. For illustration, I am also posting the sample code here (see below).

I used a <div> for the whole accordion, a <fieldset> for each section, a <legend> for the section header and nested <div>…</div> for the section content.

Please note that you can use other HTML tags, as long as you use the CSS classes used in the stylesheet and the same kind of nesting.

HTML code:

<div class="accordion">
  <fieldset class="section" id="id1">
    <legend class="section-header">
      <a href="#id1">Section 1</a>
    </legend>
    <div class="section-content">
      Section 1 content
    </div>
  </fieldset>
  <fieldset class="section" id="id2">
    <legend class="section-header">
      <a href="#id2">Section 2</a>
    </legend>
    <div class="section-content">
      Section 2 content
    </div>
  </fieldset>
</div>

CSS stylesheet code:

.accordion .section {
    margin: 1em;
}
.accordion .section-header a {
    text-decoration: none;
    color: gray;
}
.accordion .section:hover .section-header a {
    color: red;
}
.accordion .section .section-content {
    display: none;
}
.accordion .section:target .section-content {
    display: block;
}
.accordion .section:target {
    border: 2px solid navy;
}
.accordion .section:target .section-header a {
    color: navy;
    cursor: default;
}
.accordion .section-header a:before {
    content:"\25BA\0000a0"
    /* unicode triangle pointing right */
}
.accordion .section:target .section-header a:before {
    content:"\25BC\0000a0"
    /* unicode triangle pointing down */
}