Transparently improve Java 7 mime-type recognition with Apache Tika

Java 7 comes with the method java.nio.file.Files#probeContentType(path) to determine the content type of a file at the given path. It returns a mime type identifier. The implementation actually looks at the file content and inspects so-called “magic” byte sequences, which is more reliable than just trusting filename extensions.

However, the default implementation included in Java 7 seems to be platform dependent and not very complete. For example, for me it did not even recognize an mp3 file as audio/mpeg. Fortunately, the Open Source library Apache Tika provides more comprehensive mime type detection and seems to be platform independent.

As shown below, you can register a simple Tika based FileTypeDetector implementation with the Java Service Provider Interface (SPI) to transparently enhance the behaviour of java.nio.file.Files#probeContentType(path). As soon as the resulting jar is in your classpath, the SPI mechanism wil pick up our implementation class and Files.probeContentType(..) will automatically use it behind the scenes.

Maven dependency

        <dependency>
            <groupId>org.apache.tika</groupId>
            <artifactId>tika-core</artifactId>
            <version>1.4</version>
        </dependency>

FileTypeDetector.java

package net.doepner.file;

import java.io.IOException;
import java.nio.file.Path;

import org.apache.tika.Tika;

/**
 * Detects the mime type of files (ideally based on marker in file content)
 */
public class FileTypeDetector extends java.nio.file.spi.FileTypeDetector {

    private final Tika tika = new Tika();

    @Override
    public String probeContentType(Path path) throws IOException {
        return tika.detect(path.toFile());
    }
}

Service Provider registration

To register the implementation with the Java Service Provider Interface (SPI), you need to have a plaintext file /META-INF/services/java.nio.file.spi.FileTypeDetector in the same jar that contains the class net.doepner.file.FileTypeDetector. The text file contains just one line with the fully qualified name of the implementing class:

net.doepner.file.FileTypeDetector

With Maven, you simply create the file src/main/resources/META-INF/services/java.nio.file.spi.FileTypeDetector containing the line shown above.

See the ServiceLoader documentation for details about Java SPI.

Reflexionen der Moderne im dramatischen Werk Ernst Tollers

About 12 years ago, in July 2001, I submitted my thesis “Zwischen Weltverbesserung und Isolation – Reflexionen der Moderne im dramatischen Werk Ernst Tollers” to complete my university degree in Mathematics and German Linguistics and Literature.

I wrote the document using Latex and GNU Emacs on a GNU/Linux system. It is available in PDF format.

The LaTex source files of the thesis are also available. The structure is very straightforward and uses predefined macro definitions that can be generally useful for writing essays, books and academic papers in the Liberal Arts.

For those who are fed up with their word processor messing with their layouts and prefer to just write plain text: Take a look at how simple, for example my introduction chapter is.

If you are interested, feel free to reuse my LaTeX macros in STYLE.tex.

The LaTeX code is compatible with TeX Live, version 2012. On Debian stable (wheezy) installation is as simple as

sudo apt-get install texlive texlive-latex-extra texlive-lang-german evince
wget https://github.com/odoepner/toller-moderne/archive/master.zip
unzip master.zip
cd toller-moderne-master/src/main/tex
pdflatex MAIN.tex
evince MAIN.pdf

Subversion 1.8 released

The new Subversion 1.8 features look quite good for a centralized Version Control System (VCS).

But note that the Subversion 1.8 working copy format is backwards-incompatible. Some tools like recent TortoiseSVN versions will use the 1.8 format by default which will cause compatibility problems for IntelliJ and any other tools that do not yet support it.

So for now, it is probably better to stick with 1.7 and wait until all your tools fully support 1.8. For IntelliJ you might want to watch [IDEA-94942] for status updates.

Personally, I am more interested in Git anyway because it offers all the flexibility of a decentralized VCS. I am reading the free “Pro Git” ebook on my Kobo eReader (epub format).

Play MP3 or OGG using javax.sound.sampled, mp3spi, vorbisspi

I tried to come up with the simplest possible way of writing a Java class that can play mp3 and ogg files, using standard Java Sound APIs, with purely Open Source libraries from the public Maven Central repositories.

The LGPL-licensed mp3spi and vorbisspi libraries from javazoom.net satisfy these requirements and worked for me right away. As service provider implementations (SPI), they transparently add support for the mp3 and ogg audio formats to javax.sound.sampled, simply by being in the classpath.

For my AudioFilePlayer class below I basically took the example code from javazoom and simplified it as much as possible. Please note that it requires Java 7 as it uses try-with-resources.

Maven dependencies

  <!-- 
    We have to explicitly instruct Maven to use tritonus-share 0.3.7-2 
    and NOT 0.3.7-1, otherwise vorbisspi won't work.
   -->
<dependency>
  <groupId>com.googlecode.soundlibs</groupId>
  <artifactId>tritonus-share</artifactId>
  <version>0.3.7-2</version>
</dependency>
<dependency>
  <groupId>com.googlecode.soundlibs</groupId>
  <artifactId>mp3spi</artifactId>
  <version>1.9.5-1</version>
</dependency>
<dependency>
  <groupId>com.googlecode.soundlibs</groupId>
  <artifactId>vorbisspi</artifactId>
  <version>1.0.3-1</version>
</dependency>

AudioFilePlayer.java

package net.doepner.audio;

import java.io.File;
import java.io.IOException;

import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioInputStream;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.DataLine.Info;
import javax.sound.sampled.LineUnavailableException;
import javax.sound.sampled.SourceDataLine;
import javax.sound.sampled.UnsupportedAudioFileException;

import static javax.sound.sampled.AudioSystem.getAudioInputStream;
import static javax.sound.sampled.AudioFormat.Encoding.PCM_SIGNED;

public class AudioFilePlayer {

    public static void main(String[] args) {
        final AudioFilePlayer player = new AudioFilePlayer ();
        player.play("something.mp3");
        player.play("something.ogg");
    }

    public void play(String filePath) {
        final File file = new File(filePath);

        try (final AudioInputStream in = getAudioInputStream(file)) {
            
            final AudioFormat outFormat = getOutFormat(in.getFormat());
            final Info info = new Info(SourceDataLine.class, outFormat);

            try (final SourceDataLine line =
                     (SourceDataLine) AudioSystem.getLine(info)) {

                if (line != null) {
                    line.open(outFormat);
                    line.start();
                    stream(getAudioInputStream(outFormat, in), line);
                    line.drain();
                    line.stop();
                }
            }

        } catch (UnsupportedAudioFileException 
               | LineUnavailableException 
               | IOException e) {
            throw new IllegalStateException(e);
        }
    }

    private AudioFormat getOutFormat(AudioFormat inFormat) {
        final int ch = inFormat.getChannels();
        final float rate = inFormat.getSampleRate();
        return new AudioFormat(PCM_SIGNED, rate, 16, ch, ch * 2, rate, false);
    }

    private void stream(AudioInputStream in, SourceDataLine line) 
        throws IOException {
        final byte[] buffer = new byte[65536];
        for (int n = 0; n != -1; n = in.read(buffer, 0, buffer.length)) {
            line.write(buffer, 0, n);
        }
    }
}

Manage “recommended” dependencies with apt-get, debfoster and custom script

I use apt-get to install Debian packages. By default apt-get will also install all packages that your desired package depends on or that it recommends.

I find that recommended dependencies are often not actually necessary. I use debfoster to carefully review and selectively remove them, to keep my system light and clean. My approach requires this line in /etc/debfoster.conf:

UseRecommends = no

With this setting, debfoster will ignore recommended dependencies and allow you to decide individually if you want to keep them.

Disclaimer

This approach only makes sense if you know exactly what you are doing. Sometimes the removal of “recommended” dependencies can actually break functionality. If in doubt, tell debfoster to keep (Y) the respective packages or skip (s) the decision.

The prune (p) option offered by debfoster is the most drastic removal type and should be used with extreme caution.

Reinstall recommended dependencies

The ant-rdepends command can help you find out which packages recommend a given package (replace PACKAGE with the name of the package you are interested in):

sudo apt-get install apt-rdepends
apt-rdepends -pr --state-show Installed --state-follow Installed --show Recommends PACKAGE

If you ever remove too much, you can reinstall all dependencies (including the recommended ones) of a package using the following script. Save it for example as /usr/local/bin/install-dependecies.sh and use chmod ugo+x to make it executable.

#!/bin/sh

if [ $# -ne 1 ]; then
  echo "Usage: $(basename $0) package"
  exit 1
fi

package="$1"
header="Package $package depends on:"

df_output=$(debfoster -o UseRecommends=true -d $package)
pkg_list=${df_output#*$header}

if [ "$pkg_list" != "$df_output" ]; then
  sudo apt-get install $pkg_list
else
  echo $df_output
fi

Reference info