Sanselan: a Pure-Java Image Library.
By Charles M. Chen, charlesmchen@gmail.com.
Introduction
This Pure-Java library reads & writes a variety of image formats, including fast parsing of image info (size, color space, icc profile, etc.) and metadata.
This library is pure Java. It's slow, consequently, but perfectly portable. It's easier to use than ImageIO/JAI/Toolkit (Sun/Java's image support), supports more formats (and supports them more correctly). It also provides easy access to metadata.
Although not yet version 1.0, sanselan is working and is used by a number of projects in production.
It is Open Source; free as in freedom and free as in beer.
Sanselan has been invited by the Apache Software Foundation into their Apache Incubator project. The proposal has been accepted, and Sanselan is in the course of migrating. I'm honored to migrate this project to that wonderful organization. I invite you to get involved!
Links at Apache:
Definitions
In this document, Image Info refers to things like image size, bits per pixel, color space, transparency, etc. Image Metadata refers to structured metadata (ie. EXIF) embedded in an image format (ie. JFIF), for example, Geocoding, time taken, encoder info, etc. Image Data refers to the raw data that is interpreted to decode pixel info.
Contents
Downloads
Version 0.91 released February 5th, 2008.
Once again, I hope this is the last non-apache release. =)
Binary: sanselan.jar
Source: sanselan-src.zip
Source distribution includes binary, source, javadoc/api and this document.
I've removed the dependency on sharedlib.
I've also renamed almost all of the package names (again). Sorry about this; a simple global search and replace should be easy to do.
Sample Usage
 
import org.cmc.sanselan.*;

BufferedImage someImage = null;
byte someBytes[] = null;
File someFile = null;
InputStream someInputStream = null;
OutputStream someOutputStream = null;

// The Sanselan class provides a simple interface to the library. 

// how to read an image: 
byte imageBytes[] = someBytes;
BufferedImage image_1 = Sanselan.getBufferedImage(imageBytes);

// methods of Sanselan usually accept files, byte arrays, or inputstreams as arguments. 
BufferedImage image_2 = Sanselan.getBufferedImage(imageBytes);
File file = someFile;
BufferedImage image_3 = Sanselan.getBufferedImage(file);
InputStream is = someInputStream;
BufferedImage image_4 = Sanselan.getBufferedImage(is);

// Write an image. 
BufferedImage image = someImage;
File dst = someFile;
ImageFormat format = ImageFormat.IMAGE_FORMAT_PNG;
Map optional_params = new Hashtable();
Sanselan.write(image, dst, format, optional_params);

OutputStream os = someOutputStream;
Sanselan.write(image, os, format, optional_params);

// get the image's embedded ICC Profile, if it has one. 
byte icc_profile_bytes[] = Sanselan.getICCProfileBytes(imageBytes);

ICC_Profile icc_profile = Sanselan.getICCProfile(imageBytes);

// get the image's width and height. 
Dimension d = Sanselan.getImageSize(imageBytes);

// get all of the image's info (ie. bits per pixel, size, transparency, etc.) 
ImageInfo image_info = Sanselan.getImageInfo(imageBytes);

if (image_info.getColorType() == ImageInfo.COLOR_TYPE_GRAYSCALE)
	System.out.println("Grayscale image.");
if (image_info.getHeight() > 1000)
	System.out.println("Large image.");

// try to guess the image's format. 
ImageFormat image_format = Sanselan.guessFormat(imageBytes);
image_format.equals(ImageFormat.IMAGE_FORMAT_PNG);

// get all metadata stored in EXIF format (ie. from JPEG or TIFF). 
// org.w3c.dom.Node node = Sanselan.getMetadataObsolete(imageBytes); 
IImageMetadata metdata = Sanselan.getMetadata(imageBytes);

// print a dump of information about an image to stdout. 
Sanselan.dumpImageFile(imageBytes);

// get a summary of format errors. 
FormatCompliance format_compliance = Sanselan
		.getFormatCompliance(imageBytes);
Example Code:
Details
Format Support
Format Read Write Notes References
PNG yes yes Supported through version 1.2/ISO/IEC standard (15948:2003). Controlling the exact format when writing is incomplete. Spec Wikipedia
GIF yes yes Both versions 87a & 89a Reading of animated GIFs is supported to the extent that you can read all of the images contained in a GIF, but timing/loop info is ignored. Controlling the exact format when writing is incomplete. Spec Wikipedia
TIFF yes yes Supported through version 6.0. TIFFs is a open-ended container format, so it's not possible to support every possibly variation. JPEG-compressed TIFFs are not supported. Supports Bi-Level, Palette/Indexed, RGB, CMYK, YCbCr, CIELab and LOGLUV images. Supports LZW, CCITT/Huffman and Packbits/RLE compression. Notably missing other forms of compression, though, including CCITT 4 and 6 bilevel and JPEG. Supports Tiled images. Adobe Spec Wikipedia AWare Systems TIFF Tag Reference
JPEG/JFIF no no Can read image info, metadata & extract ICC Profiles. Both JFIF and DCF/EXIF. JFIF spec, @ JPEG Group. Wikipedia
JPEG/JFIF EXIF Metadata yes soon Working on ability to update exif metadata in JPEG/JFIF files WITHOUT modifying image data. Exif Specs, etc.
Wikipedia
AWare Systems TIFF Tag Reference (JPEG EXIF metadata is stored in TIFF directories).
Phil Harvey's exiftool and metadata reference.
Phil Harvey on writing EXIF.
BMP yes yes Mostly Complete. May not read some cursors, icons and OS/2 bitmaps. Controlling the exact format when writing is incomplete. No spec, see: Wikipedia
ico yes no Not complete. Can do simple reads. No spec, see: Wikipedia
PNM/PGM/PBM/PPM Portable Pixmap yes yes Complete. No spec, see: Wikipedia
PSD/Photoshop yes no Basic support. Can only read the first Layer. No support for extra channels. Supports all modes except Multichannel. Can read some image metadata. Unofficial spec spec Wikipedia
Simple ICC Profile parsing is also offered.
Justification
Why another image library? There are so many already.
This library is Pure Java. Consequently it's slow, but perfectly portable.
It is designed to be very easy to use. See the Sample Usage section.
This library supports some variations and encodings missed by all or most other libaries.
Most other libraries offer little or incomplete support for ICC Profiles. Sanselan can extract & (simply) parse embedded ICC Profiles. Moreover, sanselan applies the icc profile by default, coverting read images to sRGB. This means images are color-corrected by default.
Sanselan also lets you read in image info (ie. width, height or colorspace) and metadata without "reading" the entire image. It presents image info and metadata in a format-neutral manner. It also gives easy, structured access to format-specific info.
This library was written with an eye to correctness & code clarity rather than efficiency. Hopefully it is easy to use, easy to extend and can be used to explore images + image formats, rather than just read images for display.
This library is Free Software/Open Source. It is available under the Apache Software License.
Ultimately, other libraries didn't quite fill my requirements, though there are many good ones out there. What's really called for is a free, portable, feature complete library that ISN'T pure Java - ie. one that uses JNI, at least for JPEG, anyhow. The obvious solution would be a JNI wrapper around libtiff, libjpeg, libpng, libgif/libungif, etc. imageloader uses this approach, but is unfinished.
Sanselan also includes a number of useful functions such as guess an image's format by examining its "magic numbers" (header info).
Sanselan aims to be transparent. There are no hidden buffers to dispose, no blocking calls, no native memory to free.
The ColorConversions class offers methods to convert between the following color spaces: CIE-L*CH°, CIE-L*ab, CIE-L*uv, CMY, CMYK, HSL, HSV, Hunter-Lab, RGB, XYZ and Yxy (algorithms courtesy of EasyRGB's).
FAQ
Test image suite
Project Status
Version 0.91 released February 5th, 2008.
  • Fixed a couple of bugs around adding Exif data to files without an existing Exif segment. Thanks to Arnaud Mondit for finding this problem and providing a great bug report.
Version 0.90 released January 31st, 2008.
  • Added some convenience functions for reading and writing GPS data. These are demonstrated in the metadata sample usage classes.
Version 0.89 released January 22nd, 2008. This release was mislabeled as 0.88
  • Added EXIF insert/update/remove functionality. See WriteExifMetadataExample.java for examples.
  • Rewrote JPEG and TIFF parsing.
  • Greatly elaborated the unit test suite and test image suite. In the process found and resolved many bugs.
  • Once again, I hope this is the last non-apache release. =)
Version 0.88 released November 17th, 2007.
  • Restored original package structure. (org.apache.sanselan.* -> org.cmc.sanselan.*)
  • Refactored "byte sources," improving performance reading image data from InputStreams.
  • More code cleanup, mostly removing debugging code and applying naming conventions.
  • Fixed two bugs around pngs: alpha channels weren't be written properly, and alpha channel was not being preserved when reading grayscale pngs.
  • Improved javadocs.
Version 0.87 released October 6th, 2007.
  • Fixed a number of bugs.
  • Began adding javadocs, starting with the facade classes: Sanselan, and every class returned by its methods.
  • This is probably the last pre-apache release.
Version 0.86 released September 17th, 2007.
  • Fixed bug with writing grayscale pngs.
  • Fixed bug with gamma correction when reading pngs.
  • Added image read param that allows control over BufferedImage creation.
  • Removed an erroneous javadoc.
  • Minor cleanup.
Version 0.85 released September 5th, 2007.
  • Cleaned up Tiff image parser and writer.
  • Added compression parameter to tiff image writer.
  • Added an example that illustrates image writing, optional parameters, etc.
Version 0.84 released September 3rd, 2007.
  • Fixed Tiff/Exif bug wherein rational number fields with a zero divisor prevented the metadata from being read, due to a "divide by zero" error.
Version 0.83 released August 30th, 2007.
  • Fixed Tiff/Exif bug wherein Private IFD Tags were not being properly read.
  • Added better metadata sample code.
Version 0.82 released August 30th, 2007.
  • Complete refactor of the image metadata methods. See the new MetadataExample class for a simple example.
  • Converted all of the Sanselan class's methods to static.
  • Cleaned up some old code.
Version 0.81 released August 17th, 2007.
  • Made a couple of methods of ImageInfo public (getColorType() and getColorTypeDescription()).
Version 0.80 released July 25th, 2007.
  • I've begun a overhaul of the codebase in anticipation of becoming an Apache Incubator project.
  • I've changed the package names (again) to be org.apache.sanselan.*.
  • I've removed the dependency on sharedlib.
  • I've removed a great deal of old cruft.
  • I've begun to apply a consistent naming convention to variables (lowerCamelCase) and constant names (ALL_CAPS).
Version 0.79 released June 21th, 2007.
  • I've fixed that pernicious bug in LZW compression. I've switched the default TIFF compression scheme back to LZW.
  • TIFF uses an unusual variation of LZW. For details, see this article.
  • In this case, the bug was: trailing EndOfInformation codes are sometimes omitted. That is, if a EndOfInformation code is the last code of a block, it may not appear.
Version 0.78 released June 20th, 2007.
  • LZW compression is buggy; this only effects writing TIFF. I've switched the default TIFF compression scheme to packbits which performs worse until this can be corrected.
Version 0.77 released June 16th, 2007.
  • I've open sourced the last dependency of this project, sharedlib.
  • I've also renamed almost all of the package names. Sorry about this; a simple global search and replace should be easy to do.
Version 0.76 released September 16th, 2006.
Version 0.75 released September 5th, 2006.
First released September 22nd, 2004.

To do list & known bugs:
  • Refactor interface. Rename Sanselan methods and public class names. Formats should subclass imagemetadata class to include format-specific info ie. GIF's transparency index.
  • More control over writing.
  • Share test image suite with comments.
  • Improve Javadoc & write more FAQs / examples.
  • Allow user to disable autoconvesion to sRGB/Grayscale.
  • Add support for more than 8 bits per channel.
  • Reading all images from .gif files isn't working. see: getAllBufferedImages().
  • Add request/hint params to ImageFactory, per Endre's suggestion.
  • Add DNG metadata/image info read. Perhaps some RAW formats as well.
  • Publish image library (possibly) and links to other libraries.
  • Develop suite of unit tests that only depend on images in the public domain.
License
This text is available under the ASF (Apache) License.
This is a non-viral Open Source license.
Requirements
References
Sample Images
This list is hardly comprehensive, but it is a good starting point; I'll elaborate when I get the chance.
Keywords and Keyphrases
In the interest of improving my goolge karma, allow me to mention:
java image readers, java image file formats, java image parsing or image parsers, java image library, java library, imaging library, image api, java image metadata, java exif metadata, gps metadata, geocoding, timestamp, .JPEG, .JPG, .TIFF, .TIF, .BMP, Windows Bitmap, PNG, .PSD, Photoshop, Gif, etc.
home