Publishing Technical Documents with ePub

Prerelease Version

HTML Content

Entire books and large portions of the Web have been devoted to the subject of creating HTML content. There are ePub super ninjas who know how to do amazing things with the format. Obviously, a single chapter is not going to even scratch the surface. Instead of trying to cover the whole topic, this chapter will be limited to what your HTML needs to do to be valid for ePub, and what you need to consider when building your CSS.

Valid HTML Content

Both Version 2.x and 3.x have fairly strict notions about what the HTML should look like in your content. For Version 2.x, you can only use strict XHTML 1.1, CSS 2.0, and a limited set of other media types (a full list is here). For Version 3.x, you can use well formed HTML 5 based XHTML, CSS 2.1, and a larger (but still limited) set of other media types (full list here). For both versions, your HTML files should also have the extension .xhtml in order to be valid.

“Well formed” means that all elements need to be wrapped in some root element (which should be <html> and that tags should either be self closing, or have a corresponding close tag, and not be entangled with other elements. For example, every <p> always needs a closing </p>, and that tag should be closed before adding another opening <p> element, or heavens forbid opening a structural element like <div>. However, a standalone <br> should be okay, with or without the self closing aspect, like <br />.

The requirement for strict XHTML in 2.x means you may need to validate the content and may want to use a tool like HTML Tidy to cleanup your HTML. You should also be very careful about the DTD, header information, and other allowed tags if you want to avoid potential problems. Version 3.x uses HTML 5, which has a larger set of allowable tags and is much less strict about its header requirements. The content for Version 3.x still needs to be well formed and as clean as you can get it.

This is much more strict than developing for a web browser. However, the assumption is that you are developing for the lowest common denominator, which is a device that is exclusively used as an eReader. A device that, depending on when it was updated, may be years behind whatever the latest standard is. Tablet-based Readers may take in your ePub just fine with a few small errors in content, but they also use a web browser engine inside which has its own tools to deal with markup problems inherited from the larger web.

So, What do You Mean by “Valid”?

This brings up the question: how do you know if an ePub is valid? There are a number of tools out there to check an ePub and make sure it is valid, but among free open source tools, the gold standard is a command line tools called ePubCheck. You can either go to the release page on the website, and download a zip file containing the .jar compiled version of ePubCheck or download the entire source and build it yourself.

If you’ve built something from source before, the process is straightforward, though you also need to get and build Apache Maven before you can build ePubCheck. If you haven’t built software before, then what you need to do is download all of the source files, follow the instructions exactly as they’ve listed, and hope there’s no hitch. However, you may need go back and do more downloading if you find out you’re missing a dependency, then build those files you’ve just downloaded, and try and build ePubCheck again. And pray there aren’t any more errors or dependencies. Welcome to Free Software! That being said, building from source is the best way to guarantee that the resulting binary will work on your system and is secure.

Once you have obtained the .jar file, you go to the command line and type:

java -jar /path/to/epubcheck.jar /path/to/your.epub

The tool will go through your files, determine whether you are using Version 2.x or 3.x, and validate the file accordingly. Some of the error messages may be cryptic but they always include a line and column number and after some practice (and searching on the web) they nearly always make sense. You may also want to add validation to any scripts you have for building the zip files, as often the first thing you do after building an ePub is to check it with ePubCheck.

Handling CSS

Handling CSS is one of the more thorny issues for ePub. Some Readers do a very good job of using your CSS. Those usually have a browser based core. Others completely ignore any CSS you’ve given them and render the HTML how they see fit. A few are somewhere in the middle of this continuum.

The most important tip for handling this is to separate your CSS and your HTML as much as possible. This means using ids and classes in your HTML to add structure (and hopefully semantics) to your content. The second tip is to design your CSS with fallbacks that are still visually acceptable. Nearly all of the Readers will honor your spacing and sizing as best they can, but some will completely ignore the cool font file you’ve included in favor of whatever defaults the user has set up. If you don’t give the software acceptable alternatives to your core fonts, you may have wasted hours preparing a stylesheet that looks great in Georgia, only to have everything end up being rendered in Times New Roman and suddenly looking not-so-good.

Also, you need to use relative font sizes (like ems), rather than pixels, as much as is practical. This will allow your design to keep the same relative proportions if the user decides to change the font size or switch to a more narrow or wider font than you originally designed for.

Unless you are using FXL, you should also look into designing CSS that uses @media queries to layout differently depending on the screen size. Do a web search for “Responsive Web Design” for tips. Ignore 90% of the buzzwords and other nonsense and just find simple tutorials. A List Apart is a good resource for much of this though the site covers a large number of other topics and may not have details on issues specific to ePub.

Finally, look into a number of sites with tips on “Web Typography”. Much of what is helpful for creating a readable website can also be directly applied to laying out the CSS in an ePub. Font choices, line height choices, and relative sizes between headers and body text are all something that should be considered when designing your ebooks. Butterick’s Practical Typography is a good starting place for people who are new to this topic, though not everything there should be taken as gospel.

You can, of course, leave it up to the Readers to sort out, but being able to control these aspects can allow you to create a book that is both better to read and stands out from the crowd. It’s somewhat of an advanced topic, but worth learning on your own at some point.

A Note on Sass

If you have the time to learn how to use them, there are a number of preprocessor tools that can make it easier to generate CSS that is reliable and easy to reuse. The best tool, in my opinion, is called Sass and it can relieve much of the drudgery and allow you to focus on getting the design right.



Table of Contents