What—Exactly—Is an ePub?
The extremely abbreviated version: an ePub is a zipped web archive bundled together with XML files that describe the content of the ePub document in excruciating detail.
To meet the ePub standard there are a number of additional restrictions. The HTML in the web archive has to be well formed XML (unlike most web pages). You can only include the specific file types listed by the standard. Every file referred to by the web page has to also be listed in the XML manifest document. You must also include a navigation document that lets ePub reader apps know what order you want the web pages to be presented in.
The best way to get a quick handle on this is to take an example ePub file (Project Gutenberg has quite a few). Move the file to an empty directory, and then unzip it. A number of files will appear with names like
container.xml as well as a directory with a group of .xhtml files that you can open in any web browser.
Once you get a handle on all of these files, you’ll have everything you need to start creating your own ePubs. In this chapter, we give an overview of how these files work together to generate an ePub document.
The directory structure of your new ePub should look something like this:
ePub Root | |-- mimetype |-- META-INF | |-- container.xml |-- com.apple.ibooks.display-options.xml | |-- OPS (or other name) | |-- package.opf |-- toc.xhtml |-- toc.ncx (optional for ePub 3.x) | ... < Other Web Files Here >
There is some flexibility in the naming and structure for these files, but a few of them are required to have the exact same name and many others are just carried over by convention.
This file is required, must be named
mimetype without an extension, must consist of
application/epub+zip in plain text with no return at the end, and needs to be zipped in a particular way (see Zipping Your ePub). This file is primarily for the benefit of standalone eReader devices, but needs to be handled correctly if you want a valid ePub document.
META-INF and container.xml
This is also a required directory structure and is described in detail here. The specification says that the
META-INF directory can contain a number of optional XML files in addition to
container.xml. However, most of those files are rarely used, and some of them contain the same data used in parts of the required
package.opf file, which is described later.
container.xml file is a fairly simple file that points to the location of the root package file. For example:
<container xmlns="urn:oasis:names:tc:opendocument:xmlns:container" version="1.0"> <rootfiles> <rootfile full-path="EPUB/package.opf" media-type="application/oebps-package+xml" /> </rootfiles> </container>
The naming scheme can be fairly flexible in that you can name the required .opf file whatever you like and place it in whatever directory scheme works for you, as long as it is consistent with whatever you put in the container.xml file. This consistency is required so that ePub Reader apps (from here on, “the Reader”) can find the files needed to build the ePub. The only other restriction is that you must keep the
META-INF directory clean of the .opf file or any other content files.
This small XML file is not part of the specification, but is used to set various options used by iBooks when working with Version 2.x of the spec. This file may not be needed if you are using Version 3.x of ePub as the same options are read from the .opf file in that case. The documentation for iBooks content is the iBooks Asset Guide.
The most common option you set is the one to tell iBooks prefer the fonts specified in the CSS rather than whatever the user default fonts are. This is very important if you’ve included custom font files, or put alot of work into the CSS, but less important otherwise.
<?xml version="1.0" encoding="UTF-8"?> <display_options> <platform name="*"> <option name="specified-fonts">true</option> </platform> </display_options>
This is more of a convention than a requirement as you could just place everything in a directory named “EPUB” or “Awesome-Files”. You can even put them all in the root directory if you want, though I wouldn’t recommend it. The important thing is that whatever scheme you use is consistent with what is listed in the container.xml file.
The OPF file (often named “package.opf”) is an XML file that contains the meta data for the ePub document, a manifest of all the files used by the ePub, and a basic navigation structure for the Reader to use. This is the most detailed file in the specification and will require an entire chapter to break it down. It is also the source of the largest differences between ePub 2.x and 3.x.
toc.xhtml and toc.ncx files
These are navigation files that are either XML or XHTML and are described in detail in their own chapter. These are required by the Reader so that it can create a Table of Contents for the user.
Other Web Files
This can be broken down into different subdirectories or other groupings based on whatever makes sense to you. The only requirement is that whatever grouping you use is consistent with what is listed in the OPF, navigation, and HTML content files. Otherwise, the Reader software will get confused and be unable to find required files.
You’ve seen the inside of an ePub, which is basically a zipped HTML archive with XML that describes the contents for the benefit of an ePub Reader. In the next chapter, we will cover the Open Package Format in detail which is the most detailed part of the ePub specification.
Zipping Your ePub
Unfortunately, zipping an ePub document is not as straightforward as just building a zip file and renaming the extension to “.epub”. The specification requires that the
mimetype file be the first thing in the zip binary and also that it remain uncompressed unlike the rest of the document. This also makes the zip procedure one of the first things you are likely to script from the command line.
The command itself is:
zip -q0Xj $epub_filename $path_to_mimetype
with the various options basically amounting to “quiet operation”, “store don’t compress”, “eXclude any extra file data”, and “don’t record directory names”.
From there you’ll have a new zip file that you can then add the rest of the ePub file to directly either from the command line or from another scripting API.
To add files to the zip using the command line, use something like this:
zip -rg $epub_filename $path_to_META-INF -x \*.DS_Store
with the various options being “recursively add”, “grow (append to) existing file” and also “exclude .DS_Store” which is a hidden file on Mac OS X. You may want to add other hidden files to the exclude list, so the syntax is still good to know.
This last zip command should be repeated for
/EPUB/ or any other other directories that contain required files, and your ePub should be complete.