Hello, EPUB! 👋
EPUB (Electronic Publication) is a format for ebooks. It’s the most widely used open format for digital books. EPUB files can be read on most all devices. Sometimes not automatically, but then there are converters that transform them to create different formats that are usable for your device.
A different format exists, which is unfortunately more popular: the Kindle format, used by Amazon. In fact, there are many Kindle formats: AZW (KF7); AZW1; AZW3 (KF8), AZW4, AZK (summed up as “AZK is another binary format created by Amazon for reasons that are currently not clear”), AZW8, and many more. Kindle is a proprietary mess, and I’m not interested in ranting or writing about it.
There are many ways to create EPUB files, such as Word, Pages, or InDesign, but this article shows that it’s code, which can be read and written by hand too!
What’s in an EPUB?
EPUB can be thought of the web platform (HTML, CSS, images, sometimes JS), glued together with XML, archived together with an
.epub extension instead of
A good way to learn more about the web is View Source. You can do that to with EPUBs: rename the file from
book.zip and unzip that with your favorite tool (or do
unzip path/to/book.epub -d path/to/directory on unix/macos), then open the result in your text editor of choice to see it’s made.
To create an EPUB, a couple things are needed. What we’re going to create here is a directory with the following files:
book/ META-INF/ container.xml content.opf index.xhtml mimetype
First, a file called
mimetype, without extension, must exist in the root of the folder, with following value exactly copied over:
Note: there must not be a newline at the end of that file.
The existence and contents of the file signals that a ZIP archive, which could be huge and slow to unzip, represents an epub ebook.
There isn’t much else to the file. But if you’re interested, it’s specified in EPUB OCF 3.2 § 4.3.
The second requirement is that a
container.xml file exists, in a
The contents of
container.xml looks like this:
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container"> <rootfiles> <rootfile full-path="content.opf" media-type="application/oebps-package+xml"/> </rootfiles> </container>
There is a tiny bit more to this file if you’re going to do weird things, but in 99.9% of cases this is exactly what you need. If you’re interested, it’s specified in EPUB OCF 3.2 § 18.104.22.168. The role of this file is to point to the next XML file, through the path defined with the
full-path attribute on the
It could point to a different place (often:
OEBPS/content.opf). You’re free to structure your books as you please, but for this example I’m keeping the file structure as flat as possible.
The last XML file that is needed is
content.opf. It can be placed anywhere in the book, but must be referenced correctly by the
The bare minimum, with some extra metadata, looks like so:
<package version="3.0" xmlns="http://www.idpf.org/2007/opf" unique-identifier="bookid"> <metadata xmlns:dc="http://purl.org/dc/elements/1.1/"> <dc:title>Book Title</dc:title> <dc:creator>Author Name</dc:creator> <dc:date>2020-02-27T00:00:00Z</dc:date> <dc:rights>Copyright © 2020 Author Name</dc:rights> <dc:language>en</dc:language> <meta property="dcterms:modified">2020-02-27T00:00:00Z</meta> <dc:identifier id="bookid">tag:example.com,2020:book-title:1</dc:identifier> </metadata> <manifest> <item id="index" href="index.xhtml" media-type="application/xhtml+xml" properties="nav"/> </manifest> <spine> <itemref idref="index"/> </spine> </package>
The file consists of a single
<package> element, which includes three sections each in separate container elements:
<metadata>, which is sort of like the
<head>in HTML (or
<manifest>, which lists all the files that make up the book with their media types
<spine>, which defines the items in
<manifest>that make up the content of the book, and in what order they are to be placed
The last part,
<spine>, is important to clarify: as books are typically long form, and rendering hundreds of pages at once takes a while, ebooks are split up in separate files, and concatenated together by EPUB readers.
For this small example, a single content file is fine, but typically books are split up per chapter, or maybe even per section.
properties attribute on
<item> elements adds some extra data to that entry. In this case, it defines that the file it’s on contains the table of contents. Something like
Another interesting part is that the
<package> element has a
unique-identifier attribute, which points to another element by its
id attribute (which should be a
<dc:identifier>). Any URI can be placed in that element. It could be an ISBN (such as
isbn:978-1-234567-89-1), but there is no reason to get one for ebooks (and it often costs money). You can also use a UUID (such as
urn:uuid:B9B412F2-CAAD-4A44-B91F-A375068478A0). I like using the tag URI scheme.
The value of
<dc:identifier> is a bit like the semver major: it shouldn’t change for minor changes.
More info on the
content.opf file is available in EPUB Packages 3.2 § 3.4.
Last, we need content! We’ve set up the needed ancillary files to make that happen. We can now add a file, let’s call it
index.xhtml, and add the following to it:
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops"> <head> <meta charset="utf-8"/> <title></title> </head> <body> <h1 id="hello-world">Hello, World!</h1> <nav epub:type="toc"> <h2>Contents</h2> <ol> <li><a href="#hello-world">Hello, World!</a></li> </ol> </nav> </body> </html>
Important to note here that the syntax is in XHTML. Not HTML. They are slightly different syntaxes. With HTML, the browser does more work to assume you had the best intentions. With XHTML, you will be yelled at if you don’t put a slash on
<img/>, amongst other reasons.
Otherwise, this looks very similar to the HTML needed for a website: a
<body>, the same semantic elements.
What may stand out though, is that it’s pretty big for a Hello World! That’s mostly because of the table of content. Almost all apps or ereaders support a quick way to get to that landmark, and it’s something readers expect. So it’s a required feature in EPUB files.
<head> doesn’t matter as much in EPUB as when building a website, you still use it to link to CSS, but the metadata that typically is in
<head> is now pulled out into
A namespace is defined on the HTML element with
xmlns:epub, linking the prefix
epub to the namespace
http://www.idpf.org/2007/ops, and later defining
epub:type="toc" on the
<nav> element. There are other extra things you can do with the
epub:type attribute, such as footnotes, but that’s for another time.
Finally, what needs to be done is to combine those files into a ZIP archive. Unfortunately, using your favorite ZIP archive tool won’t work, as there are some peculiar things needed for EPUBs. From the directory where you have your book files, do the following in a terminal:
zip -0X book.epub mimetype; zip -0DXr book.epub . -x **/.* *.epub
Note: this works on macOS, and I unfortunately don’t have experience with how to do it on other operating systems. Do let me know if you do!
What this does is create a file called
book.epub in your book directory, where
mimetype is the first entry, and adding everything else (except for hidden files or EPUBs), and not compressing the archive.
If you prefer GUIs over bash one liners, which is very understandable, some of them are listed here.
This gives us an EPUB file,
book/ META-INF/ container.xml + book.epub content.opf index.xhtml mimetype
Which you can load up in for example Books.app. Or Adobe Digital Editions. Or some other ereader app that you prefer!
I hope this walk through how to create a Hello, World! for EPUB shows that, while a bit much and weird, it is doable to create EPUB files yourself, by hand!
There’s a lot more to them though. I think I’ll write more about EPUBs in the future. When I do, I’ll link that up here!