In order to get a clean ePub file from a PDF you’ll need a couple basic tools:

1. Adobe Acrobat to view the PDF and to convert it to HTML

2. Sigil, a WYSIWYG editor for HTML

3. Calibre to view/manage your ebook files

4. IDPF’s ePub validator

5. Kindle Previewer to convert your clean mobi to Kindle formats

The most basic steps to convert a PDF to ePub are:

From Adobe Acrobat, export your book to HTML web page.

Open the HTML file you created in Acrobat from Sigil.  What you’ll see is an idea of what the document will look like in an ebook reader.  This is usually very, very messy at this point.  Go into split screen mode, where part of your screen is text and the other part is the html code.  At the very top of the html code is the .css that will control text formatting.  One of my favorite ways to clean up a document in Sigil (and something I just discovered) is that I can delete almost all of the .css information at the beginning of the file.  This is located between one set of ‘<style>’ tags.  There are 2 sets of .css information that you will see at the beginning of your first html file.  The longest and most complicated one is the one that you can delete.  There will be another shorter one that needs to stay.  When the unnecessary .css is gone, the text will all look pretty much the same, with the possible exception of some headers.  All of this information came over from Acrobat and it’s unnecessary.  Part of creating an ePub is to strip the file of unneeded formatting and get down to fairly raw text.  It’s still possible to add indents, italics, and bold within Sigil, as well as cover files and other images.

Sigil will allow you to add chapter breaks, edit the table of contents, and update the metadata. If you added bookmarks and links in Acrobat, you will have linked a document table of contents that appears at the beginning of the file to the header locations throughout the book. This is handy but not completely necessary, because anytime you tag something in your document as a header, that will become part of your table of contents. Chapter breaks will create a new file within Sigil and the software will create navigation for you. Also, whenever you update anything in the HTML editor, it will refresh the code for you when you switch to book view. Be sure to save often. During your first save of the HTML file, Sigil will prompt you to save as .epub.

Once you’ve created an ePub file, you’ll need to send it through IDPF’s ePub validator. This can be one of the most frustrating parts of the process, as you have more than likely spent hours and hours tinkering with code, and now it’s saying there’s something wrong with the file. The most obvious ones are: be sure to use an alt tag when inserting images, or when images already appear in the text. Another one is for links, specifically links to other locations in the ebook itself. Links within the book need to have a specific end point. If that end point isn’t identified, you will get an error message. Be sure to check your metadata as well. Some of this information, like title, is mandatory, and will throw an error message if it’s not included.

When you have a nice, clean epub, you’re ready to convert it to mobi. Kindle Previewer uses KindleGen to create a mobi file from epub. One of the most common error messages you might see here is if you forgot to designate the cover file. To do this, right-click on the cover image in Sigil, then go to “Add Semantics/Cover Image.” This would be a good time to run Sigil’s onboard validator (the green checkmark) to see if the IDPF validator missed anything, like an unused image.

Tagged on:             

One thought on “Basic Steps to Convert a PDF to ePub

  • February 26, 2012 at 10:18 am

    Wow very impressive, what a girl she is so talented and I am not biased.

Comments are closed.