Use Pandoc to Easily Convert Text File To PDF or eBook Format

In the last article, we learned how Markdown can quickly help you produce clean HTML code to be used in a website or blog. But what if you also want to produce an ebook using the same content as you have on the web? While the Markdown tool set is targeted at creating web content, there is another tool that allows you to take Markdown and turn it into OpenOffice/LibreOffice documents, PDF’s, or even e-books suitable for a Kindle or other e-reader – Pandoc.

Installing the pandoc package on an Ubuntu system is dead simple with the following command:

sudo apt-get install pandoc

Once installed, you can immediately use Pandoc in place of Markdown to create HTML with the following command:

pandoc -r markdown -w html -o *yourfilename*.html *yourfilename*.md

The syntax and flags are as follow:

  • “-r” – read format
  • “-w” – write format
  • “-o” – filename of the output

What the above command do is to read from a markdown file and output the file in HTML format with the same filename.

The above example outputs the file in HTML format, but you can use Pandoc to generate other formats as well.

If you’ll need to exchange your document with people using a more generic office suite, such as OpenOffice/LibreOffice or Microsoft Office, you can convert it to ODT format using Pandoc. If you think you’ll do this often, it’s useful to set up a template beforehand. Firstly, create a simple document (such as a header and a line or two of text) in Markdown and convert it to ODT with the following command:

pandoc -r markdown -w odt -o pandoctemplate.odt *yourfilename*.md

Then, open the “pandoctemplate.odt” file in Open/LibreOffice to change the fonts, spacing, margins, etc… to your liking. Be sure to use Styles to configure this – some details on the use of styles are available here. Once your document is set up to your liking, you can use it as a template for creating ODT files from Markdown in the future by adding it to the above command:

pandoc -r markdown -w odt --reference-odt=pandoctemplate.odt -o *yourfilename*.odt *yourfilename*.md

Now when you convert a Markdown file to ODT, it will automatically be formatted with the styles you have created earlier. Pandoc also supports conversion to the new (version 2007 and later) Microsoft Word format with the flag “--reference-docx=templatefile.docx“.

pandoc-odt-template-in-markdown

pandoc-odt-template-in-odt2

pandoc-odt-converted-with-template

When I need to generate PDF files from Markdown, I’ll most often convert it ODT, and use either LibreOffice’s Export to PDF function, or if it’s a large group of files, the “unoconv” command line utility. If you’re a LaTeX user, and have a number of packages installed (this section of the Pandoc documentation describes what’s required), you can output PDF’s with the following command:

pandoc -r markdown -o *yourfilename*.pdf *yourfilename*.md

Note the absence of the “-w” flag in this case.

To publish e-books suitable for most electronic readers (ePub is a format handled by almost all readers), you may want to have some items specific to that format prepared in advance. These include:

  • A stylesheet, written in CSS, that describes how the ePub will look
  • Metadata, such as the creator, description, rights to the work, and language
  • A cover image

If you don’t have these, however, Pandoc will use some reasonable defaults. The following command will convert your Markdown document to an ePub:

pandoc -r markdown -w epub --epub-metadata=*metadatafile*.xml --epub-cover-image=*coverimage*.jpg --epub-stylesheet=*stylesheet*.css -o *yourfilename*.epub *yourfilename*.md

Here are some additional tips and tricks I use in the course of using Markdown for my writing tasks:

  • Since it’s plain text, if you use DropBox to keep files in sync between devices, you can use the built-in text editor to create or update your Markdown documents on the Web. There are also editors available for Linux (I happen to like ReText a lot) and Android (I’ve been switching between Writer, Epistle, and the code editor DroidEdit lately).
  • Also, since it’s plain text, concurrent versioning systems (such as Subversion) do an excellent job of tracking versions and showing the differences between them.
  • Once you’ve converted a couple of documents, and know which flags you need for all the formats you want, you can create a simple shell script that will output them all at once.

I’ve found Markdown to be an excellent way to draft content, in a “distraction-free” environment (most plain text editors are), that supports output to multiple formats, yet doesn’t require any dedicated applications.

Image credit: Typewriter closeup shot, concept of Chapter one by Big Stock Photo.