Converting LaTeX to HTML

This is an excerpt of the book “Better Books with LaTeX.” The book comes with a LaTeX template you can use to easily create your own books.

Now that you know how to fill the template, let us take a look at how to activate certain content only for the HTML or only for the PDF output. Afterward, we will examine the technical details of converting LaTeX to HTML, and how to add that capability to an existing project that does not use the template. If you plan to only use the template, feel free to skip section 7.2.

pdfLaTeX   pdfLaTeX is a basic LaTeX typesetting engine that translates LaTeX documents directly into PDFs or HTML files (with the help of tex4ht).

XeLaTeX   XeLaTeX is a LaTeX typesetting engine with an extended font, as well as UTF-8 encoding (for special characters) support. It is slower than the more basic pdfLaTeX.

tex4ht   tex4ht is a tool to translate LaTeX code into a HTML document.

To convert LaTeX to HTML, we need a special compiler, tex4ht. Unfortunately, tex4ht does not work with the default compiler we have set up for the Overleaf project. It only works with pdfLaTeX, not with XeLaTeX or LuaLaTeX. So, in Overleaf , we have to click on the settings icon and select pdfLaTeX as the LaTeX engine. Let it compile, then clear the cache, and have it compile again. If you did not use the template, you might run into some compatibility problems between XeLaTeX and pdfLaTeX. If it is already compatible or if you are already working with pdfLaTeX, you can skip the chapter after the following section 7.1.

  Switching from XeLaTeX to pdfLaTeX

If you are experiencing problems after switching a XeLaTeX project to pdfLaTeX in the project settings, an adaption of the LaTeX code is necessary. As we do not want to make the original XeLaTeX code unusable, we need to add conditional statements. For this, you need to include the ifxetex package:


Then, simply surround XeLaTeX-specific code (or simply code that produces an error) with a “\ifxetex …\else …\fi” construction. Having this compatibility allows you to generate PDF files with XeLaTeX, and also produce HTML documents with pdfLaTeX when you switch compiler settings.

For example, when generating an HTML file, you cannot include PDF files or vector graphics. Instead, you have to rely on JPG and PNG image files. Another application would be if you want to minimize the size of an existing image file for an e-book. A code might look like this:

\adjustbox{max width=.95\columnwidth, max height=.4\textheight}{ 

Yet another example is using different texts for the PDF (designed for print) and the HTML output (designed for an e-book release). The conditional clause allows you to show medium specific text, dates, or formatting:

2016, First Edition 
\textsc{ISBN} 978-3-945586-21-1 
Printed on acid\hyp{}free, unbleached paper. 
Ebook created \today 
\textit{PS: If you want to rate this book, please always add a short text comment. Did you like it? What can be improved? Who would you recommend it to? Without a text comment, your star rating will not be counted on the Amazon website!} 

A further example is footnotes. As e-books do not have pages in the traditional sense, your footnotes would end up in a separate part of the book at the end with a small reference. Given that we do not want the reader to jump back and forth, one approach is to simply include the footnote in parentheses if the output is not set to XeLaTeX (print):

A popular assumption is that the same words convey the same meanings. This is generally only correct if both conversation partners belong to a common \emph{language network}, i.e., that they define their terms either among themselves or through close acquaintance\ifxetex.\footnote{Interesting\else~(interesting\fi~to note here is the theory that every person in the world is connected to every other person by approximately seven intermediate connections\ifxetex~\citep[cf.][]{Travers69anexperimental}.}\else, \cite[cf.][]{Travers69anexperimental}).\fi{}

The last example is handling references. In a printed book, you can add a quotation page at the end to list the sources of individual quotes. This is possible because you can quickly jump to the end of the book and back using a page number, while in an e-book, you have no fixed page numbers and have to rely on links:


  Tex4ht Configuration

If you plan to use the template, feel free to skip this section.

tex4ht   tex4ht is a tool to translate LaTeX code into a HTML document.

On the Overleaf platform, no separate installation for tex4ht is needed. All you need to do is include it in your workflow. In Overleaf , this is done by adding a file named “latexmkrc” in the main directory (and thus overriding the default Overleaf one) of your project and adding a configuration file.

Build chain using different tools to produce different output formats.

latexmk   latexmk is the build tool Overleaf uses to automatically build your LaTeX project. The configuration file latexmkrc can be used to override build settings or add a hook to another compiler (like tex4ht to generate HTML output in addition to the PDF).

First, let us create the latexmkrc file in the main directory of your project and insert this code (depending on your project, if you are not using the template, you might need additional settings from

$pdflatex = "htlatex %S \"htlatex.cfg,MyFonts,NoFonts\" \"\" \"\" -shell-escape > output.txt; pdflatex -synctex=1 %O %S";

This creates a hook in the compilation chain of LaTeX (LaTeX calls $pdflatex at the end of the compilation). All this does is call htlatex before calling pdflatex, giving you an HTML output in addition to the PDF output. It also writes the output of the compilation of htlatex to a new file called output.txt to be used for debugging.

When all compiles, the HTML and debug files will not show up within Overleaf . Instead, you have to actually download the output files (use the drop-down menu at the bottom left in the project window, “DOWNLOAD AS ZIP” and “Input and Output Files”). There, you should check if there is an HTML file in the main directory. That is your converted LaTeX document! You can now easily copy and paste the whole document or parts of it into, for example, a WordPress post and publish it online.

If there is no HTML file, double-check for any errors within Overleaf and check the output.txt. If you cannot make sense of it, just let us know, we can help!

Converting that HTML file into a real e-book format like MOBI, or EPUB takes some extra effort as we need to adjust the settings, take care of the table of contents, add a cover, and optimize our images. We will go over this in Chapter 8.

  HTML Output Formatting

Unfortunately, tex4ht cannot do a 1:1 conversion simply because printed books are based on pages while HTML documents and e-books are continuous texts. Also, formatting, spacing, and images are handled differently, so we need to configure this separately. In the listing above, you can see a reference to htlatex.cfg—that is where the tex4ht configuration resides:

\Configure{DOCTYPE}{\HCode{<!DOCTYPE html>\Hnewline}} 
\Configure{@HEAD}{\HCode{<!-- for beautifying --><link rel="stylesheet" type="text/css" href="site.css" />\Hnewline}} 
% Translate \textbf, \textit and \texttt directives into <strong>, <em> and <code> 
\Configure{textsc}{\ifvmode\ShowPar\fi\HCode{<span class="sc">}}{\HCode{</span>}} 
% Translate verbatim and lstlisting blocks into <pre> elements 
\ConfigureEnv{minipage}{\ifvmode\IgnorePar\fi\HCode{<div class="minipage">}}{\ifvmode\IgnorePar\fi\HCode{</div>\Hnewline}}{}{}% 
% Do not set ‘indent‘/‘noindent‘ classes on paragraphs 

What the file does is configure the mapping between LaTeX and HTML. If you are familiar with HTML, you see that you can configure the contents of the output HTML file with the htlatex.cfg file. It starts with setting up the HTML header and then configures how individual LaTeX commands (emphtextbf textit, …) should be translated into HTML. For example, text formatted in italics (textit) is translated into HTML by using the emphasis HTML tag (¡em¿). The 
 command directly inserts HTML commands in the output file and can also be used in the normal LaTeX files. For example, you can use HCode< hrstyle = ”clear : both”∕ > to directly add a vertical line into the HTML output file and thus the e-book.

CSS   CSS files determine the final design of appearance of a website (or e-book).

Also, in the htlatex.cfg file, the site.css file is referenced. This can be adjusted according to your needs, although in my experience, some of the following settings work nicely for Kindle e-books:

1. You might want to adapt the sizes of the chapter title and section title fonts:

.chapterHead { font-size: 1.5em; margin-top: 0.83em; margin-bottom: 0.83em; font-weight: bold; text-align: left; } 
.sectionHead { font-size: 1.17em; margin-top: 1em; margin-bottom: 1em; font-weight: bold; } 
.subsectionHead { margin-top: 1.33em; margin-bottom: 1.33em; font-weight: bold; } 
.subsubsectionHead { font-size: 0.83em; margin-top: 1.67em; margin-bottom: 1.67em; font-weight: bold; }

2. In Kindle e-books, new paragraphs have indents on the first line. If you do not like that, this is the workaround:

p { margin-top: 1em; margin-bottom: 1em; text-indent: 0.01em; }

3. One way to highlight a quotation:

.quotation { margin: 0.25em 0; padding: 0.35em 40px; line-height: 1.45; position: relative; color: #383838; } 
.quotation:before { display: block; padding-left: 10px; content: "\201C"; font-size: 80px; position: absolute; left: -15px; top: -20px; color: #7a7a7a; } 
.quotation cite { color: #999999; font-size: 14px; display: block; margin-top: 5px; } 
.quotation cite:before { content: "\2014 \2009"; } 
div.quotation { width: auto; }

4. Adding small capitals textsc:

.sc { font-variant: small-caps; }

5. Have description list elements printed in bold:

dt.description { font-weight: bold; }