Categories
Editing LaTeX Publishing

Index Creation in LaTeX

This is an excerpt from Better Books with LaTeX the Agile Way. You can get a copy here.

As opposed to their electronic counterparts, printed books do not have a search functionality to find specific words in the text. Instead, they have an index at the end as a service for the reader to quickly find certain parts of the book that he or she wants to read. If you are planning to publish only an e-book version of your book, you could skip this section—e-books do not have indexes because they do not have fixed page numbers: they are formatted differently on different devices. Keep in mind, though, that you might want to create a paperback edition at a later date and that your knowledge about your book is still fresh now. Even without a paperback edition, working on an index will help you to find possible keywords you could use for marketing, and it is a useful method of reviewing your book’s content paragraph by paragraph.

Unfortunately, there is no easy way to generate an index automatically—at least not in the quality a human can. Why is that?

Beyond merely listing all the concepts in a book, a good index is like an intelligent filter. The person creating the index has to think about what a reader might search for and list that word, even though it might never literally appear in the book. Likewise, if a concept consists of multiple words, it might be good to include both variations, for example, “language → mathematics” and “mathematics → language” to refer to the language of mathematics and mathematics as a language.

But how can LaTeX help in this regard? In traditional bookmaking, creating the index of a book is a separate process after the actual book is finished. You can imagine it basically as having the printed book in front of you, then going through page by page, noting which concepts appear on a particular page. This approach gets problematic if you want to make changes to the book that affect the page numbers: you would have to redo the entire index each time.

Indexing in Word

In Word, you can select the word or words you would like to use as an index entry and click on Mark Entry (on the References tab, in the Index group). A dialog shows up (see Figure 16.1) where you can configure the index entry (search for “Create and update an index” to find Microsoft Office help on this topic). Once done, Word switches into a hybrid mode that shows things like line breaks or index entries which are usually hidden (you can activate/deactivate the mode yourself by clicking on the ¶ button). If you want to edit an existing index entry, you have to edit the generated code. For example, the code for a subentry “mathematics” of the index entry “language” looks like this in Word:

 
 
Mathematics{*XE*"language:mathematics"*}  
PIC
Figure 16.1: Marking an entry to add it to the index in Microsoft Word.

Indexing in LaTeX

In LaTeX, insert the index command by adding the \index{…} directly into the text. For example, if you have the sentence “The yellow lab was voted America’s favorite dog again this year,” you could add two entries: “The yellow lab\index{yellow lab} was voted America’s favorite dog\index{favorite dog} again this year.” The page number that shows up in the index will then correspond to the place where you have inserted the index command. If the sentence “The yellow lab was voted America’s favorite dog again this year,” is printed on page 7 of the book, the index will show “yellow lab, 7” and “favorite dog, 7” assuming there is no mid-sentence page break.

The big question is: which words should you index? Let us look at an overall indexing strategy.

My approach in the first phase is to index all the terms that need to be indexed no matter what. Those are:

  • Names of persons. Whenever you mention (or quote) a person, add the index command after his or her name. The format for indexing someone’s name is \index{last name, first name middle name}, for example, \index{Darwin, Charles Robert}.
  • Media titles. Likewise, whenever mentioning a work of art (book, movie, software, etc.), add the index command after its title. If you have the title in your bibliography, \citetitle{bookid} does the job for you and adds the item automatically to the index. The “bookid” stands for the id you have given in the bibliography file. The exception to using \citetitle would be titles that start with an article (a, the) which is usually put at the end (e.g., Last Unicorn, The instead of The Last Unicorn). In those cases (or when you do not reference a book from your bibliography), use \index{title of the [email protected]\textit{title of the work}}. The “@” character is necessary for the indexing engine to recognize the italic font formatting.
  • Concept definitions. When introducing a concept and providing its definition (especially in glossary items), you want the reader to look at a particular passage before any other. When indexing, you can accomplish this by marking those entries in bold by adding “|textbf”: \index{word|textbf}. For example:  
    Science\index{science|textbf} is the formalized process of gaining new knowledge.  

Note that you must not index entries within captions of figures. This will cause problems during compilation. Instead, index the place where the figure is referenced in the text.

Once those basics are implemented, you move into the second phase. Here, you go from paragraph to paragraph and ask yourself each time what concepts are discussed. Add the first occurrence of each concept to the index. This paragraph-by-paragraph approach is the best compromise between accuracy and usability. You save time by adding a particular concept to the index only once per paragraph. The reader sees if a paragraph on a particular page spills over to the next page and will read on. The index page number basically says “start reading here until the paragraph is finished.”

If the same word is indexed in multiple paragraphs, LaTeX will combine them into a single index entry. That is, a single number if all occurrences are on the same page, or a list (or range) of pages where the concept is discussed (e.g., “5-7, 9”, “273-279, 401”, etc.).

To prevent confusing the reader with too many index entries, it is important to not index simply because a particular word is mentioned; the paragraph should explain something about the word or concept in question. Imagine the reader looking up the word in the index, going to the page, and then wanting to read what the word in question is about. For example, take the sentence “A republic is different than a democracy as it sets the constitution as its highest arbiter.” Of course, you would index “republic” in this sentence. But if you also indexed “democracy,” a reader will gain little value from it. From this sentence, the reader does not get any explanation of what democracy is about. If the sentence was instead “A republic is different than a democracy as it puts the constitution, not the people, as its highest arbiter,” the situation would be different. It describes (an aspect of) democracy. Of course, it is not a definition of democracy, so you would not mark it as bold in the index. Alternatively, you can also always use more general concepts in an index. Here, you could use “systems of government” instead of either republic or democracy, especially when you are just comparing different systems of government in that paragraph. Even if you never use the expression “systems of government” in your book, a reader will be satisfied reading the paragraph as it compares systems of government. As the following phases will only remove or combine indexes, it is safe to “overindex” in this second phase. You can even add multiple index entries for individual words. Using the example from above, you could add both “republic” and “systems of government.”

Once you are done with indexing individual words, in phase three, you can take a break and have your editor (or a friend) take the role of a reviewer. Alternatively, take an extended break and revisit your book one month later with a fresh mind. For the review, go one by one through the index, go to the page specified, and ask yourself if the passage really explains the concept. If not, remove the entry from the index.

Phase four then deals with cleaning up this “overindexed” index. Look at the index and see if you can find entries that can be combined. For example, you might find you have the following entries in your index: “Greek alphabet,” “Latin alphabet,” and “Phoenician alphabet.” Here you have to ask yourself if your readers might also search for “alphabet” or already have a specific language in mind. In my book Philosophy for Heroes: Knowledge , I decided for the former and combined the index entries into the category “alphabet” with “Greek,” “Latin,” and “Phoenician” as subcategories. This categorization can be done in LaTeX with the following construct:

 
\index{main category!subcategory}
 
                                                       
                                                         

In our example, use \index{alphabet!Greek}\index{alphabet!Latin}, and \index{alphabet!Phoenician}, respectively. You can also go one level deeper, although that should be the exception. For example, you could categorize “natural numbers in mathematics” as \index{mathematics!number!natural}. In both cases, LaTeX will automatically combine those three entries and arrange them together. With this in mind, I recommend reading a few indexes of the books in your library to learn how the authors combined their concepts. If you want to know more about the theory of knowledge and categorization of language in philosophy, check out Philosophy for Heroes: Knowledge , which explains it in detail.

Next, in phase five, you might want to explain to the reader the case of two different words in the index referring to the same concept. Maybe there is a popular expression for something and (in your field of work) the correct expression for something. In the index, you can point one expression to the other. For example, one application is when citing a person who has different names, maybe a real name and an artist name. Readers might look for either version of the name. For example, the mathematician Leonardo Bonacci is also known as Fibonacci. You might list both names and tell the reader that you are referring to the same person. If you used “Leonardo Bonacci” in your text, you could add an index entry with the following format: \index{one version of the word|see{other version of the word}}. In our case this would be “Leonardo Bonacci was a famous mathematician.\index{Bonacci, Leonardo}\index{Fibonacci|see{Bonacci, Leonardo}}.” Another example and a bit of an inside joke would be recursion: “If a statement relates to itself, it is called recursive.\index{recursion}\index{recursion|see{recursion}}.”

Finally, read through the entire index again to identify double entries, e.g., the same concept listed twice because of a misspelling. These can happen because the LaTeX commands are hidden in any PDF output you might be using for proofreading. That is it!

Here again are the steps of the index creation as a list:

  • Index all the basic terms (titles, people and place names, definitions).
  • Go through the entire text and index all the terms that are explained in a particular passage.
  • Check all index entries by going backward from the index to the text.
  • Combine index entries into groups.
  • Add references from one index entry to another (e.g., for people with several names).
  • Check for spelling errors.

This is an excerpt from Better Books with LaTeX the Agile Way. You can get a copy here.

Leave a Reply