make.index.markup.xml #1

<refentry xmlns="http://docbook.org/ns/docbook"
          xmlns:xlink="http://www.w3.org/1999/xlink"
          xmlns:xi="http://www.w3.org/2001/XInclude"
          xmlns:src="http://nwalsh.com/xmlns/litprog/fragment"
          xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
          version="5.0" xml:id="make.index.markup">
<refmeta>
<refentrytitle>make.index.markup</refentrytitle>
<refmiscinfo class="other" otherclass="datatype">boolean</refmiscinfo>
</refmeta>
<refnamediv>
<refname>make.index.markup</refname>
<refpurpose>Generate XML index markup in the index?</refpurpose>
</refnamediv>

<refsynopsisdiv>
<src:fragment xml:id="make.index.markup.frag">
<xsl:param name="make.index.markup" select="0"/>
</src:fragment>
</refsynopsisdiv>

<refsection><info><title>Description</title></info>

<para>This parameter enables a very neat trick for getting properly
merged, collated back-of-the-book indexes. G. Ken Holman suggested
this trick at Extreme Markup Languages 2002 and I'm indebted to him
for it.</para>

<para>Jeni Tennison's excellent code in
<filename>autoidx.xsl</filename> does a great job of merging and
sorting <tag>indexterm</tag>s in the document and building a
back-of-the-book index. However, there's one thing that it cannot
reasonably be expected to do: merge page numbers into ranges. (I would
not have thought that it could collate and suppress duplicate page
numbers, but in fact it appears to manage that task somehow.)</para>

<para>Ken's trick is to produce a document in which the index at the
back of the book is <quote>displayed</quote> in XML. Because the index
is generated by the FO processor, all of the page numbers have been resolved.
It's a bit hard to explain, but what it boils down to is that instead of having
an index at the back of the book that looks like this:</para>

<blockquote>
<formalpara><info><title>A</title></info>
<para>ap1, 1, 2, 3</para>
</formalpara>
</blockquote>

<para>you get one that looks like this:</para>

<blockquote>
<programlisting>&lt;indexdiv&gt;A&lt;/indexdiv&gt;
&lt;indexentry&gt;
&lt;primaryie&gt;ap1&lt;/primaryie&gt;,
&lt;phrase role="pageno"&gt;1&lt;/phrase&gt;,
&lt;phrase role="pageno"&gt;2&lt;/phrase&gt;,
&lt;phrase role="pageno"&gt;3&lt;/phrase&gt;
&lt;/indexentry&gt;</programlisting>
</blockquote>

<para>After building a PDF file with this sort of odd-looking index, you can
extract the text from the PDF file and the result is a proper index expressed in
XML.</para>

<para>Now you have data that's amenable to processing and a simple Perl script
(such as <filename>fo/pdf2index</filename>) can
merge page ranges and generate a proper index.</para>

<para>Finally, reformat your original document using this literal index instead of
an automatically generated one and <quote>bingo</quote>!</para>

</refsection>
</refentry>

#	Change	User	Description
#1	26953	Paul Allen	Move //guest/perforce_software/p4convert to //guest/perforce_software/p4convert/main
//guest/perforce_software/p4convert/docs/docbook-xsl-ns-1.78.1/params/make.index.markup.xml
#2	14806	Paul Allen	Update docs and add +w.
#1	13920	Paul Allen	copy part 2 (no errors)
//guest/paul_allen/p4convert-maven/docs/docbook-xsl-ns-1.78.1/params/make.index.markup.xml
#1	13895	Paul Allen	Copying using p4convert-docbook
//guest/perforce_software/doc_build/main/docbook-xsl-ns-1.78.1/params/make.index.markup.xml
#1	12728	eedwards	Upgrade ANT doc build infrastructure to assemble PDFs: - remove non-namespaced DocBook source and add namespaced DocBook source. - add Apache FOP 1.1 - copy fonts, images, XSL into _build, establishing new asset structure. The original structure remains until all guides using it can be upgraded, and several other issues can be resolved. - updated build.xml to allow for per-target build properties. - upgraded the P4SAG to use the new infrastructure. - tweaked admonition presentation in PDFs to remove admonition graphics, and resemble closely the presentation used in the new HTML layout, including the same colors. With these changes, building PDFs involves using a shell, navigating into the guide's directory (just P4SAG for now), and executing "ant pdf". Issues still to be resolved: - PDF generation encounters several warnings about missing fonts (bold versions of Symbol and ZapfDingbats), and a couple of locations where the page content exceeds the defined content area. - Due to issues within Apache FOP, PDF generation emits a substantial amount of output that is not easily suppressed without losing important warning information. - Apache FOP's interface to ANT does not expose a way to set the font base directory. The current configuration does work under Mac OSX, but further testing on Windows will need to be done to determine if the relative paths defined continue to work. The workaround is for Windows users to customize the fop-config.xml to provide absolute system paths to the required fonts. - HTML generation needs further browser testing, and exhibits broken navigation on iOS browsers within the TOC sidebar. - A number of PDF and HTML presentation tweaks still need to be made, for example: sidebars, gui* DocBook tags, whitespace, section separation, etc.