Archive for the ‘XML’ Category

Guest post: A successful DITA implementation

Monday, January 16th, 2012

Guest post by Julia Malkin. This originally appeared on the dita-users mailing list as a response to a query from a list member about whether it was worthwhile for them to implement DITA.

Julia Malkin is a Principal Technical Writer at Endeca. Since 2006, she has pioneered Endeca’s DITA adoption as the DITA project lead and a member of the Information Architecture team. Julia has been following standards and best practices in the DITA arena as well as related authoring, content management, and documentation production tools since 2005, and applied this knowledge when overseeing the implementation of a new XML authoring and production environment for the Endeca Documentation team.

We have implemented DITA at my company during the 2007-2009 years, going from Frame 7 to XMetaL and SVN. The conversion was manual and took so long because we did it gradually, starting from those guides that were the highest priority and/or were new, and then moving all other guides to DITA-based on their priority. I was the project lead and, initially, along with my manager and our editor, we were the team that was both interested in DITA and knew enough to get started. From what you describe, your doc set will definitely benefit from being more modular. It is incredibly useful not having to write content twice or to maintain duplicate content. However, whether getting these benefits is worth the effort in implementing DITA depends on the following factors:

  • The willingness of the team to learn and change their ways. Your team’s success largely depends on whether everyone on the team is willing to commit their time and learn. In addition, there are many tasks in the project that could be shared along the way, and it helps tremendously if your team is willing to pick any of these tasks and drive them to completion.
  • The willingness and ability of one or two members of the team to serve as project managers. This includes creating wiki pages with information that must be shared along the way, continuous project tracking efforts through some project management software (zen, scrumworks, jira), creating and conducting learning demos/presentations to the team.
  • The willingness and ability for one or two team members to serve as information architects. This person would create the new organizational model, decide which units of information you are going to have, and how you will organize them in content sets (maps). This role also includes development of metadata, creation of XML templates for your topics and maps, naming conventions, creation of SVN directory structure, deciding which output types you are going to need and implementing them (PDF and some sort of online output that suits your needs, either Eclipse Help or other), troubleshooting and customizing the production of outputs (online and PDF).
  • Finally, there is conversion of the legacy content. At each of these stages, there are consultants and software you could use to help leverage existing accumulated knowledge, simplify and/or speed up the process. It helps, of course, if you have management support, not just at the doc team level but at the larger departmental level.

Looking back, I am happy that we are now a DITA/XML group and that we made the leap. I am also grateful for the learning and professional growth that occurred along the way. Personally, I would also switch from Frame to a native XML editor, because I believe the transition to typed XML-based content is significant enough that it warrants having a truly XML-based authoring tool. It also helps to use an XML-based editor when you must stop relying on an unstructured writing environment of FrameMaker. However, I recognize that switching editors is debatable, and that many writing groups appreciate the numerous benefits of FM, and value them more over other features, such as native XML support.

XML Mind improves DITA converter

Thursday, December 8th, 2011

Pixware keep improving their open-source DITA Converter tool. The latest release adds Webhelp output – you can see a sample here.

XMLmind DITA Converter (ditac for short) allows to convert the most complex DITA 1.0, 1.1 or 1.2 documents to production-quality XHTML 1.0, XHTML 1.1, HTML 4.1, JavaTM Help, HTML Help, Eclipse Help, EPUB, PDF, PostScript®, RTF (can be opened in Word 2000+), WordprocessingML (can be opened in Word 2003+), Office Open XML (.docx, can be opened in Word 2007+), OpenOffice (.odt, can be opened in OpenOffice.org 2+).

File naming conventions for XML and DITA

Tuesday, November 8th, 2011

Last week, on the frameusers mailing list there was a dicussion of file naming conventions and DITA. File naming conventions aren’t something many writers think much about, but when you’re dealing with DITA they become important as certain choices in naming files can cause the DITA Open Toolkit to throw a hissy fit.  List member Roger Shuttleworth posted a long message with many useful tips, and with his permission, I’m posting it here. Note that some of his advice can be applied to any kind of files, not just those in a DITA-based project.

First, join the dita-users group on Yahoo and search in the archives there. You’ll find lots of good advice from people who know a heck of a lot more about XML than I do. But I can tell you what I have gleaned from various user groups:

  1. Use only lower-case lettering for both files and folders. Why? XML is case-sensitive, so you don’t want authors to stumble over which case to use for href attribute values, for example. It’s the KISS principle.
  2. Remember also that, since XML is all about interchangeability, and your content may end up in a Linux or Unix environment, you should avoid all special characters except for underscores in file and folder names. That includes periods and spaces: ban them in folder and file names.
  3. As far as possible, make sure the file name matches the title of the topic. So if your topic is “Inserting the Widget” then your file name would be “inserting_the_widget.xml”. I realize this may cause problems later if the title changes, but that doesn’t happen often and you can usually make the change to the file name without too much hassle (changing it in ditamaps, etc.).
  4. As you can see above, use underscores in file names instead of spaces. CamelCase is not a good idea, for reasons given above.
  5. We add a suffix to the file name to indicate the topic type: inserting_the_widget_c.xml is a concept; _t for a task; _r for a reference; _tp for a topic (rarely used). That makes it easier to locate the file in a concepts folder, for example.
  6. If you use XML and DITA, you will find yourself working with large numbers of topic files. This may sound intimidating, but with good folder structure it is not a problem.
  7. For ditamaps, make full use of the nesting capabilities (that is, a map referencing a map) so that your maps don’t get too large. It is easier to handle a map that has five sub-maps than to handle one map containing hundreds of topics.

Don’t assume that these rules are irrelevant if you’re not planning on using XML right now. There’s great benefit in using structured authoring and DITA even without going so far as to use XML, but you probably will want to take that step in the future. Don’t paint yourself into a corner.

As regards folder structure, opinions vary. Most people recommend keeping ditamaps in a maps folder, and topics in three or four sibling folders (concepts, topics, reference, tasks). Some people just have a topics folder and keep all topic types together; that may work if your numbers are relatively small. It is good practice to keep relative paths simple, for example, for href attributes). So organize your folders in a shallow structure; something like:

           product_name
              maps
              concepts
              images
              reference
              tasks
              topics

Or:

           product_name
              images
              maps
                  concepts
                  reference
                  tasks
                  topics

That way, your relative paths from map to topic are simple, and from topic to image. What you don’t want are long ../../../../ strings in the hrefs. Since you will be using FM, your maps folder will eventually contain other things apart from maps.

In any case, it is better to keep all the topics context-agnostic – that is, not to do anything that restricts where a topic can be reused. With that in mind, it’s probably easier to organize your folders according to product rather than deliverable. It depends on how many products you are supporting, and how much overlap there is between them. But essentially, for a given product, you would draw from the same pool of topics for all deliverables – PDF user guide, Help, etc. Just use different maps for each, or if using FM, use attribute filtering (not conditional text).

In our case we have a single product that consists of several sections. We divide our topics up according to section, each with the concepts, tasks, reference sub-folders. We also use the @product attributes that exist on several DITA elements so that, if necessary, we can search the file system according to product.

Another recommendation: If you are using XML, use a good text editor that will allow you to do Search/Replace in multiple files and folders. Notepad++ or J-Edit are two that come to mind. They will save you a ton of time. You may also want to become familiar with regular expressions 8^) (not really, but useful if you do!).

One last thing: Take a look at DITA-FMx from leximation.com. It simplifies things a lot. If you’re making the switch in the near future, don’t buy DITA-FMx until the new version comes out in a few months.

In our experience, you can manage many hundreds of files without needing an expensive content management system. Get used to working with XML first before even thinking about a CMS.

Prince software for XML or HTML to PDF

Tuesday, November 1st, 2011

Getting good quality PDFs out of XML or HTML content can be a real struggle. XSLT and XSL-FO are not for the fainthearted. A substantial number of the messages on the DITA users mailing list are about PDF problems with the DITA Open Toolkit.

On the mailing list, I came across a mention of Prince software for converting XML or HTML to PDF. According to their web site:

Prince is an ideal printing component for server-based software like web applications that need to print reports or invoices. Using Prince, it is quick and easy to create PDF files that can be printed, archived or downloaded over the web.

Prince can also be used by authors and publishers to typeset and print documents written in HTML, XHTML, or one of the many XML-based document formats. Prince is capable of formatting academic papers, journals, magazines, and books.

Although commercial licenses are moderately expensive, it’s free for personal use, although it does put a small logo on the first page of the PDF. You can view some samples on their web site – the Wikipedia articles look pretty good.

XSLT and XSL-FO books for trial download

Friday, October 14th, 2011

Ken Holman is making four of his books available for “try and buy” download. The two of most likely interest to tech writers, especially those working with DITA, are Practical Transformation Using XSLT and XPath and Practical Formatting Using XSL-FO. You can download the complete books, but are asked to pay for them if you use them or want to keep them.

Working around XSL-FO page-count limitations in DITA

Thursday, January 13th, 2011

If you use DITA and produce large PDF documents (>2000 pages), you’ve likely run into out-of-memory errors with the XSL-FO processor. Scriptorium has published a white paper that explains how you can work around these limitations.

Formatting Object (FO) processors (FOP, in particular) often fail with memory errors when processing very large documents for PDF output. Typically in XSL:FO, the body of a document is contained in a single fo:page-sequence element. When FO documents are converted to PDF output, the FO processor holds an entire fo:page-sequence in memory to perform pagination adjustments over the span of the sequence. Very large page counts can result in memory overflows or Java heap space errors. Reducing page count in a document is not usually an option.

You can set up processing to divide the document into multiple fo:page sequences to avoid memory problems. The granularity of the content of
each fo:page-sequence does not matter. Each fo:page-sequence could contain a chapter, a sub-chapter, a section, and so on. For a DITA implementation, each DITA topic could be placed in its own page sequence. The key is to place elements in a flat sequence of fo:page-sequences.
Flattening the hierarchy is challenging. The examples in this document will explain the approach.

Formatting Object (FO) processors (FOP, in particular) often fail with memory errors when processing very large documents for PDF output. Typically in XSL:FO, the body of a document is contained in a single fo:page-sequence element. When FO documents are converted to PDF output, the FO processor holds an entire fo:page-sequence in memory to perform pagination adjustments over the span of the sequence. Very large page counts can result in memory overflows or Java heap space errors. Reducing page count in a document is not usually an option.You can set up processing to divide the document into multiple fo:page sequences to avoid memory problems. The granularity of the content of each fo:page-sequence does not matter. Each fo:page-sequence could contain a chapter, a sub-chapter, a section, and so on. For a DITA implementation, each DITA topic could be placed in its own page sequence. The key is to place elements in a flat sequence of fo:page-sequences. Flattening the hierarchy is challenging. The examples in this document will explain the approach.