Tuesday, February 23, 2010
Mallard - yet another doc markup system
Labels: technical communication, XML
Wednesday, February 03, 2010
Reimagining book publishing with XML
It’s time for traditional publishers to follow suit − with a content-centered XML-first publishing approach. Getting there is not the difficult or disruptive process that many publishing executives have assumed. For instance, innovative new authoring tools enable content to be created in XML using interfaces indistinguishable from Microsoft Word. (XML is an open content standard that drastically reduces the effort required of publishing houses to create eBooks — and every other type of content. XML is designed to help publishers break the dependency of content on proprietary formats and specific devices. XML content can be easily repurposed, reused, shared, sorted, aggregated with other content, and automatically processed, published, and delivered, often on-demand.)
Monday, January 04, 2010
XMetal/oXygen editor comparison
Sunday, September 27, 2009
XML Mind Editor 4.5 has more DITA support
DITA support is now bundled in XMLmind XML Editor. This support has been greatly enhanced. It is now as comprehensive as DocBook support in XMLmind XML Editor. Most of the enhancements come from XMLmind DITA Converter.
XMLmind DITA Converter (ditac for short) allows to convert the most complex DITA 1.1 documents to production-quality XHTML 1.0, XHTML 1.1, HTML 4.1, JavaTM Help, HTML Help, PDF, PostScript®, RTF (can be opened in Word 2000+), WordprocessingML (can be opened in Word 2003+), Office Open XML (.docx, can be opened in Word 2007+), OpenOffice (.odt, can be opened in OpenOffice.org 2+).
XMLmind DITA Converter is free, open source, software licensed under the very liberal terms of the Mozilla Public License version 1.1.
Monday, June 01, 2009
Syntext Sern Free XML Editor
Friday, May 01, 2009
DITA - It's just XML
DITA is a sophisticated application architecture with lots of very useful features. People coming to DITA or promoting it, especially in the TechDoc world, tend to focus on the most sophisticated features because they're focusing on business problems for which those features are intended, such as managing large bodies of small re-used information modules across information for many products (for example, mobile phone manuals). That's cool stuff, but it's also pretty complex. It's no suprise that people see in-depth discussions of DITA maps and re-use strategies and localization best practice and say "hold the phone, I just want to get my traditional documents into XML I can understand--I don't need all this fancy stuff."
I'm here to say: you're probably right, you don't need all that whizbang stuff (today), but don't be so quick to reject DITA as a potential solution base.
If you ignore all of the features of DITA that get the technology guys like me excited, you start to see that DITA has two important aspects that tend to get overlooked:
1. At its core DITA is very simple and can be easily applied to simple XML applications that just need to represent things like books and magazine articles.
2. DITA's unique extensibility architecture makes it a much better business value than any comparable XML alternative.
Labels: DITA, technical communication, XML
Thursday, March 26, 2009
Some useful XSL links
- A FOP (Formatting Object Processor) forum - you'll likely be using FOP for printed output.
- Archives of the XSL mailing list on lists.com.
Labels: XML
Tuesday, October 07, 2008
Open XML Format SDK 2.0
The Open XML SDK provides a set of .Net APIs that allows developers to create and manipulate documents in the Open XML Formats in both client and server environments without the need of the Office clients. The SDK should make it easier for you to build solutions on top of the Open XML Format by allowing you to perform complex operations, such as creating Open XML packages or adding/deleting tables, with just a few lines of code.
Labels: Microsoft Word, XML
Thursday, September 04, 2008
XSL-FO tutorial
RenderX, the makers of the XEP formatter used in the DITA Open Toolkit, have a good tutorial on XSL-FO that'll certainly help with this rather arcane subject.
This document gives a quick, learn-by-example introduction to XSL Formatting Objects. I don't discuss subtle details of implementation, but rather provide a series of examples of how to perform routine tasks with XEP — an XSL formatter developed by RenderX, Inc. It is not a manual of XSL FO (XSLFO) in general, and some examples given here may not work in other XSL FO (XSLFO) formatters, or give different results.
This tutorial was conceived as a means to facilitate reading of XSL 1.0 Recommendation of October 15, 2001. The normative text is available from W3C site: http://www.w3.org/TR/2001/REC-xsl-20011015/. You should obtain a copy of XSL 1.0 Recommendation, and refer to it for a complete description of objects and properties mentioned here.
Labels: DITA, technical communication, XML
Saturday, April 19, 2008
DocBook or DITA?
Labels: DITA, technical communication, XML
Monday, February 04, 2008
Online DocBook XML publisher
This is the demo version of Mr. XML Publisher for DocBook. It is an implementation-specific version of Mr. XML Publisher, which is itself a more generalized web app for publishing XML. Mr. XML Publisher is as much "consulting-ware" as it is commercial software. It can run any tool chain your server is licensed to run.
* Use any XSL transformer.
* Format XML with uploaded XSL.
* Pull XML or XSL from a database, create files from the data, and include those files in the formatting.
* Run scripts and executables of any type.
I don't have any DocBook files handy, so I wasn't able to test it. Perhaps they'll consider building a DITA version one of these days.
Labels: XML
Thursday, January 31, 2008
Content reuse with Open XML and XSLT
With just a few lines of XSLT and a few templates we have already written a stylesheet that extracts the basic paragraphs and most important styles from a WordprocessingML document and turns them into HTML that can be viewed in the browser view ...
Similarly, it is quite easy to extend the stylesheet to extract meta information, other styles, or image information from the WordprocessingML document and reuse the content for any modern application scenario, from web publishing via HTML, RSS, or social media formats to mobile web applications and beyond.
Labels: Office 2007, XML
Wednesday, January 30, 2008
DITA2InDesign project started
There's nothing much there at the moment, just a little bit of XSLT code
that demonstrates the general approach I'm taking for generating XML
that can then be imported more or less directly into InDesign CS3. (It's
just in the Subversion code repository at the moment--I haven't gotten
as far as building a separate distribution package).
This is intended to be a community project and I am actively soliciting
participation and contribution from anyone and everyone. While I am
authorized to contribute to the development, it will definitely be a
"spare time" project for me, at least for now.
I will be adding documentation and some Web pages to the project over
the coming weeks as I can.
My intent with this project is to develop a Toolkit plugin and
supporting InDesign scripts and templates that enable publishing
DITA-based content to InDesign with up to 100% automation. I say "up to
100%" because with InDesign there is usually an implicit expectation
that you may need or want to tweak things by hand. But there should be a
class of non-trivial page layouts that can be laid out 100%
automatically given a reasonable level of scripting effort.
Monday, December 17, 2007
Format XML with CSS
I settled on an application called Prince that specializes in converting XML to PDF. While proprietary, it is relatively inexpensive, runs from the command line on Linux and Mac OS X and as a GUI app on Windows, and has many advanced features not available elsewhere. It uses standard CSS to control formatting instead of something like XSL templates or LaTeX markup. In addition to pure XML, Prince can create PDFs from [X]HTML. It supports common image formats such as JPEG, PNG, TIFF, and GIF and a subset of Scalable Vector Graphics (SVG). By default, Prince uses the free Microsoft True Type fonts, available for Linux on SourceForge.
There's also a review of it on the O'Reilly site. There was some discussion about it on the DITA Yahoo group recently, as a possible alternative to the XSL-FO processing pipeline that the DITA Open Toolkit uses, but so far I don't think anyone has got that working.
Wednesday, December 12, 2007
DITA at XML 2007
Thursday, September 27, 2007
Embedding XML in docs
Labels: technical communication, XML
Tuesday, September 18, 2007
Docbook to DITA via ODF
The ODF transforms are pretty interesting. They would make it possible to edit DITA or DocBook documents in OpenOffice--an open source suite of tools that is available to everyone. That's a far cry from the kind of money you have to spend to get a really good editor these days. (Those editors will still be needed for handling content references, at the very least. But it will be interesting to see what can be done using OpenOffice.
But it's the DITA/DocBook transforms that are of most interest for interchange with legacy systems and tools. (There is also the question of how they handle DITA content references and DocBook entity references. But that's one of the tricky details that a concept paper like this can skim over...)
Unfortunately, it's those tricky details that keep people like me from adopting DITA right now.
Tuesday, September 04, 2007
Office Open XML misses ISO certification
Update: This is somehwat in conflict with what I'm reading on the Office XML blog. It looks like the process isn't finished yet.
Friday, August 17, 2007
CSS to XSL-FO
Labels: XML
Friday, July 20, 2007
InDesign CS3 and XML authoring
Labels: FrameMaker, XML
Thursday, July 12, 2007
WordprocessingML document model
The WordprocessingML format represents a stream of content (the data), and the formatting associated with it. Word does not work on this data in a hierarchical manner, nor does it infer a hierarchy when working with it. As such, there is no hierarchy stored in the file format. The way that you impose any type of hierarchy or semantics is through the use of structured document tags (SDTs) like content controls, custom XML, etc.. That hierarchy will then be reflected in the document content and in the file format.
If you intend to use wordprocessingML as a pure data interchange format, and you want the data to be hierarchical in nature, then you will want to use the SDTs in your document for this hierarchy. We actually do this today in our workflows in Microsoft, such as our spec library where we leverage the SDTs to structure the specs for easy interrogation of the spec collection.
Other approaches folks have used to get semantics out of the document would be through the use of styles. Remember though that the Styles are flat since they are just a property of the paragraph or run of text.
The vital thing to understand is formatting itself should not be viewed as structure. The "view" of the data is not PART of the data. The "view" is separate. The fact that you have Heading 2 after heading 1 does not imply a structural relationship between the 2 headings – merely that they LOOK different. In a world that espouses the separation of data and view, this is a great model. There is no attempt to try to invent some hierarchical representation based on the view of the data.
This is well worth a good, close read if you plan on using any of the XML features of Word.
Labels: Microsoft Word, XML
Friday, June 29, 2007
Word to XML and DITA
An easy way to begin, one that content contributors may be comfortable with, is consistent use of the same Word templates for the same document type. Of course, they would then still be free to deviate from the template, which will cause problems down the road.
So there is a new class of tools that look exactly like Microsoft Word, but which can force your authors to create perfectly structured documents. By perfectly structured, we mean that when exported to XML, the document can be validated against a DTD (document type definition) or XML Schema Document (XSD).
Microsoft has provided an API that allows developers to customize Word. They can selectively disable Word's menus to allow only those options that are valid at a given point in the document (context-sensitive controls).
Along with this article, look at the Word to DITA Editors page on DitaUsers.org, which has links to four tools.
Labels: DITA, Microsoft Word, XML
Wednesday, June 13, 2007
Extreme Markup Conference 2007
Extreme is the leading international conference on markup theory and practice. If you have interesting markup applications, difficult markup problems, or intriguing solutions to problems related to the design and use of markup, markup languages, or markup tools; if you want to know what the leading theorists of markup are thinking; if you are the house markup expert and want to spend time with your kind, then you should plan on attending Extreme Markup Languages® 2007.
I looked at the program and it's certainly way past my limited understanding of XML, XSLT, and the like, but it does appeal to the inner geek in me. And Montreal is such a wonderful city, at least in the summer.
Labels: XML
Sunday, June 10, 2007
Altova XML Spy supports Open XML
- Create an XSLT 2.0 transformation to publish data in a Word or Excel document on the Web or your corporate intranet.
- Manually edit some Word XML data and save it back to an Office 2007 format to test the outcome of changes that will be made in an application being developed
- Use XQuery to extract and aggregate financial data from an Excel document and provide it in an XML form suitable for mapping to EDI messages or Web services functions
Labels: Office 2007, XML
Wednesday, June 06, 2007
More on Open XML development tools
Labels: Office 2007, XML
Saturday, June 02, 2007
Open XML Java library
This scenario takes any Open XML document as input, one stylesheet to apply, and makes a restylish document compliant with your organizational formatting.
Remove comments, annotations, document properties, personal information, presentation notes, tracked changes, ... from outbound documents.
Given that the DITA Open Toolkit is based on Java, I wonder if it would be possible to get the two libraries to work together. In any case, it should allow for more sopisticated ways of handling MS Office documents.
Labels: Office 2007, XML
Wednesday, May 02, 2007
Jeni's XML pages
Labels: XML
Friday, April 06, 2007
DITA specialization tutorial
It is of course written as a set of DITA topics, which is interesting in and of itself because a tutorial is a type of document for which the DITA concept/task/reference and highly fragmented presentation paradigms are not necessarily a good match. For example, I discovered that the only way to get prev/next links from one topic to the next within a logical narrative sequence of topics is to set their parent container in the organizing map to "sequence". However, this has the effect of numbering each topic in the sequence, which makes sense for the topics that represent a logical sequence of steps within the tutorial, but not for the purely conceptual overview of what DITA specialization is. (This is what the DITA Open Toolkit does today--whether this behavior is required by the DITA spec is a more subtle question.)
So it raises some issues, like do we need a tutorial-specific set of specializations and corresponding rendering customizations to get the effects I want as a tutorial author, or does the DITA spec need to be refined to reflect these sorts of more subtle rhetorical distinctions? Are my topics that describe a sequence of steps to be performed really task or concept topics (I've coded them as concepts because even in DITA 1.1, the task topic type is too restrictive in the way it represents sequences of steps)?
Thursday, April 05, 2007
Vex XML editor
Vex is an editor for XML documents. The "visual" part comes from the fact that Vex hides the raw XML tags from the user, providing instead a wordprocessor-like interface. Because of this, Vex is best suited for "document-style" XML documents such as XHTML and DocBook rather than "data-style" XML documents.
Vex is based on the Eclipse platform and is Java-based. I haven't tried it yet, but it looks like it might be worth trying with DITA, assuming it can support the DITA DTDs or schemas.
Saturday, March 24, 2007
Requirements for DITA editor
If you are involved with evaluating editors and other DITA tools, try to have a realistic approach to what you are looking for. Recognize that many of these task-assistive features are only just now appearing on higher-end full XML editors. But you don't have to hold all editors up to these standards. For example, an emerging class of DITA editors are components that operate through Web browsers, meaning they must trade off full-featured generality for highly focused function in a small footprint.
As I offered back in 2005, ultimately you must make cost/benefit judgments on the features that mean most to your intended scenarios, business rules that need to be supported, and the willingness of your team to learn some new ways of doing things.
Friday, March 23, 2007
Free online DITA editor
I used it to create a simple task file and it did create valid XML, though it didn't add an XML declaration or namespace information. So you'd probably have to tweak the results in a full XML editor. Of course, I could have missed quite a lot in my brief foray into the editor. I did find that right-clicking on an element opens a properties dialog that allows you to add attributes.
Thursday, March 22, 2007
CMS requirements for DITA
So I thought I would try to outline what I think the key DITA non-obvious content management features are that any CMS that claims to provide DITA support should provide. I will not state what should be obvious requirements related to the creation and management of links, the ability to search on content and metadata, and so on.
Labels: content management, DITA, XML
Tuesday, March 20, 2007
Quadralay demos DITA adapter
This will give you a lot more control over the output of your DITA projects than you can get through the DITA Open Toolkit. You'll still need another option for PDF output, as in the first release, the adapter won't output to PDF. It does look like a good choice for organizations that are still moving to DITA, and need to integrate structured and unstructued content.
Sunday, March 18, 2007
XML trials and tribulations
Labels: technical communication, XML
Tuesday, March 13, 2007
Using OpenXML to draft bills in Florida
Basically, as a bill goes through the legislative process, amendments are added. So every day, someone needs to go through those amendments that were adopted the previous day and re-generate the bill with those new amendments. They've customized Word 2007 and with the OpenXML formats make it super easy for the people generating the new draft of the bill to bring all the amendments in.
They leverage the OpenXML formats and SQL server as a way of storing the various amendments. They then built some custom UI into Word 2007 to expose the amendments to the guys regenerating the bill so that they could easily insert them.
From a business perspective, the new XML formats in Office 2007 offer the most benefit from upgrading. While the new interface does offer some productivity benefits once users learn it, the ability to work with chunks of XML content in a document provides opportunities to manage content in new and exciting ways. Companies who are sticking with older versions of Office are missing a real opportunity to establish new and more efficient workflows.
Labels: Office 2007, XML
Thursday, January 25, 2007
Adobe goes to Mars
One of their projects is called Mars, and it's an XML implementation of the PDF format. From their web site: "The Mars file format incorporates additional industry standards such as SVG, PNG, JPG, JPG2000, OpenType, Xpath and XML into ZIP-based document container. The Mars plug-ins enable recognition of the Mars file format by Adobe Acrobat 8 and Adobe Reader 8 software."
You can download a plug-in for Acrobat 8 that lets you save PDF documents in the MARS format.
I'm speculating here, but I wonder if we might see Mars integrated into the next release of FrameMaker. It would certainly be a viable alternative to MIF, which is getting pretty long in the tooth, and tie into Adobe's increased efforts to market FrameMaker as an XML publishing tool.
Labels: FrameMaker, technical communication, XML
Sunday, January 14, 2007
XML 2006 Proceedings online
Another ancient subject that seems to be popping up again is the idea of modular document creation. This is one of those concepts that comes through about once a decade, seduces all the writing managers with the prospect of greater efficiency, takes over entire writing departments for a couple of years, and then falls out of favor as people finally realize that document reuse is not a solvable problem in document delivery but rather an intractable problem in document writing — which is, how to retain any sense of logical connection between pieces of information while writing as if your target audience consisted entirely of people afflicted with ADD.
I could go on at length about this, but instead I'll simply leave you with the observation that my personal love affair with modular documentation occurred in 1978 and that I haven't seen a thing since then that would change the conclusions I reached about it almost thirty years ago. This is not to say that I'm trying to discourage the technical writing community whence I came from their enthusiasm for the modular authoring technology du jour, since engagement in such efforts is virtually guaranteed to buy tech writers a few years in which they can act like software engineers and present themselves as engaged in cutting-edge informational technology development rather than plain old technical writing. That strategy has worked great for some of us.
Labels: technical communication, XML
Thursday, January 11, 2007
Using XML in Office 2007 documents
* When the user types into the controls, the corresponding data in the data store is updated in real time (so the custom XML is always live and up to date).This means that finding out the "data" of the document is as simple as pulling out the appropriate XML data store part.
* When the data is updated inside or outside of Word, the corresponding controls are updated – so the contract that you see can be changed simply by editing the custom XML that lives with the document. That custom XML has no Word-specific information in it, and is therefore extremely easy to read and/or write.
Brian Jones also has a post on the same subject, which links to some other articles that go into more detail on what you can do with the new capabilities.
Labels: Microsoft Word, Office 2007, XML
Monday, December 11, 2006
XMetal 5.0 is out
Thursday, November 16, 2006
How to use SpreadsheetML in Excel
Also, you'll notice that unlike a typical table format (like HTML, CALS, etc.) the XML above is representing a spreadsheet. It's a subtle difference when working with simple examples like this, but becomes more obvious as you move into more complex spreadsheets. One noticeable difference right away though is that we don't write any elements down for the empty cells B2:C4. If there isn't any data in a cell, then you just don't write anything. This is a bit of a different model from table formats that are more presentation based.