Scott Granneman

Contact | Site Map | Search
HomeWritingPresentationsTeachingWeb DevTech InfoUseful LinksPersonal
Home > Tech Info > Windows > Extract content from Word

Get the content out of a Microsoft Word document without Word

Word2x

http://word2x.sourceforge.net/ ~ "Word2x is a GPLed program for converting word documents to text without any Microsoft software to help you ... The currently supported output formats are plain text, LaTeX and HTML. The program converts word to a central format and output modules write the desired target format."

Class: MS files converter

http://phpclasses.upperdesign.com/browse.html/package/388 ~ PHP. "Class to convert any document, that can be read by MS Word, to another format supported by Word ... Converts extensions from/to: doc, dot, txt, rtf, htm, html, asc, wri, wps ... Only for Windows as it uses COM."

wvWare

http://wvware.sourceforge.net/ ~ "wv is a library which allows access to Microsoft Word files. It can load and parse Word 2000, 97, 95 and 6 file formats. ... wv compiles and works under most operating systems. Although most development is carried out with Linux, wv should work on BSD, Solaris, OS/2, AIX, OSF1, and even (with varying levels of success) AmigaOS VMS. The GnuWin32 project maintains a port for Windows ... wv allows other programs access to Word documents for the purpose of converting them to other formats."

Jakarta POI - Java API To Access Microsoft Format Files

http://jakarta.apache.org/poi/ ~ "The POI project consists of APIs for manipulating various file formats based upon Microsoft's OLE 2 Compound Document format using pure Java. In short, you can read and write MS Excel files using Java. Soon, you'll be able to read and write Word files using Java. POI is your Java Excel solution as well as your Java Word solution. However, we have a complete API for porting other OLE 2 Compound Document formats and welcome others to participate."