Use antiword to extract text from .doc files

Status
Not open for further replies.

Osiris

Golden Master
Messages
36,817
Location
Kentucky
Source Use antiword to extract text from .doc files

I know what you're thinking: “Why not just use OpenOffice to get the text you need?” There's a good reason. If you've ever used one word processor to get raw text from another you know that formatting is often left behind. End of line characters, etc can remain making the cutting and pasting of text from one source to another a problem (especially when going from a .doc file to an html end point.) This has caused me plenty of issues when I have written articles off-line to be pasted into, say, ghacks. I have seen formatting strings left behind only to have to go back and delete them.
When extracting text with a tool like antiword you won't have this problem. And even though antiword is a command-line only tool, it isn't complicated to install or use. With this tool you can either extract the text immediately to standard output (the terminal window) or you can extract it to a text. Both methods are simple, both are effective.
 
Status
Not open for further replies.
Back
Top Bottom