Converting Word files to HTML files
Problem: Microsoft Word 97 and 2000 let you save Word .doc files as HTML. However, the result is very bulky, near impossible to read and not very compatible with non-Microsoft products.Explanation:
HTML files created by Microsoft Word carry a lot of extra information to make it possible to re-import the file into Word and save is as a .doc without losing certain features. The standard Save As dialog in Word is not primarily designed to create documents to upload to a web server for anyone to browse. There are more appropriate tools for that.
Microsoft supplies a DLL for Word 2000 that lets you use File / Export To / Compact HTML to save a .doc to a compact .html file optimzed for web publishing and not meant for re-imporing into other office products. In addition, there's an excellent freeware called tidy that verifies and corrects HTML files. Download it and run it on the output of
tidy --word-2000 yes -m inoutfile.htm