AscToHTM

Documentation for the AscToHTM conversion utility

This documentation can be downloaded as part of the documentation set in .zip format (370k)


Previous page Back to Contents List Next page

1 Introduction

AscToHTM is an ASCII to HTML conversion tool. It has, of course, been used to generate the HTML version of this document from the text file a2hdoco.txt (see an example conversion for more details).

The HTML version of this document is presented "as is". That is, no post-production of the HTML has occurred. This should give you a flavour of what AscToHTM is capable of.

Any RTF version of this document will have been made by AscToRTF, the sister product that shares the same text analysis engine.

AscToHTM is made available for download via the Internet from the download page.


1.1 AscToHTM's design objectives

1.1.1 Intelligent analysis.

AscToHTM is designed to analyse a document to determine its structure and layout. This analysis allows AscToHTM to decide how best to mark up the HTML so as to accurately represent the author's original meaning as far as possible.

This analysis helps AscToHTM to reduce errors by allowing it to spot anomalies in the document source. This is important in minimising the amount of any post-production work required to fix errors.


1.1.2 Human-readable HTML

AscToHTM tries to create HTML that can be easily read and modified in an editor. This is useful if corrections are necessary, or further development is required.

For example AscToHTM

  1. produces short (usually <80 character) output lines

  2. attempts to indent the HTML to match the output indentation.

  3. adds comments to the HTML to indicate include files etc.

  4. uses <BLOCKQUOTE> tags for indentation, rather than placing the whole file in <TABLE>...</TABLE> tags.

  5. produces "clean" HTML without large numbers of unnecessary tags.

Note, later moves to make more standards-compliant and browser-compatible HTML code tend to work against making user-readable code. For example most browsers have rendering problems when newline characters are placed in certain key locations, whereas adding newline characters can make the HTML easier to read.


1.1.3 Simple user input

Inevitably users have supply additional information to tell AscToHTM where its analysis has gone wrong and to add additional information such as a document title etc. AscToHTM offers a large number of options (also known as "policies") that the user can modify.

Broadly speaking, these policies fall into two camps

AscToHTM can save your policies to a file, so that next time you run it you can load this information back from the "policy" file. This also allows you to create different sets of policies (e.g. to use different colour schemes).

Policies are described fully in the Policy manual.

You can further refine the conversion by placing special lines and tags into your source file. These are known as pre-processor commands (see Using the preprocessor) and in-line tags (see In-line tags).

The preprocessor tags are described fully in the Tag manual

To help users formulate and modify their document's policy, AscToHTM can be made to create an output policy file (see 4.2.2.9). Users can then simply edit this file and feed it back into the conversion process.

A summary of the recognised policy lines is given in the Policy manual.


1.1.4 Standards compliance.

Earlier versions of AscToHTM (before version 3.2) made no real attempt to be standards compliance. Now standards compliance is a stated goal or the program. Sadly I can't guarantee standards compliance because the HTML generation is so complex that errors can and do occur, but it is a goal, and usually documents will validate with few problems.

Compliance has proved to be vital to get cross-browser compatability, and to stand a chance of successfully applying CSS to created pages.

Original versions of AscToHTM were (loosely) targeted at producing HTML 3.2 code.

Currently the software is targeted at "HTML 4.0 Transitional", which allows CSS, but also permits <FONT> tags (although these are deprecated). This is a compromise standard that is best placed to be well viewed by V3 and V4 browsers.

Future versions of the program may attempt to generate stricter HTML 4.0 code, while still offering production of the earlier HTML standards.

The policy "HTML version to be targeted" offers some ability to choose the style of HTML generated.


1.2 Expected uses of AscToHTM

Plain text is still a very popular data format. It is easy to generate, and easy to read. However text files when placed on the web don't look as nice as normal web pages. AscToHTM will allow you to quickly add the HTML markup required to turn a plain text page into a nice looking HTML page. Because it is an automated conversion it will save you time, and ensure you avoid typos in HTML tags that could stop the page displaying wrongly in some web browsers.

Large amounts of unconverted text exist. As people plan to put this information on the Web, conversion to HTML will become necessary.

This can be a tedious and time-consuming task. AscToHTM will do much of the work for you.

AscToHTM is priced to be worth an hour of two of your time. This means that the "pay back" time is negligible (we only mention this in case you have bean-counters to convince :). If you don't think AscToHTM will save you hours, then by all means don't buy it.

The HTML created by AscToHTM may not be as pretty or as clever as that generated by a full blown HTML editor (read as "bloated").

But...

It'll be easier to write, edit and spell-check, and it may have a hyperlinked contents list generated.

AscToHTM can be used to automatically convert text documents that you receive. For this we usually suggest you run in command line mode.

Many people have legacy systems that generate printed reports that may be saved to file. AscToHTM can help extend the lifetime of such systems by turning their output to HTML. It may be you'll need some help in getting the best results from the program in such cases, since many reports consist of complex tables.

Printer spool files are not strictly speaking plain text, but often - especially in older software systems - these files are plain text with a few printer controls added. Some users have had great success converting such files using asctohtm, and to support this we have added a limited ability to recognise and strip out Unix control characters, VT escape sequences and PCL printer codes. If you have a requirement in this area, contact the author at jaf@jafsoft.com to discuss whether the software can be made to meet your needs.


1.3 Other uses of AscToHTM

Please note, AscToHTM DOES NOT convert Word's .doc or .rtf file formats.

AscToHTM was never intended to handle Word documents. We fully expect HTML export and import filters to appear (they have in Word '97), and we would advise anyone whose master document is in Word to search out these filters and give them a try.

That said... a lot of people seem unhappy with what's already available, and AscToHTM does a reasonable job if you save the file as text with line breaks, though obviously tables and figures will get lost (in the case of tables, because Word throws them away).

The main problem is that Word produces lousy looking text. This is one area where AscToHTM does a little better than "garbage in, garbage out"

(This is a bit cheeky, but does actually work.).

Use AscToHTM to convert text to HTML, then import this into your word processing package. Since the text analysis engine in AscToHTM out-performs that in Word in many respects (URL, table and heading detection to name but three), you can often get better results than importing from text direct..

That's because AscToHTM's analysis engine is smarter. That's not just our view (see http://www.jafsoft.com/asctohtm/reviews.html)

NOTE:
The same text analysis engine is used in the text-to-RTF program AscToRTF, which is more suited to this purpose.

Use AscToHTM to convert text to HTML, then print the file from within Netscape or whatever. The result is a much nicer looking document with fonts'n'stuff.

AscToHTM has a "link dictionary" feature that can be used to add hyperlinks to any word or phrase (see the Policy manual).

This can greatly enhance an otherwise dull set of text pages.



Previous page Back to Contents List Next page

Valid HTML 4.0! Converted from a single text file by AscToHTM
© 1997-2001 John A Fotheringham
Converted by AscToHTM