Documentation for the AscToHTM conversion utility |
This documentation can be downloaded as part of the documentation set in .zip format (370k)
This chapter has been largely superceded by the Policy manual
Document policy files are ordinary text files that list the "policies" that AscToHTM should implement when converting your document. The file can have added comment lines (starting with a "!" or "#" character) and headings for clarity.
A summary of the recognised policy lines is given in the Policy manual.
In most cases recognised policy lines are identical to those listed in the generated policy file (see 4.1). This is usually a good place to start when making your own policy.
Only those lines that are recognised policies are acted upon.
To use a policy file, simply list it on the command line after the name of the file being converted (see 4.2.2.3).
Document policies have two main uses :
- To correct any failure of analysis that AscToHTM makes. Hopefully this won't be needed too much as the core analysis engine improves.
Examples include page width, whether or not underlined section headings are expected etc.
- To tell the program how to produce better HTML end product in ways that couldn't possibly be inferred from the original text.
Examples include adding colour and titles to the page, as well as requesting a large document is split into several pages, and a contents list created.
The document sections in this chapter that described the policies in detail have been moved to a standalone document called the "Policy manual". That document describes the scope, effect, location and default values for all policies recognised by the program.
This documentation has itself been converted using AscToHTM. The files used were
- a2hdoco.txt. This is the text version of the documentation. The text version is kept as the master copy and updated as required. It's then converted to HTML.
- ia2hdoco.pol. This is the policy file used to create the HTML version of this document. Only those policies that differ from the defaults have been added.
This policy file "includes" the link dictionary a2hlinks.dat.
- a2hlinks.dat. This is the link dictionary used for this document and is used to add hyperlinks to the main text file.
- html_fragments.inc. This file contains the definitions of the HTML fragments used in this conversion.
These files are included in the distribution kit as an example set of documentation.
You can, of course, use AscToHTM to convert this doco into whatever format, colour etc that you wish.
These policies are used to control and correct the analysis of files during conversion. Full descriptions of these policies can be found in the Policy manual.
The following analysis policies help give you an overview of what the program is looking for, and to enable/disable what is being looked for.
"Look for indentation"
"Look for hanging paragraphs"
"Look for white space"
"Look for short lines"
"Look for horizontal rulers"
"Minimum ruler length"
"Look for bullets"
"Search for definitions"
"Look for quoted text"
"Look for MAIL and USENET headers"
"Look for preformatted text"
"Attempt TABLE generation"
"Look for diagrams"
The following analysis policies help control general layout parameters:-
"Page width"
"TAB size"
"Short line length"
"Min chapter size""Expect blank lines between paras"
"Hanging paragraph position(s)""Search for Definitions"
"New Paragraph Offset"
"Definition Char"
AscToHTM has the following bullet point policies that will normally be correctly calculated on the analysis pass :-
"Expect alphabetic bullets"
"Expect numbered bullets"
"Expect roman numeral bullets"
"Recognise '-' as a bullet"
"Recognise 'o' as a bullet"
"Bullet char"
AscToHTM tries hard not to get confused by the "1", "a" and "I" that happen to end up at the start of lines by random. These could get mistaken for bullet points.
There is only one analysis contents policy:-
This is described together with all the output contents list policies in Contents generation policies
For more information on content list generation see 5.6.2.
AscToHTM has the following file structure policies that will normally be need to be set manually :-
"Expect code samples"
"Input file contains DOS characters"
"Input file contains MIME encoding"
"Input file contains PCL codes"
"Input file contains Japanese characters"
"Input file has change bars"
"Input file has page markers"
"Page marker size (in lines)""Text Justification"
"Input file is double spaced"
AscToHTM has the following section heading policies that will normally be correctly calculated on the analysis pass :-
"Expect Numbered Headings"
"Expect Underlined Headings"
"Expect Capitalised Headings"
"Expect Embedded Headings"
"Heading key phrases""Check indentation for consistency"
"Expect Second Word Headings"
"First Section Number"
"Smallest possible section number"
"Largest possible section number"
"Preserve underlining of headings"
Section headers are far and away the most complex things the analysis pass has to detect, and the most likely area for errors to occur.
AscToHTM will also document to a policy file the headings it finds. This is still to be finalised, but currently has the format
We have 4 recognised headings Heading level 0 = "" N at indent 0 Heading level 1 = "" N.N at indent 0 Contents level 0 = "" N at indent 0 Contents level 1 = "" N.N at indent 2
AscToHTM will read in such lines from a policy text file, but does not yet fully supported editing these via the Windows interface.
The syntax is explained below, but this will probably change in future releases. You can edit these lines in your policy file, and through the policy options in Windows.
The lines are currently structured as follows
Line component Value xxxx
Either "Heading" or "Contents" according
to the part of the policy being describedLevel n
Level number, starting at 0 for chapters
1 for level 1 headings etc."Some_word"
Any text that may be expected to occur before
the heading number. E.g. "Chapter" or "Section"
or "[". The case is unimportant.N.Nx
The style of the heading number. This will
ultimately (in later versions) be read
as a series of number/separator pairs.
The proposed format is
"N" = number
"i" / "I" = lower/upper case roman numeral
with an 'x' at the end signalling that trailing
letters may be expected (e.g. 5.6a, 5.6b)at indent n
The indentation that this heading is expected
at. This is important in helping to eliminate
false candidates.
AscToHTM has the following section heading policies that will normally be correctly calculated on the analysis pass :-
"Minimum automatic <PRE> size"
New in version 4
AscToHTM uses the following policies to control the detection and analysis of tables :-
"Expect sparse tables"
"Ignore table header during analysis"
"Column merging factor"
"Minimum TABLE column separation""Default TABLE layout"
"Tables could be blank line separated"
These policies are used to output and generation of files during conversion. Full descriptions of these policies can be found in the Policy manual.
AscToHTM has the following HTML policies that will only ever take effect if supplied in a user policy file :-
"Use first heading as title"
"Use first line as title"
"Document title""Document description"
"Document keywords"
"Background Image""HTML header file"
"HTML footer file"
"HTML Script file""Omit <HEAD> and <BODY> from output"
"Document Base URL"
"Comment generation code"
"HTML fragments file"These "polices" allow you to start "adding value" to the HTML generated. That is, they allow to specify things that cannot be inferred from the original text.
You can also add HTML to your files by using the HTML preprocessor command (see 7.1.1)
New in version 4
AscToHTM has the following HTML policies that influence the use of CSS in the HTML generated :-
Not visible in the user interface is :-
AscToHTM has the following HTML policies that influence the detection and generation of contents lists :-
"Add contents list"
"Maximum level to show in contents""Use any existing contents list"
"Generate external contents file"
"External contents list filename"See also the discussion in 5.6.2
New in version 4
AscToHTM has a large number of HTML policies that can control the colouring of the files. These policies are spread across a number of areas of functionality.
General
"Active Link Colour"
"Background Colour"
"Text Colour"
"Unvisited Link Colour"
"Visited Link Colour"Frames
"Header frame background colour"
"Header frame text colour"
"Contents frame background colour"
"Contents frame text colour"
"Footer frame background colour"
"Footer frame text colour"Tables
"Colour data rows"
"Default TABLE border colour"
"Default TABLE colour"
"Default TABLE even row colour"
"Default TABLE odd row colour"
AscToHTM has the following policies that can be used to influence whether or not AscToHTM will attempt to generate a Directory page for the files being converted. This is really only appropriate when converting more that one file at once (see 4.3.3)
The Directory Page will consist of entries for each file being converted (in order of conversion), and can have hyperlinks to the files, and to recognised headings in the files. This makes it suitable for use as a master index to a set of files converted in a single directory.
"Make Directory"
"Indent headings in Directory"
"Show file titles in Directory"
"Directory filename""Directory title"
"Directory description"
"Directory keywords"
"Directory return hyperlink text""Directory header file"
"Directory footer file"
"Directory script file"
AscToHTM has the following HTML policies that affect the file generation process :-
"Input directory"
"Output directory"
"Use .HTM extension"
"Output file extension""Preserve file structure using <PRE>"
"Preserve line structure"
"Treat each line as a paragraph""Generate diagnostics files"
"Output policy file"
"Output policy filename""DOS filename root"
"Use DOS filenames""Split level"
"Min HTML File size"
"Add navigation bar"
"Minimise HTML file size"These policies specify how your document is divided into one or more HTML files, and how those files are to be named and linked together with hyperlinks.
AscToHTM supports the implementation of fonts via either Cascading style sheets (CSS) or via the <FONT> tag.
Related policies are :-
"Use CSS to implement fonts"
"Default font"
New in version 4
From version 4 onwards AscToHTM will support the output of HTML as a set of HTML FRAMES. A large number of policies support this process.
General
"Output frame name"
"Add Frame border""Open frame links in new window"
"New frame link window name""Add NOFRAMES links"
"NOFRAMES link URL"Header and Footer frame policies
"Use main header in header frame"
"Header Frame depth""Use main footer in footer frame"
"Footer Frame depth"Contents frame
"Add contents frame if possible"
"Contents Frame width"
"Number of levels in contents frame"Main Frame
"Split level"
"Min HTML File size"
"First frame page number"Frame colours
"Header frame background colour"
"Header frame text colour"
"Contents frame background colour"
"Contents frame text colour"
"Footer frame background colour"
"Footer frame text colour"
AscToHTM has the following hyperlink policies set as defaults :-
"Create hyperlinks"
"Create mailto links"
"Allow email beginning with numbers"
"Check domain name syntax""Create gopher links"
"Create FTP links"
"Only allow explicit FTP links""Create NEWS links"
"Only use known groups"
"Recognised USENET groups""Open link in new browser window"
"new browser window name"Hyperlinks can also be added by using a link dictionary (see 4.3.2.2 and 4.4.2).
Link definitions appear in a policy file as follows :-
[Link Dictionary] ----------------- Link definition : "a2hdoco.txt" = "Source text" + "/~jaf/A2HDOCO
That is, the text to be matched, the text to be used in its placed as the highlighted text, and the URL this link is to point to (in this case a relative URL).
See the discussions in 4.3.2.2 and 4.4.2.
AscToHTM has the following policies that can be used to influence the preprocessor (see Using the preprocessor), and hence the HTML output :-
"Use Preprocessor"
"Include document section(s)""Allow definitions inside PRE"
AscToHTM has the following "styling" that can be used to influence the HTML output :-
"Allow automatic centring"
"Automatic centring tolerance"
"Ignore multiple blank lines""Highlight definition text"
"Use <DL> markup for defn. paras""Largest allowed <Hn> tag"
"Smallest allowed <Hn> tag"
"Headings colour"
"Preserve underlining of headings""Use <EM> and <STRONG> markup"
"Preserve New Paragraph Offset"
Also, not available in the user interface is :-
AscToHTM has the following policies that can be used to influence whether or not AscToHTM will attempt to detect and generate HTML tables, and the attributes of any tables generated.
Tables may be tailored individually by adding pre-processor commands to your source text (see 7.1.4)
"Default TABLE cell spacing"
"Default TABLE cell padding"
"Default TABLE border size"
"Default TABLE width""Default TABLE colour"
"Default TABLE border colour""Colour data rows"
"Default TABLE even row colour"
"Default TABLE odd row colour""Default TABLE alignment"
"Default TABLE cell alignment""Convert TABLE X-refs to links"
The following policies can only be changed through policy file, but are probably best not used in favour of the their equivalent preprocessor tags.
"Default TABLE header rows"
"Default TABLE header cols""Column boundaries have zero width"
AscToHTM supports the following policies which currently can only be added by editing the .policy file
Contents List
"Add mail headers to contents list"CSS
"Create embedded style sheet"File generation
"Break up long HTML lines"
"HTML version to be targeted"
"Lines to ignore at end of file"
"Lines to ignore at start of file"Fonts
"Suppress all font markup"Headings
"Expect Second Word Headings"
"First Section Number"
"Number of words to include in filename"HTML Generation
"HTML version to be targeted"Style
"First line indentation (in blocks)"Tables
"Default TABLE caption""Default TABLE header rows"
"Default TABLE header cols""Column boundaries have zero width"
New in version 4
These policies are used to control the behaviour of the program during the conversion process. Most program setting are not available as policies, but those that are are listed here. Full descriptions of these policies can be found in the Policy manual.
The following policies can be used to tailor the number and type of messages displayed during conversion.
"Suppress INFO messages",
"Suppress TAG ERROR messages"
"Suppress URL messages"
"Suppress WARNING messages"
"Suppress program ERROR messages"
This section has been copied into the Policy manual section on placing policies in a file
AscToHTM allows you to save policies to file so that you can later reload them. This allows you to easily define different ways of doing conversions, either for different types of files, or to produce different types of output.
The policy files have a .pol extension by default, and are simple text files, with one policy on each line. You can, if you wish, edit these policies in a text editor... this is sometimes easier that using all the dialogs in the Windows version.
When editing policies, it is important not to change the key phrase (the bit before the ":" character), as this needs to be matched exactly by AscToHTM.
For best results, it is advisable to put in your policy file only those policies you want to fix. This leaves AscToHTM to calculate document-by-document policies that suit the files being converted.
- Note:
- Avoid using "full" policy file for your conversions. Such files prevent the program from adjusting to each source file, often leading to unwanted results.
The normal way to create a policy file is by setting options and them saving them using the "save policy file" dialog. This will offer you the choice of creating a partial policy file or a full policy file (see 6.5.2.1 and 6.5.2.2).
Alternatively, you can set the "Output policy file" policy which will generate a full policy file resulting from the analysis of the converted document.
Once a file is generated you can either edit them in a text editor - deleting policies that are of little interest to you, and editing those that are - or reload them into the program, change them and save them again.
Partial policy files are files which have values for some, not all, policies.
These are recommended, because the it leaves AscToHTM free to adjust all the other policies not set in the file, allowing it to adapt to the details of the document being concerned.
For example, you should only set the indentation policy if you know what indents you are using, or if you want to override those calculated by AscToHTM. Normally it is best to omit this policy, and allow AscToHTM to work it out itself.
When you save a policy file from inside AscToHTM, a partial policy file will contain
- all policies loaded from the current policy file (if any)
- all policies changed in AscToHTM during the current session (if any)
A "full" policy file contains a value for almost every possible policy. Such files are usually only useful for documentation and analysis reasons, and should almost never be expected to be reloaded as input into a conversion, as this would totally fix the conversion details.
Whenever the "Output policy file" policy is set the generated "full" policy file is usually called
<filename>.pol
where <filename> is the name of the file being created. When this happens any existing file of that name will be overwritten.
For this reason we strongly advise you adopt a naming convention of the form
in_<filename>.pol or i<filename>.pol
or place your input policies in a different directory and ensure they are backed up.
Converted from
a single text file by
AscToHTM © 1997-2001 John A Fotheringham |