Documentation for the AscToRTF conversion utility : Using policy files

Documentation for the AscToRTF conversion utility

The latest version of these files is available online at http://www.jafsoft.com/doco/docindex.html

Using policy files

Document policies have two main uses; to correct any failure of analysis that AscToRTF makes, and to tell the program how to produce better RTF in ways that couldn't possibly be inferred from the original text.

Examples of the former may include specifying a nominal page width, and stating whether or not underlined section headings are expected etc.

Examples of the latter include adding colour and titles to the page, as well as requesting that a large document is split into several pages.

Contents of this section

What are Policy files?
Analysis policies

'What to look for' policies
General analysis policies
Bullet policies
Contents policies
File Structure policies
Headings policies
Table analysis policies

Output policies

File generating policies
Document details
Formatting Policies
RTF settings
Make Windows Help file policies
Hyperlinks policies
Preprocessor policies
Font policies
Link Dictionary Edit Dialog
Other policies

What are Policy files?

AscToRTF has a large number of options available to influence the analysis of your text files, and the output to RTF. These options are called "policies" as they govern how the source file should be interpreted and converted.

Policies may be saved in text files, known as policy files. These files have a ".pol" extension by default. The policy files are usually updated by changing the policies and saving the changes in a new file. Because they are text files you can also edit them directly, in a text editor. The files have the format of one policy per line of

Text in the form

PolicyText : <policy value>

The use of policy files allow a given set of options to be saved and reused for other conversions, or later conversions of the same file. See Using policy files for more information.

Analysis policies

Analysis policies are usually calculated by AscToRTF by making a first pass through your document. The resulting policies are then used during the second, conversion pass to categorise all input lines so that they may be correctly converted to HTML.

You should only need to change these policies should the analysis fail.

'What to look for' policies

General Analysis

Bullets

File generation

Headings Policies

Pre-formatted text

Table analysis

'What to look for' policies

These policies act as "broad stroke" policies enabling or disabling areas of functionality within the software by telling it what to look for and to try to detect.

For example you can tell the program whether or not to bother looking for patterns of indentation, bullets, or numbered lists. In many cases if you enable a policy you can further fine tune the conversion details on other policy sheets.

Look for indentation

Look for paragraphs

Look for short lines

Look for horizontal rules

Look for bullets and numbered lists

Look for definitions

Look for quoted lines

Look for emphasis

Look for underlined text

Look for mail and USENET headers

Look for character encoding

Look for regions of preformatted text

Look for diagrams

Look for indentation

AscToRTF can attempt to detect the indentation pattern of your document and replicate it in the output file. If you chose to disable this policy, all your text will be output with no indentations at all.

If the program is wrongly indenting your files, you can try adjusting the pattern of indentation on the General Analysis tabbed policy sheet.

Look for white space

By default AscToRTF will attempt to look for paragraphs in your source. Usually this is signaled by a blank line between paragraphs, a leading indent on the first line of each paragraph, or (in extreme cases) a short line at the end of a paragraph.

If you don't want AscToRTF to detect paragraphs, disable this policy.

If AscToRTF is wrongly detecting paragraphs, try adjusting the paragraph analysis policies on the General Analysis tabbed policy sheet.

Look for short lines

By default AscToRTF will attempt to detect short lines and preserve their structure by adding a line break. Disabling this will cause short lines to be merged into the surrounding paragraph's text.

If AscToRTF is wrongly handling your short lines, you can adjust the short line cutoff point or the page width (which is used in short line detection) in the Sizes section of the General Analysis tabbed policy sheet.

Look for horizontal rules

By default AscToRTF will treat a series of hyphens, minus signs, equal signs on the same line as a horizontal rule. (On occasion it might be regarded as underlining a heading on the previous line).

You can disable this is you wish, or you can specify how many "line" characters it takes to make a horizontal rule.

Look for bullets and numbered lists

By default AscToRTF will try to detect bullet points and numbered lists. This can sometimes go wrong if you have lines that look to the program like bullet points.

You can disable this behaviour should you wish. Alternatively you can fine tune the detection of bullets on the bullet analysis tabbed policy sheet.

Look for definitions

By default AscToRTF will try to detect definitions and notes, usually in the form of a single word and a hanging paragraph.

This can often go wrong, so you can use this policy to disable this feature.

Look for quoted lines

By default AscToRTF will try to identify "quoted" lines. Quoted lines are lines that have had a single character (often ">" or "!") inserted at the start. This is common practice when quoting email in a reply. AscToRTF places such text in italics.

You can disable this behaviour should you wish.

Look for emphasis

New in version 2.0

AscToRTF will try to look for text that has been marked up with underscores and asterisks to signify bold an italic text. For example

This is bold and this is italic

becomes

This is bold and this is italic

Look for underlined text

New in version 2.0

AscToRTF will try to detect where a line of text has been "underlined" by following it by a same length row of dashes, hyphens, equal signs etc. This text will then be regarded as a candidate for being an underlined heading or - if those are not allowed - underlined text.

If you have tables and reports, you may want to switch this policy off since the line at the end of a table may appear to under- or over-line the last line of text in the table.

Look for mail and USENET headers

AscToRTF will try to look for email and USENET headers. Where these are recognised they can be simplified so that only the To, Form and Subject lines are shown in the output.

You can disable this behaviour should you wish.

Look for character encoding

Specifies whether or not the software should attempt to detect alternative character sets, such as those used for languages such as Greek, Turkish, Chinese etc.

The software does this by doing a statistical analysis on the characters used in the source file. This process isn't perfect, and when it fails you will need to manually set the correct character set using the Character encoding policy.

If you find the program is wrongly detecting the character encoding, disable this policy and/or manually set it using the Character encoding policy

Note: Not all character sets are supported by RTF.

Look for preformatted text

By default AscToRTF will try to identify regions of preformatted text. Once identified AscToRTF will try to decide if it's a diagram, table or some other form of preformatted text. If it thinks it's a table it will attempt to place the text in an appropriate table structure.

You can disable the search for preformatted text, or if you allow preformatted text, disable table generation. (This may be appropriate if you have a large number of ASCII diagrams in your text).

The search for preformatted text can be refined via the Pre-formatted text and Table analysis
tabbed policy sheets.

The output of tables can be fine-tuned via the output policy Formatting tabbed policy sheet.

Look for diagrams

Specifies whether or not regions of preformatted text that are detected should be considered as candidate diagrams. Text that contains numbers of characters such as "|", "-", ">" and "<" may be considered to be an ASCII diagram.

If you find the program is wrongly treating tables as diagrams then disable this policy.

General analysis policies

These policies aid AscToRTF's analysis by describing in detail what the contents of the document being converted are

Sizes

Page Width

TAB Size

Short line length

Min Chapter Size

Paragraphs

Blank lines between paragraphs

New paragraph offset

Definitions

Search for definitions in source text

Definition paragraph indent levels

recognize hyphen characters

recognize colon characters

Other definition characters

Layout

Indentation levels

Page Width

This indicates the width (in characters) of your nominal output page. This width is calculated from the observed line lengths in the original document.

This width is used in short line calculation, and determining whether a given line contains a definition term or not (definition character near the start of the line).

In documents that contain line feeds this should be automatically detected.

In other documents you may need to set this manually.

TAB size

This indicates the size (in characters) of your tabs. AscToRTF converts all tabs to spaces on conversion before analysis. By default a tab size of 8 characters is assumed.

The tab size can influence the analysis of paragraph indentations and other layout. Provided they are used consistently there shouldn't be a problem. However where tabs and spaces are used in combination, mistakes can arise.

This is particularly true in tables of data. AscToRTF does not expect tab-separated table cells, instead converting the tabs to spaces and analysing the results.

If your source document has been created with an editor with a different tab size, you should change this value should you start to experience strange layout conversion problems.

Short Line Length

This policy is used to determine what is a "short line". Short lines are treated specially by AscToRTF by adding a paragraph marker on the end. They can also be used to detect ends of paragraphs in those documents that don't have blank lines between paragraphs.

Normally AscToRTF will determine whether or not a line is short by comparing it to the page width, given the current context.

The default value is 0 characters (indicating a comparison to Page Width should be used). Set this to any value you like. A value of 80 is likely to make every line in your original document have a paragraph marker on the end.

Min Chapter Size

This policy tells AscToRTF what the smallest chapter size may be. This is used when trying to determine if a numbered line is a chapter heading. AscToRTF tries to avoid treating numbered lists as a series of small chapters using this policy.

The default value is 8 lines. Change this only if you suspect small chapters are being ignored, or large list items are being treated as chapter headings.

Blank Lines between paragraphs

AscToRTF can detect whether or not it should expect blank lines between paragraphs. Documents without blank lines between paragraphs will be harder to convert, and errors are more likely. Unfortunately text documents exported from Word for Windows often have this property.

Where there are no blank lines, AscToRTF relies of spotting the last line of a paragraph (usually shorter), and (in some documents) the presence of a hanging indent at the start of each new paragraph.

This should be automatically detected.

New Paragraph Offset

Some documents start the first line of a new paragraph with an offset of a number of characters. This is especially true in text files saved from Word for Windows documents.

AscToRTF can sometimes confuse such paragraphs as being two different levels of indentation. Use this policy to eliminate such confusion.

This should be automatically detected

Search for definitions

This policy can be used to disable the search for definitions. Sometimes this leads to unexpected results with text that is not part of a definition being treated as such. In such cases you can adjust the definition policies, but if this still fails, use this to disable the search completely.

Hanging indent position(s)

This policy identifies the indentations used for the follow-on text in definition paragraphs. These indentation levels need not be the same as the indentation levels used for normal text, though of course often they are.

This should be detected automatically, but if your document has only a few examples it's possible AscToRTF will ignore them. In such cases you may need to set this policy manually.

Note, this policy appears on-screen as "Definition paragraph indent levels"

Recognize hyphen characters

This policy specifies whether or not hyphen (-) characters are used in one-line definitions.

If the hyphen character only occurs in definitions, then set the nearby always flag, otherwise AscToRTF will have to guess whether a particular character is part of a definition or not. This is sometimes a source of conversion errors.

If this policy is selected, it will result in a suitable "Definition Char" line being added to the policy file.

This should be detected automatically.

Recognize colon characters

This policy specifies whether or not colon (:) characters are used in one-line definitions.

If the colon character only occurs in definitions, then set the nearby always flag, otherwise AscToRTF will have to guess whether a particular character is part of a definition or not. This is sometimes a source of conversion errors.

If this policy is selected, it will result in a suitable "Definition Char" line being added to the policy file.

This should be detected automatically.

Other definition Characters

This policy specifies which other characters are used in one-line definitions.

This may be detected automatically, but more likely you'll need to specify it yourself.

Each character selected as a potential delimiter will result in a "Definition Char" line being added to the policy file.

Indent position(s)

AscToRTF recognises multiple levels of indentation. This policy shows the character levels at which indentation has been detected.

AscToRTF converts all tab characters into multiple spaces in input. These indentation positions are the positions that result after that conversion. Depending on your tab settings these might not be exactly the positions you would expect.

Normally these levels are correctly detected automatically, but should you wish to set them manually you may need to experiment slightly to see how AscToRTF has handled your tabs.

Bullet policies

AscToRTF should be able to detect the use of bullets on a reasonably sized document. These policies describe the type of bullets expected.

Automatically detect bullets and numbered lists

Expected Bullet types

numbered bullets

alphabetic bullets

roman numeral bullets

Bullet characters

recognize hyphen character as a bullet point

'recognize an "o" character as a bullet point'

Other bullet point characters

Look for bullets

This policy states whether or not the program should attempt to automatically detect bullets and numbered lists. This should normally be left on unless your document has no such features, but the program (wrongly) thinks it has.

This policy appears on the Bullets dialog as "Automatically detect bullets and numbered lists", but is identical to the "Look for bullets" policy on the 'What to look for' policies tabbed property sheet.

Expect Numbered bullets

This policy states whether or not numbered bullet points are expected. The numbered bullets can be followed by any punctuation, thus 1., 2) and (3) will all be recognised, but RTF will not necessarily support this in the markup produced.

This should be automatically detected.

Expect alphabetic bullets

This policy states whether or not alphabetic bullet points are expected. The numbered bullets can be followed by any punctuation, thus a., b) and (c) will all be recognised, but RTF will not necessarily support this in the markup produced.

Both upper and lower case bullets are recognised (and supported in the markup).

This should be automatically detected

Expect roman numeral bullets

This policy states whether or not roman numeral bullet points are expected. The numbered bullets can be followed by any punctuation, thus i., ii) and (iii) will all be recognised, but RTF will not necessarily support this in the markup produced.

Both upper and lower case bullets are recognised (and supported in the markup), although the range of roman numeral values supported is limited.

This should be automatically detected.

recognize 'o' as a bullet

This policy states whether or not bullet points starting with the hyphen character '-' are expected.

This policy appear on-screen as "Recognize hyphen character as a bullet point"

This should be automatically detected.

recognize '-' as a bullet

This policy states whether or not bullet points starting with the lower case 'o' are expected.

This policy appear on-screen as "Recognize 'o' character as a bullet point"

This should be automatically detected.

Other bullet point characters

This policy lists any other characters that are to be recognised as bullet characters.

Each bullet character entered will appear in the policy file as it's own "Bullet Char" line.

This should be automatically detected, but may sometimes need to be manually entered.

Contents policies

This dialog shows both analysis and output policies connected with contents list detection and generation.

Analysis

Expect contents list

Expect contents list

This policy specifies whether or not the document already contains a contents list. If it does, AscToRTF will attempt to convert the existing list into a series of hyperlinks.

This should be detected automatically, but occasionally you will need to set this policy manually.

See the discussion on contents list generation in the Documentation available

File Structure policies

These policies aid AscToRTF's analysis by describing some of the file structure that would affect the analysis.

Expect only a simple layout

Expected File contents

'Expect "C"-code samples'

Contains DOS characters

Contains PCL printer codes

Contains non-European (e.g. Japanese) characters

Contains mime-encoded quotable characters

File has change bars

File has Page markers

Page marker size (in lines)

Text Attributes

Text justification

File is double spaced

Text to ignore

Number of lines to ignore at start of document

Number of lines to ignore at end of document

Keep it simple

AscToRTF puts a lot of effort into detecting overall structure such as headings etc.

In documents that don't have any such structure, AscToRTF is liable to convert any line with a number at the start into a heading.

To prevent this, you can mark the document as simple, that is with no global structure. In a simple document AscToRTF will attempt far less analysis.

This policy appears on-screen as "Expect only a simple layout".

AscToRTF attempts to automatically identify simple documents, but you may still need to set this policy manually.

Expect Code samples

AscToRTF can markup C-like code fragments in <PRE>...</PRE> tags to preserve the layout and readability of the quoted code.

This may be automatically detected, but occasionally needs to be manually corrected.

Input file contains DOS characters

AscToRTF can convert files that use the DOS (OEM) character set. By default the file is assumed to be in the ANSI character set, but some files may have originated under DOS.

This may be automatically detected, but usually needs to be manually set.

Input file contains PCL codes

New in version 2.0
Indicates that the input file contains PCL printer codes. When set, the program will make whatever sensible use it can of these codes, otherwise they will be removed.

Please note that the PCL printer codes offer a rich command language that may be used to drive graphical printers. As such the emulation possibilities in a text converter are limited, and it is quite likely that files that make heavy use of such codes will fail dramatically to convert.

That said, those codes that are not recognised will be eliminated from the output.

Input file contains Japanese characters

*** not implemented yet ***

Files using non-ASCII character sets (Japanese, Korean etc) will be incorrectly converted. This may be fixed (as far as possible) in later versions.

Appears on-screen as "Contains non-European (e.g. Japanese) characters"

Input file contains MIME encoding

AscToRTF can convert mime-encoded quotable characters. These will usually appear in files that were originally part of an email message. Such files use the "=" character to escape special characters. So for example "=20" should be interpreted as a space.

This appears on-screen as "Contains mime-encoded quotable characters"

This may be automatically detected in files where the "=" is used to break up long lines, but more usually you will need to manually set this.

Input file has change bars

AscToRTF can strip out change bars in documents that contain them. Change bars are usually a vertical bar '|' placed in the leftmost or rightmost column.

Currently this is not automatically detected, and so will need to be manually switched on.

Input file has page markers

AscToRTF has a limited ability to remove page markers. These are normally a few lines following a form feed (FF) character, containing page numbers etc. This will commonly occur with files generated from older software packages.

Page marker size (in lines)

The number of lines after each form feed (FF) that should be ignored. These lines will not be copied to the output.

Text Justification

AscToRTF recognises documents that are left justified (default), right justified, centred or both left and right justified (confusingly known as "justified").

The program cannot currently mark up the text in a matching style, but this policy
is important in the analysis. For example "justified" documents are padded with extra white space which could be interpreted as pre-formatted text where the document not recognised as being justified.

Normally this policy is correctly detected automatically.

Input file is double spaced

AscToRTF will normally treat a blank line as a break between paragraphs. Some files have extra CR/LF characters (usually if they've come from a different computer, or from a printer package). In such cases AscToRTF will see every second line as blank, and this will affect the analysis, usually by turning each line of data into a separate paragraph.

If you have such a file, use this policy to mark the file as double spaced to get better results.

Lines to ignore at start of file

New in version 2.0
This specifies how many lines from the input files should be ignored at the start of the file. These lines will be discarded from the output.

This can be useful when converting file copied from a news feed or whatever that adds a small data header to the file.

Lines to ignore at end of file

New in version 2.0
This specifies how many lines from the input files should be ignored at the end of the file. Up to 40 lines may be ignored in this way. These lines will be discarded from the output.

This can be useful when converting file copied from a news feed or whatever that adds a small data footer to the file.

Headings policies

These policies determine the headings structure that the document is expected to have. Normally these are calculated correctly by AscToRTF, but due to the complexity of heading detection, you may sometimes need to correct the analysis.

At the top of the dialog you can specify what type of headings you expect to see. Any combination is allowed, although usually documents use just one type of heading.

Expect Numbered headings

Expect Underlined headings

Expect Capitalised headings

Expect Embedded headings

Heading Key phrases

Use first line as heading

Center first heading

Check indentations of headings are consistent

If numbered headings are expected, it may be possible to expect headings at multiple levels, and to also expect a contents list. Each level of heading will have it's own set of policies which are shown on this dialog. The policies are shown in text form, but are edited via the heading details dialog

Note: This area of functionality is continually under review.

See also the discussion in detecting headings and section titles.

Expect numbered headings

This policy specifies whether or not numbered headings are expected in the document.

Numbered headings may be found at multiple levels, and their details may be edited via The heading details dialog

This should be calculated correctly by AscToRTF. But is prone to error, getting confused by numbered bullets and the like. In such cases you may need to set this policy manually.

Expect underlined headings

This policy specifies whether or not underlined headings are expected. Note, where the headings themselves are numbered, the underlining will be taken into account, and you should set the expect numbered headings policy instead.

AscToRTF uses the character in the underlining to determine the heading level, thus text underlined with equals signs is given prominence over text with single underline characters such as minus signs, tildes or underscores.

Expect capitalised headings

This policy specifies whether or not CAPITALISED headings are expected. Note, where the headings themselves are numbered, this policy need not be set, and instead you should set the expect numbered headings policy instead.

Expect Embedded headings

New in version 2.0
This policy specifies whether or not "embedded" headings are expected, i.e.. the heading is "embedded" in the first paragraph. Such headings are expected to be a complete sentence or phrase in UPPER CASE at the start of a paragraph.

At present such headings are not auto-detected... you need to switch this policy.

Heading Key phrases

New in version 2.0
If specified, then any line that begins with one of the key phrases will be regarded as a heading. The syntax is

<details>, <details>...

where each set of details is

<details> = <phrases>, [<heading_level>]

and

<phrases> = <phrase_1> [|<phase_2>]

That is, each set of <details> can optionally specify a <heading_level>. If omitted this will default to 1,2,3 for the first, second, third set of details etc. Note, this is a logical heading level, and will be apparent in the contents list.

Each set of <details> must supply a set of <phrases>, and each set of phrases would must have at least one phrase with extra phrases added if wanted, separated by vertical bars.

So for example

Part, Chapter, Section

would treat lines beginning with the words "Part", "Chapter" and "Section" as level 1,2, and 3 headings.

The key phrases are case-sensitive in order to reduce the likelihood of false matches with lines that just happen to have these phrases at the start of the line. So

PART|Part, Chapter, Section

Would allow either "PART" or "Part" to be matched.

"PART|Part,1" , "Chapter,2" , "Section,2"

Would make lines beginning with "Part" level-1 headings, while both "Chapter" and "Section" would become level 2. This would be the same as

"PART|Part,1" , "Chapter|Section,2"

Note, spaces may form part of a match phrase, but because of their use in the tag syntax commands and vertical bars may not.

If false matches occur, (e.g. the word "Part" appears in the body of the text) edit the source text so that the offending word is no longer at the start of the line.

Use first line as heading

New in version 2.0

When this option is selected, the first line in the document will be treated as a heading. This can be a useful option to select when the first line of your document is a document title line, but doesn't conform to the headings style used in the rest of the document.

Center first heading

New in version 2.0

When this option is selected, the first heading in the document is centred. This may be an appropriate choice when the first heading is in fact to be treated as a document title.

Check indentation for consistency

The program performs a number of consistency checks when detecting headings. Amongst these is a check that all headings of the same type occur at the same indentation. This check can help distinguish between numbered headings and numbered lists.

However, if you have numbered headings that are different indentations - e.g. because they are centred on the page - then this check will cause them to be rejected as headings. In such cases you can manually disable this check.

This policy appears on-screen as "Check indentations of headings are consistent"

The heading details dialog

This dialog is reached through one of the edit buttons on the main Headings Policies dialog. This allows you to edit details of a particular type or level of heading.

Position of section number on the line

Indentation of heading lines

Heading prefix words

Section number formatting

Heading numbering scheme

Heading separator characters

Heading trailing letters

Bracketing

Heading bracket characters

Indentation of heading lines

AscToRTF uses checks on indentation levels to reject lines with numbers on that could be confused with headers.

This is the indentation level (in characters) that heading of this types are expected to be found at.

Heading prefix words

Some documents put words like "chapter", "subject" and "section" in front of the section number. These are known as prefix words.

Heading numbering scheme

This is the numbering scheme expected for headings at this level. At present AscToRTF can't cope with mixed types like "II-2.b".

This may be addressed in later versions.

Heading separator characters

This shows the separator expected between parts of the heading number.

*** Not currently supported ***

Heading trailing letters

This shows whether we expect trailing letters after the section number, as in "1.1b".

*** Not currently supported ***

Heading bracket characters

This shows what bracket characters (if any) we expect before and after the section number as in "[2.2]" or "3.2.1)".

*** Not currently supported ***

Pre-formatted text policies

These policies specify how AscToRTF detects pre-formatted text.

Detecting pre-formatted regions

Minimum size of automatic <PRE> section

See the section on pre-formatted text for more details.

Minimum size of automatic <PRE> section

This policy specifies the minimum number of consecutive pre-formatted lines that must be detected before the text is placed in fixed width font.

AscToRTF detects heavily formatted lines, and then looks at their neighbours to see if they too could be part of a pre-formatted text.

Once a group of lines is identifies, it will only be marked up as pre-formatted if the minimum is exceeded.

The default value is 0. Set this value larger if AscToRTF is marking text as pre-formatted when it shouldn't do.

Note: The <PRE> is a reference to the shared ancestry of this software with the text to HTML converter from which it evolved.

Table analysis policies

These policies specify how AscToRTF detects possible tables and analyses the data in them into columns and rows.

Attempt TABLE generation

Detection

Extend preformatted regions

Analysing rows

Could table have blank lines between rows

Analysing columns

Table Layout

Is the table expected to have sparse columns

Ignore table header when analysing columns

Merge together "poor" columns

Minimum number of spaces between table columns

See the section on pre-formatted text for more details.

Attempt TABLE generation

This policy specifies whether or not you want RTF table generation attempted for regions of apparently pre-formatted text. AscToRTF will attempt to analyse such regions, preferring to fit them into a RTF table. However, if this is not possible, or if AscToRTF decides the pre-formatted region is something else (like a diagram or a piece of code) then a RTF table will not be generated.

Disabling this policy tells AscToRTF not to attempt this analysis, usually leading to pre-formatted text being placed in simple fixed width font markup instead.

Table extending factor

When the program encounters a strongly formatted line, it examines the adjacent lines to see if they too could form part of the same preformatted region.

This policy specifies the extend to which strongly preformatted lines should be used to "extend" to include adjacent lines as part of the same preformatted regions. If set to 10, then all adjacent lines up to the next page break or section heading will be treated as part of the same region. When set to 1 only those lines that are clearly heavily formatted themselves will be included.

This policy appears on-screen as "Extend preformatted regions"

Could be blank line separated

New in version 2.0

This option specifies whether or not tables are expected to have blank lines between rows. If they are, the software will be more likely to merge the text for adjacent source lines into a single row in the output table.

Default Table Layout

New in version 2.0

This option allows you to specify the default table layout for all tables in the document. The layout specifies the number of columns and their end positions.

This is the default layout and will normally be applied to all tables the document. If a document has multiple tables you are better off either using the preprocessor to mark up the source text and supplying TABLE_LAYOUT commands, or supplying a "Layout" component in a Table Definition File.

The format of the Table Layout policy is the same as that described in the discussion of the TABLE_LAYOUT pre-processor command.

See also TDF line: Layout

Expect sparse tables

This policy is used to tell AscToRTF that you expect your tables to be quite sparse in places. This can affect AscToRTF's analysis, as the algorithms are liable to merge "empty" columns with their less empty neighbours.

Enabling this policy will usually result in your tables having more, emptier, columns.

Ignore table header during analysis

This policy specifies that the table header should be ignored when analysing the column structure of the table.

In some tables (usually "reports") the header can be quite complex, with titles spanning multiple columns, whereas the body of the table is much more structured.

In such cases including the table header in the analysis can lead to errors, so enabling this policy can simplify the analysis giving better chances of success.

This policy appears on-screen as "Ignore table header when analysing columns"

Column merging factor

Once the program has detected the column layout of a table, it reviews how well the data can be fitted into these columns. If too many cells in a column are empty, or if too many cells "span" multiple columns, then the columns are deemed to be "poor", and may be merged together to form fewer, wider columns.

This factor determines the extent to which columns should be merged. A value of 10 means columns should be merged together whenever there is any doubt. Use this if you are getting too many columns. A value of 1 means columns should never be merged. Use this if you are getting too few columns.

This policy appears on-screen as "Merge together "poor" columns".

Note, this policy can't guarantee you will the correct column structure, but it does give you a chance to influence the logic.

Minimum TABLE column separation

This policy specifies the minimum number of spaces that should be interpreted as a gap between columns in a potential table. The default value is 1, but this value can sometimes lead to too many columns, especially in small tables. Larger values may lead to columns being merged together.

This policy appears on-screen as "Minimum number of spaces between table columns"

Output policies

These policies are used to control the output to RTF. Generally these policies allow you to decide how the resulting RTF should look in a manner that cannot be inferred from the original document.

File generation
Document details
Formatting
RTF settings
Make Windows Help File
Hyperlinks
Preprocessor
Fonts
Link Dictionary

File generating policies

Line and file structures

Preserve file structure using <PRE>

Preserve line structure

Treat each line as a paragraph

Diagnostics Files

Generate log files

Generate sample policy file

Preserve file structure using <PRE>

This policy can be used to place the whole file inside <PRE>...</PRE> markup. This will use a mono spaced font that preserves the line structure and the relative spacing of characters.

When this is enabled almost all of the program's other conversions will be disabled. You should only really use this if your document has a lot of formatting that the program is failing to understand.

This policy needs to be set manually where wanted.

Preserve Line structure

This policy specifies that the line structure of the original document should be preserved, rather than just the paragraph structure.

If enabled the lines in the output document will match those of the original document, and the text will not automatically be adjusted if you widen your window. On large monitors this will give the text an "A4" look and feel.

This policy needs to be set manually where wanted.

Treat each line as a paragraph

Some files do not break large paragraphs into smaller lines, but instead place the whole paragraph on a single line. This is especially true if the source file was created by a text editor that relied on word wrap (such as Notepad or Word).

These files often have no blank lines between paragraphs, which makes detecting where paragraphs begin and end more difficult.

In such files this policy can be enabled so that each "line of text" in the source file will be treated as a separate paragraph.

This policy cannot be automatically detected, and so needs to be set manually where wanted.

Generate diagnostics files

This policy allows you to specify the generation of some diagnostics files. AscToRTF will generate 3 files with the following extensions:

.lis1	A line-by-line summary of how AscToRTF analysed the source file during the analysis pass
.lis	A line-by-line summary of how AscToRTF analysed the source file during the output pass
.stats	A statistics file

The .lis file will give the best description of how the source file has been converted. The differences between the .lis1 and .lis files can be slight, and are down to fact that on the output pass more rigorous attention is applied to the policies.

Any error messages generated during the conversion are inserted into the .lis file at the offending line. This will help you determine how relevant they are.

This policy appears on-screen as "Generate log files"

Generate sample policy file

This policy allows you to generate a policy file containing all the policies used during the output pass. This will help you understand how AscToRTF has interpreted your document, and may help in determining where the analysis may have gone wrong and need correcting.

Note, this file will contain all the policies used, and as such is probably not suited for use as an input policy.

This option is equivalent to the policies "Output policy file" and "Output policy filename".

See the discussion in the Documentation available

Document details

These document details are placed in the "information" section of the generated RTF document. Depending on your RTF client these details may, or may not be visible, for example in Word for Windows you can view these details under the File -> Properties menu.

On some systems search programs can use this information to help locate your documents. The details that can be set include

Title. You have three options

use first line as title

use first heading as title

set a default title

Subject

Author

Manager

Company Name

Category

Keywords

Comments

Use first line as title

New in version 2.0

When this option is selected, the first line in the document will be treated as the document title, and will be copied across to the document properties part of the output file. This can be a useful option to select when the first line of your document is a document title line.

If you also want the first line to appear in the output as a heading, select the use first line as heading option

Use first heading as title

New in version 2.0

When this option is selected, the first heading detected in the document will also be used as the document's title, and copied across to the properties section of the output document. Note, this relies on the program correctly detecting headings, and in particular the first heading. If the first heading is also the first line, you may want to instead just consider using the use first line as heading and/or use first heading as title policies.

Title

This is the document title to be copied across into the properties section of the output document. The default value is

Converted from [[filename]]

where [[filename]] gets replaced by the original filename (see Pre-processor command: filename).

Formatting Policies

New in version 2.0

These options offer settings that allow you to control some of the formatting applied to the RTF output. Options exits for each of the following areas :-

Automatic centring

These options control the any detected centred text.

Enable automatic centring

Automatic centring tolerance

Paragraphs

These options control the formatting of paragraphs.

Preserve first line indentation

First line indentation (in blocks)

Headings

These options control the formatting of headings.

preserve underlining of headings

Bullets

These options control the formatting of bullets and lists.

Use original bullet text

Characters to use for bullets

Tables

These options control the formatting tables.

table cell alignment

table alignment

number of header rows

Miscellaneous formatting options

ignore multiple blank lines

Allow automatic centring

New in version 2.0

When enabled the software will attempt centred text detection.

This policy appears on-screen as "Enable automatic centring"

Automatic centring tolerance

New in version 2.0

When centred text detection is enabled, this specifies how much off-centre text can be and still be considered as centred text. Text is compared to the page width, taking into account any left hand indentation.

If you make this value larger, more text will be considered to be centred and will be centred in the output, although only blocks of text that are wholly centred (all lines fall within the specified tolerance) will be regarded as centred text in the output.

Preserve new paragraph offset

New in version 2.0

When enabled, any first-line indentation detected for paragraphs will be preserved. Often paragraphs indent the first line by a few spaces. Where the software detects this you have the choice as to whether an indentation should be preserved in the output.

This policy appears on-screen as "Preserve first line indentation"

First line indentation (in blocks)

New in version 2.0

When set to a non-zero value, paragraphs will be set so that the first line in each paragraph is indented relative to the remainder of the paragraph. the indentation is set as a number of tab stop positions.

Preserve underlining of headings

New in version 2.0

When enabled any detected underlined headings will be underlined in the output. Headings may be underlined in the source text to make it clear what they are. When detected by the program these are output using heading styles which tend to make the text bigger and bolder. that being the case you may want to lose the underlining of such headings by switching them off using this policy.

Use original bullet text

New in version 2.0

By default the software will replace bullets in the original text by bullet point characters in the output document. However this isn't always ideal, especially if the text is to be re-exported or emailed to various computer systems. When this policy is enabled the bullet text is left unchanged.

Note, you can choose the bullet text that is used via the policy characters to use for bullets

Characters to use for bullets

New in version 2.0

By default the software will replace bullets in the original text by bullet point characters in the output document. However this isn't always ideal, especially if the text is to be re-exported or emailed to various computer systems. You can use the policy use original bullet text to preserve the originals, or you can use this policy to choose your own, alternative, text markers.

The value should be a string whose first character will be used for level 1 bullets, second for level 2 etc, etc. All levels deeper than the last character will use the last character supplied. So for example the value

+o-

Would use '+' for level 1 bullets, 'o' for level 2 bullets and '-' for level 3 and beyond.

Default TABLE cell alignment

New in version 2.0

By default the software will attempt to automatically calculate the alignment of data inside each cell of a table. This will look at the placement of the data, and the type of data (e.g. numerical data is right justified).

This policy can be used to overrule that process and force a particular alignment. When set it will apply to all cells in all detected tables.

To exert more control over particular columns in particular tables you should consider using a Table definition file

Default TABLE alignment

New in version 2.0

By default the software will attempt to automatically calculate the alignment of a table within a document, and in most cases will simply left align the table, possibly with a margin where one is detected.

This policy can be used to overrule that process and set the alignment for all tables in the document (e.g. to centre all tables).

To exert more control over particular columns in particular tables you should consider using a Table definition file

Default TABLE header rows

This policy specifies how many lines should be regarded as the header of a table. AscToRTF can attempt to detect this, and it may not be the case that all tables in the same file have the same header size.

This policy appears on-screen as "Number of header rows"

Ignore multiple blank lines

New in version 2.0

When enabled multiple blank lines in the input will not be converted to multiple blank lines in the output. This can be desirable when converting a document that has been "paged" and so had extra blank lines added to space out the sections, and this spacing makes no sense and is unwanted in the RTF.

RTF settings

These options offer settings that would normally be set by your RTF viewer.

Document language
Paper size
Paper Orientation
Margin sizes
Mirror margins

Character Encoding

Depending on the application you use to view the RTF file created these settings may affect the spell checking, grammar checking, paging and printing of the generated document.

Language (for proofing)

This specifies the language the document is written in. Depending on the application you use to view the RTF files, this setting may be used by spell checkers and the like when checking the document.

This policy appears on-screen as "Document language".

Paper size

This specifies the paper size. Depending on the application you use to view the RTF files, this may be used when printing the document, and the value selected will affect the paging of the document.

Use Landscape mode

This specifies the paper orientation (portrait or landscape mode). Depending on the application you use to view the RTF files, this may be used when printing the document, and the value selected will affect the paging of the document.

This policy is offered on-screen as the "Paper orientation" option.

Margin sizes

This specifies the margin sizes. Depending on the application you use to view the RTF files, these may be used when printing the document, and the value selected will affect the paging of the document.

The selected margin sizes are saved in any policy file under the following names

Top margin (in cm)

Bottom margin (in cm)

margin (in cm)

margin (in cm)

Mirror Margins

This specifies that margins should be mirrored, i.e. that odd and even pages should reverse margin sizes so that they can be placed facing each other when printed and bound together

Character Encoding

This allows the character encoding to be set. Although designed to convert documents that use the ASCII character set, the software has some ability to convert Japanese and Cyrillic files amongst others. For such files to display correctly, the character encoding has to be set up correctly.

Note, RTF doesn't support all possible languages, and the Arabic conversion may be a little suspect :-)

Make Windows Help file policies

New in version 2.0

AscToRTF can now be used to create RTF files suitable for use as source files when making a WinHelp help file. You can also get AscToRTF to create a Help Compiler project file (.hpj) for you, and tell it some of the things to put in that file. The policies available are

Generate WinHelp project file

WinHelp Resource File

Help file citation

Help file copyright notice

Help title background colour

Help body background colour

For more information see Creating WinHelp files

Generate WinHelp project file

New in version 2.0

When selected, AscToRTF will create a new Help Compiler project file for you. This file will have the same name as your source file but with a .hpj extension. If this option is selected it will overwrite any existing project file, so take care in using it.

You will need a copy of the Help Compiler Workshop (HCW.EXE) from Microsoft in order to load and execute the created project file.

WinHelp Resource File

New in version 2.0

If you are attempting to create a WinHelp file for one of your own software applications you will need to supply the name of the .hm resource file from your project that defines the topic IDs that your software will want to be defined in the help file. This .hm file will be added to your project file and is a crucial link between IDs used in your software, and topics defined in your help file.

See also the discussion in Resource file (.hm)

Help file citation

New in version 2.0

This is the "citation" text added to your help file. This text is displayed anytime someone prints or pastes topics from your WinHelp file.

Help file copyright notice

New in version 2.0

This is the copyright notice attached to your help file.

Help title background colour

New in version 2.0

This is the background colour used for the title text of each topic. This is the non-scrolling section at the top of each topic page.

NOTE:: The colours you specify will only be visible to users running a default Windows colour scheme. If users change this at all, then these colours may have no effect.

Help body background colour

New in version 2.0

This is the background colour used for the main body of each topic. This is the scrolling section that makes up the majority of the topic window.

NOTE:: The colours you specify will only be visible to users running a default Windows colour scheme. If users change this at all, then these colours may have no effect.

Hyperlinks policies

Add hyperlinks

http:// and www references

Convert Email references

Allow email addresses that begin with a number

Convert Gopher references

Convert Telnet references

Convert FTP references

Convert "weak" ftp references

Convert USENET newsgroup references

Convert only recognised USENET newsgroups

Additional hierarchies to recognize

Check domain name syntax

Hyperlinks to other section numbers

Convert cross-references to other sections

See also the comments in the adding hyperlinks section.

Create hyperlinks

This specifies that all valid "http" and www references that are found should be turned into active hyperlinks.

Such hyperlinks may sometimes get confused by surrounding punctuation characters.

This policy is shown on-screen as "http:// and www references"

Create mailto links

This specifies that all valid email addresses that are found should be turned into active "mailto" hyperlinks.

AscToRTF has no way of checking email addresses, so "made up" addresses will also get converted, although the domain name will be validated.

An extra option allows email addresses beginning with a number to be accepted. Often USENET message IDs have an email format, but start with a number, so by Default these are not converted to email hyperlinks.

This policy is shown on-screen as "Convert email references"

Allow email beginning with numbers

This specifies whether or not email addresses that begin with numbers are allowed or now.

The program has no way of validating email addresses. Often documents - especially Usenet posts and the like - contain message Ids that look like email addresses but aren't. These usually become with a number, and so by default the program will ignore "addresses" in this form.

On the other hand some ISPs (e.g. older Compuserve accounts) allow email addresses that start with numbers. You should toggle this policy according to which is more appropriate for your documents.

This policy appears on-screen as "Allow email addresses that begin with a number"

Create FTP links

This specifies that all FTP addresses that are found should be turned into active hyperlinks.

These will usually start with "ftp://" or be a domain name starting "ftp.".

However quite often FTP sites have domain names that don't start with "ftp." But do end in a recognised domain type such as ".com". An extra option allows the program to convert such "weak" or implicit FTP references into FTP links. See Only allow explicit FTP links

This policy appears on-screen as "Convert FTP references"

Only allow explicit FTP links

This specifies that all "internet" addresses which don't start with "www." or "ftp." Should be regarded as FTP sites.

Often FTP sites have domain names that don't start with "ftp." But do end in a recognised domain type such as ".com". For example rtfm.mit.edu is a well known archive.

This policy appears on-screen as "Convert "weak" FTP references"

Create Gopher links

New in version 2.0
This specifies that all gopher addresses that are found should be turned into active hyperlinks.

These will usually start with "gopher://".

This policy appears on-screen as "Convert Gopher references"

Create Telnet links

New in version 2.0
This specifies that all telnet addresses that are found should be turned into active hyperlinks.

These will usually start with "telnet://".

This policy appears on-screen as "Convert Telnet references"

Check domain name syntax

New in version 2.0
This specifies whether or not potential URLs should have their "domain name" checked against the known domain name structures, (i.e. ends in .com, .org, .co.uk etc). Having this switched on reduces the likelyhood of invalid URLs being turned into clickable links that don't go anywhere. Note, the software doesn't check the domain exists, only that the domain name obeys the known rules.

You might want to switch this off if your document contains URLs that don't use standard domain names (e.g. they are inside an Intranet).

Create NEWS links

This specifies that AscToRTF should attempt to identify Usenet newsgroup names and turn them into active "news" hyperlinks.

AscToRTF has no way of checking newsgroup names, so by default it will only convert names in recognised hierarchies such as alt.,comp.,rec. etc.

This policy appears on-screen as "Convert USENET newsgroup references"

Only use known groups

This specifies that when detecting Usenet newsgroup names, AscToRTF should only convert names in recognised hierarchies such as alt.,comp.,rec. etc.. You can get the program to recognize additional hierarchies.

This policy is shown on-screen as "Convert only recognised USENET newsgroups"

Recognised USENET groups

This specifies that when detecting Usenet newsgroup names, AscToRTF should additionally allow "newsgroups" in these hierarchies in addition to the standard hierarchies such as alt.,comp.,rec. etc.

This policy is shown on-screen as "Additional hierarchies to recognize"

Convert cross-references to other sections

This specifies whether or not AscToRTF should turn references to section numbers in the main text to hyperlinks to those sections.

This is only possible for numbered sections.

If selected, you should specify the level at which such cross-references should start. A value of "1" will attempt to convert all numbers N, N.N... to hyperlinks. A value of two will attempt to convert N.N, N.N.N... etc.

This policy is quite prone to error (e.g. Windows 3.1 often becomes a hyperlink to section 3.1). Consequently lower values are more error prone. A value of "2" is set by default.

Later versions may address this problem.

This option is saved in the policy file as the "Convert TABLE X-refs to links" and "Cross-refs at level" policies.

Preprocessor policies

These options allow various aspects of the pre-processor to be controlled

Use Preprocessor

Include document section(s)

Use Preprocessor

New in version 2.0

When enabled the pre-processor is activated. You would only ever want to de-activate it to see what difference not processing an pre-processor commands would make.

Include document section(s)

New in version 2.0

This is a comma-separated list of which SECTION's you want included in your document. This only applies if you've made use of the SECTION command to mark up parts of
your document to be conditionally output during the conversion.

Font policies

Fonts

Normal text Default font

Headings Heading Font

Text in tables Table font

Table of contents TOC Font

Fixed-pitch text Fixed font

Default font

This specifies the default font to be used. It may be edited via a normal Windows Font selection dialog.

Heading font

New in version 2.0
This specifies the default font to be used for headings. The actual headings will be based on this font family, but will be made larger and or italic according to the level of heading applied to a given heading.

It may be edited via the Font selection dialog.

Table font

New in version 2.0
This specifies the default font to be used inside tables. This will default to the Default Font, but you may want to set it smaller in order to fit wide tables on the page.

It may be edited via the Font selection dialog.

Table of contents Font

New in version 2.0
This specifies the default font used in any generated Table of Contents. The font family specified will be used, but the different levels of heading in the list will be given different sizes and italics, just as in a default Word document.

It may be edited via the Font selection dialog.

This policy is shown on-screen as "TOC font"

Fixed font

New in version 2.0
This specifies the default font to be used for ASCII art and diagrams and other portions of text where the spacing is to be preserved. For this a mono-spaced font such as Courier is usually used. The Font size is also usually set a bit smaller at 8pt. This is to ensure that an 80-character "line" in the original document will fit on a page in the output document.

It may be edited via the Font selection dialog.

The Font Selection Dialog

Each of the font values may be chosen using the font selection dialog. The selected font is shown as a comma-separated list containing :-

The font name
The font characteristics ("Regular", "bold", "italic" or "bold italic")
The font size (in pts)

Link Dictionary Edit Dialog

Other policies

Other policies can be set as follows:

On the Style Definition File selection dialog

Scope for font tags

Scope for font tags

New in version 2.0

When using an external Style Definition File together with FO tags to control the fonts in your document, this policy controls the scope of the font tag introduced by each new FO tag. The options are

Scope to the end of the file. If this is selected, the associated font will apply until the end of the document, or until another FO tag is encountered

Scope to the next paragraph/table/heading. If this is selected, the associated font will apply until a major new typographical feature is encountered, or a new FO tag is detected.

Scope to the end of the input line. If this is selected, the FO tag will only apply on the rest of the text on the same input line, or until a new FO tag is encountered.

Normal text	Default font
Headings	Heading Font
Text in tables	Table font
Table of contents	TOC Font
Fixed-pitch text	Fixed font