Documentation for the AscToRTF conversion utility |
The latest version of these files is available online at http://www.jafsoft.com/doco/docindex.html
Document policies have two main uses; to correct any failure of analysis that AscToRTF makes, and to tell the program how to produce better RTF in ways that couldn't possibly be inferred from the original text.
Examples of the former may include specifying a nominal page width, and stating whether or not underlined section headings are expected etc.
Examples of the latter include adding colour and titles to the page, as well as requesting that a large document is split into several pages.
Contents of this section
What are Policy files?
Analysis policies
'What to look for' policiesOutput policies
General analysis policies
Bullet policies
Contents policies
File Structure policies
Headings policies
Table analysis policies
File generating policies
Document details
Formatting Policies
RTF settings
Make Windows Help file policies
Hyperlinks policies
Preprocessor policies
Font policies
Link Dictionary Edit Dialog
Other policies
AscToRTF has a large number of options available to influence the analysis of your text files, and the output to RTF. These options are called "policies" as they govern how the source file should be interpreted and converted.
Policies may be saved in text files, known as policy files. These files have a ".pol" extension by default. The policy files are usually updated by changing the policies and saving the changes in a new file. Because they are text files you can also edit them directly, in a text editor. The files have the format of one policy per line of
Text in the form
PolicyText : <policy value>
The use of policy files allow a given set of options to be saved and reused for other conversions, or later conversions of the same file. See Using policy files for more information.
Analysis policies are usually calculated by AscToRTF by making a first pass through your document. The resulting policies are then used during the second, conversion pass to categorise all input lines so that they may be correctly converted to HTML.
You should only need to change these policies should the analysis fail.
These policies act as "broad stroke" policies enabling or disabling areas of functionality within the software by telling it what to look for and to try to detect.
For example you can tell the program whether or not to bother looking for patterns of indentation, bullets, or numbered lists. In many cases if you enable a policy you can further fine tune the conversion details on other policy sheets.
- Look for indentation
- Look for paragraphs
- Look for short lines
- Look for horizontal rules
- Look for bullets and numbered lists
- Look for definitions
- Look for quoted lines
- Look for emphasis
- Look for underlined text
- Look for mail and USENET headers
- Look for character encoding
- Look for regions of preformatted text
- Look for diagrams
AscToRTF can attempt to detect the indentation pattern of your document and replicate it in the output file. If you chose to disable this policy, all your text will be output with no indentations at all.
If the program is wrongly indenting your files, you can try adjusting the pattern of indentation on the General Analysis tabbed policy sheet.
By default AscToRTF will attempt to look for paragraphs in your source. Usually this is signaled by a blank line between paragraphs, a leading indent on the first line of each paragraph, or (in extreme cases) a short line at the end of a paragraph.
If you don't want AscToRTF to detect paragraphs, disable this policy.
If AscToRTF is wrongly detecting paragraphs, try adjusting the paragraph analysis policies on the General Analysis tabbed policy sheet.
By default AscToRTF will attempt to detect short lines and preserve their structure by adding a line break. Disabling this will cause short lines to be merged into the surrounding paragraph's text.
If AscToRTF is wrongly handling your short lines, you can adjust the short line cutoff point or the page width (which is used in short line detection) in the Sizes section of the General Analysis tabbed policy sheet.
By default AscToRTF will treat a series of hyphens, minus signs, equal signs on the same line as a horizontal rule. (On occasion it might be regarded as underlining a heading on the previous line).
You can disable this is you wish, or you can specify how many "line" characters it takes to make a horizontal rule.
By default AscToRTF will try to detect bullet points and numbered lists. This can sometimes go wrong if you have lines that look to the program like bullet points.
You can disable this behaviour should you wish. Alternatively you can fine tune the detection of bullets on the bullet analysis tabbed policy sheet.
By default AscToRTF will try to detect definitions and notes, usually in the form of a single word and a hanging paragraph.
This can often go wrong, so you can use this policy to disable this feature.
By default AscToRTF will try to identify "quoted" lines. Quoted lines are lines that have had a single character (often ">" or "!") inserted at the start. This is common practice when quoting email in a reply. AscToRTF places such text in italics.
You can disable this behaviour should you wish.
New in version 2.0
AscToRTF will try to look for text that has been marked up with underscores and asterisks to signify bold an italic text. For example
This is bold and this is italic
becomes
This is bold and this is italic
New in version 2.0
AscToRTF will try to detect where a line of text has been "underlined" by following it by a same length row of dashes, hyphens, equal signs etc. This text will then be regarded as a candidate for being an underlined heading or - if those are not allowed - underlined text.
If you have tables and reports, you may want to switch this policy off since the line at the end of a table may appear to under- or over-line the last line of text in the table.
AscToRTF will try to look for email and USENET headers. Where these are recognised they can be simplified so that only the To, Form and Subject lines are shown in the output.
You can disable this behaviour should you wish.
Specifies whether or not the software should attempt to detect alternative character sets, such as those used for languages such as Greek, Turkish, Chinese etc.
The software does this by doing a statistical analysis on the characters used in the source file. This process isn't perfect, and when it fails you will need to manually set the correct character set using the Character encoding policy.
If you find the program is wrongly detecting the character encoding, disable this policy and/or manually set it using the Character encoding policy
Note: Not all character sets are supported by RTF.
By default AscToRTF will try to identify regions of preformatted text. Once identified AscToRTF will try to decide if it's a diagram, table or some other form of preformatted text. If it thinks it's a table it will attempt to place the text in an appropriate table structure.
You can disable the search for preformatted text, or if you allow preformatted text, disable table generation. (This may be appropriate if you have a large number of ASCII diagrams in your text).
The search for preformatted text can be refined via the
Pre-formatted text and Table analysis
tabbed policy sheets.
The output of tables can be fine-tuned via the output policy Formatting tabbed policy sheet.
Specifies whether or not regions of preformatted text that are detected should be considered as candidate diagrams. Text that contains numbers of characters such as "|", "-", ">" and "<" may be considered to be an ASCII diagram.
If you find the program is wrongly treating tables as diagrams then disable this policy.
These policies aid AscToRTF's analysis by describing in detail what the contents of the document being converted are
Sizes
Paragraphs
Definitions
Layout
This indicates the width (in characters) of your nominal output page. This width is calculated from the observed line lengths in the original document.
This width is used in short line calculation, and determining whether a given line contains a definition term or not (definition character near the start of the line).
In documents that contain line feeds this should be automatically detected.
In other documents you may need to set this manually.
This indicates the size (in characters) of your tabs. AscToRTF converts all tabs to spaces on conversion before analysis. By default a tab size of 8 characters is assumed.
The tab size can influence the analysis of paragraph indentations and other layout. Provided they are used consistently there shouldn't be a problem. However where tabs and spaces are used in combination, mistakes can arise.
This is particularly true in tables of data. AscToRTF does not expect tab-separated table cells, instead converting the tabs to spaces and analysing the results.
If your source document has been created with an editor with a different tab size, you should change this value should you start to experience strange layout conversion problems.
This policy is used to determine what is a "short line". Short lines are treated specially by AscToRTF by adding a paragraph marker on the end. They can also be used to detect ends of paragraphs in those documents that don't have blank lines between paragraphs.
Normally AscToRTF will determine whether or not a line is short by comparing it to the page width, given the current context.
The default value is 0 characters (indicating a comparison to Page Width should be used). Set this to any value you like. A value of 80 is likely to make every line in your original document have a paragraph marker on the end.
This policy tells AscToRTF what the smallest chapter size may be. This is used when trying to determine if a numbered line is a chapter heading. AscToRTF tries to avoid treating numbered lists as a series of small chapters using this policy.
The default value is 8 lines. Change this only if you suspect small chapters are being ignored, or large list items are being treated as chapter headings.
AscToRTF can detect whether or not it should expect blank lines between paragraphs. Documents without blank lines between paragraphs will be harder to convert, and errors are more likely. Unfortunately text documents exported from Word for Windows often have this property.
Where there are no blank lines, AscToRTF relies of spotting the last line of a paragraph (usually shorter), and (in some documents) the presence of a hanging indent at the start of each new paragraph.
This should be automatically detected.
Some documents start the first line of a new paragraph with an offset of a number of characters. This is especially true in text files saved from Word for Windows documents.
AscToRTF can sometimes confuse such paragraphs as being two different levels of indentation. Use this policy to eliminate such confusion.
This should be automatically detected
This policy can be used to disable the search for definitions. Sometimes this leads to unexpected results with text that is not part of a definition being treated as such. In such cases you can adjust the definition policies, but if this still fails, use this to disable the search completely.
See also one-line definitions and definition paragraphs
This policy identifies the indentations used for the follow-on text in definition paragraphs. These indentation levels need not be the same as the indentation levels used for normal text, though of course often they are.
This should be detected automatically, but if your document has only a few examples it's possible AscToRTF will ignore them. In such cases you may need to set this policy manually.
Note, this policy appears on-screen as "Definition paragraph indent levels"
This policy specifies whether or not hyphen (-) characters are used in one-line definitions.
If the hyphen character only occurs in definitions, then set the nearby always flag, otherwise AscToRTF will have to guess whether a particular character is part of a definition or not. This is sometimes a source of conversion errors.
If this policy is selected, it will result in a suitable "Definition Char" line being added to the policy file.
This should be detected automatically.
This policy specifies whether or not colon (:) characters are used in one-line definitions.
If the colon character only occurs in definitions, then set the nearby always flag, otherwise AscToRTF will have to guess whether a particular character is part of a definition or not. This is sometimes a source of conversion errors.
If this policy is selected, it will result in a suitable "Definition Char" line being added to the policy file.
This should be detected automatically.
This policy specifies which other characters are used in one-line definitions.
This may be detected automatically, but more likely you'll need to specify it yourself.
Each character selected as a potential delimiter will result in a "Definition Char" line being added to the policy file.
AscToRTF recognises multiple levels of indentation. This policy shows the character levels at which indentation has been detected.
AscToRTF converts all tab characters into multiple spaces in input. These indentation positions are the positions that result after that conversion. Depending on your tab settings these might not be exactly the positions you would expect.
Normally these levels are correctly detected automatically, but should you wish to set them manually you may need to experiment slightly to see how AscToRTF has handled your tabs.
AscToRTF should be able to detect the use of bullets on a reasonably sized document. These policies describe the type of bullets expected.
Expected Bullet types
Bullet characters
This policy states whether or not the program should attempt to automatically detect bullets and numbered lists. This should normally be left on unless your document has no such features, but the program (wrongly) thinks it has.
This policy appears on the Bullets dialog as "Automatically detect bullets and numbered lists", but is identical to the "Look for bullets" policy on the 'What to look for' policies tabbed property sheet.
This policy states whether or not numbered bullet points are expected. The numbered bullets can be followed by any punctuation, thus 1., 2) and (3) will all be recognised, but RTF will not necessarily support this in the markup produced.
This should be automatically detected.
This policy states whether or not alphabetic bullet points are expected. The numbered bullets can be followed by any punctuation, thus a., b) and (c) will all be recognised, but RTF will not necessarily support this in the markup produced.
Both upper and lower case bullets are recognised (and supported in the markup).
This should be automatically detected
This policy states whether or not roman numeral bullet points are expected. The numbered bullets can be followed by any punctuation, thus i., ii) and (iii) will all be recognised, but RTF will not necessarily support this in the markup produced.
Both upper and lower case bullets are recognised (and supported in the markup), although the range of roman numeral values supported is limited.
This should be automatically detected.
This policy states whether or not bullet points starting with the hyphen character '-' are expected.
This policy appear on-screen as "Recognize hyphen character as a bullet point"
This should be automatically detected.
This policy states whether or not bullet points starting with the lower case 'o' are expected.
This policy appear on-screen as "Recognize 'o' character as a bullet point"
This should be automatically detected.
This policy lists any other characters that are to be recognised as bullet characters.
Each bullet character entered will appear in the policy file as it's own "Bullet Char" line.
This should be automatically detected, but may sometimes need to be manually entered.
This dialog shows both analysis and output policies connected with contents list detection and generation.
Analysis
This policy specifies whether or not the document already contains a contents list. If it does, AscToRTF will attempt to convert the existing list into a series of hyperlinks.
This should be detected automatically, but occasionally you will need to set this policy manually.
See the discussion on contents list generation in the Documentation available
These policies aid AscToRTF's analysis by describing some of the file structure that would affect the analysis.
Expected File contents
Text Attributes
Text to ignore
AscToRTF puts a lot of effort into detecting overall structure such as headings etc.
In documents that don't have any such structure, AscToRTF is liable to convert any line with a number at the start into a heading.
To prevent this, you can mark the document as simple, that is with no global structure. In a simple document AscToRTF will attempt far less analysis.
This policy appears on-screen as "Expect only a simple layout".
AscToRTF attempts to automatically identify simple documents, but you may still need to set this policy manually.
AscToRTF can markup C-like code fragments in <PRE>...</PRE> tags to preserve the layout and readability of the quoted code.
This may be automatically detected, but occasionally needs to be manually corrected.
AscToRTF can convert files that use the DOS (OEM) character set. By default the file is assumed to be in the ANSI character set, but some files may have originated under DOS.
This may be automatically detected, but usually needs to be manually set.
New in version 2.0
Indicates that the input file contains PCL printer codes. When set, the
program will make whatever sensible use it can of these codes, otherwise they
will be removed.
Please note that the PCL printer codes offer a rich command language that may be used to drive graphical printers. As such the emulation possibilities in a text converter are limited, and it is quite likely that files that make heavy use of such codes will fail dramatically to convert.
That said, those codes that are not recognised will be eliminated from the output.
*** not implemented yet ***
Files using non-ASCII character sets (Japanese, Korean etc) will be incorrectly converted. This may be fixed (as far as possible) in later versions.
Appears on-screen as "Contains non-European (e.g. Japanese) characters"
AscToRTF can convert mime-encoded quotable characters. These will usually appear in files that were originally part of an email message. Such files use the "=" character to escape special characters. So for example "=20" should be interpreted as a space.
This appears on-screen as "Contains mime-encoded quotable characters"
This may be automatically detected in files where the "=" is used to break up long lines, but more usually you will need to manually set this.
AscToRTF can strip out change bars in documents that contain them. Change bars are usually a vertical bar '|' placed in the leftmost or rightmost column.
Currently this is not automatically detected, and so will need to be manually switched on.
AscToRTF has a limited ability to remove page markers. These are normally a few lines following a form feed (FF) character, containing page numbers etc. This will commonly occur with files generated from older software packages.
The number of lines after each form feed (FF) that should be ignored. These lines will not be copied to the output.
AscToRTF recognises documents that are left justified (default), right justified, centred or both left and right justified (confusingly known as "justified").
The program cannot currently mark up the text in a matching style,
but this policy
is important in the analysis. For example "justified" documents
are padded with extra white space which could be interpreted as
pre-formatted text where the document not recognised as being
justified.
Normally this policy is correctly detected automatically.
AscToRTF will normally treat a blank line as a break between paragraphs. Some files have extra CR/LF characters (usually if they've come from a different computer, or from a printer package). In such cases AscToRTF will see every second line as blank, and this will affect the analysis, usually by turning each line of data into a separate paragraph.
If you have such a file, use this policy to mark the file as double spaced to get better results.
New in version 2.0
This specifies how many lines from the input files should be
ignored at the start of the file. These lines will be discarded
from the output.
This can be useful when converting file copied from a news feed or whatever that adds a small data header to the file.
New in version 2.0
This specifies how many lines from the input files should be
ignored at the end of the file. Up to 40 lines may be ignored
in this way. These lines will be discarded from the output.
This can be useful when converting file copied from a news feed or whatever that adds a small data footer to the file.
These policies determine the headings structure that the document is expected to have. Normally these are calculated correctly by AscToRTF, but due to the complexity of heading detection, you may sometimes need to correct the analysis.
At the top of the dialog you can specify what type of headings you expect to see. Any combination is allowed, although usually documents use just one type of heading.
If numbered headings are expected, it may be possible to expect headings at multiple levels, and to also expect a contents list. Each level of heading will have it's own set of policies which are shown on this dialog. The policies are shown in text form, but are edited via the heading details dialog
Note: This area of functionality is continually under review.
See also the discussion in detecting headings and section titles.
This policy specifies whether or not numbered headings are expected in the document.
Numbered headings may be found at multiple levels, and their details may be edited via The heading details dialog
This should be calculated correctly by AscToRTF. But is prone to error, getting confused by numbered bullets and the like. In such cases you may need to set this policy manually.
This policy specifies whether or not underlined headings are expected. Note, where the headings themselves are numbered, the underlining will be taken into account, and you should set the expect numbered headings policy instead.
AscToRTF uses the character in the underlining to determine the heading level, thus text underlined with equals signs is given prominence over text with single underline characters such as minus signs, tildes or underscores.
This policy specifies whether or not CAPITALISED headings are expected. Note, where the headings themselves are numbered, this policy need not be set, and instead you should set the expect numbered headings policy instead.
New in version 2.0
This policy specifies whether or not "embedded" headings are
expected, i.e.. the heading is "embedded" in the first paragraph.
Such headings are expected to be a complete sentence or phrase
in UPPER CASE at the start of a paragraph.
At present such headings are not auto-detected... you need to switch this policy.
New in version 2.0
If specified, then any line that begins with one of the key phrases
will be regarded as a heading. The syntax is
<details>, <details>...
where each set of details is
<details> = <phrases>, [<heading_level>]
and
<phrases> = <phrase_1> [|<phase_2>]
That is, each set of <details> can optionally specify a <heading_level>. If omitted this will default to 1,2,3 for the first, second, third set of details etc. Note, this is a logical heading level, and will be apparent in the contents list.
Each set of <details> must supply a set of <phrases>, and each set of phrases would must have at least one phrase with extra phrases added if wanted, separated by vertical bars.
So for example
Part, Chapter, Section
would treat lines beginning with the words "Part", "Chapter" and "Section" as level 1,2, and 3 headings.
The key phrases are case-sensitive in order to reduce the likelihood of false matches with lines that just happen to have these phrases at the start of the line. So
PART|Part, Chapter, Section
Would allow either "PART" or "Part" to be matched.
"PART|Part,1" , "Chapter,2" , "Section,2"
Would make lines beginning with "Part" level-1 headings, while both "Chapter" and "Section" would become level 2. This would be the same as
"PART|Part,1" , "Chapter|Section,2"
Note, spaces may form part of a match phrase, but because of their use in the tag syntax commands and vertical bars may not.
If false matches occur, (e.g. the word "Part" appears in the body of the text) edit the source text so that the offending word is no longer at the start of the line.
New in version 2.0
When this option is selected, the first line in the document will be treated as a heading. This can be a useful option to select when the first line of your document is a document title line, but doesn't conform to the headings style used in the rest of the document.
See also use first line as title
New in version 2.0
When this option is selected, the first heading in the document is centred. This may be an appropriate choice when the first heading is in fact to be treated as a document title.
See also use first line as heading
The program performs a number of consistency checks when detecting headings. Amongst these is a check that all headings of the same type occur at the same indentation. This check can help distinguish between numbered headings and numbered lists.
However, if you have numbered headings that are different indentations - e.g. because they are centred on the page - then this check will cause them to be rejected as headings. In such cases you can manually disable this check.
This policy appears on-screen as "Check indentations of headings are consistent"
This dialog is reached through one of the edit buttons on the main Headings Policies dialog. This allows you to edit details of a particular type or level of heading.
Position of section number on the line
Section number formatting
Bracketing
AscToRTF uses checks on indentation levels to reject lines with numbers on that could be confused with headers.
This is the indentation level (in characters) that heading of this types are expected to be found at.
Some documents put words like "chapter", "subject" and "section" in front of the section number. These are known as prefix words.
This is the numbering scheme expected for headings at this level. At present AscToRTF can't cope with mixed types like "II-2.b".
This may be addressed in later versions.
This shows the separator expected between parts of the heading number.
*** Not currently supported ***
This shows whether we expect trailing letters after the section number, as in "1.1b".
*** Not currently supported ***
This shows what bracket characters (if any) we expect before and after the section number as in "[2.2]" or "3.2.1)".
*** Not currently supported ***
These policies specify how AscToRTF detects pre-formatted text.
Detecting pre-formatted regions
See the section on pre-formatted text for more details.
This policy specifies the minimum number of consecutive pre-formatted lines that must be detected before the text is placed in fixed width font.
AscToRTF detects heavily formatted lines, and then looks at their neighbours to see if they too could be part of a pre-formatted text.
Once a group of lines is identifies, it will only be marked up as pre-formatted if the minimum is exceeded.
The default value is 0. Set this value larger if AscToRTF is marking text as pre-formatted when it shouldn't do.
These policies specify how AscToRTF detects possible tables and analyses the data in them into columns and rows.
Detection
Analysing rows
Analysing columns
See the section on pre-formatted text for more details.
This policy specifies whether or not you want RTF table generation attempted for regions of apparently pre-formatted text. AscToRTF will attempt to analyse such regions, preferring to fit them into a RTF table. However, if this is not possible, or if AscToRTF decides the pre-formatted region is something else (like a diagram or a piece of code) then a RTF table will not be generated.
Disabling this policy tells AscToRTF not to attempt this analysis, usually leading to pre-formatted text being placed in simple fixed width font markup instead.
When the program encounters a strongly formatted line, it examines the adjacent lines to see if they too could form part of the same preformatted region.
This policy specifies the extend to which strongly preformatted lines should be used to "extend" to include adjacent lines as part of the same preformatted regions. If set to 10, then all adjacent lines up to the next page break or section heading will be treated as part of the same region. When set to 1 only those lines that are clearly heavily formatted themselves will be included.
This policy appears on-screen as "Extend preformatted regions"
New in version 2.0
This option specifies whether or not tables are expected to have blank lines between rows. If they are, the software will be more likely to merge the text for adjacent source lines into a single row in the output table.
New in version 2.0
This option allows you to specify the default table layout for all tables in the document. The layout specifies the number of columns and their end positions.
This is the default layout and will normally be applied to all tables the document. If a document has multiple tables you are better off either using the preprocessor to mark up the source text and supplying TABLE_LAYOUT commands, or supplying a "Layout" component in a Table Definition File.
The format of the Table Layout policy is the same as that described in the discussion of the TABLE_LAYOUT pre-processor command.
See also TDF line: Layout
This policy is used to tell AscToRTF that you expect your tables to be quite sparse in places. This can affect AscToRTF's analysis, as the algorithms are liable to merge "empty" columns with their less empty neighbours.
Enabling this policy will usually result in your tables having more, emptier, columns.
See also the Pre-processor command: TABLE_MAY_BE_SPARSE.
This policy specifies that the table header should be ignored when analysing the column structure of the table.
In some tables (usually "reports") the header can be quite complex, with titles spanning multiple columns, whereas the body of the table is much more structured.
In such cases including the table header in the analysis can lead to errors, so enabling this policy can simplify the analysis giving better chances of success.
This policy appears on-screen as "Ignore table header when analysing columns"
Once the program has detected the column layout of a table, it reviews how well the data can be fitted into these columns. If too many cells in a column are empty, or if too many cells "span" multiple columns, then the columns are deemed to be "poor", and may be merged together to form fewer, wider columns.
This factor determines the extent to which columns should be merged. A value of 10 means columns should be merged together whenever there is any doubt. Use this if you are getting too many columns. A value of 1 means columns should never be merged. Use this if you are getting too few columns.
This policy appears on-screen as "Merge together "poor" columns".
Note, this policy can't guarantee you will the correct column structure, but it does give you a chance to influence the logic.
This policy specifies the minimum number of spaces that should be interpreted as a gap between columns in a potential table. The default value is 1, but this value can sometimes lead to too many columns, especially in small tables. Larger values may lead to columns being merged together.
This policy appears on-screen as "Minimum number of spaces between table columns"
These policies are used to control the output to RTF. Generally these policies allow you to decide how the resulting RTF should look in a manner that cannot be inferred from the original document.
Line and file structures
Diagnostics Files
This policy can be used to place the whole file inside <PRE>...</PRE> markup. This will use a mono spaced font that preserves the line structure and the relative spacing of characters.
When this is enabled almost all of the program's other conversions will be disabled. You should only really use this if your document has a lot of formatting that the program is failing to understand.
This policy needs to be set manually where wanted.
This policy specifies that the line structure of the original document should be preserved, rather than just the paragraph structure.
If enabled the lines in the output document will match those of the original document, and the text will not automatically be adjusted if you widen your window. On large monitors this will give the text an "A4" look and feel.
This policy needs to be set manually where wanted.
Some files do not break large paragraphs into smaller lines, but instead place the whole paragraph on a single line. This is especially true if the source file was created by a text editor that relied on word wrap (such as Notepad or Word).
These files often have no blank lines between paragraphs, which makes detecting where paragraphs begin and end more difficult.
In such files this policy can be enabled so that each "line of text" in the source file will be treated as a separate paragraph.
This policy cannot be automatically detected, and so needs to be set manually where wanted.
This policy allows you to specify the generation of some diagnostics files. AscToRTF will generate 3 files with the following extensions:
.lis1 | A line-by-line summary of how AscToRTF analysed the source file during the analysis pass |
.lis | A line-by-line summary of how AscToRTF analysed the source file during the output pass |
.stats | A statistics file |
The .lis file will give the best description of how the source file has been converted. The differences between the .lis1 and .lis files can be slight, and are down to fact that on the output pass more rigorous attention is applied to the policies.
Any error messages generated during the conversion are inserted into the .lis file at the offending line. This will help you determine how relevant they are.
This policy appears on-screen as "Generate log files"
This policy allows you to generate a policy file containing all the policies used during the output pass. This will help you understand how AscToRTF has interpreted your document, and may help in determining where the analysis may have gone wrong and need correcting.
Note, this file will contain all the policies used, and as such is probably not suited for use as an input policy.
This option is equivalent to the policies "Output policy file" and "Output policy filename".
See the discussion in the Documentation available
These document details are placed in the "information" section of the generated RTF document. Depending on your RTF client these details may, or may not be visible, for example in Word for Windows you can view these details under the File -> Properties menu.
On some systems search programs can use this information to help locate your documents. The details that can be set include
- Title. You have three options
- use first line as title
- use first heading as title
- set a default title
- Subject
- Author
- Manager
- Company Name
- Category
- Keywords
- Comments
New in version 2.0
When this option is selected, the first line in the document will be treated as the document title, and will be copied across to the document properties part of the output file. This can be a useful option to select when the first line of your document is a document title line.
If you also want the first line to appear in the output as a heading, select the use first line as heading option
New in version 2.0
When this option is selected, the first heading detected in the document will also be used as the document's title, and copied across to the properties section of the output document. Note, this relies on the program correctly detecting headings, and in particular the first heading. If the first heading is also the first line, you may want to instead just consider using the use first line as heading and/or use first heading as title policies.
This is the document title to be copied across into the properties section of the output document. The default value is
Converted from [[filename]]
where [[filename]] gets replaced by the original filename (see Pre-processor command: filename).
See also:
New in version 2.0
These options offer settings that allow you to control some of the formatting applied to the RTF output. Options exits for each of the following areas :-
Automatic centring
These options control the any detected centred text.
Paragraphs
These options control the formatting of paragraphs.
Headings
These options control the formatting of headings.
Bullets
These options control the formatting of bullets and lists.
Tables
These options control the formatting tables.
Miscellaneous formatting options
New in version 2.0
When enabled the software will attempt centred text detection.
This policy appears on-screen as "Enable automatic centring"
New in version 2.0
When centred text detection is enabled, this specifies how much off-centre text can be and still be considered as centred text. Text is compared to the page width, taking into account any left hand indentation.
If you make this value larger, more text will be considered to be centred and will be centred in the output, although only blocks of text that are wholly centred (all lines fall within the specified tolerance) will be regarded as centred text in the output.
New in version 2.0
When enabled, any first-line indentation detected for paragraphs will be preserved. Often paragraphs indent the first line by a few spaces. Where the software detects this you have the choice as to whether an indentation should be preserved in the output.
This policy appears on-screen as "Preserve first line indentation"
See also First line indentation (in blocks)
New in version 2.0
When set to a non-zero value, paragraphs will be set so that the first line in each paragraph is indented relative to the remainder of the paragraph. the indentation is set as a number of tab stop positions.
See also Preserve new paragraph offset
New in version 2.0
When enabled any detected underlined headings will be underlined in the output. Headings may be underlined in the source text to make it clear what they are. When detected by the program these are output using heading styles which tend to make the text bigger and bolder. that being the case you may want to lose the underlining of such headings by switching them off using this policy.
New in version 2.0
By default the software will replace bullets in the original text by bullet point characters in the output document. However this isn't always ideal, especially if the text is to be re-exported or emailed to various computer systems. When this policy is enabled the bullet text is left unchanged.
Note, you can choose the bullet text that is used via the policy characters to use for bullets
New in version 2.0
By default the software will replace bullets in the original text by bullet point characters in the output document. However this isn't always ideal, especially if the text is to be re-exported or emailed to various computer systems. You can use the policy use original bullet text to preserve the originals, or you can use this policy to choose your own, alternative, text markers.
The value should be a string whose first character will be used for level 1 bullets, second for level 2 etc, etc. All levels deeper than the last character will use the last character supplied. So for example the value
+o-
Would use '+' for level 1 bullets, 'o' for level 2 bullets and '-' for level 3 and beyond.
New in version 2.0
By default the software will attempt to automatically calculate the alignment of data inside each cell of a table. This will look at the placement of the data, and the type of data (e.g. numerical data is right justified).
This policy can be used to overrule that process and force a particular alignment. When set it will apply to all cells in all detected tables.
To exert more control over particular columns in particular tables you should consider using a Table definition file
New in version 2.0
By default the software will attempt to automatically calculate the alignment of a table within a document, and in most cases will simply left align the table, possibly with a margin where one is detected.
This policy can be used to overrule that process and set the alignment for all tables in the document (e.g. to centre all tables).
To exert more control over particular columns in particular tables you should consider using a Table definition file
This policy specifies how many lines should be regarded as the header of a table. AscToRTF can attempt to detect this, and it may not be the case that all tables in the same file have the same header size.
This policy appears on-screen as "Number of header rows"
New in version 2.0
When enabled multiple blank lines in the input will not be converted to multiple blank lines in the output. This can be desirable when converting a document that has been "paged" and so had extra blank lines added to space out the sections, and this spacing makes no sense and is unwanted in the RTF.
These options offer settings that would normally be set by your RTF viewer.
Document language
Paper size
Paper Orientation
Margin sizes
Mirror margins
Depending on the application you use to view the RTF file created these settings may affect the spell checking, grammar checking, paging and printing of the generated document.
This specifies the language the document is written in. Depending on the application you use to view the RTF files, this setting may be used by spell checkers and the like when checking the document.
This policy appears on-screen as "Document language".
This specifies the paper size. Depending on the application you use to view the RTF files, this may be used when printing the document, and the value selected will affect the paging of the document.
This specifies the paper orientation (portrait or landscape mode). Depending on the application you use to view the RTF files, this may be used when printing the document, and the value selected will affect the paging of the document.
This policy is offered on-screen as the "Paper orientation" option.
This specifies the margin sizes. Depending on the application you use to view the RTF files, these may be used when printing the document, and the value selected will affect the paging of the document.
The selected margin sizes are saved in any policy file under the following names
- Top margin (in cm)
- Bottom margin (in cm)
- margin (in cm)
- margin (in cm)
This specifies that margins should be mirrored, i.e. that odd and even pages should reverse margin sizes so that they can be placed facing each other when printed and bound together
This allows the character encoding to be set. Although designed to convert documents that use the ASCII character set, the software has some ability to convert Japanese and Cyrillic files amongst others. For such files to display correctly, the character encoding has to be set up correctly.
Note, RTF doesn't support all possible languages, and the Arabic conversion may be a little suspect :-)
New in version 2.0
AscToRTF can now be used to create RTF files suitable for use as source files when making a WinHelp help file. You can also get AscToRTF to create a Help Compiler project file (.hpj) for you, and tell it some of the things to put in that file. The policies available are
For more information see Creating WinHelp files
New in version 2.0
When selected, AscToRTF will create a new Help Compiler project file for you. This file will have the same name as your source file but with a .hpj extension. If this option is selected it will overwrite any existing project file, so take care in using it.
You will need a copy of the Help Compiler Workshop (HCW.EXE) from Microsoft in order to load and execute the created project file.
New in version 2.0
If you are attempting to create a WinHelp file for one of your own software applications you will need to supply the name of the .hm resource file from your project that defines the topic IDs that your software will want to be defined in the help file. This .hm file will be added to your project file and is a crucial link between IDs used in your software, and topics defined in your help file.
See also the discussion in Resource file (.hm)
New in version 2.0
This is the "citation" text added to your help file. This text is displayed anytime someone prints or pastes topics from your WinHelp file.
New in version 2.0
This is the copyright notice attached to your help file.
New in version 2.0
This is the background colour used for the title text of each topic. This is the non-scrolling section at the top of each topic page.
See also help body background colour
New in version 2.0
This is the background colour used for the main body of each topic. This is the scrolling section that makes up the majority of the topic window.
See also help title background colour
Add hyperlinks
Hyperlinks to other section numbers
See also the comments in the adding hyperlinks section.
This specifies that all valid "http" and www references that are found should be turned into active hyperlinks.
Such hyperlinks may sometimes get confused by surrounding punctuation characters.
This policy is shown on-screen as "http:// and www references"
This specifies that all valid email addresses that are found should be turned into active "mailto" hyperlinks.
AscToRTF has no way of checking email addresses, so "made up" addresses will also get converted, although the domain name will be validated.
An extra option allows email addresses beginning with a number to be accepted. Often USENET message IDs have an email format, but start with a number, so by Default these are not converted to email hyperlinks.
This policy is shown on-screen as "Convert email references"
This specifies whether or not email addresses that begin with numbers are allowed or now.
The program has no way of validating email addresses. Often documents - especially Usenet posts and the like - contain message Ids that look like email addresses but aren't. These usually become with a number, and so by default the program will ignore "addresses" in this form.
On the other hand some ISPs (e.g. older Compuserve accounts) allow email addresses that start with numbers. You should toggle this policy according to which is more appropriate for your documents.
This policy appears on-screen as "Allow email addresses that begin with a number"
This specifies that all FTP addresses that are found should be turned into active hyperlinks.
These will usually start with "ftp://" or be a domain name starting "ftp.".
However quite often FTP sites have domain names that don't start with "ftp." But do end in a recognised domain type such as ".com". An extra option allows the program to convert such "weak" or implicit FTP references into FTP links. See Only allow explicit FTP links
This policy appears on-screen as "Convert FTP references"
This specifies that all "internet" addresses which don't start with "www." or "ftp." Should be regarded as FTP sites.
Often FTP sites have domain names that don't start with "ftp." But do end in a recognised domain type such as ".com". For example rtfm.mit.edu is a well known archive.
This policy appears on-screen as "Convert "weak" FTP references"
New in version 2.0
This specifies that all gopher addresses that are found should
be turned into active hyperlinks.
These will usually start with "gopher://".
This policy appears on-screen as "Convert Gopher references"
New in version 2.0
This specifies that all telnet addresses that are found should
be turned into active hyperlinks.
These will usually start with "telnet://".
This policy appears on-screen as "Convert Telnet references"
New in version 2.0
This specifies whether or not potential URLs should have their "domain name"
checked against the known domain name structures, (i.e. ends in .com, .org,
.co.uk etc). Having this switched on reduces the likelyhood of invalid
URLs being turned into clickable links that don't go anywhere. Note,
the software doesn't check the domain exists, only that the domain name
obeys the known rules.
You might want to switch this off if your document contains URLs that don't use standard domain names (e.g. they are inside an Intranet).
This specifies that AscToRTF should attempt to identify Usenet newsgroup names and turn them into active "news" hyperlinks.
AscToRTF has no way of checking newsgroup names, so by default it will only convert names in recognised hierarchies such as alt.,comp.,rec. etc.
This policy appears on-screen as "Convert USENET newsgroup references"
This specifies that when detecting Usenet newsgroup names, AscToRTF should only convert names in recognised hierarchies such as alt.,comp.,rec. etc.. You can get the program to recognize additional hierarchies.
This policy is shown on-screen as "Convert only recognised USENET newsgroups"
This specifies that when detecting Usenet newsgroup names, AscToRTF should additionally allow "newsgroups" in these hierarchies in addition to the standard hierarchies such as alt.,comp.,rec. etc.
This policy is shown on-screen as "Additional hierarchies to recognize"
This specifies whether or not AscToRTF should turn references to section numbers in the main text to hyperlinks to those sections.
This is only possible for numbered sections.
If selected, you should specify the level at which such cross-references should start. A value of "1" will attempt to convert all numbers N, N.N... to hyperlinks. A value of two will attempt to convert N.N, N.N.N... etc.
This policy is quite prone to error (e.g. Windows 3.1 often becomes a hyperlink to section 3.1). Consequently lower values are more error prone. A value of "2" is set by default.
Later versions may address this problem.
This option is saved in the policy file as the "Convert TABLE X-refs to links" and "Cross-refs at level" policies.
These options allow various aspects of the pre-processor to be controlled
New in version 2.0
When enabled the pre-processor is activated. You would only ever want to de-activate it to see what difference not processing an pre-processor commands would make.
New in version 2.0
This is a comma-separated list of which SECTION's you want included
in your document. This only applies if you've made use of the
SECTION command to mark up parts of
your document to be conditionally output during the conversion.
Fonts
Normal text Default font Headings Heading Font Text in tables Table font Table of contents TOC Font Fixed-pitch text Fixed font
This specifies the default font to be used. It may be edited via a normal Windows Font selection dialog.
New in version 2.0
This specifies the default font to be used for headings.
The actual headings will be based on this font family, but
will be made larger and or italic according to the level of
heading applied to a given heading.
It may be edited via the Font selection dialog.
New in version 2.0
This specifies the default font to be used inside tables.
This will default to the Default Font, but you may want
to set it smaller in order to fit wide tables on the page.
It may be edited via the Font selection dialog.
New in version 2.0
This specifies the default font used in any generated Table
of Contents. The font family specified will be used, but the
different levels of heading in the list will be given different
sizes and italics, just as in a default Word document.
It may be edited via the Font selection dialog.
This policy is shown on-screen as "TOC font"
New in version 2.0
This specifies the default font to be used for ASCII art and
diagrams and other portions of text where the spacing is to be
preserved. For this a mono-spaced font such as Courier is usually
used. The Font size is also usually set a bit smaller at 8pt.
This is to ensure that an 80-character "line" in the original
document will fit on a page in the output document.
It may be edited via the Font selection dialog.
Each of the font values may be chosen using the font selection dialog. The selected font is shown as a comma-separated list containing :-
See also Using link dictionary files
The Link Dictionary allows you to convert text in your files into hyperlinks. For fuller details see Using link dictionary files
This dialog allows you to edit the links in your link dictionary, although if you take care you can do this more easily by opening up you dictionary file in a text editor such as NotePad.
To enter a new link
- Click on "Add a new link definition"
- Enter the definition details in the edit boxes, replacing the demonstration text
- Press the "Add link" button.
To update or remove a link
- Click on the desired link on the list on the left.
- Edit the details of the link in the edit boxes on the right
- Press the "Update link" or "Remove Link" buttons
Each link definition consists of three parts :-
Text to be matched
This is the text as it will appear in your source file. The text must be contained on a single line of the input file. Care should be taken to avoid using substrings of other matched text. For this reason it is a good idea to edit your source files to put brackets round the links your want [like this] and then only match the text including the brackets.
Replacement text
This is the text that will appear as the hyperlink text. Normally this is very similar to the matched text.
Hyperlink URL
This is the hyperlink's URL. It can be absolute or relative or even local to the current page. Just ensure it is correct for where your pages are going to end up.
Other policies can be set as follows:
On the Style Definition File selection dialog
New in version 2.0
When using an external Style Definition File together with FO tags to control the fonts in your document, this policy controls the scope of the font tag introduced by each new FO tag. The options are
- Scope to the end of the file. If this is selected, the associated font will apply until the end of the document, or until another FO tag is encountered
- Scope to the next paragraph/table/heading. If this is selected, the associated font will apply until a major new typographical feature is encountered, or a new FO tag is detected.
- Scope to the end of the input line. If this is selected, the FO tag will only apply on the rest of the text on the same input line, or until a new FO tag is encountered.
See also
Converted from
a single text file by
AscToHTM © 1997-2004 John A Fotheringham |