Documentation for the AscToRTF conversion utility : Change History

Documentation for the AscToRTF conversion utility

The latest version of these files is available online at http://www.jafsoft.com/doco/docindex.html

Change History

Contents of this section

Version 2.0 (February 2004)

New features

Support for generating RTF as WinHelp source files
New ability to define table layouts and formatting rules
Ability to "tag" your own tables for greater accuracy
Input text manipulation and labelling using "Text commands"
Control over document styling by using a "Style Definition File"
Support for non-ASCII character types and character encodings
Support for comma-delimited and tab-delimited tables

New policies

New Font policies
New Heading policies
New Hyperlink policies
New file, page, paragraph and line structure policies
New Table policies
Other new policies

New programs

API version now available
New utility A2HDETAG

Other changes

New Preprocessor tags
Other new options
New document, the "Table Manual"
Changes to the Windows version
Changes to the command line version
Changes to document analysis
Changes to documentation

Version 1.5 (October 2002)
Version 1.00 (March 2000)

Version 2.0 (February 2004)

A major update since version 1, version 2 is essentially version 1.5, but fully documented. The changes listed here were mostly available in version 1.5. Some were obvious, others were not.

New features

Support for generating RTF as WinHelp source files

AscToRTF can now create RTF files suitable for conversion into WinHelp help files. Although this type of help file has been superceded by HTML help, many people still prefer this type of Help file.

The Help files are created from specially formatted RTF files using the free Help Compiler Workshop (HCW) utility, available from Microsoft.

To support this the following new policies have been added

Generate WinHelp project file

WinHelp Resource File

Help file citation

Help file copyright notice

Help title background colour

Help body background colour

See also the section Creating WinHelp files

New ability to define table layouts and formatting rules

To aid in processing tables, the program now allows you to identify various table structures by specifying various match conditions. Each time the software encounters a candidate table, it tests this against the match conditions to see if the "table" is of a known type.

For each table you can specify its structure, and various formatting rules to be used in its conversion. These structure and formatting definitions can be shared between multiple table types for your convenience.

All the table type, structures and formatting rules should be placed in an external text file, known as a Table Definition File (or TDF for short). A new policy allows you to identify which Table Definition File is to be used, and you can select this from the new Config File Location menu.

For full details see Using Table Definition Files (TDF).

NOTE: This feature was originally added to AscToHTM, and at present most of the formatting rules apply more to HTML generation, and so aren't available in the RTF generation of AscToRTF.

Ability to "tag" your own tables for greater accuracy

The program now supports Tagged Table commands. These commands allow you to completely markup a table, specifying the column details, the row details and the contents of each table cell.

This approach can be used by those who want complete control over how their tables are constructed, or who are generating text files from a source which knows the table layout and can explicitly state it.

By using the tagged approach, you avoid the prospect of the program making mistakes when analysing the layout of the table.

As an example of using tagged table commands, the following sequence in the source file

        $_$_BEGIN_USER_TABLE C,1 in
        $_$_COLUMN_DETAILS 1,,,L, 2 in
        $_$_COLUMN_DETAILS 2,,,C, 1 ins
        $_$_TABLE_BORDER 1

        $_$_NEW_ROW HEAD
        $_$_NEW_CELL
        Substance (units)
        $_$_NEW_CELL
        Year
        Sampled

        $_$_NEW_ROW DATA
        $_$_NEW_CELL
        Alpha emitters (pCi/L)
        $_$_NEW_CELL
        1999

        $_$_NEW_ROW DATA
        $_$_NEW_CELL
        Asbestos (MFL)
        $_$_NEW_CELL
        1993
        $_$_END_TABLE

becomes

Substance (units)	Year Sampled
Alpha emitters (pCi/L)	1999
Asbestos (MFL)	1993

Input text manipulation and labelling using "Text commands"

The program now allows you to apply "text commands" to the input text, before it is converted. There are several commands possible, which allow you to identify lines in the input text that should be ignored, and text in the input file that should be removed or replaced.

You can also use commands to tell the software how to interpret certain types of line. For example to say which lines are headings, and which should be regarded as bullet points etc. The Text Commands to be used should be placed in an external Text Command File. A new policy allows you to identify which Text Command File is to be used, and you can select this from the new Config File Location menu.

For full details see Using Text Command Files

Control over document styling by using a "Style Definition File"

Several users of AscToRTF commented that they wanted to feed the output of various typesetting programs into AscToRTF as text input. These users stated that often they had detailed knowledge of how the text should be interpreted or the RTF should be formatted, but were either having to discard that information in the text passed to AscToRTF or were unable to provide AscToRTF with enough hints to ensure the conversion was always accurate.

To help such users, changes have been made in version 2.0 to allow better integration between the original document and the RTF created by AscToRTF.

From version 2.0 onwards AscToRTF has the ability to use an external Style definition file to define named font styles. These styles can then be invoked by adding the new Pre-processor command: FO tags to your document. These tags invoke a font change that applies to the following text. The new policy Scope for font tags determines the range within the document over which an FO tag will apply.

For full details see Using Style Definition Files (SDF)

Support for non-ASCII character types and character encodings

Non-latin and Unicode character sets

Some support has been added for non-latin character sets. The character set names are based on those used in HTML charsets, although RTF cannot support the same range that HTML does.

Support has been added for auto-detecting the character set used, but this is far from foolproof. If you are using non-latin character sets you may need to set the character set manually.

It is not possible at present to support multiple character sets in one document (unless you are using Unicode)

To support this feature the following policies have been added

the character encoding policy to allow the character encoding of a document to be set. The software has limited ability to detect Japanese ("x-sjis") and Cyrillic ("koi-8") text, but in some cases this will need to be set. Not all options available are supported in RTF at present.

The auto-detect of character sets can be switched off by using the Look for character encoding policy. You might want to do this if the software wrongly suspects your document is a non-latin character set.

other special characters

Added support for parsing files with some Mime-encoded quotable strings in them. The new policy Input file contains MIME encoding can be found under Analysis->File structure. At present there is some (very limited) auto-detect for this feature.

Added support for documents with change bars. By default change bars are stripped out, and the changed text coloured red this behaviour may be changed in later versions. Added the new policy Input file has change bars which can be found under Analysis->File Structure.

Added support for converting DOS characters. The new policy Input file contains DOS characters can be found under Analysis->File Structure.

There is a limited auto-detect of DOS characters when diagrams are present.

Added Input file contains PCL codes policy. Again there is a limited ability to detect these codes. A few of the PCL codes are interpreted. Most are just discarded.

Improved handling of VT escape characters. These are either removed from the output or converted to "line" characters

Support for comma-delimited and tab-delimited tables

Pre-processor commands have been added to allow you to mark up a section of comma-delimited or tab-delimited data you want turning into a table.

The new pre-processor directives are the COMMA_DELIMITED)TABLE command and the DELIMITED_TABLE command

New /COMMA and /TABBEDcommand line qualifiers that allow comma delimited and tab delimited files be converted into tables.
New /TABLE command line qualifier that allows the input file to be treated as a single plain text table

In addition to this, the software now has the ability to automatically detect tab-delimited data tables.

New policies

New Font policies

Added policies to allow different fonts to be applied to different types of text as follows

Normal text Default font

Headings Heading Font

Text in tables Table font

Table of contents Table of contents Font

Fixed-pitch text Fixed font

New Heading policies

There are two new heading types that can be supported :-

Added support for embedded headings with the Expect embedded headings policy (see Embedded heading detection). These are "headings" that are embedded as the first sentence in a paragraph.
Added support for headings that start with particular words or phrases via the Heading key phrases policy (see Key phrase headings for more on this).

Also added :-

Added the policy Check indentation for consistency so that it could be disabled in documents where headings were centred (and thus all at different indentations)
Added support for headings that span up to 3 lines, previously this was only 2.

New Hyperlink policies

Added the policy Create Gopher links to toggle the conversion of gopher links into hyperlinks.
Added the policy Create Telnet links to toggle the conversion of telnet links into hyperlinks.
Added Check domain name syntax policy to toggle the checking of domain name syntax in detected URLs. You can now switch this off to allow intranet links to be accepted.
Changed hyperlink detection to only allow explicit FTP URLs and email addresses that don't start with numbers. These behaviours can be reversed using the new policies Only allow explicit FTP links and Allow email beginning with numbers, both of which are on the Output->Hyperlinks tab.
Added support for new top level domains (.info, .biz etc)
Added support for the "snews://" secure news server protocol
URLs of the form http://username@domain_name/... are now supported
Added support for "obfuscated" URLs such as

http://3640005069/
http://7934972365/
http://0330.0366.0021.0315/
http://%6c%6f%63%6b%65%72%67%6e%6f%6d%65%2e%63%6f%6d/

Although the display text is left unchanged, the hyperlink will point to a non-obfuscated URL (either the domain name, or an IP address). This is because the obfuscated URLs such as there are often used by spammers, and the author has no intention of allowing his software to aid spammers in their goals.

If someone cares to give me a valid reason for using such URLs I may reconsider this behaviour.

New file, page, paragraph and line structure policies

File structure

Added Lines to ignore at start of file and Lines to ignore at end of file policies to allow lines at the start and end of the source file to be discarded. This can be useful if you source text is coming from a third party source that adds extra, unwanted, lines.

Added auto-detect of double spaced files (files where every second line is blank). This will set the Input file is double spaced policy whenever double-spaced text is detected (unless the policy has already been set).

Page structure

Added PAGE command. This marks a page boundary. In the RTF this creates a page marker

Added Mirror margins policy.

Page markers

Added Input file has page markers and Page marker size (in lines) policies. These allow you to identify that the file has page markers containing form feeds and that the first so many lines after the form feed should be discarded.

Paragraph structure

Added Preserve new paragraph offset policy. In documents where a first line offset is detected at the start of each paragraph you can elect to have this preserved in the output.

Added First line indentation (in blocks) policy. This allows you to specify the size of a first line offset, regardless of whether or not one already exists in the file.

Line structure

Added options to allow more control over how the original document's file structure should be preserved

Added Treat each line as a paragraph policy. If this option is selected, every line in the source file is treated as a paragraph. This may be suitable if the file has been authored using an editor that wraps the lines (i.e. doesn't put in hard breaks) and which doesn't add blank lines between paragraphs.

Added Preserve line structure policy. If this option is selected a line break is added to every line, thereby preserving the line structure of the original.

New Table policies

Added Default TABLE layout policy (also the pre-processor TABLE_LAYOUT command) This allows you to specify the number of columns in each table, and the attributes of each column, specifically the character position that marks the end of each column. Rather than use this policy, it is probably better to use the related pre-processor TABLE_LAYOUT command in the source text on a per-table basis.

Added Ignore table header during analysis policy (also pre-processor TABLE_IGNORE_HEADER command) Specifies that table headers should be ignored when columns are being auto-detected. Some tables have complex headers that confuse the analysis. This policy can be used to help them be ignored.

Added Table extending factor policy. This controls the degree to which pre-formatted lines should be expanded into adjacent text.

Added Column merging factor policy which controls the degree to which columns which don't appear to be very clear should be "merged" together

Added Could be blank line separated policy Indicates that tables could be using blank lines to separate rows of data. This affects the analysis and detection of the tables extent.

Added "Column boundaries have zero width" policy for tables that have no separator character between columns. Can be useful for some tables generated by software

Added Look for diagrams policy. Can be used to stop complex tables being wrongly interpreted as "diagrams"

Added Default TABLE cell alignment and Default TABLE alignment policies to allow you to set the default alignments of data within table cells and of tables on the page.

Added Ignore table header during analysis policy. For tables with complex headers you can elect to ignore these lines in the calculation of the column structure of the table. This can lead to more accurate results.

Other new policies

Added Allow automatic centring and Automatic centring tolerance policies. These allow you to look for text that is centred and to specify a tolerance used in this detection.

Added Look for underlined text policy. This allows text detected as underlined (other than headings) to be underlined in the output.

New programs

API version now available

As with all JafSoft converters, AscToRTF is available under separate license as an Application Programming Interface (API). This API allows software developers to harness the powerful abilities of AscToRTF from within their own software products.

The API is written in C++, and is supplied as either a library or a DLL under Windows. As such it can easily be invoked from C, C++ and Visual Basic software and has also been successfully invoked from inside Java and C# programs.

New utility A2HDETAG

For users who register, there is a new, separate command line utility called A2HDETAG available so they can "de-tag" their source files of all AscToRTF pre-processor tags, leaving a plain text fit for publishing, e.g. on Usenet.

In conjunction with this new BEGIN/END_ASCII ... END_ASCII pre-processor tags have been added. These identify text that will be copied to the output of A2HDETAG. It is ignored in all other conversions, and is intended to allow alternative text to be placed in text and HTML versions of a document.

Other changes

New Preprocessor tags

Added several new pre-processor in-line tags :-

FILENAME	outputs name of file being converted
FRACTION	outputs a fraction
VERSION	outputs AscToRTF program name and version number
IGNORE	multi-line text to be ignored
IGNORE_THIS	in-line text to be ignored

Other new options

Added the "Suppress URL messages" option to the Diagnostic settings. When enabled all URLs, email addresses etc will be listed in the log file. Since this file can be saved to disk, this is one way of identifying all the candidate hyperlinks from your text file.
The new ALLOW and DISALLOW tags allow you to enable/disable the search for headings and lists to sections of the document. This helps with eliminating faulty analysis that confuses numbered lists with headings, or lines of text ALL IN CAPITALS as headings.

New document, the "Table Manual"

To help people better understand how AscToRTF detects and analyses tables, and to know what they can do to aid, improve and correct this process, a new manual, known as the "Table Manual" has been produced. You should look for this on the web site, or check if it has been included with your software installation.

Changes to the Windows version

The main screen now allows access to Policy file selection. Previously this was only available on the menu structure. The Menu structure has been left unchanged, meaning you now have two ways of choosing your policy files.

The main screen now allows you to search sub folders when using wildcards.

The main screen also allows you to specify the File conversion type. You can choose to treat the input file as a number of different table types (e.g. tab-delimited data).

You no longer get prompted to "save policy" just because you pressed OK on one of the policy sheets. Now this only happens when something has been changed.

The main menu now has a "check for updates" option. If you select this you'll be taken to the JafSoft website where you'll be told if any newer versions of the software have been released.

Program now remembers positions of windows from one invocation to the next.

The user interface is now available in Italian, French and Swedish.

Changes to the command line version

Command line now allows multiple filespecs, separated by spaces. Policy file must now be a .pol file, rather than the second argument.

Changes to document analysis

More changes on bullet characters, in particular to disallow 'O' (upper case) from becoming a bullet character through analysis. This really doesn't work in Portuguese documents :-) 'o' (lower case) may still be detected. If upper case 'O' is wanted this can still be manually switched on.

Horizontal lines are now implemented as line rules whose length attempts to approximate the original (e.g. 50% or whatever). Previously lines would become full width.

Bookmark names from filename are now lower case (to reduce possible mismatches)

Shareware version now expires after 30 days + 5 uses. This will allow people to use the software on 5 different days after the first 30 days, giving people more time to evaluate the software at their leisure.

Now strip out leading and trailing "---" from heading text to make them more presentable in RTF

Changed emphasis handling to allow hyphenated parts to be emphasised independently, e.g. pre-formatted or pre-formatted.

Fine-tuned the detection of whether or not a file has an in-situ contents list

The "LINKPOINT" pre-processor tag can now be used as a directive as well as an in-line tag. (see the Tag manual for details).

Increased maximum width allowed for input lines in tables to 200 (after encountering a sample at 165). Lines longer than this are still disregarded as candidate table lines.

Improved analysis for tables using bar ('|') column separators

Improved detection of ASCII art diagrams.

Improved handling of heavily indented blocks of text. Previously these were (poorly) rendered as tables. Now the tables more accurately preserve the large indentation (see Text block detection).

The software will now automatically detect where a table is in fact tab-delimited data. Where detected it will then and use that tab structure to calculate columns.

Changes to documentation

This document has been completely re-written. It is converted from a single text file into the HTML pages, an RTF file and the Windows Help file using the AscToHTM and AscToRTF programs. You can view the source file for this document as file "asctortf.txt".

The Tag manual describes the tagging systems available to JafSoft conversion utilities. Note that not all of the tags described there are relevant (or supported) in RTF generation. However many are common between the converters, should you wish to convert the same text file into other formats

A "Table manual" is under production to explain how to get the most from tables in your conversions. This is expected to appear some time after AscToRTF 2.0 is released.

Version 1.5 (October 2002)

Released as an "interim" release before version 2, version 1.5 contains a large number of changes, not all of which were fully documented (that will be a large part of the difference between the two versions).

The software was released at this time to give existing an new users a far better version to work with and evaluate.

Version 1.00 (March 2000)

The initial version is released after months and months (some might say years) of promising it.

Back to Contents List

Normal text	Default font
Headings	Heading Font
Text in tables	Table font
Table of contents	Table of contents Font
Fixed-pitch text	Fixed font