Documentation for the AscToRTF conversion utility |
The latest version of these files is available online at http://www.jafsoft.com/doco/docindex.html
Contents of this section
Version 2.0 (February 2004)
New featuresVersion 1.5 (October 2002)
Support for generating RTF as WinHelp source filesNew policies
New ability to define table layouts and formatting rules
Ability to "tag" your own tables for greater accuracy
Input text manipulation and labelling using "Text commands"
Control over document styling by using a "Style Definition File"
Support for non-ASCII character types and character encodings
Support for comma-delimited and tab-delimited tables
New Font policiesNew programs
New Heading policies
New Hyperlink policies
New file, page, paragraph and line structure policies
New Table policies
Other new policies
API version now availableOther changes
New utility A2HDETAG
New Preprocessor tags
Other new options
New document, the "Table Manual"
Changes to the Windows version
Changes to the command line version
Changes to document analysis
Changes to documentation
Version 1.00 (March 2000)
A major update since version 1, version 2 is essentially version 1.5, but fully documented. The changes listed here were mostly available in version 1.5. Some were obvious, others were not.
AscToRTF can now create RTF files suitable for conversion into WinHelp help files. Although this type of help file has been superceded by HTML help, many people still prefer this type of Help file.
The Help files are created from specially formatted RTF files using the free Help Compiler Workshop (HCW) utility, available from Microsoft.
To support this the following new policies have been added
See also the section Creating WinHelp files
To aid in processing tables, the program now allows you to identify various table structures by specifying various match conditions. Each time the software encounters a candidate table, it tests this against the match conditions to see if the "table" is of a known type.
For each table you can specify its structure, and various formatting rules to be used in its conversion. These structure and formatting definitions can be shared between multiple table types for your convenience.
All the table type, structures and formatting rules should be placed in an external text file, known as a Table Definition File (or TDF for short). A new policy allows you to identify which Table Definition File is to be used, and you can select this from the new Config File Location menu.
For full details see Using Table Definition Files (TDF).
The program now supports Tagged Table commands. These commands allow you to completely markup a table, specifying the column details, the row details and the contents of each table cell.
This approach can be used by those who want complete control over how their tables are constructed, or who are generating text files from a source which knows the table layout and can explicitly state it.
By using the tagged approach, you avoid the prospect of the program making mistakes when analysing the layout of the table.
As an example of using tagged table commands, the following sequence in the source file
$_$_BEGIN_USER_TABLE C,1 in $_$_COLUMN_DETAILS 1,,,L, 2 in $_$_COLUMN_DETAILS 2,,,C, 1 ins $_$_TABLE_BORDER 1 $_$_NEW_ROW HEAD $_$_NEW_CELL Substance (units) $_$_NEW_CELL Year Sampled $_$_NEW_ROW DATA $_$_NEW_CELL Alpha emitters (pCi/L) $_$_NEW_CELL 1999 $_$_NEW_ROW DATA $_$_NEW_CELL Asbestos (MFL) $_$_NEW_CELL 1993 $_$_END_TABLE
becomes
Substance (units) |
Year Sampled |
---|---|
Alpha emitters (pCi/L) |
1999 |
Asbestos (MFL) |
1993 |
The program now allows you to apply "text commands" to the input text, before it is converted. There are several commands possible, which allow you to identify lines in the input text that should be ignored, and text in the input file that should be removed or replaced.
You can also use commands to tell the software how to interpret certain types of line. For example to say which lines are headings, and which should be regarded as bullet points etc. The Text Commands to be used should be placed in an external Text Command File. A new policy allows you to identify which Text Command File is to be used, and you can select this from the new Config File Location menu.
For full details see Using Text Command Files
Several users of AscToRTF commented that they wanted to feed the output of various typesetting programs into AscToRTF as text input. These users stated that often they had detailed knowledge of how the text should be interpreted or the RTF should be formatted, but were either having to discard that information in the text passed to AscToRTF or were unable to provide AscToRTF with enough hints to ensure the conversion was always accurate.
To help such users, changes have been made in version 2.0 to allow better integration between the original document and the RTF created by AscToRTF.
From version 2.0 onwards AscToRTF has the ability to use an external Style definition file to define named font styles. These styles can then be invoked by adding the new Pre-processor command: FO tags to your document. These tags invoke a font change that applies to the following text. The new policy Scope for font tags determines the range within the document over which an FO tag will apply.
For full details see Using Style Definition Files (SDF)
Non-latin and Unicode character sets
Some support has been added for non-latin character sets. The character set names are based on those used in HTML charsets, although RTF cannot support the same range that HTML does.
Support has been added for auto-detecting the character set used, but this is far from foolproof. If you are using non-latin character sets you may need to set the character set manually.
It is not possible at present to support multiple character sets in one document (unless you are using Unicode)
To support this feature the following policies have been added
- the character encoding policy to allow the character encoding of a document to be set. The software has limited ability to detect Japanese ("x-sjis") and Cyrillic ("koi-8") text, but in some cases this will need to be set. Not all options available are supported in RTF at present.
- The auto-detect of character sets can be switched off by using the Look for character encoding policy. You might want to do this if the software wrongly suspects your document is a non-latin character set.
other special characters
- Added support for parsing files with some Mime-encoded quotable strings in them. The new policy Input file contains MIME encoding can be found under Analysis->File structure. At present there is some (very limited) auto-detect for this feature.
- Added support for documents with change bars. By default change bars are stripped out, and the changed text coloured red this behaviour may be changed in later versions. Added the new policy Input file has change bars which can be found under Analysis->File Structure.
- Added support for converting DOS characters. The new policy Input file contains DOS characters can be found under Analysis->File Structure.
There is a limited auto-detect of DOS characters when diagrams are present.
- Added Input file contains PCL codes policy. Again there is a limited ability to detect these codes. A few of the PCL codes are interpreted. Most are just discarded.
- Improved handling of VT escape characters. These are either removed from the output or converted to "line" characters
Pre-processor commands have been added to allow you to mark up a section of comma-delimited or tab-delimited data you want turning into a table.
The new pre-processor directives are the COMMA_DELIMITED)TABLE command and the DELIMITED_TABLE command
In addition to this, the software now has the ability to automatically detect tab-delimited data tables.
Added policies to allow different fonts to be applied to different types of text as follows
Normal text Default font Headings Heading Font Text in tables Table font Table of contents Table of contents Font Fixed-pitch text Fixed font
There are two new heading types that can be supported :-
Also added :-
http://3640005069/
http://7934972365/
http://0330.0366.0021.0315/
http://%6c%6f%63%6b%65%72%67%6e%6f%6d%65%2e%63%6f%6d/
Although the display text is left unchanged, the hyperlink will point to a non-obfuscated URL (either the domain name, or an IP address). This is because the obfuscated URLs such as there are often used by spammers, and the author has no intention of allowing his software to aid spammers in their goals.
If someone cares to give me a valid reason for using such URLs I may reconsider this behaviour.
File structure
- Added Lines to ignore at start of file and Lines to ignore at end of file policies to allow lines at the start and end of the source file to be discarded. This can be useful if you source text is coming from a third party source that adds extra, unwanted, lines.
- Added auto-detect of double spaced files (files where every second line is blank). This will set the Input file is double spaced policy whenever double-spaced text is detected (unless the policy has already been set).
Page structure
- Added PAGE command. This marks a page boundary. In the RTF this creates a page marker
- Added Mirror margins policy.
Page markers
- Added Input file has page markers and Page marker size (in lines) policies. These allow you to identify that the file has page markers containing form feeds and that the first so many lines after the form feed should be discarded.
Paragraph structure
- Added Preserve new paragraph offset policy. In documents where a first line offset is detected at the start of each paragraph you can elect to have this preserved in the output.
- Added First line indentation (in blocks) policy. This allows you to specify the size of a first line offset, regardless of whether or not one already exists in the file.
Line structure
Added options to allow more control over how the original document's file structure should be preserved
- Added Treat each line as a paragraph policy. If this option is selected, every line in the source file is treated as a paragraph. This may be suitable if the file has been authored using an editor that wraps the lines (i.e. doesn't put in hard breaks) and which doesn't add blank lines between paragraphs.
- Added Preserve line structure policy. If this option is selected a line break is added to every line, thereby preserving the line structure of the original.
- Added Default TABLE layout policy (also the pre-processor TABLE_LAYOUT command) This allows you to specify the number of columns in each table, and the attributes of each column, specifically the character position that marks the end of each column. Rather than use this policy, it is probably better to use the related pre-processor TABLE_LAYOUT command in the source text on a per-table basis.
- Added Ignore table header during analysis policy (also pre-processor TABLE_IGNORE_HEADER command) Specifies that table headers should be ignored when columns are being auto-detected. Some tables have complex headers that confuse the analysis. This policy can be used to help them be ignored.
- Added Table extending factor policy. This controls the degree to which pre-formatted lines should be expanded into adjacent text.
- Added Column merging factor policy which controls the degree to which columns which don't appear to be very clear should be "merged" together
- Added Could be blank line separated policy Indicates that tables could be using blank lines to separate rows of data. This affects the analysis and detection of the tables extent.
- Added "Column boundaries have zero width" policy for tables that have no separator character between columns. Can be useful for some tables generated by software
- Added Look for diagrams policy. Can be used to stop complex tables being wrongly interpreted as "diagrams"
- Added Default TABLE cell alignment and Default TABLE alignment policies to allow you to set the default alignments of data within table cells and of tables on the page.
- Added Ignore table header during analysis policy. For tables with complex headers you can elect to ignore these lines in the calculation of the column structure of the table. This can lead to more accurate results.
- Added Allow automatic centring and Automatic centring tolerance policies. These allow you to look for text that is centred and to specify a tolerance used in this detection.
- Added Look for underlined text policy. This allows text detected as underlined (other than headings) to be underlined in the output.
As with all JafSoft converters, AscToRTF is available under separate license as an Application Programming Interface (API). This API allows software developers to harness the powerful abilities of AscToRTF from within their own software products.
The API is written in C++, and is supplied as either a library or a DLL under Windows. As such it can easily be invoked from C, C++ and Visual Basic software and has also been successfully invoked from inside Java and C# programs.
For users who register, there is a new, separate command line utility called A2HDETAG available so they can "de-tag" their source files of all AscToRTF pre-processor tags, leaving a plain text fit for publishing, e.g. on Usenet.
In conjunction with this new BEGIN/END_ASCII ... END_ASCII pre-processor tags have been added. These identify text that will be copied to the output of A2HDETAG. It is ignored in all other conversions, and is intended to allow alternative text to be placed in text and HTML versions of a document.
Added several new pre-processor in-line tags :-
FILENAME | outputs name of file being converted |
FRACTION | outputs a fraction |
VERSION | outputs AscToRTF program name and version number |
IGNORE | multi-line text to be ignored |
IGNORE_THIS | in-line text to be ignored |
To help people better understand how AscToRTF detects and analyses tables, and to know what they can do to aid, improve and correct this process, a new manual, known as the "Table Manual" has been produced. You should look for this on the web site, or check if it has been included with your software installation.
- The main screen now allows access to Policy file selection. Previously this was only available on the menu structure. The Menu structure has been left unchanged, meaning you now have two ways of choosing your policy files.
- The main screen now allows you to search sub folders when using wildcards.
- The main screen also allows you to specify the File conversion type. You can choose to treat the input file as a number of different table types (e.g. tab-delimited data).
- You no longer get prompted to "save policy" just because you pressed OK on one of the policy sheets. Now this only happens when something has been changed.
- The main menu now has a "check for updates" option. If you select this you'll be taken to the JafSoft website where you'll be told if any newer versions of the software have been released.
- Program now remembers positions of windows from one invocation to the next.
- The user interface is now available in Italian, French and Swedish.
- Command line now allows multiple filespecs, separated by spaces. Policy file must now be a .pol file, rather than the second argument.
- More changes on bullet characters, in particular to disallow 'O' (upper case) from becoming a bullet character through analysis. This really doesn't work in Portuguese documents :-) 'o' (lower case) may still be detected. If upper case 'O' is wanted this can still be manually switched on.
- Horizontal lines are now implemented as line rules whose length attempts to approximate the original (e.g. 50% or whatever). Previously lines would become full width.
- Bookmark names from filename are now lower case (to reduce possible mismatches)
- Shareware version now expires after 30 days + 5 uses. This will allow people to use the software on 5 different days after the first 30 days, giving people more time to evaluate the software at their leisure.
- Now strip out leading and trailing "---" from heading text to make them more presentable in RTF
- Changed emphasis handling to allow hyphenated parts to be emphasised independently, e.g. pre-formatted or pre-formatted.
- Fine-tuned the detection of whether or not a file has an in-situ contents list
- The "LINKPOINT" pre-processor tag can now be used as a directive as well as an in-line tag. (see the Tag manual for details).
- Increased maximum width allowed for input lines in tables to 200 (after encountering a sample at 165). Lines longer than this are still disregarded as candidate table lines.
- Improved analysis for tables using bar ('|') column separators
- Improved detection of ASCII art diagrams.
- Improved handling of heavily indented blocks of text. Previously these were (poorly) rendered as tables. Now the tables more accurately preserve the large indentation (see Text block detection).
- The software will now automatically detect where a table is in fact tab-delimited data. Where detected it will then and use that tab structure to calculate columns.
- This document has been completely re-written. It is converted from a single text file into the HTML pages, an RTF file and the Windows Help file using the AscToHTM and AscToRTF programs. You can view the source file for this document as file "asctortf.txt".
- The Tag manual describes the tagging systems available to JafSoft conversion utilities. Note that not all of the tags described there are relevant (or supported) in RTF generation. However many are common between the converters, should you wish to convert the same text file into other formats
- A "Table manual" is under production to explain how to get the most from tables in your conversions. This is expected to appear some time after AscToRTF 2.0 is released.
Released as an "interim" release before version 2, version 1.5 contains a large number of changes, not all of which were fully documented (that will be a large part of the difference between the two versions).
The software was released at this time to give existing an new users a far better version to work with and evaluate.
The initial version is released after months and months (some might say years) of promising it.
Converted from
a single text file by
AscToHTM © 1997-2004 John A Fotheringham |