Documentation for the AscToRTF conversion utility |
The latest version of these files is available online at http://www.jafsoft.com/doco/docindex.html
The pre-processor allows authors to add special lines to the source document to customise the conversion. This is usually used where someone intends regularly generating RTF from a master text document.
The Pre-processor is described more fully in the separate document the Tag manual.
The pre-processor works by giving the software hints and instructions on how to process the text. During the analysis process the software reads the source files line-by-line. The pre-processor recognises special keywords in two ways
In both cases the tag or directive cannot be split over multiple lines, that is directives must be on a line by themselves, and in-line tags must be wholly contained on a single line.
Contents of this section
Pre-processor DirectivesDocument commands
Pre-processor In-line tags
Pre-processor command: DESCRIPTIONSection delimiters
Pre-processor command: KEYWORDS
Pre-processor command: TITLE
Pre-processor command: ALLOW and DISALLOWTagged Table commands
Pre-processor command: ASCII
Pre-processor command: CONTENTS
Pre-processor command: CODE
Pre-processor command: COMMA_DELIMITED_TABLE
Pre-processor command: DELIMITED_TABLE
Pre-processor command: DIAGRAM
Pre-processor command: IGNORE
Pre-processor command: PRE
Pre-processor command: TABLE
Pre-processor command: SECTION
Tagged table command: BEGIN_USER_TABLETable modifier commands
Tagged table command: COLUMN_DETAILS
Tagged table command: NEW_ROW
Tagged table command: NEW_CELL
Tagged table: Cell contents
Pre-processor command: TABLE_HEADER_ROWSOther commands
Pre-processor command: TABLE_IGNORE_HEADER
Pre-processor command: TABLE_LAYOUT
Pre-processor command: TABLE_MAY_BE_SPARSE
Pre-processor command: TABLE_MIN_COLUMN_SEPARATION
Pre-processor command: BR
Pre-processor command: CHANGE_POLICY
Pre-processor command: FILENAME
Pre-processor command: FO
Pre-processor command: FRACTION
Pre-processor command: GOTO
Pre-processor command: POPUP
Pre-processor command: SUPER and SUB
Pre-processor command: IGNORE_THIS
Pre-processor command: INCLUDE
Pre-processor command: PAGE
Pre-processor command: VERSION
"Directives" consist of a single line in the source file beginning with the string "$_$_" followed by a recognised keyword and any additional "attributes" that the directive supports.
In-line tags, as the name implies, can occur anywhere in the source lines. They are enclosed between the special strings "[[" and "]]". Between these strings the tag consists of a keyword and then any attributes that tag supports.
Useful in-line tags include
These commands are used to control tags placed in the document information section of the created RTF page(s).
Not yet implemented in this release.
DESCRIPTION | Add a description in the document properties |
KEYWORDS | Add keywords to the document properties |
TITLE | Add a Title to the RTF pages |
You can specify a description of your page to be added as a to the document information portion of your page by adding a line of the form
$_$_DESCRIPTION <rest of line is used as a description>
This takes precedence over any description added via a policy file.
Not yet implemented
You can specify keywords that describe the contents of your page to be added to the document information section of your file by adding a line of the form
$_$_KEYWORDS <rest of line is used as a list of keywords>
This takes precedence over any keywords added via a policy file.
Not yet implemented
You can specify the TITLE to be added to your RTF page in the document information section, by adding a line of the form
$_$_TITLE <rest of line is used as a title>
This title takes precedence over any title added via a policy file.
Not yet implemented
These commands mark the start and end of various sections in your document
ALLOW and DISALLOW | Enable and disable certain types of detection. |
ASCII | Mark a section of text for utility A2HDETAG. |
CONTENTS | Mark a section as the contents list. |
CODE | Mark a section as C-like code sample |
DIAGRAM | Mark a section as a diagram or ASCII Art |
IGNORE | Ignore a section of the document |
PRE | Mark a section as pre-formatted text. |
TABLE | Mark a section as a table. |
COMMA_DELIMITED_TABLE | Mark a section as comma-delimited data table. |
DELIMITED_TABLE | Mark a section as a tab-delimited data table. |
SECTION | Mark the start of a user-specified section. |
New in version 2.0
AscToRTF will automatically try to detect various typographical features. You
can turn this behaviour on and off in different sections by using the ALLOW and
DISALLOW. This can be used, for example, to prevent a numbered list being
wrongly detected as a numbered heading and vice versa.
The syntax for both commands is the same, namely
ALLOW/DISALLOW <comma-separated list of keywords>
Where the recognised keywords are as follows
- Headings
- This enables/disable the search for lines that could be treated as headings.
- Lists
- This enables/disables the search for lines that could be regarded as list items (either unordered bullets, or alphabetic or numeric list points)
All | Set (enable) all of the above | ||
Reset | Reset (disable) all of the above |
In each case the tag will simply add or subtract from the current list of allowable features. To aid control, two special keywords "all" and "reset" are available for inclusion in the list. "Reset" will disable all options, thus
$_$_ALLOW reset, Headings
will have the effect of disabling everything (the "reset") and then adding "Headings" to the allowed list. In this respect "ALLOW all" and "DISALLOW reset" are identical commands.
Below is an example in which the DISALLOW tag is used to prevent numbered lines being regarded as lists or headings. The ALLOW tag at the end switched back to default behaviour,, so if there are any lists of numbered headings elsewhere in the document they will still be detected.
$_$_DISALLOW headings ... 1. Whatever this line is, it isn't a heading ... $_$_DISALLOW headings,lists ... 2. Whatever this line is, it isn't a heading or a list item ... $_$_ALLOW reset
New in version 2.0
As of version 2.0, the separate utility A2HDETAG is available to create a plain ASCII file, by removing all pre-processor tags from your source file. In this was a source file designed for conversion to RTF by AscToRTF can be "cleaned up" and posted as plain text elsewhere.
To support this, the BEGIN_ASCII and END_ASCII tags can be used to delimit a section of text that will only appear in the version created by A2HDETAG. This allows you to add comments to the "plain text" version, that won't appear in the RTF conversion. Use these commands as follows
$_$_BEGIN_ASCII You are reading this text in a "cleaned up" version of the source file. This text won't get copied across when this file is converted to either RTF or HTML $_$_END_ASCII
You can mark up a section of your document as a contents list. To do this use matching BEGIN_CONTENTS and END_CONTENTS command as follows:
$_$_BEGIN_CONTENTS ... $_$_END_CONTENTS
AscToRTF will then attempt to treat the enclosed text as a contents list.
See comments on contents policies
You can mark up a section of your document as being a piece of sample C-like code. To do this use matching BEGIN_CODE and END_CODE command as follows:
$_$_BEGIN_CODE ... $_$_END_CODE
AscToRTF will then mark up the enclosed text in fixed width fonts.
See comments on pre-formatted text
New in version 2.0
These commands delimit a table of comma-delimited data
Syntax:
$_$_BEGIN_COMMA_DELIMITED_TABLE ... (lines of comma-delimited data) ... $_$_END_COMMA_DELIMITED_TABLE
The BEGIN_COMMA_DELIMITED_TABLE ... END_COMMA_DELIMITED_TABLE directives can be used to delimit a series of comma-delimited data values that should be interpreted as a table (e.g. data originally exported from a spreadsheet such as Excel)
See comments in Pre-processor command: TABLE
These directives delimit a table of delimited data
Syntax:
$_$_BEGIN_DELIMITED_TABLE [<delimiter>] ... (lines of delimited data) ... $_$_END_DELIMITED_TABLE
where
<delimiter> The delimiter character to use. If omitted
the default is tab-delimited. The delimiter
can be any character except a comma. For
comma-delimited tables use the
COMMA_DELIMITED_TABLE Command instead
The BEGIN_DELIMITED_TABLE ... END_DELIMITED_TABLE directives can be used to delimit a series of delimited data values that should be interpreted as a table (e.g. data originally exported from a spreadsheet such as Excel)
See comments in Pre-processor command: TABLE
You can mark up a section of your document as being a diagram or a piece of ASCII art. To do this use matching BEGIN_DIAGRAM and END_DIAGRAM commands as follows:
$_$_BEGIN_DIAGRAM ... $_$_END_DIAGRAM
AscToRTF will then mark up the enclosed text in fixed width fonts.
See comments on pre-formatted text
You can mark up a section in your document that you want ignored in the output. This can be used to store change history information or whatever you want.
Syntax:
$_$_BEGIN_IGNORE ... (text to be ignored) ... $_$_END_IGNORE
This markup can be used to delimit a section to be wholly ignored. Any markup and tags in the ignored section will have no effect.
You can mark up a section of your document as being pre-formatted text. To do this use matching BEGIN_PRE and END_PRE commands as follows:
$_$_BEGIN_PRE ... $_$_END_PRE
AscToRTF will then mark up the enclosed text in fixed width fonts.
See comments on pre-formatted text
You can mark up a section of your document as being a text table. To do this use matching BEGIN_TABLE and END_TABLE Commands as follows:
$_$_BEGIN_TABLE ... $_$_END_TABLE
AscToRTF will then analyse the enclosed text to determine the table layout and will generate a proper RTF table.
General comments on marking up tables
AscToRTF has some ability to auto-detect tables (see comments on pre-formatted text), but this can be error prone. Marking up tables removes a lot of the ambiguity and so can give better results*
For tables of delimited data (as opposed to plain text tables) you should use the DELIMITED_TABLE and COMMA_DELIMITED_TABLE commands.
Note in each case the presence of these directives overrides any value set in the Attempt table generation policy, as that only refers to the auto-detection of tables. Placing markup in the source forces the text to be treated as tables.
Within each marked-up table other pre-processor commands may be used to customise the table as follows:
For a full list see Table modifier commands
You can mark up sections of your document as being named sections. By default text belongs to a section called "all".
To do so insert SECTION command at the start of each section as follows:
$_$_SECTION <name> ...
All following text will be marked as belonging to the named section until another SECTION command is encountered. AscToRTF will only copy across those sections named in the allowable sections policy, and any text in "all" sections. In this way you can generate variants of your document for different audiences (e.g. Internet and Intranet).
If you want the rest of your document to be included in all conversions, insert an "all" SECTION command as follows:
$_$_SECTION all ...
New in version 2.0
In addition to converting plain text files, and sets of delimited data into tables, the software also supports a method of explicitly tagging the input as to how it should be placed in a table.
This may seem extreme, as the point of the converters is to generate the desired markup as save work, but there are a couple of situations in which this approach can be useful.
- If you are converting the source to both RTF and HTML, the same input file can be used to generate the desired output format
- If the file you are converting is being generated by some software it may be easy to add the required tags, and by so doing get the required output with minimal changes and maximum accuracy.
Here's a sample of a user-tagged table (with blank lines added for clarity) :-
$_$_BEGIN_USER_TABLE C,1 in $_$_COLUMN_DETAILS 1,,,L, 2 in $_$_COLUMN_DETAILS 2,,,C, 1 ins $_$_TABLE_BORDER 1 $_$_NEW_ROW HEAD $_$_NEW_CELL Substance (units) $_$_NEW_CELL Year Sampled $_$_NEW_ROW DATA $_$_NEW_CELL Alpha emitters (pCi/L) $_$_NEW_CELL 1999 $_$_NEW_ROW DATA $_$_NEW_CELL Asbestos (MFL) $_$_NEW_CELL 1993 $_$_END_TABLE
Here's how this table appears when converted into the current format
Substance (units) |
Year Sampled |
---|---|
Alpha emitters (pCi/L) |
1999 |
Asbestos (MFL) |
1993 |
See also
To identify a section of a source file as a user table, it must be enclosed in the BEGIN_USER_TABLE ... END_TABLE commands as follows
$_$_BEGIN_USER_TABLE <arguments> ... <other commands to layout the table> ... $_$_END_TABLE
The command line can take arguments as follows
$_$_BEGIN_USER_TABLE <alignment>,<margin>
where
<align> The alignment of the table. This can be
L(eft), R(ight) or C(enter)<margin> The margin to be applied to the table. This
consists of a number and a unit. Recognised
units include points ("pts" or "pt"), inches
("ins" or "in") and centimetres ("cm"). In HTML
generation these margins will be approximate only
After the BEGIN_USER_TABLE line will appear a number of COLUMN_DETAILS lines. These are optional, but if present they give details of the characteristics of each column in the table as follows :-
$_$_COLUMN_DETAILS <col_no>,<align>,<width>
where
<col_no> This is the column number, starting at 1 <align> This is the alignment of data in this column.
If omitted this will be auto-detected, but you can
choose to set it to L(eft), R(ight) or C(enter)<width> The width of the column. If omitted the width will
be calculated. As with the <margin> on the table
the width can be specified in points, inches or
centimetres. If a width is set too narrow, it may
be ignored.
Each new row is identifies by the present of a NEW_ROW command on a line by itself. The format is
$_$_NEW_ROW <row_type>
where
<row_type> This is the row type. Options include HEAD This is a header row
DATA This is a data row
LINE This is a line in the tableThe type may be omitted, in which case the default
is "DATA"
except when the NEW_ROW is a "LINE", this command should be followed by a series of NEW_CELL commands and their matching cell data - normally one per column.
Except for "LINE" rows, each new cell in a row identifies by the present of a NEW_CELL command on a line by itself. The contents of the cell follow on subsequent lines until either another NEW_CELL, NEW_ROW or END_TABLE command is encountered.
The format of the NEW_CELL command is
$_$_NEW_CELL
At present the NEW_CELL command doesn't take any arguments.
Anything following a NEW_CELL command up until the next NEW_CELL, NEW_ROW or END_TABLE commands will be added into the current cell. The line structure will be preserved, so that if you have three lines of text following a NEW_CELL command, this will appear as a cell in the table with three lines of data in it.
The alignment of the cell will normally be that of the column the cell is in. This will either have been calculated automatically for the column as a whole, or will be value passed in via the matching COLUMN_DETAILS line, earlier in the table definition.
These commands can be used to tailor the appearance of a table. They're usually placed between the BEGIN_TABLE ... END_TABLE for the table they will affect, but they can also be placed at the top of the document to define defaults for all tables.
This specifies how many rows in the table should be regarded as the table header.
New in version 2.0
This directive specifies that a table header should be ignored during
the column analysis
Syntax:
$_$_TABLE_IGNORE_HEADER
This tag has no attributes.
If present, indicates that the first few lines of the table - assumed to be the header - should be ignored when calculating the table's column structure.
This should be enabled if the table has a particularly complex header that may confuse the program.
This command has the same effect as the policy Ignore table header when analysing columns, but can be applied on a table-by-table basis when enclosed between TABLE command markers.
New in version 2.0
This directive allows you to specify the column structure of
a table
Syntax:
$_$_TABLE_LAYOUT <number of columns>,"<col 1 spec>","<col 2>",.....
where,
<Number_of_cols> Integer number of columns <col_n_spec> Specification of the nth column. The
specification must be contained in quote.Currently the specification consists of
just the end position of the column.More may be added in later versions
An example would be
$_$_TABLE_LAYOUT 3,"6","21","32"
which describes a 3-column table with column boundaries at the 6th, 21st and 32nd character positions.
Normally this directive should be placed between the BEGIN_TABLE...END_TABLE directives for the table it applies to, thereby overriding the "intelligent" analysis the program would otherwise attempt for a plain text table.
This specifies that the table may be sparse, i.e. largely empty in places. There is no data value required on this command.
See also expect sparse tables policy
This specifies the minimum number of spaces to be regarded as a column separator. The default value is 1, but occasionally this gives too many columns, especially in short tables. Increasing this value will reduce the number of columns calculated.
BR | Insert a line break |
CHANGE_POLICY | Dynamically vary policies through the input file |
FILENAME | Output the original filename |
FO | Change the prevailing font |
FRACTION | Output a fraction |
GOTO | Add a hyperlink to a section title |
IGNORE_THIS | Ignore some text in the source |
INCLUDE | Include an external file into the source |
PAGE | Create a page boundary at this location |
POPUP | Add a hyperlink to a section title |
SUPER and SUB | Add superscripts and subscripts |
VERSION | Output the program version used in this conversion |
This command tells the software to output a line break at this point. Usually the default is to let all lines flow together to form a paragraph. This commands can be used (e.g. in address lines to make sure lines are correctly placed on new lines).
This option allows you to embed policy lines in the source document. This can be used to avoid the need for separate policy files, or to change the policy at different locations within the document (although the effects can sometimes be unpredictable).
The syntax is
$_$_CHANGE_POLICY <policy line as in policy file>
For example placing
$_$_CHANGE_POLICY Convert mailto links : yes
would make all subsequent email addresses be converted into working hyperlinks. By adding several lines of this type you can toggle this behaviour on and off, controlling which email links become hyperlinks and which do not.
This in-line tag substitutes the name of the files being converted
Syntax:
[[FILENAME]]
The tag will be replaced by the name of the file being converted. This facilitates the construction of sentences like
"This file was converted from [[FILENAME]] at [[TIMESTAMP]] "
which becomes
"This file was converted from asctortf.txt at 22-Feb-2004"
New in version 2.0
NOTE: The FO tag is only currently supported in RTF generation.
This in-line tag allows the font used in a document to be changed, either locally within some text, or from this point onwards.
The FO tag should be used in conjunction with a Style Definition File (SDF), which can be used to define the "font id"s that are used
Syntax:
FO [<font_id>],[<font_size>],[<font_weight>]
where
<font_id> Identifies the font to be used. This must match the
name of a font in the SDF file. If no name is given then
the prevailing font will be used.<font_size> The font size in pts. Only needed if the default
value in the font table is to be overridden.The size can be supplied as an absolute value or - if a plus or
minus sign is present - as a relative size. So for example
"4" means 4pt, whereas "+4" means 4pt larger than the
surrounding text.A value of "-" will be taken as a reset to the prevailing
default font size.<font_weight> The font weight. Only needed if the default value
in the font table is to be overridden. Possible values
areit (Italic)
bo (Bold)
bi (Bold Italic)
no (Normal)
- (Reset)The "reset" will cause the weight to be reset to the
prevailing default, i.e. no longer override the
prevailing font.
Example:
"This text is [[fo ,+6,bo]]big and bold,[[fo ,-,-]] but this text is normal again"
becomes
"This text is big and bold, but this text is normal again"
(this may only work in the RTF version of this document)
See also Scope for font tags
This in-line tag implements a fraction
Syntax:
[[FRACTION <expression>]]
where
<expression> This is the fraction expression which should contain
a slash ("/") separating the numerator and denominatorBoth values must be present.
So for example
The fractions [[FRACTION 5/16]] and 1[[FRACTION 1/2]].
becomes
The fractions 5/16 and 11/2.
New in version 2.0
This in-line tag adds a hyperlink to the named section heading.
Syntax:
[[GOTO <Heading_name>]]
where
<Heading_name> Name of a heading else where in the file.
The text used must match exactly for this tag
to work (case insensitive though)
It creates a hyperlink to the named section heading. The heading must match the text exactly, and be in the same file. It must also have been recognised by AscToRTF as a heading.
If making RTF WinHelp source files, see also the POPUP command.
New in version 2.0
This in-line tag adds a hyperlink to the named section heading.
Syntax:
[[POPUP <Heading_name>]]
This behaves in an identical manner to the GOTO unless you are creating an RTF file for use as a Windows Help file, in which case the hyperlink link becomes a pop-up link, instead of a full "go to" link.
These in-line tags implement superscripts and subscripts
Syntax:
[[SUPER <expression>]]
[[SUB <expression>]]
So for example
This[[SUPER superscript]] and that[[SUB subscript]]
becomes
Thissuperscript and thatsubscript
This is an in-line tag whose contents are ignored. Could be used for comments
Syntax:
[[IGNORE_THIS <anything_you_like>]]
This tag is ignored. It is replaced by a single space in the output stream. It could be used to add a brief comment to your source that would not appear in the output.
See also the IGNORE command
You can include one source file in another by using the include command as follows:-
$_$_INCLUDE filename
Make sure the file is accessible from wherever AscToRTF is run, or in the same directory as the original source file. AscToRTF will read the file on each pass, treating its contents as part of the main file for both analysis and conversion purposes.
Note, the include file should be plain text, which will be converted as normal for the document. It may include other pre-processor commands including further INCLUDE commands up to a limit of 9 levels. Be careful not to set up include loops (i.e. a includes b include c includes a etc).
Include files like this can be a useful way of embedding standard disclaimers etc, and compliment the use of header and footers.
New in version 2.0
The syntax is
$_$_PAGE
This signals a page boundary. In RTF generation a page break will be generated at this point. In HTML the concept of page boundaries isn't really supported, so a horizontal rule <HR> is put out instead.
This in-line tag adds a description of the program name/version used to convert the files (e.g. "AscToRTF 2.1")
Syntax:
[[VERSION]]
Outputs the version name of the conversion into the output file. For example "AscToHTM 4.2 beta".
Converted from
a single text file by
AscToHTM © 1997-2004 John A Fotheringham |