Tag Manual for JafSoft text conversion utilities

The "Tag Manual" for JafSoft's text conversion utilities

The most recent version of this document can always be found online

This file was generated by AscToHTM 5.0 from this source file on 14-Dec-2004

1 Introduction

JafSoft Limited have produced the following programs

AscToHTM converts plain text files into HTML files

AscToRTF converts plain text files into RTF files

Detagger converts HTML files to text,a nd can selectively remove markup

These programs share the same text analysis engine, and should do a good job of understanding the structure of the text document and replicating it in the target format.

However frequently users will want to have control over how the output looks and occasionally the text analysis will go wrong. For that reason the software supports

a large number of configuration options that act as rules or "policies". These policies may be saved in an external file called a policy file, so that the same combination of policies can be re-loaded. The use of policies is described fully in the Policy manual.

The addition of special tags to the source file which tell the program's "pre-processor" how to process the file.

This document describes the tagging system that the software supports.

CONTENTS LIST

1 Introduction

1.1 Overview of the pre-processor
1.2 Directives
1.3 In-line tags
1.4 Tag attribute lists
1.5 Error handling

1.5.1 Unrecognised and unimplemented tags
1.5.2 Parse errors

1.6 Tagging restrictions

2. Using the pre-processor

2.1 Marking up sections of text
2.2 Commands that influence the indexing of the document
2.3 Useful one-line pre-processor commands
2.4 Useful in-line tags
2.5 The TABLE commands
2.6 The CHANGE_POLICY command
2.7 Definition blocks and variables
2.8 HTML colours
2.9 English/American spellings

3 Using HTML fragments

3.1 Overview
3.2 How to define a HTML fragment
3.3 Fragment tags

3.3.1 Tags for navigation bar fragments
3.3.2 Tags for horizontal rule fragments
3.3.3 Tags for frame fragments
3.3.4 Tags for Heading fragments
3.3.5 Tags for Table of Contents fragments

3.4 Reserved HTML fragment names

3.4.1 Navigation bar fragments
3.4.2 The horizontal rule fragment
3.4.3 Table of contents fragments
3.4.4 HTML headers, footers and JavaScript fragments
3.4.5 HTML header and footers (inside FRAMES) fragments
3.4.6 Heading fragments

3.5 A sample HTML fragments file

4 Change history
5 Complete TAG list

ALLOW and DISALLOW
BASEHREF
BEGIN/END_ASCII
BEGIN/END_CODE
BEGIN/END_COMMA_DELIMITED_TABLE
BEGIN/END_CONTENTS
BEGIN/END_DELIMITED_TABLE
BEGIN/END_DIAGRAM
BEGIN/END_HTML
BEGIN/END_IGNORE
BEGIN/END_PRE
BEGIN/END_TABLE
BEGIN/END_USER_TABLE
BR (line break)
CHANGE_POLICY
COLUMN_DETAILS
CONTENTS_LIST
DATA
DEFINE/END_BLOCK and RESET_BLOCK
DEFINE_HTML_FRAGMENT and RESET_HTML_FRAGMENT
DEFINE_VARIABLE
DESCRIPTION
EMBED_BLOCK
ENTITY
FILENAME
FO (font) tag
FONT
FRACTION
GOTO
HTML
HTML_COMMENT
HTML_LINE
HYPERLINK
IGNORE_THIS
INCLUDE
INSERT_BLOCK
INSERT_FRAGMENT
KEYWORDS
LINERULE
LINKPOINT
META_TAG
NAVIGATION_BAR
NB "non-breaking spaces"
NEW_ROW
NEW_CELL
PAGE
POPUP
SAVE/RESTORE_CONTEXT
RULESET
SECTION
SOURCE_FILE
SPACES
SHORTCUT_ICON
STYLE_SHEET
SUPER and SUB
TABLE_ALIGN
TABLE_BGCOLOR
TABLE_BORDER
TABLE_BORDERCOLOR
TABLE_CAPTION
TABLE_CELLPADDING
TABLE_CELLSPACING
TABLE_CELL_ALIGN
TABLE_COLO(U) R_ROWS
TABLE_CONVERT_XREFS
TABLE_EVEN_ROW_COLO(U) R
TABLE_HEADER_COLS
TABLE_HEADER_ROWS
TABLE_IGNORE_HEADER
TABLE_LAYOUT
TABLE_MAY_BE_SPARSE
TABLE_MIN_COLUMN_SEPARATION
TABLE_ODD_ROW_COLO(U) R
TABLE_WIDTH
TEXT
TIMESTAMP
TITLE
TOC
VARIABLE
VERSION

1.1 Overview of the pre-processor

During the analysis process the software reads the source files line-by-line. The pre-processor recognises special keywords in two ways :-

Directives

"Directives" consist of a single line in the source file beginning with the string "$_$_" followed by a recognised keyword and any additional "attributes" that the directive supports.

In-line tags

In-line tags, as the name implies, can occur anywhere in the source lines. They are enclosed between the special strings "[[" and "]]". Between these strings the tag consists of a keyword and then any attributes that tag supports.

In both cases the tag or directive cannot be split over multiple lines, that is directives must be on a line by themselves, and in-line tags must be wholly contained on a single line.

Examples of a directive and in-line tags are shown below.

      $_$_LINERULE 75%

      The BR tag means this text will be broken [[BR]] into two lines.

becomes :-

The BR tag means this text will be broken
into two lines.

Some tags can be expressed in either directive or in-line form.

1.2 Directives

Directives exist on a line by themselves in the source text. They have the form

$_$_<keyword> <attribute_list>

where the "$_$_<keyword>" must occur at the start of the line and the <keyword> must be recognised.

If the "$_$_<keyword>" is not at the start of the line, the directive is ignored and treated as end-user text. This device has been used to date to aid conversion of the software's own documentation to HTML and Windows Help files.

The format of the <attribute_list> depends on the particular tag, but a general description is given in 1.4.

1.3 In-line tags

In-line tags may occur anywhere within end-user text, but not on directive lines or inside other in-line tags. In-line tags have the form

[[<keyword> <attribute_list>]]

that is the <keyword> and its <attribute_list> are between "[[" and "]]" delimiters. Initially the start and end delimiters must lie on the same line of the source text.

The <keyword> must be recognised, even if the tag is not yet fully implemented.

Note, the delimiters "[[" and "]]" are themselves dynamically configurable.

The format of the <attribute_list> depends on the particular tag, but a general description is given in 1.4.

1.4 Tag attribute lists

The <attribute_list> should be a comma-delimited set of attribute values. The number and types of attributes expected will depend on the tag concerned.

Some tags allow attributes to be optionally omitted, and a default value used instead. If the attribute being omitted is not the last on the list, then a place-saving comma should be supplied.

Examples

TAG 1,2,3,4 // full list

TAG 1,,3,4 // argument 2 is missing

TAG 1,,3, // arguments 2 and 4 are missing

TAG 1,,3 // arguments 2 and 4 are missing

TAG 1 // arguments 2, 3 and 4 are missing

If a mandatory argument is missing (one for which no default value is permitted), a TAG_ERROR will be signaled.

Each attribute in the list will be of a particular type. Supported types are

numeric values (mostly integers)

strings (enclosed in quotes (see below)

coded lists... e.g. an alignment attribute can take the values 'L' for Left, 'R' for Right etc.

If a particular attribute value is incorrect for the expected type, then a TAG_ERROR will be signal led.

String attributes should be placed in quotes if they contain commas themselves. It's probably good practice to place them in quotes in any case. Quotes within string attributes should be doubled up e.g.

"This string has the word ""quotes"" in quotes"

becomes

This string has the word "quotes" in quotes

Here are some examples

TAG 1,2 // arg 1 is '1', arg 2 is '2'

TAG ,2 // arg 1 is missing, arg 2 is '2'

TAG "one",2 // arg 1 is 'one', arg 2 is '2'

TAG "say one",2 // arg 1 is 'say one', arg 2 is '2'

TAG "a,b", 2 // arg 1 is 'a,b', arg 2 is '2'

TAG "say ""one"""", 2 // arg 1 is 'say "one"', arg 2 is '2'

TAG // both arguments missing, defaults will be used

TAG , // both arguments missing, defaults will be used

TAG ,, // both arguments missing, defaults will be used

1.5 Error handling

1.5.1 Unrecognised and unimplemented tags

Unrecognised and unimplemented tags will be signal led via messages. These messages will be given different severities. The software allows messages to be filtered by severity and type, so in this way the testing and production versions of the software can be made to report differently.

1.5.2 Parse errors

All failures to fully parse a tag will be reported. The actual error recovery (if any) will vary on a tag-by-tag basis. Usually the tag will simply be ignored, removed from the end-user text and signal led as in error. Occasionally a certain default behaviour may be possible.

For example, a failure to fully parse the attribute list of a LINERULE tag would probably default to outputting a simple <HR> into the HTML, rather than completely ignoring the LINERULE request.

1.6 Tagging restrictions

Most tags are either directives or in-line tags. Only a few may be used as either.
The tag must be wholly contained on a single line in the source file.

Directives must be on a line by themselves.

You can have multiple in-line tags on one line, but no single in-line tag may be spread over multiple lines.

Nested in-line tags are not allowed

TAG 1,2,3,4	// full list
TAG 1,,3,4	// argument 2 is missing
TAG 1,,3,	// arguments 2 and 4 are missing
TAG 1,,3	// arguments 2 and 4 are missing
TAG 1	// arguments 2, 3 and 4 are missing

TAG 1,2	// arg 1 is '1', arg 2 is '2'
TAG ,2	// arg 1 is missing, arg 2 is '2'
TAG "one",2	// arg 1 is 'one', arg 2 is '2'
TAG "say one",2	// arg 1 is 'say one', arg 2 is '2'
TAG "a,b", 2	// arg 1 is 'a,b', arg 2 is '2'
TAG "say ""one"""", 2	// arg 1 is 'say "one"', arg 2 is '2'
TAG	// both arguments missing, defaults will be used
TAG ,	// both arguments missing, defaults will be used
TAG ,,	// both arguments missing, defaults will be used