Documentation for the AscToRTF conversion utility |
The latest version of these files is available online at http://www.jafsoft.com/doco/docindex.html
New in version 2.0
As of version 2.0, AscToRTF allows the use of "Text Commands". These are
commands that allow you to modify the text before it is converted, or
to label certain lines as being of a particular type.
The commands should be placed in an external "Text Command File". This file can be chosen from Conversion Options -> Config File Locations menu option.
Contents of this section
Text Commands available
Text Command : ignore_lineText Command line elements
Text Command : remove_text
Text Command : replace_text
Text Command : treat_line
Text Command : meta_tag_line
line_selectionAn example Text Command File
line_match
match_type
replace_type
as_line_type
The ignore_line command identifies lines that should be ignored in the input.
Syntax:
ignore_line <line_selection>
Any line matching the specified line_selection criteria will be ignored
in the output. This can be a useful way of ignoring page markers in an
input file, as these don't always transfer well under the conversion.
The remove_text command identifies text that should be removed from the input.
Syntax:
remove_text <match_type> "match string"
Any line containing text that matches the specified match_type for the supplied "match string" will have the matching text removed.
The remove_text command identifies text that should be removed from the input.
Syntax:
replace_text <match_type> "match string" by_string "new string" or replace_text <match_type> "match string" by_character "<char>"
Any line containing text that matches the specified match_type for the supplied
"match string" will have the matching text replaced.
If the replacement is specified as
by_string "new string"
then the text is replaced by the new string. If the replacement is specified as
by_character "<char>"
then the string is replaced by a string of equal length consisting of this single character repeated. This can be useful for example to replace change bar characters by spaces in a document where the change bars have confused the program, or to replace other characters inside a table that are confusing the detection of the table's true layout.
The treat_line command allows you to specify how a line should be regarded during the analysis of the file.
Syntax:
treat_line <line_selection> <as_line_type>
With this command any line that matches the specified line_selection criteria will be regarded as the specified as_line_type.
For example the command
treat_line starting_with string "news" as_header_1
specifies that any line in which the string "News" is found at the start should be considered as a level 1 heading.
The meta_tag_line is meant solely for HTML conversion. It identifies lines that should be converted into HTML META tags.
Syntax:
meta_tag_line "tag name" <line_selection> [remove_match_text]
This command specifies that any line matching the line_selection criteria should be used to create a META tag called "tag name". The value of this META tag will be the line itself. If the remove_match_text argument is supplied, the match text itself will be removed from the value.
For example the command
meta_tag_line "author" starting_with string "author: " remove_match_text
Will match the line
Author: Dr John A Fotheringham
will remove the "author: " from this line, and create a META tag as follows
<META NAME="author" CONTENT="Dr John A Fotheringham">
This can be useful when processing text files created by other systems that add "tagging" and catalogue information at the top.
The line_selection element is actually a combination of a number of simpler elements as follows
Syntax:
<line_match> <match_type> "match string"
That is the line_selection consists of a line_match, a match_type, and then the actual "match string" to be matched. All three elements must be present in order for the line_selection to be valid.
The following are all valid examples
starting_with string "Chapter" starting_with exact_phrase "Author : " containing phrase "click here" containing string "http://"
The line_match element specifies where on the input line the specified text should be located. The options are
starting_with | Text should be at start of line (ignoring any white space) | |
containing | Text can be anywhere on the input line |
Care should be used when using the containing option, as false matches are more likely to occur.
The match_type element specifies how any supplied match string should be matched. The options are
string This specifies that a string should be matched.
This is, in fact, the most general of match types
and is the one that would normally be used. This
match type is case-insensitive.exact_string Same as "string", but case-sensitive. phrase A "phrase" is a string that is surrounded by white space
and/or punctuation on either side (see below).
This match type is case-insensitiveexact_phrase Same as "phrase", but case-sensitive. wildcard Not yet supported (*)
The match_type phrase is a special case. This is a string that is surrounded by white space or punctuation on either side. So whereas the string "the" would match "then", the phrase "the" wouldn't because the "n" in "then" is not a white space character.
The start and end of a line count as white space, and any leading or trailing punctuation is allowed. Phase is therefore a more precise match - even for single words - than string.
Consider the following example, concentrating on the letters "ten" in the word "tense"
This is a tense situation....
The following would apply
match_type | Matches? |
---|---|
string "ten" | Yes. The "ten" matches the first three characters in "tense" in the middle |
extact_string "Ten" | No. The "t" in "tense" is lower case, so the match fails |
phrase "ten" | No. "ten" is not surrounded by white space or punctuation because it is followed by "se" |
exact_phrase "tense situation" | Yes. The case matches, and there is a space before and punctuation (the "...") afterwards. |
The replace_type element is used in the replace_text command to specify what type of text replacement should be executed. The element should be immediately followed by the replacement text in quotes.
There are two options:-
by_string The matched text should simply be replaced
by the replacement text.by_character The matched text should be replaced by an
equal length string composed solely of the
single character in the replacement text.
The by_character option allows a string to be "blanked out" by the character of your choice, but without altering the line length or spacing etc. This can be useful, for example to replace all DOS line drawing characters by blanks in table, so as to let the software make a better stab at detecting the table layout.
The as_line_type element is used by the treat_line command to specify how the matching line should be treated. The as_line_type assigns to the matching line a type that would otherwise have to be automatically be detected by the program. It can therefore help the analysis if you can tell the program how such lines should be treated.
The options are
as_heading_<n> Where <n> is "1","2"..."6". The matched
line is treated as a heading of level <n>as_bullet The matched line is treated as being
an unordered list item (bullet)as_alpha_bullet The matched line is treated as being an
item on an alphabetic list.as_capalpha_bullet The matched line is treated as being an
item on an UPPER CASE alphabetic list.as_roman_bullet The matched line is treated as being an
item on an roman numeral list.as_caproman_bullet The matched line is treated as being an
item on an UPPER CASE roman numeral list.as_quoted The matched line is treated as being "quoted
text" such as lines in emails that start with
a ">" are.as_new_page The matched line is treated as being the
start of a new page.as_number_bullet The matched line is treated as being an
item on a numbered list.
For example the command
treat_line starting_with string ":" as_quoted
can be used to ensure that lines that start with ":" are treated as if they are "quoted text" such as one finds inside emails. See quoted line detection
Below is an example Text Command file:
treat_line starting_with exact_string "new page" as_new_page treat_line starting_with string "head_1" as heading_1 treat_line starting_with string "head_2" as heading_2 treat_line starting_with string "head_3" as heading_3 remove_text exact_string "head_1" remove_text exact_string "head_2" remove_text exact_string "head_3" ignore_line containing exact_string "PAGE"
In this example lines starting with "new_page" are treated as page breaks. Lines starting with "head_1" etc are treated as headings, and then the text "head_1" is removed. In this way you could label your heading lines without the labelling appearing in the output. Finally any line containing the exact_string "PAGE" is discarded. Note that by using "exact_string" you ensure that the case is matched so "PAGE" matches but "page" does not.
Converted from
a single text file by
AscToHTM © 1997-2004 John A Fotheringham |