Documentation for the Detagger html to text converter and markup removal utility

The latest version of these files is available online at http://www.jafsoft.com/doco/docindex.html


Previous page Back to Contents List Next page

Using a Text Fragments File

Detagger allows you to define your own text headers and footers when converting the file to text. You do this by defining "text fragments" in an external "Text Fragments File" as follows

        $_$_DEFINE_TEXT_FRAGMENT <fragment_name>
        ..
        ... fragment lines...
        ...
        $_$_END_BLOCK

Having placed your block definitions in an external text file, you should then use the Menu option

Conversion Options | Convert to text | Text headers

to specify where this file can be located. This location will be saved in your Policy file, and may be lost if you load a new policy file.

Using this approach you can define

header and footer fragments



 
These will be placed at the top and bottom of each file when
Detagger is converting files to text. This allows you to
add standard copyright and contact information, and if you
use the TEXT_HEADER tags you can create headers
that are tailored to the contents of each file.
separator fragments

 
These will be places between results when you choose to
convert multiple files and concatenate the results into
a single file.

Contents of this section

Header and footer fragments
Default header and footer
TEXT_HEADER Tags
Separator fragments
Fragment tags
The DATA fragment tag

Header and footer fragments

Detagger recognises two fragment names

TEXT_HEADER
the text to be placed at the top of each output file
TEXT_FOOTER
the text to be placed at the end of each output file

If either of these fragments is not defined in the text fragments file, or if you don't supply a text fragment file, then the header and/or footer will be omitted.

Note: This feature is not available in the evaluation version of Detagger, instead in this version a default header and footer are used

Default header and footer

In the evaluation version of Detagger the header and footer are defined as follows:-

        $_$_DEFINE_TEXT_FRAGMENT TEXT_HEADER
        [[TEXT_HEADER BOX_TOP]]
        [[TEXT_HEADER VERSION]]
        [[TEXT_HEADER TITLE]]
        [[TEXT_HEADER BOX_MIDDLE]]
        [[TEXT_HEADER OUT_FILENAME]]
        [[TEXT_HEADER OUT_FILESIZE]]
        [[TEXT_HEADER TIMESTAMP]]
        [[TEXT_HEADER BOX_BOTTOM]]
 
        $_$_END_BLOCK
 
 
        $_$_DEFINE_TEXT_FRAGMENT TEXT_FOOTER
 
        [[LINERULE]]
        Converted by an unregistered version of [[VERSION]]
        Visit http://www.jafsoft.com/detagger/
        (this message is omitted in registered version
        [[LINERULE]]
        $_$_END_BLOCK

This gives example results as follows

        /----------------------------------------------------------------------\
        | < This header can be omitted in the registered version >             |
        | Converted by : Detagger 2.0 (unregistered)                           |
        |              : www.jafsoft.com/detagger/                             |
        | Title        : The JafSoft text conversion FAQ                       |
        |                                                                      |
        | File name    : a2hfaq.txt                                            |
        | File size    : 8,914 bytes (approx)                                  |
        | Create date  : 7-Aug-2002                                            |
        \----------------------------------------------------------------------/

        <main file contents>

        ========================================================================
        Converted by an unregistered version of Detagger 2.0
        Visit http://www.jafsoft.com/detagger/
        (this message is omitted in registered version)
        ========================================================================

TEXT_HEADER Tags

TEXT_HEADER tags are Fragment tags that can be placed inside text fragments and be replaced by a suitable box line in the output. The box lines will adjust to the current page width.

TEXT_HEADER tags have the form

        [[TEXT_HEADER <type>]]

and should be placed on a line by itself inside the fragment. For example the fragment :-

        $_$_DEFINE_TEXT_FRAGMENT TEXT_HEADER
        [[TEXT_HEADER BOX_TOP]]
        [[TEXT_HEADER OUT_FILENAME]]
        [[TEXT_HEADER OUT_FILESIZE]]
        [[TEXT_HEADER BOX_BOTTOM]]
        $_$_END_BLOCK

Gives the output

        /----------------------------------------------------------------------\
        | File name    : a2hfaq.txt                                            |
        | File size    : 8,914 bytes (approx)                                  |
        \----------------------------------------------------------------------/

Possible TEXT_HEADER tag types include


AUTHOR

 
This tag will add a box line identifying the
document author (taken from an author line, or
from a META tag in the original)
BOX_BOTTOM Adds a bottom line to the box
BOX_MIDDLE Adds a middle (blank) line to the box
BOX_TOP Adds a top line to the box
LAST_EMAIL
 
Adds a email line to the box for the last observed
email hyperlink (e.g. taken from a signature)
LAST_URL
 
Adds a URL line to the box based on the last
observed hyperlink
IN_FILENAME Input filename
IN_FILESIZE Input file size
IN_FILEDATE Input file date
OUT_FILENAME Output file name
OUT_FILESIZE
 
Output file size (in bytes). Only approximate,
as it estimates the header size
TIMESTAMP Adds a "date" line for the date of the conversion
TITLE
 
Adds a title line. Taken from the <TITLE> tag,
or from the first heading
TOP_EMAIL
 
Adds a email line to the box based on the first
email hyperlink in the source
TOP_URL
 
Adds a URL line to the box based on the first
observed hyperlink
VERSION
 
Adds a line identifying that the file was
converted by Detagger


Separator fragments

When converting multiples files at once and choosing to concatenate results, Detagger can be made to add a separator between the results for each file.

The fragment names recognised are

TEXT_SEPARATOR
 
the text to be placed between each set of results
in the output file when converting files to text
HTML_SEPARATOR



 
the HTML to be placed between each set of results
in the output HTML file when selectively removing
markup from the input files. Care should be taken
to ensure the HTML in this fragment is compatible
with that from the results files.

If either of these fragments is not defined in the text fragments file, or if you don't supply a text fragment file, then there will be no separators between results in the output file.

Note: This feature is not available in the evaluation version of Detagger, instead default results separators are used


Fragment tags

Within your fragment definitions you can supply any text you want, but this will be the same for each file converted. A number of fragment tags are recognised in the form

[[TAGNAME <details>]]

Where tags of this form are recognised, Detagger will replace the tag by a suitable value.

Of particular interest are the TEXT_HEADER tags. These tags produce a line of text suitable to be placed in a box at the top of the text file. The box width will be adjusted (where possible) to fit the chosen target page width.

For other fragments tags supported by JafSoft converters, please read the section on fragment tags in the Tag Manual available online at http://www.jafsoft.com/doco/tag_manual_3.html#Section_3.3

Note: Not all the tags described in that document are suitable for use inside Detagger files


The DATA fragment tag

The DATA fragment tag can be used to imbed information about the file being converted into the output.

In Detagger the main use of the DATA fragment tag is in text fragment or in replacement strings in the replace_text Text command (See An example use of a Text Commands File)

        [[DATA <data_type>]]

where,

<data_type> This is the type of data to be substituted in

Supported data types include

VERSION
TITLE
IN_FILENAME
OUT_FILENAME
IN_FILESIZE
OUT_FILESIZE
IN_FILEDATE
TIMESTAMP
COMMENT
Indicates the program version of Detagger used in the conversion
Document title (taken from the HTML header)
Input filename
Output filename
Input file size (in bytes)
Output file size
Timestamp of input file
Timestamp of conversion
Free text comment

Note, when used in a the replace_text Text command only those data types known when the input file is opened will work, so for example TITLE won't work in that context.



Previous page Back to Contents List Next page

Valid HTML 4.0! Converted from a single text file by AscToHTM
© 1997-2005 John A Fotheringham
Converted by AscToHTM