Documentation for the Detagger html to text converter and markup removal utility

The latest version of these files is available online at http://www.jafsoft.com/doco/docindex.html


Previous page Back to Contents List Next page

Change History

Here is the change history of Detagger

Version 2.4 (June 2005)
Changes made in 2.4
Policies added in 2.4
Version 2.3.2 (September 2004)
Changes in version 2.3.2
New policies in version 2.3.2
Bugs fixed in version 2.3.2
Version 2.3 (April 2004)
Changes in Version 2.3
New policy options in Version 2.3
Version 2.2 (May 2003)
Changes in Version 2.2
New features in Version 2.2
Version 2.1 (March 2003)
Changes in Version 2.1
New features in Version 2.1
Version 2.0 (December 2002)
Changes in Version 2.0
New features in Version 2.0
Version 1.0 (August 2002)

Version 2.4 (June 2005)

Version 2.4 contains a small number of minor bug fixes and policy changes


Changes made in 2.4

Policies added in 2.4

Bugs fixed :-

Version 2.3.2 (September 2004)

Version 2.3.2 contains a number of bug fixes and minor enhancements over version 2.3. It also contains enhanced support for handling Unicode, especially UTF-16.


Changes in version 2.3.2

New policies in version 2.3.2

Here are the policies added in version 2.3.2 :-

Bugs fixed in version 2.3.2

Version 2.3 (April 2004)

Here are the policies added in version 2.3 :-


Changes in Version 2.3

For example SEC filings use the HTML-like <TEXT> tag to markup plain text. By using a Text Command to change this into a <PRE> tag, the HTML converter is then tricked into leaving the format of the text in this section alone.

See Using a Text Commands File

Alternatively, the tips can be read in sequence should you prefer by using the next/last buttons to go through them, and the screen can be brought up from an option on the settings menu.

If anyone has suggestions as to topics they would like tips on, please feel free to send them to info<at>jafsoft.com.

New policy options in Version 2.3

Several new policy options are added in version 2.3

General

External configuration files

Conversion to text

Markup removal

Version 2.2 (May 2003)

Version 2.2 contains a small number of improvements and enhancements over version 2.1.


Changes in Version 2.2

On the Settings menu a new option allows you to Remember settings on exit. If selected the current file, output directory, policy file and conversion options are remembered and used as the starting values next time you run the program.

New features in Version 2.2

A number of new options have been added to allow you to remove certain types of tags and attributes from inside tables only.

A new "Tables" option has been added under "markup manipulation" on the Conversion options menu. This takes you to the Detag Tables options tab which has the following options

Version 2.1 (March 2003)

Version 2.1 contains a small number of improvements and enhancements over version 2.0.


Changes in Version 2.1

New features in Version 2.1

Version 2.0 (December 2002)

Version 2.0 contains a number of changes suggested by users of Detagger, as well as a number of bug fixes and code enhancements.


Changes in Version 2.0

New features in Version 2.0

Several new features have been added to Detagger since version 1.0.

Markup removal

Tag removal options

Remove emphasis tags
Removes all the bold and italic markup from the HTML

Remove style sheet
Removes all the <STYLE> sections from the HTML, together with any reference to an external CSS style sheet.

Remove HTML <IMG> tags
Removes all the <IMG> tags from a HTML document.

HTML-to-Text conversion

Paragraph formatting

Output each paragraph on a single line
Each paragraph is output without hard line breaks (except at the end). This can be useful, depending on how and where the resulting text is to be used

Miscellaneous text formatting

May add Unicode marker to output file
When Unicode is detected in the source the software will output the text as UTF8 and optionally add a file marker that will label the file as "Unicode" in a way that most applications that can cope with Unicode will recognize.

Hyperlinks handling

Display link URLs
An option to display hyperlink URLs immediately after the display text in the output.

Replace <IMG> tags by a text marker
Option to place a marker in the text to show where an image has been removed.

Use the ALT attribute to replace <IMG> tags
Option to use the ALT attribute of an <ING> tag in it's text marker. This can help give some sense of what was being shown on the original page.

Tables conversion

Convert table to plain text
Convert table to comma-delimited data
Convert table to tab-delimited data
These options (which are mutually exclusive), determine how any <TABLE>s in the HTML should be output in the text. The options allow the table to be converted to plain text, or to delimited text better suited for loading into a spreadsheet such as Excel.

Version 1.0 (August 2002)

The initial release.



Previous page Back to Contents List Next page

Valid HTML 4.0! Converted from a single text file by AscToHTM
© 1997-2005 John A Fotheringham
Converted by AscToHTM