Documentation for the AscToHTM conversion utility : Adding HTML features to the document

Documentation for the AscToHTM Text to HTML converter

Adding HTML features to the document

As well as detection features present in the source text, the software allows you to add in features that you would expect in the output file that can't be inferred from the input

These include the following.

Document title
A working contents list

Adding a Document Title

AscToHTM can calculate - or be told - the title of a document. This will be placed in document properties section in the header of each HTML file produced.

The Title is calculated as in the order shown below. If the first algorithm returns a value, the subsequent ones are ignored.

If a TITLE command is placed in the source text, that value is used
If the Use first line as heading policy is set then the first heading (if any) encountered is used as the title.

Note:: Depending on your document structure, this is prone to give bland tiles like "Introduction" , "Overview" and "Summary"

If the Use first line as title policy is set then the first line in the file is used as the title.
If the Document title policy is set then this value is used.

Note:: If this is the value you want, ensure the other policies outlined above are disabled.

Finally, if none of the above result in a title the text "Converted from <filename>" is used.

Adding a Contents list

AscToHTM can detect the presence of a contents list in the original document, or it can insert a field code that will generate a contents list from the headings that it observes.

There are a number of policies that give you control over how and where a contents list is generated (see contents list policies).

Contents lists placement

By default the contents list will be placed at the top of the output file. You can cause contents lists to be placed wherever you want by using the CONTENTS_LIST pre-processor command.

Contents list detection
AscToHTM can detect contents lists in a number of ways

By detecting "table of contents" "end contents" or something similar in the text.
By spotting the numbering sequence has been repeated twice. AscToHTM will assume the first set is the contents list.
By spotting pre-processor markup.

This is often a hit-and-miss procedure, and is liable to error.

Should the analysis fail, you can attempt to correct it via the Contents lists policies.

Adding Headers and Footers to your output

AscToHTM can be made to add HTML headers and footers to each page of HTML generated. Although there are policies that allow you to specify special header and footer files, in later versions of the software we recommend the use of HTML fragments

Using a HTML Fragments file, you can define the reserved fragment names HTML_HEADER and HTML_FOOTER and these will be copied into the output. You can even embed HTML fragment tags into these definitions, to allow for some customisation. Other reserved names allow you to customize the headers and footers when converting files into a set of frames.

For example the definition

        $_$_DEFINE_HTML_FRAGMENT HTML_FOOTER
        <HR>
        <P>&copy; JafSoft 2004</P>
        <P>Converted by AscToHTM 5.0 from this  source file  on 14-Dec-2004</P>
        $_$_END_BLOCK

defines a fragment that will add a line at the end of the page and then a copyright notice and a line of text showing the version number of the conversion program used, a link to the original text source file (assumed to be a local link) and the date the conversion was performed.

See Using HTML fragments for a fuller description.

Customising the HTML by using 'HTML fragments'

From version 4 onwards the program allows you to define "HTML fragments", that is fragments of HTML that can be used by the software to override the standard HTML that it produces. This allows you to customise the headers, footers, horizontal rules, contents list and more.

See HTML fragments

Splitting large files into a linked set of smaller files

By default AscToHTM creates a single HTML file. However it is possible to get the software to split large files into smaller files, all linked together. For this to be possible the program has to first detect headings in the file. Once this is done, you can choose at what level of heading you want to split the file into pages.

This feature is described more fully in the AscToHTM FAQ that is part of the HTML documentation

See how file splitting works

How file splitting works

The program can only split into files at headings it recognises (see "Detecting Headings"). Before splitting the file you first need to check that the program is correctly determining where the headings are, and what type they are.

Headings can be numbered, capitalised or underlined. To tell if the program is correctly detecting the headings

Look at the HTML to see if <H1>, <H2> etc. tags are being added to the correct text.
If the headings are wrong, check the Headings Policies are being set correctly.
Once the headings are begin correctly diagnosed, you can use the File Splitting policies to switch on file splitting and to tailor the conversion

Note that the split level is set to 1 to split at "chapter" headings, 2 to split at "chapter and major section" headings etc.

Underlined headings tend to start at level 2, depending on the underline character (see "Underlined heading detection")

Hopefully this will give you some pointers, but if you still can't get it to work, please mail a copy of the source file (and any policy file you're using) to info<at>jafsoft.com and I'll see what I can advise.

Generating a set of Frames

You can use the Conversion Type to select the option of placing your document into a set of HTML frames. This will consist of a master document containing the necessary <FRAMESET> tags to define the frames, and then a number of supporting documents. The main conversion files will be created as before, and will be displayed in the main frame.

Frames Overview

The program has the ability to generate a set of frames from your source file. The program works to a model set of frames as shown below, but you have a great degree of control over how the frames are laid out, and what their contents are.

      +------------------------------------------------------------+
      |                       Header frame                         |
      |                        (optional)                          |
      +-------------+----------------------------------------------+
      |  NOFRAMES   |                                              |
      |    link     |                                              |
      |             |                                              |
      |             |                                              |
      |             |                                              |
      |  Contents   |               Main                           |
      |   Frame     |               Frame                          |
      | (optional)  |                                              |
      |             |                                              |
      |             |                                              |
      |             |                                              |
      |             |                                              |
      |             |                                              |
      +-------------+----------------------------------------------+
      |                       Footer frame                         |
      |                        (optional)                          |
      +------------------------------------------------------------+

The master <FRAMESET> document

Frames are implemented under HTML by having a document that describes the frame layout by using one or more nested <FRAMESET> tags. These tags group together <FRAME> tags that identify other HTML files that describe the contents of the individual frames or panes. The HTML page containing the <FRAMESET> doesn't normally contain any visible content. The source of this HTML page looks something like this :-

      <FRAMESET ROWS="110,*,90">
        <FRAME NAME="header" SRC="header.html">
        <FRAMESET COLS="260,*">
          <FRAME NAME="contents" SRC="contents.html">
          <FRAME NAME="main" SRC="main.html">
        </FRAMESET>
        <FRAME NAME="footer" SRC="footer.html"">
        <NOFRAMES>
          <BODY>
          <p>This browser does not support FRAMES<p>
          <p>Visit <A TARGET="_top"
                      HREF="noframes_main.html">this link</A></p>
          </BODY>
        </NOFRAMES>
      </FRAMESET>

This example produces a layout similar to that shown in the diagram in the Frames Overview. There are four frames as follows :-

"header" at the top of the screen with content taken from the HTML page header.html

"footer" at the bottom of the screen with content taken from the HTML page footer.html

the two frames "contents" and "main" side by side in the middle of the screen, between the "header" and "footer" frames. The "contents" frame is on the left, the "main" frame on the right. The contents of these frames are held in the html files "contents.html" and "main.html".

The <NOFRAMES> tag describes the content to be displayed if the browser doesn't support frames. This is less common now, but is still important as many search engines don't understand frames, and will only index the pages linked to in the <NOFRAMES> tag.

In HTML the frame names and source file names can be whatever you like. AscToHTM uses the frame names "header", "footer", "contents" and "main", but will vary the source file names according to the name of your input filename.

Depending on the details of your conversion, not all of the above frames are generated, in which case the <FRAMESET> tags will look slightly different.

You don't need to worry about any of this as AscToHTM will determine what layout is required and will generate the necessary HTML <FRAMESET> code.

By default if you convert a file called "myfile.txt" the files created are named as follows:-

myfile_frame.html - Master <FRAMESET> file

myfile_header_frame.html - "header" source file.

myfile_contents_frame.html - "contents" source file.

myfile_footer_frame.html - "footer" source file.

myfile.html - "main" source file.

The "main" frame

The "main" frame will contain the conversion of your source file. If you elect to split a document into many pages, then this will show the start page (which will have links to any next/previous page).

The "contents" frame

If your document has recognised headings, then the program is able to generate a contents list (see 5.6.2). In such cases a "contents" frame is generated and the contents list is placed in a file called "myfile_contents_frame.html".

If no contents list can be generated, then no contents frame is created unless you supply a CONTENTS_FRAME HTML fragment to be used as the contents of the "contents" frame.

The contents frame is placed to the left of the main frame. It will include a hyperlink labelled "NOFRAMES" (see NOFRAMES tag and NOFRAMES link) and the generated contents list. This is different from the <NOFRAMES> tag described in The master <FRAMESET> document.

You can use policies (see Using policies to control the frame structure) to suppress the creation of a contents frame or to control the following:-

width of the frame

colours of background and text

number of levels shown in the generated contents list

whether a "NOFRAMES" link is shown, and what URL it links to

You can also customize the frame's appearance using the following HTML fragments (see Using HTML fragments to override frame contents)

CONTENTS_FRAME

START_TOC / END_TOC

The "header" and "footer" frames

The software cannot "detect" headers and footers in your source text, so you will only get a header or footer frame if you supply the HTML yourself. Header and footer frames can be useful as they provide you with the opportunity to supply titles, navigation links or copyright notices that are always visible.

Prior to version 4 the software already had the ability to add HTML headers and footers to each page generated using HTML supplied in separate files identified by policy values. From version 4 onwards HTML fragments may also be used.

NOTE:: We recommend that, where possible, you use HTML fragments to define any header and footer HTML

It's expected that you may want to convert the same source into both frames and non-frames forms, using the same policy file. Given this the program has the ability to "promote" the HTML headers and footers used in non-frames production into their own always-visible frames. Equally there may be times when this behaviour is not wanted.

The relationships between headers and footers used in non-frames conversion and those used in frames-based conversion are quite complex. In the following sections we describe how headers (footers) are calculated. The logic is described for headers, but applies equally well to footers if you make the necessary name changes.

Non-frames use of HTML headers

In non-frames conversion each page created will get a HTML header if

The policy HTML header file is set
The HTML fragment HTML_HEADER is defined

If both are set, the HTML_HEADER fragment is used in preference.

The selected header is referred to as the "standard" header in the discussion in the next two sections.

Note:: For HTML footers the fragment HTML_FOOTER is used, and the policy
HTML footer file is tested.

"main" frame header

In frames conversion the HTML header added to each page is determined by three things

Any "standard" HTML header defined for non-frames conversion (see Non-frames use of HTML headers)
the policy Use main header in header frame
whether or not a HTML fragment MAIN_FRAME_HEADER is defined

If the fragment MAIN_FRAME_HEADER is defined, then that is used.

If the fragment MAIN_FRAME_HEADER is not defined, and there is no "standard" header, then the main frame gets no HTML header.

If the fragment MAIN_FRAME_HEADER is not defined, and the policy is not set then the "standard" header is used as in non-frames conversion.

If the fragment MAIN_FRAME_HEADER is not defined, and the policy is set then the "standard" header is promoted into its own "header" frame, and the main frame gets no HTML header.

Note:: For HTML footers the fragment MAIN_FRAME_FOOTER is used, and the policy
use main footer in footer frame is tested.

"header" frame

In frames conversion whether or not a "header" frame is created is determined by three things

Any "standard" HTML header defined for non-frames conversion (see Non-frames use of HTML headers)
the policy Use main header in header frame
whether or not a HTML fragment HEADER_FRAME is defined

If the fragment HEADER_FRAME is defined, then that is used as the contents of a "header" frame.

If the fragment HEADER_FRAME is not defined, and there is no "standard" header, then no "header" frame is created.

If the fragment HEADER_FRAME is not defined, and the policy is not set, then no "header" frame is created.

If the fragment HEADER_FRAME is not defined, and the policy is set, then the "standard" header is used as the contents of the "header" frame. In other words "standard" header is promoted from the "main" frame into its own "header" frame.

Note:: For HTML footers the fragment FOOTER_FRAME is used, and the policy
use main footer in footer frame is tested.

Using policies to control the frame structure

A large number of policies influence frames generation. These are described more fully in the Policy manual.

general

Place document in frames
Output frame name

Add Frame border

New frame link window name
Open frame links in new window

contents frame

Add contents frame if possible

Add NOFRAMES links
NOFRAMES link URL

Number of levels in contents frame

Contents Frame width
Contents frame background colour
Contents frame text colour

main frame

First frame page number

A number of file generation policies affect the main frame's appearance, including :-

Split level
Min HTML File size
Add navigation bar

header and footer frames

Use main header in header frame
Header Frame depth
Header frame background colour
Header frame text colour

Use main footer in footer frame
Footer Frame depth
Footer frame background colour
Footer frame text colour

Using HTML fragments to override frame contents

HTML fragments were introduced in version 4 as a means of allowing users to customize some the HTML generated by the software. This feature is heavily used in frames generation.

The fragment names used in frames production includes

HEADER_FRAME If defined, this fragment is used as the

contents of a header frame at the top of the screen

FOOTER_FRAME If defined, this fragment is used as the

contents of
screen a footer frame at the bottom of the

CONTENTS_FRAME If defined, this fragment is used as the contents

of the "contents" frame on the left of the screen.

If not defined the "contents" frame will contain a

generated contents list

MAIN_FRAME_FOOTER If defined, this fragment is used as the

HTML footer of each page that appears in the

main frame, overriding any HTML_FOOTER or value

defined via policy file.

MAIN_FRAME_HEADER If defined, this fragment is used as the

HTML header of each page that appears in the

main frame, overriding any HTML_HEADER or value

defined via policy file.

Other HTML fragments may have an effect. For example :-

START_TOC A fragment to be output before any generated
table of contents. If not defined the default
behaviour is to output the title "Table of Contents"

END_TOC A fragment to be output after any generated table
of contents. If not defined the default behaviour
is to simply put out a horizontal rule <HR>

NOFRAMES tag and NOFRAMES link

There are several reasons why providing a non-frames alternative to your pages is a good idea. These include

Not all browsers support frames. This is rarer these days, but there are still people who use text-based or non-visual browsers that can get confused by frames.

Not all people like frames. This is understating it, as many people loathe frames. This is because frames pages are hard to bookmark and the navigation can confuse some people.

Many search engines won't access the HTML pages used inside frames. This means your pages will go un-indexed, making it hard for people to find them.

To help with these problems the software supplies a <NOFRAMES> tag in the main <FRAMESET> document, and a visible "NOFRAMES" hyperlink in the contents frame.

The "NOFRAMES" hyperlink

The program can place a hyperlink in the contents frame. This link is labelled "NOFRAMES" and will link to the first main page. This will allow users who don't like frames to view your pages in a non-frames window. You can control this link to a limited extent using policies.

The <NOFRAMES> tag

HTML provides a tag whose contents is displayed to any browser that doesn't support the <FRAMESET> tag. The program will automatically generate a <NOFRAMES> tag that displays a message saying the page requires frames, and offering a link to the first main page. This will allow users with non-frames browsers, and search engines to access your main pages.

Generating frames and non-frames versions

You should consider whether or not your pages are suitable for both frames and non-frames viewing. If they are, then you can use the first page displayed in the main frame as your NOFRAMES hyperlink target. This is, in fact, the default behaviour.

There are a number of reasons that you might want to maintain two sets of pages :-

You don't want to have the non-frames version split into as many small pages as the frames version (different Split level policy values)

You want to place different headers and footers on the two versions to allow for different methods of navigation.

If you do want two sets of files, simply convert the file twice with and without frames generation selected. You can either move the files into different directories, or change the output filename for one of the sets. Other than these changes you should be able to use the same policy file.

If you create two sets of files, make sure you set the NOFRAMES link URL policy to point to the first non-frames HTML page.

Outputting HTML to the clipboard

You can use the Conversion Type to select the option of placing the generated HTML onto the Windows clipboard, ready for use in other Windows applications.

In this case the HTML generated will omit the <HTML>, <HEAD> and <BODY> tags as these are not suitable when pasting into an existing HTML document.

Using AscToHTM in this way can be a very powerful technique which allows you to merge converted HTML with more traditionally authored content.

This approach becomes even more powerful if you use a Clipboard extender like ClipMate to remember and organise everything to the clipboard. You could convert a few files, and then use Clipmate to recall the pasted HTML at your leisure for insertion into your other HTML.

ClipMate is produced by ThornSoft and can be downloaded from their website at http://www.thornsoft.com/

Back to Contents List

myfile_frame.html	- Master <FRAMESET> file
myfile_header_frame.html	- "header" source file.
myfile_contents_frame.html	- "contents" source file.
myfile_footer_frame.html	- "footer" source file.
myfile.html	- "main" source file.

HEADER_FRAME	If defined,	this fragment is used as the
	contents of	a header frame at the top of the screen
FOOTER_FRAME	If defined,	this fragment is used as the
	contents of screen	a footer frame at the bottom of the
CONTENTS_FRAME	If defined,	this fragment is used as the contents
	of the "contents" frame on the left of the screen.
	If not defined the "contents" frame will contain a
	generated contents list
MAIN_FRAME_FOOTER	If defined,	this fragment is used as the
	HTML footer	of each page that appears in the
	main frame,	overriding any HTML_FOOTER or value
	defined via	policy file.
MAIN_FRAME_HEADER	If defined,	this fragment is used as the
	HTML header	of each page that appears in the
	main frame,	overriding any HTML_HEADER or value
	defined via	policy file.

START_TOC	A fragment to be output before any generated table of contents. If not defined the default behaviour is to output the title "Table of Contents"
END_TOC	A fragment to be output after any generated table of contents. If not defined the default behaviour is to simply put out a horizontal rule <HR>