A fledgling FAQ for JafSoft text conversion utilities

The latest version of these files is available online at http://www.jafsoft.com/doco/docindex.html


Previous page Back to Contents List Next page

3.0 Conversion Questions

3.1 General

3.1.1 How do I get rid of the "nag" lines?

Easy. You register the software (see "registration and updates"), or you remove them by hand. "Nag" lines only appear in unregistered trial copies of the software. If you register, these are removed.



3.1.2 My file has had it's case changed and letters replaced at random by numbers. How do I fix that?

Easy. You register the software (see "registration and updates").

The case is only adjusted in unregistered trial copies of the software, either after the line limit is reached, or after the 30 day trial has expired. The case is adjusted so that you can still evaluate the conversion has produced the right type of HTML, but since the text is now all in the wrong case and had letters substitutes the HTML is of little use to you.

This is intended as an incentive to register.

That said, you will find pages on the web that have been converted in this manner.


3.1.3 Why do I sometimes get <DL> markup? How do I stop it?

The program is detecting a "definition". Definitions are usually keywords with a following colon ":" or hyphen "-", e.g. "text:"

You can see this more easily if you go to Output-Style and toggle the "highlight definition term" option... the definition term (to the left of the definition character) is then highlighted in bold.

If the definition spreads over 2 "lines", then a definition paragraph is created, giving the effect you see.

If you have created your file using an editor that doesn't output line breaks then only long paragraphs will appear to the program as 2 or more "lines". In such cases only the longer paragraphs will be detected as "definition paragraphs", the rest are detected as "definition lines", even though they're displayed in a browser as many lines. If you view the file in NotePad you'll see how the program sees it.

To stop this you have a number of options.

  1. _Analysis policies -> What to look for -> Look for definitions

switch this off. This will stop all attempts to spot "definition" lines

  1. Analysis policies -> Analysis -> recognize colon (:) characters

switch this off. This will stop anything with a colon (:) being recognized as a definition.

  1. Output policies -> Style -> Use <DL> markup for paragraphs

disable this. The definitions will still be recognized, but the <DL> markup won't be used.


3.1.4 Why are some of my words being broken in two?

Sometimes AscToHTM will produce HTML with words broken - usually over two lines. This can happen if your text file has been edited using a program (like NotePad) that doesn't place line breaks in the output.

AscToHTM is line-orientated (see 2.1.2). Programs like NotePad place an entire paragraph on a single "line", or on lines of a fixed length (e.g. 1000 characters).

AscToHTM places an implicit space at the end of each line it reads. This is to ensure you don't get the word at the end of one line merged with that at the start of the next.

However, in files with fixed length "lines", large paragraphs will be broken arbitrarily, with the result that a space (and possibly a <BR>) will be inserted into the middle of a word.

You can avoid this by breaking your text into smaller paragraphs, passing your file through an editor that wraps differently prior to conversion, or selecting any "save with line breaks" option you have.


3.1.5 Why am I getting line breaks in the middle of my text?

The software will add a line break to "short" lines, or - sometimes - to lines with hyperlinks in them.

You can edit your text to prevent the line being short, or you can use policies to alter the calculation of short lines. Use the "Policy Manual" to read about the following policies

3.1.6 Why isn't the software preserving my line structure?

Do you mean line structure, or do you really mean paragraph structure?

The program looks for "short lines". Short lines can mark the last line in a paragraph, but more usually indicate an intentionally short line. The calculation of what is a short line and what isn't can be complex, as it depends on the length of the line, compared to the estimate with of the page.

You have a number of options :-

See also "how do I preserve one URL per line?"


3.1.7 Why am I getting lots of white space?

Usually because you had lots of white space in your original document. If that is the case, then you can set the policy

Ignore multiple blank lines : Yes


to reduce this effect.

Some people complain that there are blank lines between paragraphs, or between changes in indentation. Often this is the vertical spacing inserted by default in HTML. This can only be controlled in later versions of HTML which support HTML 4.0 and Cascading Style Sheets (CSS)

Occasionally certain combinations of features lead to an extra line of space.


3.1.8 What's the largest file anyone's ever converted with AscToHTM?

Well, at time of writing, I know of a 56,000 line file (3Mb) which was converted into a single (4Mb) HTML file. Of course, it was also converted in a suite of 300 smaller, linked, files weighing in at 5Mb of HTML.

This file represented 1,100 pages when printed out.

I do sometimes wonder if anyone ever reads files that big though.


3.1.9 Does the software support hebrew letters / Japanese / Right to Left Alignment ?

Since version 4.1 the short answer is "probably".

Although the software has no ability to understand documents written this way, and was designed to cope with the ASCII character set, from version 4.0 onwards it is possible to manually set the "charset" used. This tells the HTML browser how to interpret the characters. Whether or not you see the page correctly then depends on the browsers and fonts installed on the viewer's machine.

In version 4.1 some auto-detection of character sets has been added. This can usually detect which character encoding is being used. You can switch this behaviour off should you wish, and you can also set the correct charset by hand.

See the policies "Character encoding" and "Auto-detect character encoding".


3.1.10 Why does the program hang after a conversion?

Under Windows the software usually tries to display the results files in your browser or viewer of choice. To prevent multiple instances of the browser being launched, DDE is used. DDE is a Windows mechanism that allow requests to be passed from one program to another, in this case the software is asking the browser to display the HTML just created.

Some users have reported problems with DDE - especially under Windows Millenium. When this occurs any program - including AscToHTM - will hang whenever it attempts to use DDE... you notice it first with AscToHTM because it uses DDE all the time. When this happens you will need to use the Task Manager to kill the program.

You can solve this problem by using the Settings -> Viewers for results to disable the use of DDE.

From version 4 onwards the software will detect when this has happened, and will disable its use of DDE next time it is run. You can re-enable this (e.g. after a reboot has cleared the problem) under the Settings->Viewers menu option.

Note, this is a workaround and not a solution. When DDE stops working on your system other programs sill have problems, e.g. when you click on a hyperlink inside your email client.

Sadly I don't know a solution for the DDE problem. Sometimes rebooting helps - initially at least - sometimes stopping a few applications helps. Sometimes it doesn't. :-(


3.2 What the software can't do

3.2.1 Why doesn't it convert Word/Wordperfect/RTF/my favourite wp documents?

Because it wasn't designed to. No, really.

The software is designed to convert ASCII text into HTML. That is plain, unformatted documents. Word and other wp packages use binary formats that contain formatting codes embedded in the text (or in some cases the text is embedded in the codes :-).

Even RTF, which is a text file, is so heavily full of formatting information that it could not be perceived as normal text (look at it in Notepad and you'll soon see what I mean).

Why the omission? Well, like I said, that was never the intention of this program. I always took the view that, in time, the authors of those wp packages would introduce "export as HTML" options that would preserve all the formatting, and in general this is what has happened. To my mind writing such a program is "easy".

My software tackles the much more difficult task of inferring structure where none is explicitly marked. In other words trying to "read" a plain text file and to determine the structure intended by the author.

See also "rtf-to-html converter etc?".


3.2.2 How can I use DDE with Netscape 6.0?

You can't. Unlike Netscape versions up to and including 4.7, Netscape 6.0 doesn't support DDE in its initial release under Windows.


3.2.3 Can I use AscToHTM to build a web site with a shopping cart?

By itself, no.

AscToHTM can only really produce relatively "static", mostly-text web pages. To add any dynamic contents and graphics you'd effectively need to add the relevant HTML yourself, so the answer is essentially "no".

Adding a shopping cart is actually fairly tricky. You either have to install the software yourself, or sign up with an ISP that will do this for you. Most such systems require a database (of items being sold). Having not dealt much with such systems myself I can't really advice on a web authoring tool (which is what AscToHTM is) that would integrate seamlessly with a shopping cart system.

My advice would be to identify an ISP that offers shopping cart functionality and see what methods they offer for web authoring.

I wish you luck.


3.2.4 How do I interrupt a conversion?

At present you can't. The windows version won't respond to stimulus while a conversion is in progress, meaning that the windows will not refresh. Normally this isn't a problem, but in large conversions this can be a little disconcerting.

Fixing this is on the "to do" list.


3.3 Tables

3.3.1 How does the program detect and analyse tables?

Here's an overview of how the software works, this will give you a flavour for the complexity of the issues that need to be addressed.

The software first looks for pre-formatted regions of text. It does this by

  1. Spotting lines that are clearly formatted, looking for large white space and any table-like characters like '|' and '+'. If may also look for code-like lines and diagram-like lines according to the policies set.

  2. Each time a heavily formatted line is encountered an attempt is made to extend the preformatted region by "rolling it out" to adjacent, not so clearly formatted lines

  3. This "roll out" process is stopped whenever it encounters a line that is clearly not part of the formatted region. This might be a section heading or a set of multiple blank lines (the default is 2).

Once a preformatted region is identified, analysis is performed to see whether this is a table, diagram, code sample or something else. This decision depends on

  1. The mix of "graphics" characters as opposed to "text" characters

  2. The presence of "code-like" indicators like curly brackets, semi-colons and "++" and other special character sequences. Note, the software doesn't understand code syntax, it just recognises commonly used character combinations.

  3. How well the data can be fitted into columns of a table (below)

If nothing fits then this text is output "as normal", expect that the line structure is preserved to hopefully retain the original meaning.

If the software decides a table is possible, it

  1. Characterizes the contents of each character position. So for example a character position that contains mostly blank characters on each line is a good candidate for a column boundary.

  2. Infers from the character positions the likely column boundaries

Once a tentative set of column boundaries has been identified, the following steps are repeated

  1. Place all text into cells using the current column boundaries

  2. Measure how "good a fit" the text is to the columns, looking for values that span column boundaries, or columns that are mostly "empty"

  3. Eliminate any apparently "spurious" columns. For example "empty" columns may get merged with their neighbours.

Finally, having settled on a column structure the software

  1. Tries to identify the table header, preferably by detecting a horizontal line near the top of the table.

  2. Tries to work out column alignments etc. If the cell contents are numeric the cell will be right aligned, otherwise the placement of the text compared to the detected boundaries will be observed

  3. Identifies how many lines goes into each row. If blank lines or horizontal rules are present, these may be taken as row boundaries.

  4. places all text into cells, using the configuration found.

Naturally any one of these steps can go wrong, leading to less than perfect results.

The program has mechanisms (via policies and preprocessor commands) to

  1. Influence the attempt to look for tables
  2. Influence the attempt to extend tables (steps (1)-(3))
  3. Influence the decision as to what a preformatted region is (steps (4)-(6))
  4. Influence the column analysis (steps (7)-(11))
  5. Influence the header size and column alignment (steps (12)-(15))

Read the table sections in the "Tag Manual" and "Policy Manual" for more details.


3.3.2 Why am I getting tables? How do I stop it?

The software will attempt to detect regions of "pre-formatted" text. Once detected it will attempt to place such regions in tables, or if that fails sometimes in <PRE>...</PRE> markup.

Lines with lots of horizontal white space or "table characters" (such as "|". "-". "+") are all candidates for being pre-formatted, especially where several of these lines occur.

This often causes people's .sigs from email to be placed in a table-like structure.

You can alter whether or not a series of lines is detected as preformatted with the policies

Look for preformatted text No
Minimum automatic <PRE> size 4

The first disables the search for pre-formatted text completely. The second policy states that only groups of 4 or more lines may be regarded as preformatted. That would prevent most 3-line .sigs being treated that way.

If you have pre-formatted text, but don't want it placed in tables (either because it's not tabular, or because the software doesn't get the table analysis quite right), you can prevent pre-formatted regions being placed in tables via the policy

Attempt TABLE generation : No



3.3.3 Why am I not getting tables?

First read "how does the program detect and analyse tables?" for an overview of how tables are detected.

If you're not getting tables this is either because they are not being detected, or that having been detected they are being deemed to be not "table-like". Look at the HTML code to see if there are any comments around your table indicating how it's been processed.

If the table is not being detected this could be because

If all this fails, edit the source to add preprocessor commands around the table as follows

        $_$_BEGIN_TABLE
        ...
        ...(your table lines)
        ...
        $_$_END_TABLE


3.3.4 Why do my tables have the wrong column structure?

First read "how does the program detect and analyse tables?" for an introduction to how tables columns are analysed.

The short answer is "the analysis went wrong". Answering why it went wrong is almost impossible to answer in a general way. Some things to consider

Often the table extent is correct, but the analysis of the table has gone wrong.

If all this fails you can explicitly tell the software what the table layout by using either the TABLE_LAYOUT preprocessor command, or the "Default TABLE layout" policy. Only use the policy if all tables in the same source file have the same layout.


3.3.5 Where did all my table lines go?

The software removed them because it thought they would look wrong as characters. The lines are usually replaced by a non-zero BORDER value and/or some <HR> tags placed in cells.


3.3.6 How can I get the program to recognize my table header?

One tip. If you insert a line of dashes after the header like so...

     Basic Dimensions
   Hole No.    X        Y
   -------------------------
     1       3.2500   5.0150
     2       1.2500   3.1250
     etc.....

The program should recognize this as a heading, and modify the HTML accordingly (placing it in bold).

Alternatively you can tell the program (via the policy options or preprocessor commands) that the file has 2 lines of headers.


3.3.7 Why am I getting strange COLSPAN values in my headers?

(see the example table in 3.3.6)

The spanning of "Basic Dimensions" over the other lines can be hit and miss. Basically if you have a space where the column gap is expected the text will be split into cells, if you don't then the text will be placed in a cell with a COLSPAN value that spans several cells.

For example

            | space aligns with column "gap"
            v
       Basic Dimensions
   Hole No.    X        Y
   -------------------------
     1       3.2500   5.0150
     2       1.2500   3.1250
     etc.....

In this case you'd get "Basic" in column 1 and "Dimensions" spanning columns 2 and 3. If you edit this slightly as follows then the "Basic Dimensions" will span all 3 columns

          | space no longer aligns with column "gap"
          v
     Basic Dimensions
   Hole No.    X        Y
   -------------------------
     1       3.2500   5.0150
     2       1.2500   3.1250
     etc.....

It's a bit of a black art.

Sometimes when the table is wrong, it's a good idea to set the BORDER size to 0 (again via the policy options) to make things look not so bad. It's a fudge, but a useful one to know.


3.4 Headings

3.4.1 How does the program recognize headings?

The program can attempt to recognize five types of headings:

Numbered headings. These are lines that begin with section numbers. To reduce errors, numbers must be broadly in sequence and headings at the same level should have the same indentation. Words like "Chapter" may be before the number, but may confuse the analysis when present.

Capitalised headings. These are lines that are ALL IN UPPERCASE.

Underlined headings. These are lines which are followed by a line consisting solely of "underline" characters such as underscore, minus, equals etc. The length of the "underline" line must closely match the length of the line it is underlining.

Embedded headings. These are headings embedded as the first sentence of the first paragraph in the section. The heading will be a single all-UPPERCASE sentence. Unlike the other headings, the program will place these as bold text, rather than using heading markup. You will need to manually enable the search for such headings, it is not enabled by default.

Key phrase headings. These are lines in the source file that begin with user-specified words (e.g. "Chapter", "Appendix" etc.) The list of words and phrases to be spotted is case-sensitive and will need to be set via the "Heading key phrases" policy.

The program is biased towards finding numbered headings, but will allow for a combination. It's quite possible for the analysis to get confused, especially when

To tell if the program is correctly detecting the headings

  1. Look at the HTML to see if <H1>, <H2> etc. tags are being added to the correct text.

  2. If the headings are wrong, check the analysis policies are being set correctly by looking at the values shown under
    Conversion Options -> Analysis policies -> headings

after the conversion.

Depending on what is going wrong do one or more of the following :-

  1. Adjust the headings policy (e.g. to disable capitalised headings)

  2. Edit the source to replace centred headings by headings at a fixed indentation.

  3. Edit the source so that numbered lists are at a different indentation to numbered sections.

  4. If your numbering system is too exotic, edit your source so that all the headings are "underlined" and get the program to recognize underlined, rather than numbered headings.

  5. If possible consider the use of the "Heading key phrase" policy instead.

3.4.2 Why are my headings coming out as hyperlinks?

This is a failure of analysis. The program looks for a possible contents list at the top of the file before the main document (sometimes in the first section).

If your file has no contents list, but the program wrongly expects one, then as it encounters the headings it will mark these up as contents lines.

To prevent this, set the analysis policy

Expect contents list : No

to "no". Or add a preprocessor line to the top of your file as follows

      $_$_CHANGE_POLICY Expect contents list : No

3.4.3 Why are the numbers of my headings coming out as hyperlinks?

Either a failure of analysis, or an error in your document. The software checks headings "obey policy" and are in sequence. If you get your numbering sequence wrong, or if you place the heading line at a radically different indentation to all the others, then the software will reject this as a heading line, in which case the number may well be turned into a hyperlink.

If it's an error in your document, fix the error.

For example, a common problem is numbered lists inside sections. If the list numbers occur at the same level of indentation as the level 1 section headings, then eventually a number on the list will be accepted as the next "in sequence" header. For example in a section number 3.11, any list containing the number 4 will have the "4" treated as the start of the next chapter. If section "3.12" is next, the change in section number from 4 will be rejected as "too small", and so all sections will be ignored until section 4.1 is reached.

The solution here is edit the source and indent the numbered list so that it cannot be confused with the true headers, Alternatively change it to an alphabetic, roman numeral or bulleted list.

Another possible cause if is the software hasn't recognized this level of heading as being statistically significant. (e.g. if you only have 2 level 4 headings (n.n.n.n) in a large document). In this case you'll need to correct the headings policy, which is a sadly messy affair.


3.4.4 Why are various bullets being turned into headings, and the headings ignored?

The software can have problems distinguishing between

1 This is chapter one

and

  1. This is list item number one.

To try and get it right it checks the sequence number, and the indentation of the line. However problems can still occur if a list item on the right number appears at the correct indentation in a section.

If possible, try to place chapter headings and list items at different indentations.

In extreme cases, the list items will confuse the software into thinking they are the headings. In such a case you'd need to change the policy file to say what the headings are, with lines of the form

We have 2 recognized headings

Heading level 0 = "" N at indent 0
Heading level 1 = "" N.N at indent 0

(this may change in later versions).


3.4.5 Why are lines beginning with numbers being treated as headings?

The software can detect numbered headings. Any lines that begin with numbers are checked to see if they are the next heading. This check includes checking the number is (nearly) in sequence, and that the line is (nearly) at the right indentation.

If the line meets these criteria, it is likely to become the next heading, often causing the real heading to be ignored, and sometimes completely upsetting the numbering sequence.

You can fix this by editing the source so that the "number" either occurs at the end of the previous line, or has a different indentation to that expected for headings.


3.4.6 Why are underlined headings not recognized?

The software prefers numbered headings to underlined or capitalised headings. If you have both, you may need to switch the underlined headings on via the policy

Expect underlined headings : Yes


3.4.7 Why are only some of my underlined headings not recognized?

If the program is looking for underlined headings (see "Why are underlined headings not recognized?") then the only reason for this is that the "underlining" is of a radically different length to the line being underlined. Problems can also occur for long lines that get broken.

Edit your source to

3.4.8 How do I control the header level of underlined headings?

The level of heading associated with an underlined heading depends on the underline character as follows:-

        '****'                  level 1
        '====','////'           level 2
        '----','____','~~~~'    level 3
        '....'                  level 4

The actual markup that each heading gets may depend on your policies. In particular level 3 and level 4 headings may be given the same size markup to prevent the level 4 heading becoming smaller than the text it is heading. However the logical different will be maintained, e.g. in a generated contents list, or when choosing the level of heading at which to split large files into many HTML pages.


3.4.9 Why are only the first few headings are working?

A couple of possible reasons :-

One of the reasons for "failure" is that - for consistency - headings must be in sequence and at the same indentation. This is an attempt to prevent errors in documents that have numbers at the start of a line by chance being treated as the wrong headings.

If some headings aren't close enough to the calculated indent then they won't be recognised as headings. If a few headings are discarded then later headings that are at the correct indentation are discarded as being "out of sequence".

If you're authoring from scratch then the easiest solution is to edit all the headings to have the same indent. Alternatively disable the policy "Check indentation for consistency".


3.5 Hyperlinks

3.5.1 Why doesn't it correctly parse my hyperlinks?

The software attempts to recognize all URLs, but the problem is that - especially near the end of the URL - punctuation characters can occur. The software then has difficulty distinguishing a comma separated list of URLs from a URL with a series of commas in it (as beloved at C|Net).

This algorithm is being improved over time, but there's not much more you can do than manually fix it, and report the problem to the author who will pull out a bit more hair in exasperation :)


3.5.2 Why doesn't it recognize my favourite newsgroup?

To avoid errors the program will only recognize newsgroups in the "big 7" hierarchies. Otherwise filenames like "command.com" might become unwanted references to fictional newsgroups.

This means that uk.telecom won't be recognized, although if you place "news:" in from of it like this news:uk.telecom then it is recognized.

If you want to make "uk." recognized as a valid news hierarchy, then set the policy

recognized USENET groups : uk

Then any work beginning "uk." may become a newsgroup link.


3.5.3 Why are only some of my section references becoming hyperlinks?

The program will only convert numbers that match known numbered sections into hyperlinks. If the number is a genuine section heading, then the chances are that this level of heading has not been detected. This has happened in large documents which contained only 2 level 5 headings. In such document you may need to manually add the extra level to your policy file.

Another limit is that the program won't convert level 1 heading references, because the error rate is usually two high. For example if I say "1, 2, 3" it's unlikely I want this to become hyperlinks to chapters 1, 2 and 3.


3.5.4 Why are some numbers becoming hyperlinks?

In a numbered document numbers of the form n.n may well become hyperlinks to that section of the document. This can cause "Windows 3.1" to become a hyperlink to section 3.1 if such a section exists in your document.

You can either insert some character (such as "V" to make "V3.1"), place the number inside a protective pre-processor TEXT tag as follows

[[TEXT 3.1]]

or disable this feature entirely via the policy

Cross-refs at level : 3


(which means only "level 3" headings such as n.n.n will be turned into links, or

Cross-refs at level : (none)


which should disable the behaviour.


3.5.5 Why are some long hyperlinks not working?

The software will sometimes break long lines to make the HTML more readable. If this happens in the middle of a hyperlink, the browser reads the end of line as a space in the URL.

You can fix this by editing the output text so that the HREF="<url>" part of the file is all on the same line.

This "feature" may be fixed in later versions of AscToHTM.


3.5.6 How do I preserve one URL per line?

Some files contain lists of URLs, with one URL per line. By default the software will not normally preserve this structure because long lines are usually concatenated into a single paragraph.

You can change this behaviour using the option on the Output policies -> Hyperlinks policy sheet.

See also "why isn't the software preserving my line structure?"


3.6 Policy files

3.6.1 How many policies are there? Where can I read more about individual policies?

First time I looked it was nearly 200, recently the number is approaching 250. They kind of sneak up on you, I guess. The "Policy Manual" gives a pretty comprehensive description of what each one does and where it can be found. Last time I checked that file was 5000 lines of text before conversion to HTML.

People complain that there are too many policies, but then they say "couldn't you add an option to ...", and so it goes. Organizing these policies in a logical manner is a fairly difficult problem, and if anyone has any bright ideas I'm listening. In recent versions I added overview policies to make things easier to locate or to switch off en masse.


3.6.2 My policy file used to work, but now it doesn't. Why?

Make sure you're using an "incremental" policy file, rather than a full one. You can do this by viewing the .pol file in a text editor. An "incremental" policy file will only contain lines for the policies you've changed. A full policy file will contain all possible policies.

If you load a "full" policy file you prevent the program intelligently adjusting to the particular file being converted. If this happens either edit out the lines you don't want from your policy file, or reset the policies to their defaults and create a new policy file from scratch.

NOTE:
There used to be a bug whereby sometimes a policy would inadvertently get saved as a "full" file. that should be fixed now.

3.6.3 xxxx Policy is not taking effect. What shall I do?

(see 1.7)


3.7 Bullets and lists

3.7.1 Why is the indentation wrong on follow-on paragraphs?

The program can't distinguish between indented paragraphs and paragraphs that are intended as follow-on paragraphs from some bullet point or list item.

This means that whilst the first paragraph (the one with the bullet point) is indented as a result of being placed inside appropriate list markup, the second and subsequent paragraphs are just treated as indented text.

The bullet point will be indented as one level deeper than the text position of the bullet. The follow-on paragraph will be indented according to it's own indentation position compared to the prevailing documentation pattern. Ideally this will be one level deeper than the text position of the bullet.

Occasionally the two result in different indentations. The solutions are either to

  1. Review your indent position(s) policy with a view to adjusting the values to give the right amount of indentation to the follow-on paragraphs. Sometimes adding an extra level to match the indentation of the follow-on paragraph is all that's necessary.

  2. Edit your source text slightly, adjusting the indent of either the list items or follow-on paragraphs until the two match.

3.7.2 Why is the numbering wrong on some of my list items?

HTML doesn't allow the numbering to be marked up explicitly. Instead you can only use a START attribute in the <OL> tag to get the right first number which is incremented each time a <LI> tag is seen.

Some browsers don't implement the START attribute, and so they always restart numbering at 1.

There's not much I can do about this problem.

I've also seen a bug in Opera V3.5 where any tag (such as <EM>) placed between the <OL> and the <LI> causes the numbering to increment. That shouldn't be a problem here, as that's illegal HTML markup - and we try very hard not to generate any of that!


3.7.3 Some of my text has gone missing. What happened?

There's a bug (in Opera), where a <FONT> tag between the <OL> and <LI> tag causes all that text to not be displayed.

That shouldn't be a problem here, as that's illegal HTML markup - and we try very hard not to generate any of that!

If there's any other problem of this sort please email info<at>support.com with details.


3.8 Contents List generation

3.8.1 How do I add a contents list to my file?

There are a number of ways:-

Conversion Options -> Output Policies -> Contents List

A hyperlinked contents list will be generated from the headings that the program detects. This list will be placed at the top of the first file.

3.8.2 Why doesn't my contents list doesn't show all my headings?

First read "how does the program recognize headings?".

If you're generating a contents list from the observed headings, then any missing headings are either because

  1. The program didn't recognize the headings
  2. The policy Maximum level to show in contents has been set to a value that excludes the desired heading.

If you're converting an in-situ contents list, then only (a) is likely to apply, in which case you need to ensure the program recognizes your headings.


3.8.3 Some of my contents hyperlinks don't work!

There used to be a problem whereby the software would add hyperlinks to sections that didn't exist, or would point to the wrong file when a large file was being split into many smaller files.

Both problems should now be fixed, so if you encounter this problem, contact info<at>support.com.


3.9 Emphasis

3.9.1 Why didn't my emphasis markup work?

Emphasis markup can be achieved by placing asterisks (*) or underscores (_) in pairs around words or phrases. The matching pair can be over a few lines, but cannot span a blank line. Asterisks and underscores can be nested.

Asterisks generate bold markup, underscores generate italic markup, and combining these generates bold, italic markup.

If you wrap a phrase in underscores, and replace and replace all the spaces by underscores like_this then the result will be underlines like this and not in italics.

The algorithm copes reasonably well with normal punctuation, but if you use some unanticipated punctuation, it may not be recognized!&%@!

You can have a phrase that spans a couple of lines that contains another phrase of a different type in the middle of it, but you can't have two phrases of the same type nested that way. Be reasonable :-)

Phrases that span a blank line are not permitted. You'll need to end the markup before the blank line, and re-start it afterward. This is to reduce the chances of false matches.


3.10 Link Dictionary

3.10.1 What is the Link Dictionary?

The link dictionary allows you to add hyperlinks to particular words or phrases. You can choose the phrase to be matched, the text to be displayed and the URL to be linked to.

This can help when building a site by converting multiple text files. For example the whole www.jafsoft.com site is built from text files, and extensive use of a link dictionary is made to add links from one page to another.


3.10.2 My links aren't coming out right. Why?

Known problems include

One tip is to place brackets round the [match text] in your source file... this not only makes the chances of a false match less likely, but also makes it clearer in the source files where the hyperlinks will be.


3.10.3 I can't enter links into the Link Dictionary. What gives?

The Link Dictionary support in the Windows version of the software is a little quirky. Apologies for that.

The way it should work is that you click on "add new link definition", button.

I realize now that this is counterintuitive, and will probably address this in the next release.

If you save your policy, each link appears as a line of the form

Link definition: "match text" = "display text" + "URL"

e.g.

Link definition: "jaf" = "John Fotheringham" + "http://www.jafsoft.com/"

The whole definition must fit on one line.

You may find it easier to open your .pol file in a text editor and add these by hand.


3.11 Batch conversion

For more information see the section "Processing several files at once" in the main documentation. The software supports wildcards, and console versions are available to registered users which are better suited for batch conversions.

In the shareware versions no more than 5 files may be converted at once. This limit is absent in the registered version (see "what's the most files I can convert at one go?").


3.11.1 How do I convert a few files at once?

If you only want a few files converted, then the simplest way is to drag and drop those files onto the program. You can either drag files onto the program's icon on the desktop, or onto the program itself.

If you drag files onto the program's icon there is a limit with this approach of around 10 files. This limit arises because the filenames are concatenated to make a command string, and this seems to have a Windows-impose limit of 255 characters. This problem may be solved in later versions.

The same limit doesn't seem to apply when you drag files onto the open program.

Alternatively you can browse to select the files you want converting.


3.11.2 How do I convert lots of files at once?

If you want to convert many files in the same directory, then just type in a wildcard like "*.txt" into the name of the files to be converted.

Registered users of the software can get a console version of the software. This can accept wildcards on the command line, and is more suited for batch conversion, e.g. from inside windows batch files (for example it won't grab focus when executed).

If you want to convert many files in different directories, either invoke the console version multiple times using a different wildcard for each directory, converting one directory at a time, or investigate the use of a steering command file when running from the command line. See the main documentation for details.


3.11.3 What's the most files I can convert at one go?

The largest number of files converted at one time using the wildcard function was reported to be around 2000. A week later someone contacted me with around 3000 files to be converted. A few weeks after that someone was claiming 7000. If you'd like to claim a higher number, let me know.

Theoretically the only limit is your disk space. The program operates on a flat memory model so that the memory used is largely independent of the number of files converted, or the size of the files being converted.

Such conversions are a testament to the program's stability and efficient use of system resources. That said if possible we recommend you break the conversion into smaller runs you reduce your risks :-)


3.12 File splitting

3.12.1 Why isn't file splitting working for me?

The program can only split into files at headings it recognises (see "how does the program recognize headings?"). You first need to check that the program is correctly determining where the headings are, and what type they are.

Headings can be numbered, capitalised or underlined. To tell if the program is correctly detecting the headings

  1. Look at the HTML to see if <H1>, <H2> etc. tags are being added to the correct text.

  2. If the headings are wrong, check the analysis policies are being set correctly. If necessary set them yourselves under

Conversion Options -> Analysis policies -> headings

Once the headings are begin correctly diagnosed, you can switch on file splitting using the policies under

Conversion Options -> output policies -> file generation

Note that the "split level" is set to 1 to split at "chapter" headings, 2 to split at "chapter and major section" headings etc.

Underlined headings tend to start at level 2, depending on the underline character (see "How do I control the header level of underlined headings?")

Hopefully this will give you some pointers, but if you still can't get it to work, please mail a copy of the source file (and any policy file you're using) to info<at>support.com and I'll see what I can advise.


3.13 Miscellaneous questions

3.13.1 How do I suppress the Next/Previous navigation bar when splitting a large document?

Prior to version 4 there was a bug which meant the policy "Add navigation bar" was being ignored when splitting files (the only time it was used). This is now fixed.

However also available in version 4 is a new "HTML fragments" feature that allows you to customize some of the HTML generated by the software. This includes the navigation bars so that, for example, if you wanted to suppress just the top navigation bar, you could define the fragment NAVBAR_TOP to be empty.

See "customizing the HTML created by the software" and the "Tag Manual" for more details.


3.13.2 Why am I getting regions of <PRE> text?

The software attempts to detect pre-formatted text in your files and, when it finds some, attempts to turn these into tables. In many cases having detected some pre-formatted text it recognises that it cannot make a table and so resorts to using <PRE>...</PRE> markup instead (in RTF is uses courier font), giving a "mal-formed table" error message.

These <PRE> sections actually work quite well for some documents, but in other cases they would be better not handled this way.

Happily the solution is simple. On the menu go to

Conversion Options -> Analysis policies -> What to look for

and disable "pre-formatted regions of text".


3.13.3 Do you have a html-to-text converter, rtf-to-html converter etc?

No.

My converters convert from plain ASCII text into HTML or RTF. Their "unique selling point" is that they intelligently work out the structure of the text file.

However other people provide other converters.

There are a number of html->text converters on top of which Netscape has a good "save as text" feature. Or you can import the HTML into Word and use Word's save as text features (although in my opinion these are inferior to Netscape's).

If you visit my ZDNet listing at http://www.hotfiles.com/?000M96 and click on the "related links" you'll see a number of converters listed.

There are at least two RTF-to-HTML converters called RTF2HTML and RTFtoHTML and of course Word for Windows offers this capability (it doesn't suit everyone though).

In fact, here are four products:-

RTFtoHTML can be found at http://www.sunpack.com/RTF/
RTF2HTML can be found at http://www.xwebware.com/products/rtf2html/
RTF-2-HTML can be found at http://www.easybyte.com/rtf2html.com
IRun RTF conveter (free) can be found at http://www.pilotltd.com/irun/index.html Yet another Word convetrter can be found at http://www.yawcpro.com/



Previous page Back to Contents List Next page

Valid HTML 4.0! Converted from a single text file by AscToHTM
© 1997-2003 John A Fotheringham
Converted by AscToHTM