Example hyperlink detection by AscToHTM

This file demonstrates the ability of AscToHTM and AddLinx to convert URLs.

The HTML version of this file has been converted from this source file by AscToHTM.

Contents of this file

New Top level domains
Newslinks
Email addresses
Hyperlinks
User Hyperlinks
Things we can't do (yet)
Using Policies to tailor the conversion (AscToHTM only)

New Top level domains

ICANN have added 7 new TLDs. At I guess we should soon be able to visit the following sites.

www.microsoft.info
www.microsoft.museum
www.microsoft.aero
www.microsoft.coop
www.microsoft.name
www.microsoft.pro
www.microsoft.biz

Newslinks

With "news://" in front

news://msnews.somewhere.com/somewhere.public.internet.mail news://news.mozilla.org/
news:jaf.whatever

With "snews://" in front  
snews:netscape.bugs ! from a secure server

Without "news://", only those groups in alt., comp. etc are converted...

alt.answers

alt.comp.os  
comp.infosystems.www.authoring.tools ! may give error cos of "www"
uk.telecom ! rejected 'cos uk not recognised


inside a table  
  alt.answers FAQs for the alt. hierarchy
  news.answers FAQs for the news. hierarchy
  comp.answers FAQs for the comp. hierarchy
  comp.os.vms VMS discussion group
  comp.risks Risks discussion group


Email addresses

various surrounding punctuation

user@your_domain_name.com,user@your_domain_name.com,user@your_domain_name.com, user@your_domain_name.com, user@your_domain_name.com, user@your_domain_name.com,

user@your_domain_name.com. user@your_domain_name.com: [user@your_domain_name.com] <user@your_domain_name.com>

mailto:user@your_domain_name.com.
<mailto:user@your_domain_name.com> mailto:user@your_domain_name.com.
mailto:mx%"user@your_domain_name.com"
user@your_domain_name.com;roy@your_domain_name.com

rejects

%something@your_domain_name.com ! "%" at start
a@b.c.d ! too short
12334.dsadasda@hotmail.com ! begins with a number (can be switched on)
me@there ! invalid domain name (too short)
newsgroup alt. ! incomplete
newsgroup "news." ! incomplete
user@your_domain_name.com@yrl.co.uk ! 2 "@"s
(@.co.uk) ! too short


By default "addresses" beginning with numbers are ignored because

wrote in message <3816A71C.958F366B@gtech.com>
news:38154FA8.7BE4B743@gtech.com...

from a usenet article would give false links. You can toggle this behaviour.


Hyperlinks

www.yrl.co.uk

http://ourworld.compuserve.com/homepages/NWF/
www.i.cz ! minimal length site name
www.jafsoft.com:8080/ ! contains port number
http://www.jafsoft.com:8080/ ! contains port number
http://www.jafsoft.com:8080/jaf ! contains port number
http://www.jafsoft.com:8080/jaf:.html ! contains ":" in url

inside brackets

(http://www.somewhere.com/)
(http://www.somewhere.com)
(www.somewhere.com)
(www.somewhere.com).
<http://www.somewhere.com>


<http://www.slashdot.org>;
<http://www.slashdot.org/>;
<URL:http://www.somewhere.com>

[http://www.somewhere.com]
"http://www.somewhere.com/"
"http://www.somewhere.com"
"www.somewhere.com"
"(www.somewhere.com)"



Complex domains

http://username@18.69.0.44/
http://username:password@18.69.0.44:port/ http://username:password@18.69.0.44:8080/ http://username@306511916/


with numbers

http://123.123.123.55/whatever.html    
http://999.123.123.55/whatever.html ! rejected (999)
http://123.123.55/whatever.html ! rejected (too few numbers)
http://123.aaa.123.55/whatever.html ! rejected (aaa)
http://306511/ ! number too small
http://10651191600/ ! number invalid

IP addresses and obfuscated domain names

http://216.246.17.205/
http://3640005069/
http://7934972365/
http://0330.0366.0021.0315/
http://%6c%6f%63%6b%65%72%67%6e%6f%6d%65%2e%63%6f%6d/

from a secure server

https://www.jafsoft.com/

URLs with commas and inside comma separated lists

Here's a URL with commas in it..

<URL:http://www.news.com/News/Item/0,4,21084,00.html>

...but this is a comma separated list of URLs

http://www.news.com/News/Item/,www.jafsoft.com,www.jafsoft.com www.jafsoft.com,www.jafsoft.com,www.jafsoft.com,www.jafsoft.com,www.jafsoft.com,

...as is this, although this has spaces as well

http://www.news.com/News/Item/, www.jafsoft.com, www.jafsoft.com

... and here's a comma and space separated list of URL's with commas in.

http://www.news.com/News/Item/0,4,21084,00.html, http://www.news.com/News/Item/0,4,21084,00.html

URLs with brackets an "URL" added to them.

URL:www.jafsoft.com
<URL:www.jafsoft.com>
<www.jafsoft.com>

ftp links

ftp://www.somewhere.com/ ! explicit link
ftp.somewhere.com ! semi-explicit link (ftp.)
ftp://user@your_domain_name.com/ ! ftp with username
penguin.mit.edu ! very weak implicit link. Can toggle policy to get this working

penguin.mit.edu ! (same, with policy switched on)


mistyped URLs

http:/www.somewhere.com/
ftp:/www.somewhere.com/
https:/www.somewhere.com/

Invalid URLs (invalid domains)

www.somewhere
www.somewhere.con
www.somewhere.com.xx
www.somewhere.co.zz

Rejects

*.excite.com ! rejected. Contains a wildcard
www.com ! rejected. Domain name too short
do...this ! rejected. "..."
do..this ! rejected. ".."


a.b.c.d.e.com

*.excite.com ! rejected. Contains a wildcard
www.com ! rejected. Domain name too short
www.gozilla ! rejected. Invalid domain name ending

http://yrj/index.html ! invalid domain, but possible Intranet link, so you can toggle this http://yrj/index.html ! "check domain name syntax" policy disabled


User Hyperlinks

AscToHTM supports a tagging system, that allows you to add your own hyperlinks. Example include

AscToHTM home page
Go to Netscape's home page

Check the source file to see how these are configured.


Things we can't do (yet)

URLs split over two lines...the line break is interpretted as a space.

http://www.news.com/News/Item/
042108400.html>

http://www.boston.com/dailyglobe/globehtml/193/ Post_office_delivers_new_codes.htm


Using Policies to tailor the conversion (AscToHTM only)

You can use policies to configure certain ascpects of the URL detection process. This can be toggled in the source file be using the $_$_CHANGE_POLICY preprocessor command.

Here's an example of treating the newsgroup "uk.telecom" (which is not in one of the main 7 newsgroup hierarchies).

--- (recognised groups switched off) ---
  uk.telecom
demon.local, uk.games
! rejected because uk.* not recognised
--- (switch on uk newsgroups) - --

Add a line in the source to "change policy" so that "uk." is a recognised USENET hierarchy. e.g.

$_$_CHANGE_POLICY Recognised USENET groups : uk demon

This change could be made globally via the policy file. Now the conversion gives the following results:-

uk.telecom
demon.local, uk.games
! accepted because uk.* now recognised

--- (switched off again) ----


Add a line in the source to "change policy" again back to the default

$_$_CHANGE_POLICY Recognised USENET groups : none

and we're back to the default behaviour

uk.telecom ! rejected again because uk.8 recognition switched off again
demon.local, uk.games




 

home - contact us - news - product index - search this site

Converted by AscToHTM