HTML+ - An HTML Babelfish Translator
By Chris Vavruska
Freeware - Copyright 2020
Table of Contents
- Goals of HTML+
- What importing can and can't do
- Styles changes
- HTML+ has Style
- Why all the Extra Blank Lines?
- How Tags are Handled
- Import Options
- Remove Blanks Before Text
- HTML Tags
- Unknown Tag Action
- Why is my document blank or missing text?
- Export Options
- Reporting Bugs
- The Future of HTML+
- Undocumented Features
- Special Thanks
Files formatted for modern computers do not necessarily mesh well with a machine/operating system that was developed in the 80's. Back in the early part of the 1990's a company name Seven Hill Software Corporation created a utility called Babelfish that allowed you to convert from files file from one format to another through the use of translators. The translators system allowed developers to create their own translators to expand on the number of file formats that Babelfish could understand. We live a time where the World Wide Web (WWW) is not going anywhere. Most people use the WWW to communicate whether it be person to person or to distribute information to the masses. You most likely learned about HTML+ from some site on the WWW. This is where HTML+ comes into play. Babelfish already had a translator that could export HTML written by Richard Bennett. This meant that if you wrote a document on your trusty Apple IIgs you could export it to a document that could be understood by computer all across the WWW.
But what if you wanted to import a HTML document so you could edit it on your Apple IIgs? This is where HTML+ comes into play. It not only exports HTML documents like the original HTML translator it also imports HTML documents. In simple terms, HTML+ is Babelfish translator that allows you to easily convert documents that were written in HTML into a format that can be edited on the Apple IIgs and back again.
Goals of HTML+
Remember, most of the Apple IIgs editors use a control called TextEdit to allow users to enter and edit text. It does not have the largest feature set so it is not possible to translate HTML to TextEdit and have it look exactly the same. The main goal of HTML+ is to import and export HTML documents that when displayed will look as close as possible to the original within the limitation of a TextEdit window/control. This means that the HTML source that HTML+ produces during an export will most likely not even look close to the source of the original file, unless of course the file was originally created using HTML+.
Since HTML+ is a Babelfish translator it is required that Babelfish be installed. If you don't already have it then you can get Babefish from http://speccie.uk/software/babelfish. To install HTML+ you need to copy the file "htmlPlus" from the HTML+ archive to the :System:SSH.Babelfish folder of your boot drive/partition. After copying the translator you must reboot your system so that Babelfish will recognize HTML+.
Selecting a file to import is easy. Babelfish has it's own file dialog for selecting which file to import. If you navigate to a file that end with .html or htm then the translator defaults to HTML+. If the file is a text file then it dives a little deeper into the file to determine if it is a HTML file. If this test succeeds then HTML+ is also selected.
Importing an HTML file is done by parsing the source looking for known HTML tags and then translating them into a format that Babelfish understands. Babelfish receives this data and passes it onto something else (another translator or application) which then can translate it into another formatter. Babelfish comes with a New Desk Accessory called "Convert File" to make converting files easy. Another way to convert files is to use the application Spectrum written by Ewen Wannop. By use of the Babelfish sub-menu (found on a File main menu) you can import or export HTML files.
Also included with the HTML+ package is a finder extension HTML View which I also wrote. HTML View allows you to view HTML files right from the finder.
Importing involves parsing the HTML code. It is very time consuming process. Be patient when importing as it can take a bit of time.
What importing can and can't do
HTML formats text by using tags. These tags allow text to look mostly the same on different computers or even different browsers on the same computer. The Apple IIgs TextEdit control is very limited to what it can display so only a subset of HTML tags are translated to change the formatting of a documents text. The following is a list of tags that HTML+ translates. Some of the tags listed have been deprecated or they are used infrequently but needed to be added for completeness.
b, strong, em, i, u, br, tab, pre, h1-h6, font, p, div, span, tt, big, small, var, samp, kbd, code, ul,
ol, li, dl, dt, dd, tr, dir, basefont
I wont go into what every tags does. If you want to know what one does that I don't explain you can find information on the web.
b, strong, em, i, and u change the formatting of text. There are other HTML tags that change the formatting of text but they are not supported by the TextEdit control so they are ignored.
HTML allows the changing of color of text. The following colors are supported.
black, blue/navy, dark green/olive, dark gray/gray, red/maroon, lilac/purple,
orange, pink/fuchsia, green/lime, aqua/cyan, light green/chartreuse,
light blue/teal, light gray/silver, cornflower blue, yellow, white
You may also specify colors using a color code. The TextEdit window only support 16 colors so it will try and find the best match for a color during importing. Not all colors display text clearly at smaller sizes so the use of them may result in un-readable text.
The font tag can be used to change the font face, size, style and color of text by use of the following attributes.
face : fontname
size : 1-7 (9, 10, 12, 14, 18, 24, 32)
color, bgcolor : color name
The use of the bgcolor attribute is an Apple IIgs only attribute. HTML changes the entire background of a text block while the TextEdit control only changes the background of the text.
The default system font for the Apple IIgs is Shaston. System software includes the shaston.8 and shaston.16 fonts. The first font size used with importing html is 9. To better handle importing and exporting when using the system font HTML font sizes 1-3 will use shaston.8 and 4-6 will use shaston.16. There is a shaston.32 available somewhere so if you have that html size 7 will use that.
HTML+ has Style
While HTML+ does not support CSS Style sheets. It does understand certain style attributes for the font, p, div and span tag. The following attributes are translated: font-size (pt and em), color, background-color, font-family, text-outline, text-shadow, font-weight (bold only) and font-style (italic only).
The following example would translate into blue text with a 24pt font.
<span style="color: blue; font-size: 24pt;">Blue text, 24pt font</span>
Attributes not listed above are lost when importing.
Why all the Extra Blank Lines?
Since HTML+ cannot translate all tags there are certain tags that are ignored. This is because they do not translate to TextEdit data. Skipping a tags and having a <BR> or <P> in between tags can add additional blank lines.
For example, the input tag is not translated so translating the following HTML source:
Click your choice
<input type="button" name="button1"><br>
<input type="button" name="button2"><br>
Would produce the following output:
Click your choice
How Tags are Handled
Translate - Tags that are handled internally by HTML+. You can set these tags to be handled by one of the other methods but you cannot set non-translatable tags to be translated
Keep - When it finds one of these tags the HTML is imported as is. Useful if you want to edit a web page with non-translatable tags and then export it later.
Strip - When if finds one of these tags it removes the tag as if it never existed but keeps the internal data. For example, the <table> tag you need the table but you still want the <tr> and <td>
Remove - Removes the tag and and all the enclosed data. For example, <script>....</script>
Replace - Replace a tag with a blank line. The only tag that uses this is the img tag. This is useful to keep formatting similar to that outside the IIgs.
HTML+ has a pre-defined set of rules on how which method to use for each tag. See the section on Import Options on how to change the methods to suit your needs.
Not all web pages are created equally when translating into a format the Apple IIgs can handle. Certain tags don't translate well or make no sense in a document. Due to these reasons HTML+ allows you to control how the import is handled.
The first section of the import options allows you to specify which font and size to use. HTML documents don't usually specify a font face but in a document you may want more than the standard system font Shaston. Shaston looks good on the screen but only comes in two sizes, 8 and 16. The default font face for HTML+ is Helvetica. It comes with System 6.0.x and contains most of the necessary font sizes for importing HTML. If you have the SIS fonts installed then you will be given the option to use SIS fonts as the default font face. If you select to use the SIS fonts then SIS-1 will be used for proportional fonts and SIS-3 will be used for tags that require a monospaced font, such as the PRE tag.
The Default Size allows you to specify which font size should be used when importing.
Remove Blanks Before Text
The top of a HTML document usually contains header information such as a graphic or other tags that might create a lof of blank lines. The "Remove Blanks Before Text" option allows you to remove these extra blanks automatically. Any blank lines found after the first readable text in the document remain untouched.
HTML+ comes with a default set of rules of how to handle each tag, see the section "How Tags are Handled" to learn how they can be handled. In this section you are given a list of all the tags that HTML+ knows about. Here you can override how a tag is handled for your specific situation. If an action is overridden for a tag the the action will be displayed in italics. To return an action back to it's default just click the default button next to the tags
Unknown Tag Action
If a web page your are importing contains an HTML tag that HTML+ does not have in it's list then Use the "Unknown Tag Action" to tell HTML+ how to handle these tags. The default is "Strip" but you might want to change it if you find it importing more than you want.
Why is my document blank or missing data?
While HTML+ is importing it encounters a tag that it doesn't know about or maybe even a tag it does know about it is possible that it might be removing the internal data of the tag. Check the import options to see how the tag is handled. If the offending tag is set to "Remove" then try setting it to "Strip". Strip will remove the tag but leave the contents to be imported.
If you are not sure which tag is causing you issues try turning on the import debug feature. Turning on a the debug feature is done by creating a file named 'htmlplus.debug' in the root of your boot drive. If this file exisits then HTML+ will dump out some addtional information about the tags it encounters while importing. There is no debug information dumped during exporting.
What use is importing HTML if you can't export it as well? That's right HTML+ can also export HTML.
HTML+ tries to export HTML tags that will display the same on an Apple IIGS or a more modern browser. The HTML code that it exports will most likely look nothing like the HTML code that was imported. The reason for this is that HTML+ can be told how to translate many tags into text that Babelfish can understand but it is hard to go back the other way since text edit only retains certain properties on text. For example, HTML+ can extract the data from a table, but there is no way to know that text being exported was in a table form. This could change some day in the future if someone created a Babelfish translator to import Appleworks GS or GraphicWriter III file formats.
The HTML source exported from HTML+ uses a limited number of HTML tags to display the data. The reason for this is that HTML+ scan the text and formatting for patterns using set of predetermined rules. If you follow these rules when creating a document then HTML+ will use more tags providing cleaner HTML. The next sections will reveal these rules.
The normal font styles such as Bold, Italic and Underline will produce the appropriate tag (b, i, u). Whenever the style changes the tags will be outputted to reflect those changes.
In order for HTML+ to use the H tags the source must follow the following rules. The text must be on the first column, text must be bold, it must be preceded by a blank line and it must not be in the default font size (see export settings below)
HTML+ can export ordered and unordered lists. To determine if a list should be used the text must be preceded by a blank line, the text must be preceded by 8 spaces or a single tab character. If the first characters are a number then an ordered list will be used if the first character is a ¥ ($A5) or ¡ ($A1) then an unordered list will be used. To end the list a blank line. If you are currently in a list and you want to nest a list then start a line with an additional 8 spaces or a single tab. The following example will be exported as an ordered list.
- List item 1
- List Item 2
- Nested List Item 1
- Nested List Item 2
- List Item 3
You can even mix ordered and unordered lists when nesting them.
The pre tag is used for preformatted text. Preformatted text use the monospaced font Courier. As will the other tags listed above the text must contain a blank link before the text and must be use the font face Courier. To end preformatted text a blank link is needed at the end.
The Babelfish export options allows you to customize how HTML+ exports html.
In the first section you get to choose which font and font size will be the default font. Choosing the correction option cannot only reduce the number of tags generated when export HTML but it will also aid in importing html in the future. When a document is exported there will be a basefont tag inserted into the header section of the document. The next time you import an HTML document this basefont will be used as the default when importing.
Selecting the "Use Selected Font as Default" will use this font face in the basfont tag. This will not override the fonts used inside the document.
Selecting the "Use SIS Fonts as Default" option will use SIS-1 as the default font. Preformatted text will use the SIS-2 font rather that Courier. This option will only be available if you have the optional SIS fonts installed in your system font folder.
Selecting the "Use First Font as Default" option is useful if you want to make the default font be whatever font and size you first used in the document you are converting. Setting the font size is not available if you choose "Use First Font as Default"
The "Keep Known HTML Tags as Tags" option allows you to embed HTML tags inside a document with out having the converted. For Example, Normally if you inserted a <b>test</b> into a TextEdit document and exported it using HTML+ it would be exported as <b>test</b> when you really wanted to have it exported as it was typed. Displaying the HTML in a browser would show you <b>test</b> with this option turned off and test with it turned on. This is very useful if you want to export tags that HTML+ can't convert to something useful in a TextEdit window.
I am sure there still a number of bugs in both the importing and export of HTML documents. If you find a bug let me know through email at email@example.com. Please include a small document that reproduces the issue so that I can fix it as soon as possible.
The Future of HTML+
This release contains quite a number tag translations but it is no where near a complete translation of HTML and it's attributes to the Apple IIgs. I plan to continue to work on adding additional functionality. If you find something that would be helpful when using HTML+ let me know and I will try and priortize your request.
An undocumented feature is sometimes a nice way of describing a bug, but in this case it mean that HTML+ may also handle formating not listing in this documentation. I am quite sure I have forgotten to document everything the HTML+ does.
Special thanks goes out to Ewen Wanop for hosting HTML+ on his web site and for his pushing me and testing to get HTML+ to where it is today. I would also like to thank Kelvin Sherlock for fixing a deficiency with the Rez compiler. The Rez compiler only reads in 32K code blocks and the main guts of the HTML+ translator is a bit bigger. He was able to patch the Rez compiler through his Golden Gate tool which allowed me to get HTML+ to where it is today.
HTML Tool & SIS Fonts
More of my software can be found at http://speccie.uk/software/