Convert Text Files to HTML documents HTML Help file

      Program version 1.0, July 1st, 2001

      'Convert Text Files to HTML' is a simple to use Windows program that takes any plain text file generated by a program such as Note Pad and converts it to an HTML document.  If you create your text file in a word processor you should save it as plain text before attempting to input it to this program (word processing software stuffs a lot of extra information specific to the particular format being used by the word processor into each document, and will not output properly as an HTML file for that reason.) Download link - > text2html.zip 270 KB


      Normally, the way I have been making the pages on my website is that I prepare the text of the document as a text file in a word processor, and then I go through the document manually adding the HTML tags.  This program automates the task, and, as only a computer can, it takes a job that would take a human like myself a long time to do and does it in about a second.

      On this site I am posting commentaries on the Bible, analysis of the various source traditions that compose the Bible as well as the editing (redaction) done on these documents  in later times.  Because the conclusions of Biblical scholarship can be controversial I have found it good to have an HTML Bible posted on my site to which I can cross reference and thus help to make any points I need to make.  Now the HTML Bible started out as plain text files, and in the end consists of almost 1300 HTML files ( a file for each chapter, plus index pages for each book).  No human being on earth could stand to do that much HTML coding.  For an example of computer generated HTML you can visit my HTML Bible pages.  As well I have been posting Delphi Pascal source code on my site, and I have found this process to be error prone, (the first three links were generated manually) and since the source code is also text, from this time on these postings will also be automated, as will all the new pages on my site.  If you have text files from scanned documents you would like to convert to HTML, or just a lot of writings in the form of text or word processor documents that you would like to convert, this program automates the task.  The text from Kersey Graves book (all 45 pages) was run through this program in about 2 seconds, as just one example (on an 850 mhz computer).

      The program is a simple executable.  You can save it in any folder, anywhere on your computer, and simply click on the program to run it.  Since the program uses no DLLs and does nothing to alter the registry or Windows system Files, there isn't much point in using an installer program.  All the installer would do in this case would be to drop the program into a folder and then put a link on the Windows start menu.  If you decide to use the program, and you like the convenience of having a link on the Windows start menu, you can visit my page explaining how to add and remove items from the Windows Start Menu, and you can also get a few tips on how to organize your start menu so that instead of being a mile long it is a couple of inches high with everything tucked neatly into folders.  

      A shortcut is not required to use this program.  Click on the program file.  When it launches, you are presented with a screen prompting you to load the text file (or files) that you wish to convert to HTML documents.  There are four options you can choose from.  

ConvertText files to HTML documents screen shot of title page mainpage.jpg - 38972 Bytes

      The first option is to load a sequence of numbered text files (for example, file001.txt, file002.txt  or 001file.txt, 002file.txt or even just 0001.txt, 0002.txt).  If you have scanned some old book, or your material can be logically arranged in chapters, this option is useful.  If you choose this option by clicking on the circle next to it, you will then be presented with a Load File Box, and you can then click on the first of the sequence of numbered files.  The program will autoload all the others files in the sequence.

      The second option will load one or more text files, and if there is more than one loaded, it will sort them into alphabetical order.  If you need to link files in alphabetical order this is the option to use.  When you press the Open button, you will be presented with a Load File Box, and you can choose multiple files one at a time by holding down the Control (CTRL) key and clicking each file in sequence, or you can batch load rows of files by holding down the SHIFT key and clicking the first and last files.

      The third option works just like the second described above, with the only difference being that the files will not be sorted in alphabetical order, and if multiple files are chosen they will be converted randomly to HTML. (On the Properties page, described below, you can elect to link pages together in sequence or generate stand alone pages.)  If you are generating stand alone pages (or linked pages where ordering is not important), this option would be suitable.

      The fourth option allows you to choose multiple files and arrange them in the order you specify.  If you choose to link the files on the Properties page, the files will then be linked together in an order you have chosen, independent of the numerical or alphabetical sorting methods described above.  

      When you have chosen one of the four options above, you then press the Open button to load the files.  The file list will show up in the window at the bottom of the screen.  If you wish you can click on files and press Delete to remove them from the list.

      Pressing Cancel will erase all the files from the list, so you can start over.

      Pressing Exit will shut down the program.

      Pressing Save will take you to the next step in the program.

      If you have chosen to convert only one text file to HTML you will bring up a standard Windows Save File Box when you press the Save button, and you can then decide on the folder in which you wish to save the file, or you can click the icon on the box to create a new folder, and you can use the default name (the name of your text file) or any other name you would like to give to the finished HTML file.

      When naming HTML files you should keep in mind that some servers do not accept spaces in HTML file names or folder names.  So 'tabby cats food.html' may be a valid Windows name, but it may generate an error in HTML, depending on the configuration of the server.  Better would be tabby_cats_food.html  (substitute underlines for the spaces).

Save options dialogue box screen shot saveopt.jpg - 13252 Bytes

      If you have chosen more than one HTML file you will be given the choice of either batch processing the text files using the original names of the text files (poodle.txt would become poodle.html) or you can choose to be prompted to give a name to each file as it is being converted to HTML.  

directory to save html batch files dir2save.jpg - 16795 Bytes

      Next a Window will come up only if you are saving more than one HTML file and you will be prompted to open the folder into which you would like to save the list of converted HTML files.  You have the option of Creating a New Folder.  When you have chosen the folder for the multiple save, press OK to move to the next step.

      The final step before the files are converted to HTML is a stop at the HTML properties page.  This consists of a tabbed notebook with 6 pages.  

choosing the format of the HTML file format.jpg - 67682 Bytes

      On the First page you can check or uncheck four check boxes.  You can choose to Indent paragraphs.  You can justify text (much like a newspaper column the text will be even along all edges rather than ragged).  You can use a margin on the page.  The fourth choice converts a number of special characters to HTML.  While these characters are fine in a text file they create problems in HTML and must be coded to work properly.  The special characters that will be converted in the output HTML files are

                      <  less than

                      >  greater than

                      &  the ampersand

                      ­   the hyphen

                      "   quotes

                          tabs

                          and any multiple blank spaces (HTML will only output one blank space

                               and will simply ignore the rest)


      If you do not have any HTML code in your input file, then leave this checkbox checked.  If your input text file includes HTML code then uncheck this check box to avoid altering the HTML code.

HTML link options and code insert page linkopt.jpg - 55641 Bytes

      The second page on the properties menu includes a check box to create a table of contents page with links to each in a series of files (if you will be outputting multiple HTML files.)  You can give the Table of Contents page an optional Title in the Edit box.  You can also include a  link back to your home page.  Next you have the choice of linking the files together, one page leading to the next, or you can generate stand alone HTML pages. An important note: if you link the files together they must remain together in the same folder (in this version of the program) if the links are to work. For example, if you linked a file named dog to a file named cat and then moved the cat file into a cat folder and the dog file into a dog folder, the links would no longer work.

      The two checkboxes and two boxes at the bottom of the page are for pasting (or typing) in any HTML code, Javascripts, banner code, web ring code, and so on that you want to include in the finished page or pages.  The top box will insert the code at the top of the page and the bottom box will insert the code at the bottom of the page. Please remember to check the checkbox if you will be inserting code into either of these two boxes. If the checkbox is not checked the code will not be inserted.

metatag.jpg - 82620 Bytes

      The third page gives you the option of entering Meta Tag information for your HTML file.  You can select the HTML Title (which appears along the top of your browser and is considered important for ranking pages in Search Engines).  The description meta tag is also important for some search engines, and here you briefly describe the subject matter of your page in one or two lines.  Next you can enter keywords for your page.  Keywords can consist of single words or search phrases separated by commas.  Your keywords should be relevant to the content of your page or you might be penalized by certain search engines.  As a general rule your keywords should appear in the first paragraph of your text file and then again in a summing up in the last paragraph, and it doesn't hurt to have keywords included in the text of your HTML links (the text people click on the visit an HTML page).  

      You can choose from two options.  The second option allows you to enter the meta tag information in the box at the bottom of the window.  If you choose this option, in order for the program to interpret the data correctly you must enter the information in the following order :  

                      Title ­ this appears along the top edge of your browser

                      A one or two line description of the page

                      Keywords separated by commas

                      Header ­ this page title appears at the top of your browser window

      If you use this box the information must be entered in the order given to output properly in your HTML file.  For example we could enter:

      Sick Poodles

      An analysis of the causes and cures for various diseases and sicknesses of poodles

      poodle, sick, disease, rabies, poodle foot and mouth disease, tail dropsy

      The diagnosis and treatment of a sick poodle

      Do not include a period after the last sentence in your description as the program outputs a period automatically.  Similarly do not enter a period after your keywords.

      If you choose the first option on the Properties page rather than looking in the box below the check buttons, the program will look for the four meta tags as the first four lines in your text files. If the values do not exist in your text files it will use default values (Untitled, no description, no keywords).  When the Meta Tag information is to be found in the text files four special directives are recognized by the program.  They are

      <T> ­ the title  

      <D> ­ the description

      <K> ­ the keywords seperated by commas

      <H> ­ the header

      These must be entered at the very top of your text file in the order listed, each on its own line followed by a return a carriage (with no spaces between them) for the program to interpret the directives properly. The directives can 'word wrap' onto another line in NotePad or your word processor, and 'a line' is ended when you press the Enter key to go to a new line. What this means is that your description tag could look like four or five lines in NotePad or a word processor, but as long it is ended by a single return carriage (Enter key stroke) it counts as 'a line' as far as using the directives tags in this program is concerned.  (If you enter these directives into your text files, do not click the circular check box to use the directives box as described above or the text file directives will become the first lines of your output HTML file.)

      The same example given above would look as follows if it was entered into a text file.

      <T>Sick Poodles

      <D>An analysis of the causes and cures for various diseases and sicknesses of poodles.

      <K>poodle, sick, disease, rabies, poodle foot and mouth disease, tail dropsy

      <H>The diagnosis and treatment of a sick poodle

      do not enter these special text file tags into the box on the properties page, or they will output as parts of your description, title, keywords or header (they are only needed to signal the program that the values are present in a text file). Note that you can use small or capital letters for the tags (<T> or <t> both work).  If no tags are present in the files and you choose not to enter them into the box on the properties page then one of two things will happen.  Your page will be titled 'Untitled' by a web browser, and assuming it gets found by a search engine, clicking on something called 'Untitled' when your listing is surrounded by titled pages seems less likely.  If you have chosen to click the checkbox to make a Table of Contents page then the title entered into the Edit box next to that checkbox will become the default title of your page. (The title entered into this box will be ignored if the check box is left unchecked.)

      At the bottom of the window there is a checkbox you can use if you want your page titles to include 'Chapter 1'  'Chapter 2' etc.  If your text files are numbered then the number of the text files will become the chapter numbers if this checkbox is checked.

dos formatted text files with hard return carriages return.jpg - 74461 Bytes

      The fourth window is for dealing with DOS formatted text files that have hard returns at the end of each line.  (Text that was copied and pasted can also have these hard returns).  If the check box is checked these hard returns will not be converted into return carriages in your HTML file.  If your file is in Windows format do not check the box, or your page might come out as one long paragraph, which probably isn't what you wanted.  If your HTML file comes out with lines that look 'shortened' this means that you should run it through the program again, this time with the checkbox on this page checked.  In order for the routine that formats the text to work correctly, if this checkbox is checked then your text file must have each paragraph separated by a blank line, as in the example in the screen shot above.  If you check this box and you do not have your paragraphs separated by a blank line, then your HTML file will  output as one long paragraph. This program never alters your text files, and any changes you make affect only the output HTML file (so if you were hoping to use this option to strip the hard returns from some text files it won't have the desired results. However if you uncheck the box to Convert special characters on the first Propertiespage, and then check the box to strip return carriages, and then open the output HTML document in NotePad (choose All Files from the Open Files pop down file type list) you can then copy the formatted plain text out of the HTML document stripped of these hard returns).

font.jpg - 66084 Bytes

      The fifth Window allows you to press labelled buttons to bring up pop up windows to set the color and type and style of the Font to be used on the page, the color of links, and the background color. (Note that in this version character sets such as 'Greek' and so on are not yet supported.)  As you make changes you can see how each color will look against the background you choose.  You can hit the Cancel button to discard the changes and start over.  You can also choose multiple fonts.  If you choose a font that is not installed on a users system, the browser will default to the default font.  However, if you choose multiple fonts, then the browser will move down the list, hopefully finding at least one font on the list that it can use.  If you choose multiple fonts, and the users browser cannot locate them, once again the browser will default to the default font for that users system.

image.jpg - 55998 Bytes

      The sixth page allows you to select a background image.  A two page help file can be brought up if you need some tutoring on how to link to files, such as background images in HTML.  You can bring up this Help file  by clicking the help button. In a future version of this program, if your website folders are mirrored on your hard drive you will be able to link to the background image (and other images, files to link to, etc) by bringing up a pop up window (if the folders on your drive are the same as the folder structure of your website, the links will work correctly when you post the file to your website.) 

      When the HTML properties have all been set, you can press the OK button to create the HTML files.  Pressing Cancel will take you back to the beginning of the program and Exit will shut the program down.

convert.jpg - 14462 Bytes

      The final Window in the program contains a ProgressBar which moves along as your HTML files are being created.  When you press the Begin button the files will be created according to the parameters specified in the previous steps.  Pressing ReStart will take you back to the beginning of the program.  Pressing Cancel will take you back to the Properties page.  Exit shuts down the program.


      In a future version of this program you will also be able to change fonts and font colors at different locations on the page, insert links, pictures, and organize the page into columns, including pictures with wrap around text, in short, everything I want to do in HTML and have been doing manually up to now.  At the moment I have no plans to support frames, because I hate those things, and never use them myself.


      'Beta program' tips

      I have been using this program myself with no problem, and I believe I have tested every possible outcome, but sometimes a program can surprise you, because they are complex things. Since this is the first time I am posting it, and it is version 1.0, I also post the following 'Beta testing' instructions.  If for any reason the program crashes do the following.  Hold down the Control (CTRL) key and the ALT key both at the same time and then press the Delete key once.  This will bring up the pop up menu of running programs.  Click on Text_2_html and then press the 'End Task' button.  More than likely a crashed program will not shut down right away, so wait a few seconds and try it again, until, finally, as Windows always does, a pop up window will come up informing you that the program crashed and is not responding, and you will then, finally, be able to shut it down.  If it does crash you can send me an email, explaining what happened, and what the error message was.  However, what this means is that for some reason I might have missed something about the text file that caused it to crash.  Another text file would work fine.  As I said, I feel that for the benefit of those users who might not know how to shut down a program if it crashes, I will give you these instructions to save you the hassle of waiting while your computer starts up again after being shut down and ScanDisk scans your drives, which can take a while if you have a big drive (as anyone who has ever run Windows would know, which would be all of us  at one time or another I imagine).

      Now as for known issues, there is only one bug in the program that I am aware of, and that I am working on getting rid of  (unfortunately the computer language documentation is proving to be less than helpful at the moment, the 'exception handling routine', which is supposed to work, stubbornly refuses to work, and so its off to Google and a search on the web for some needed information).  A numbered text file cannot have a number greater than around sixteen million.  If the number is greater than 16,777,215 the program as it is currently written will certainly crash and burn.  If you have to convert more than 16,777,215 text files then this program is not for you.  if you have a text file with a name like file20000000.txt do not bother even trying to run it through this program.  Even when the bug is fixed the program will still not process that file, but it would tell you nicely that it wasn't going to process the file and continue on gracefully, rather than crashing on the spot and burning up as it would now if you were to input such a numbered text file in this version.

Download link - > text2html.zip 270 KB

      For future versions of this program, or any other programs that might be posted in the future, visit http://www.awitness.org/software/index.html ...


HOME