HTML Routines

AppleScript scripts are often used to read and write HTML text. The following sub-routines help automate some common tasks involving HTML markup.


Converting RGB to HTML Color

The following sub-routine can be used to convert RGB color values to the format used in HTML documents.

An RGB color is stated as list of three numbers, each with a value between 0 and 65535. The following sub-routine converts those values to 8-bit or 256 color-based values which are then converted to their corresponding HEX values.

To use the sub-routine, pass it a list of RBG values and it will return the HTML code matching the passed RGB color

Click to open example in the Script Editor applicationA sub-routine to convert RGB values to HTML format:
 

on RBG_to_HTML(RGB_values)
 -- NOTE: this sub-routine expects the RBG values to be from 0 to 65535
 set the hex_list to {"0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "A", "B", "C", "D", "E", "F"}
 set the the hex_value to ""
 repeat with i from 1 to the count of the RGB_values
 set this_value to (item i of the RGB_values) div 256
 if this_value is 256 then set this_value to 255
 set x to item ((this_value div 16) + 1) of the hex_list
 set y to item (((this_value / 16 mod 1) * 16) + 1) of the hex_list
 set the hex_value to (the hex_value & x & y) as string
 end repeat
 return ("#" & the hex_value) as string
end RBG_to_HTML

Here's an example of how to call the sub-routine using values from a color picker dialog:

Click to open example in the Script Editor applicationUsing the values from a color picker dialog:
 

set the RGB_value to (choose color default color {65535, 0, 0})
set the HTML_colorvalue to my RBG_to_HTML(RGB_value)


Removing Markup Codes From Text

This sub-routine can be used to remove angle bracket enclosed tags from text passed to the sub-routine.

set this_text to "This is a <B>great</B> time to own a Mac!"
remove_markup(this_text)
--> returns: "This is a great time to own a Mac!"

Here's the sub-routine:

Click to open example in the Script Editor applicationA sub-routine for removing tags from text:
 

on remove_markup(this_text)
 set copy_flag to true
 set the clean_text to ""
 repeat with this_char in this_text
 set this_char to the contents of this_char
 if this_char is "<" then
 set the copy_flag to false
 else if this_char is ">" then
 set the copy_flag to true
 else if the copy_flag is true then
 set the clean_text to the clean_text & this_char as string
 end if
 end repeat
 return the clean_text
end remove_markup


Parsing an HTML File

The following large sub-routine can be used to extract specific tags and their contents from HTML text.

The routine will return all matches of a specific opening and closing tag combination passed to the sub-routine.

There is also a parameter for indicating whether to include the specific enclosing tags with the returned text.

You can use this sub-routine to do the following:

Return All Links in an HTML Document

Pass the file path to the sub-routine as the first parameter. Leave the other settings as shown.

read_parse (this_file, "<A HREF=", "</A>", false)
--> <A HREF="http://www.apple.com/fileA.html">click here</A>
--> <A HREF="http://www.apple.com/fileB.html">click here</A>

Return All Images in an HTML Document

Pass the file path to the sub-routine as the first parameter. Leave the other settings as shown. Note the passed value for the closing tag parameter is a null string (""). The sub-routine is written to pass the results as single tagged elements if the closing tag parameter is null.

read_parse(this_file, "<IMG ", "", false)
--> <IMG SRC="gfx/clipboard.gif" BORDER="0">
--> <IMG SRC="printer_stopped.gif" ALIGN=TOP WIDTH="32" HEIGHT="32" BORDER="0">
--> <IMG SRC="printer_on.gif" ALIGN=TOP WIDTH="32" HEIGHT="32" BORDER="0">

Return All Tables in an HTML Document

Pass the file path to the sub-routine as the first parameter. Leave the other settings as shown.

read_parse(this_file, "<TABLE", "</TABLE>", false)
(*
<TABLE WIDTH="440">
 <TR>
 <TD ALIGN="CENTER" VALIGN="TOP">
 <IMG SRC="gfx/clipboard.gif" BORDER="0">
 </TD>
 </TR>
</TABLE>
*)

Here's the sub-routine:

Click to open example in the Script Editor applicationA sub-routine for extracting tags from an HTML file:
 

on read_parse(this_file, opening_tag, closing_tag, contents_only)
 try
 set this_file to this_file as text
 set this_file to open for access file this_file
 set the combined_results to ""
 set the open_tag to ""
 repeat
 read this_file before "<" -- start of a tag
 set this_tag to read this_file until ">" -- end of a tag
 -- to make up for a bug in the "read before" command
 if this_tag does not start with "<" then set this_tag to ("<" & this_tag) as string
 -- EXAMINE THE TAG
 if this_tag begins with the opening_tag then
 --store the complete tag, not just the search string
 set the open_tag to this_tag
 -- check for single tag indicator
 if the closing_tag is "" then
 if the combined_results is "" then
 set the combined_results to the combined_results & the open_tag
 else
 set the combined_results to the combined_results & return & the open_tag
 end if
 else
 -- reset the text buffer
 set the text_buffer to ""
 -- extract the contents between the open and close tags
 repeat
 set the text_buffer to the text_buffer & ¬
 (read this_file before "<") -- start of a tag
 set the tag_buffer to read this_file until ">" -- end of a tag
 -- to make up for a bug in the "read before" command
 if the tag_buffer does not start with "<" then set the tag_buffer to ("<" & the tag_buffer)
 -- check for the closing tag
 if the tag_buffer is the closing_tag then
 if contents_only is false then
 set the text_buffer to the open_tag & the text_buffer & the tag_buffer
 end if
 if the combined_results is "" then
 set the combined_results to the combined_results & the text_buffer
 else
 set the combined_results to the combined_results & return & the text_buffer
 end if
 exit repeat
 else
 set the text_buffer to the text_buffer & the tag_buffer
 end if
 end repeat
 end if
 end if
 end repeat
 close access this_file
 on error error_msg number error_num
 try
 close access this_file
 end try
 if error_num is not -39 then return false
 end try
 return the combined_results
end read_parse