Text Encoding | Decoding

A standard practice when creating URL's is to encode special characters (high-level ASCII) and spaces to their hexidecimal equivalents. For example, spaces in URL's are converted to: %20

The following sub-routines can be used to encode text in a variety of settings.


Character Encoding Sub-Routine

This sub-routine will encode a passed character. It is called by the other sub-routine examples and must be included in your script.

encode_char("$")
--> returns: "%24"

Click to open example in the Script Editor applicationA sub-routine for encoding high-ASCII characters:
 

on encode_char(this_char)
 set the ASCII_num to (the ASCII number this_char)
 set the hex_list to {"0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "A", "B", "C", "D", "E", "F"}
 set x to item ((ASCII_num div 16) + 1) of the hex_list
 set y to item ((ASCII_num mod 16) + 1) of the hex_list
 return ("%" & x & y) as string
end encode_char


Text Encoding Sub-Routine

This sub-routine is used in conjunction with the encoding characters sub-routine to encode spaces and high-level ASCII characters (those above 127) in passed text strings. There are two parameters which control which characters to exempt from encoding.

The first parameter: encode_URL_A is a true or false value which indicates to the sub-routine whether to also encode most of the special characters reserved for use by URLs.

In the following example the encode_URL_A value is false thereby exempting the asterisk (*) character, which has a special meaning in URL's, from the encoding process. Only spaces and high-level ASCII characters, like the copyright symbol are encoded.

encode_text("*smith-wilson© report_23.txt", false, false)
--> "*smith-wilson%A9%20report_23.txt"

In the following example the encode_URL_A parameter is true and the asterisk character is included in the encoding process:

encode_text("*smith-wilson© report_23.txt", true, true)
--> "%2Asmith%2Dwilson%A9%20report%5F23%2Etxt"

In the following example the encode_URL_B is false, thereby exempting periods (.), colons(:), underscores (_), and hypens (-) from encoding:

encode_text("annual smith-wilson_report.txt", true, false)
--> "%2Aannual%20smith-wilson_report.txt

Click to open example in the Script Editor applicationA sub-routine for percent encoding strings:
 

-- this sub-routine is used to encode text
on encode_text(this_text, encode_URL_A, encode_URL_B)
 set the standard_characters to "abcdefghijklmnopqrstuvwxyz0123456789"
 set the URL_A_chars to "$+!'/?;&@=#%><{}[]\"~`^\\|*"
 set the URL_B_chars to ".-_:"
 set the acceptable_characters to the standard_characters
 if encode_URL_A is false then set the acceptable_characters to the acceptable_characters & the URL_A_chars
 if encode_URL_B is false then set the acceptable_characters to the acceptable_characters & the URL_B_chars
 set the encoded_text to ""
 repeat with this_char in this_text
 if this_char is in the acceptable_characters then
 set the encoded_text to (the encoded_text & this_char)
 else
 set the encoded_text to (the encoded_text & encode_char(this_char)) as string
 end if
 end repeat
 return the encoded_text
end encode_text


Text Decoding Routines

The following sub-routines can be used to decode previously encoded text:

Click to open example in the Script Editor applicationthis sub-routine is used to decode a three-character hex string
 

A sub-routine for decoding a three-character hex string:
on decode_chars(these_chars)
 copy these_chars to {indentifying_char, multiplier_char, remainder_char}
 set the hex_list to "123456789ABCDEF"
 if the multiplier_char is in "ABCDEF" then
 set the multiplier_amt to the offset of the multiplier_char in the hex_list
 else
 set the multiplier_amt to the multiplier_char as integer
 end if
 if the remainder_char is in "ABCDEF" then
 set the remainder_amt to the offset of the remainder_char in the hex_list
 else
 set the remainder_amt to the remainder_char as integer
 end if
 set the ASCII_num to (multiplier_amt * 16) + remainder_amt
 return (ASCII character ASCII_num)
end decode_chars

Click to open example in the Script Editor applicationA sub-routine for decoding multiple encoded characters:
 

-- this sub-routine is used to decode text strings
on decode_text(this_text)
 set flag_A to false
 set flag_B to false
 set temp_char to ""
 set the character_list to {}
 repeat with this_char in this_text
 set this_char to the contents of this_char
 if this_char is "%" then
 set flag_A to true
 else if flag_A is true then
 set the temp_char to this_char
 set flag_A to false
 set flag_B to true
 else if flag_B is true then
 set the end of the character_list to my decode_chars(("%" & temp_char & this_char) as string)
 set the temp_char to ""
 set flag_A to false
 set flag_B to false
 else
 set the end of the character_list to this_char
 end if
 end repeat
 return the character_list as string
end decode_text