Last active
September 21, 2023 13:34
-
-
Save JamoCA/6565bd4e2526b7c177a5f0cde3980d1c to your computer and use it in GitHub Desktop.
JUnidecode ColdFusion Demo - Convert Unicode strings to somewhat reasonable ASCII7-only strings then strip diacritics and convert strings.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<cfprocessingdirective pageEncoding="utf-8"> | |
<cfsetting enablecfoutputonly="Yes"> | |
<!--- | |
BLOG: https://dev.to/gamesover/convert-unicode-strings-to-ascii-with-coldfusion-junidecode-lhf | |
---> | |
<cfscript> | |
function JUnidecode(inputString){ | |
var JUnidecodeLib = ""; | |
var response = ""; | |
var temp = {}; | |
temp.encoder = createObject("java", "java.nio.charset.Charset").forName("utf-8").newEncoder(); | |
temp.isUTF = temp.encoder.canEncode(arguments.inputString); | |
if (temp.isUTF){ | |
/* NFKC: UTF Compatibility Decomposition, followed by Canonical Composition */ | |
temp.normalizer = createObject( "java", "java.text.Normalizer" ); | |
temp.normalizerForm = createObject( "java", "java.text.Normalizer$Form" ); | |
arguments.inputString = temp.normalizer.normalize( javaCast( "string", arguments.inputString ), temp.normalizerForm.NFKC ); | |
} | |
try { | |
JUnidecodeLib = createObject("java", "net.gcardone.junidecode.Junidecode"); | |
response = JUnidecodeLib.unidecode( javacast("string", arguments.inputString) ); | |
} catch (any e) { | |
response = "ERROR: JUnidecode is not installed"; | |
} | |
return trim(Response.replaceAll("\[\?\]", "")); | |
} | |
function isDiff(compareArr, val, pos){ | |
return (pos GT arrayLen(comparearr) OR comparearr[pos] neq val); | |
} | |
</cfscript> | |
<cfset TestStrings = [ | |
"ℰ𝒳𝒜ℳ𝓟ℒℰ", | |
"ABC #chr(160)# Café “test”", | |
"北亰", | |
"Mr. まさゆき たけだ", | |
"Łukasiński", | |
"⠏⠗⠑⠍⠊⠑⠗", | |
"What about Ø, Ł or æøåá", | |
"ราชอาณาจักรไทย", | |
"Ελληνικά", | |
"Москвa", | |
"Հայաստան", | |
"čeština", | |
"®™™™©©©Ⓒ½⅓⅔¼¾⅕⅖⅗⅘⅙⅚⅐⅛⅜⅝⅞⅑⅒●⚫⬤", | |
"ÀÁÂÃÄÅÆÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝàáâãäåæèéêëìíîïñòóôõöøùúûüý’“”–…", | |
"Häuser Bäume Höfe Gärten daß Ü ü ö ä Ä Ö ß " | |
]> | |
<cfset CFString = "cfscript"> | |
<cfparam name="URL.testString" default=""> | |
<cfif len(trim(URL.testString))> | |
<cfset TestStrings = listToArray(trim(URL.testString))> | |
</cfif> | |
<cfsetting enablecfoutputonly="no"> | |
<!doctype html> | |
<html lang="en"> | |
<head> | |
<title>JUnidecode ColdFusion Demo</title> | |
</head> | |
<style> | |
.diff {background-color:#ff0;} | |
fieldset:nth-child(even) {background-color:#ededed;} | |
</style> | |
<body> | |
<h1>JUnidecode ColdFusion Demo</h1> | |
<p>by <a href="https://about.me/jamesmoberg">James Moberg</a> / <a href="https://www.sunstarmedia.com/">SunStar Media</a> (February 6, 2019)</p> | |
<p>This is a demo on how to use <a href="https://github.com/gcardone/junidecode">JUnidecode</a> with <a href="https://www.adobe.com/products/coldfusion-family.html">ColdFusion</a> to convert Unicode strings to somewhat reasonable ASCII7-only strings then strip diacritics and convert strings.</p> | |
<p>I've compared this Java library against <a href="https://www.bennadel.com/blog/1155-cleaning-high-ascii-values-for-web-safeness-in-coldfusion.htm">regex</a>, <a href="https://cflib.org/udf/deAccent">java.text.Normalizer</a>, <a href="https://gist.github.com/JamoCA/ec4617b066fc4bb601f620bc93bacb57">ICU4J Transliterate</a> (390k vs 12mb+) and <a href="https://www.codota.com/code/java/methods/org.apache.commons.lang3.StringUtils/stripAccents">Apache.Lang3.StringUtils.StripAccents()</a> (500k) and found it to generate more consistent results while safely converting more characters than other solutions. I've also updated our <a href="https://gist.github.com/JamoCA/fee34a03bbe61a2f8e40">SanitizeFilename UDF</a> to use it.</p> | |
<p><b>Installation:</b> Download the latest <a href="https://github.com/gcardone/junidecode/releases">JUnidecode JAR</a>, place it in your java path & restart your ColdFusion server (or use Javaloader).<P> | |
<p><b>Sample User-Defined Function (UDF):</b></p> | |
<cfoutput> | |
<textarea rows="7" cols="100" style="margin-left:25px;"><#CFString#> | |
function JUnidecode(inputString){ | |
var JUnidecodeLib = createObject("java", "net.gcardone.junidecode.Junidecode"); | |
var response = JUnidecodeLib.unidecode( javacast("string", arguments.inputString) ); | |
return trim(replacenocase(Response, "[?]", "", "all")); | |
} | |
</#CFString#></textarea> | |
<p><b>Usage:</b></p> | |
<p style="margin-left:25px;">JUnidecode(<i>string</i>)</p> | |
<hr> | |
<h2>Form Test</h2> | |
<form action="" method="get"> | |
<input type="text" name="teststring" value="" required placeholder="Enter test string"> <button type="submit">Test</button><cfif len(trim(URL.TestString))> <a href="?">Reset</a></CFIF> | |
</form> | |
<h2>Test Results</h2> | |
<cfloop from="1" to="#ArrayLen(TestStrings)#" index="r"> | |
<cfset TestString = TestStrings[r]> | |
<cfset TestResult = JUnidecode(TestString)> | |
<cfset letters = []> | |
<fieldset> | |
<legend>#r#. #TestString#</legend> | |
<b>Result:</b> #TestResult# | |
<table border="1" cellspacing="0" cellpadding="0"> | |
<tr valign="top"> | |
<th>Original</th><cfloop from="1" to="#len(TestString)#" index="i"> | |
<cfset Letter = mid(TestString, i, 1)> | |
<cfset arrayAppend(letters, Letter)><td><tt>#Letter#</tt><br><tt>#asc(Letter)#</tt></td></cfloop> | |
</tr> | |
<tr valign="top"> | |
<th>JUnidecode</th><cfloop from="1" to="#len(TestResult)#" index="i"> | |
<cfset Letter = mid(TestResult, i, 1)> | |
<td<CFIF isDiff(Letters, Letter, i)> class="diff"</cfif>><tt>#Letter#</tt><br><tt>#asc(Letter)#</tt></td></cfloop> | |
</tr> | |
</table> | |
</fieldset> | |
</cfloop> | |
</cfoutput> | |
</body> | |
</html> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@JamoCA : Thank you so much for your reply and for providing those links. I've got it working now!