Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save gundamew/8d35cda618842bd80153503b1603ef1c to your computer and use it in GitHub Desktop.

Select an option

Save gundamew/8d35cda618842bd80153503b1603ef1c to your computer and use it in GitHub Desktop.

A known minus symbol conversion issue for PHP

Suppose we use the "Fullwidth Hyphen-Minus" (U+FF0D) character in Shift_JIS encoding and convert it to UTF-8 in PHP.

The character will be mapped to another character, "Minus Sign" (U+2212).

$uff0d = '';
$uff0dSjis = mb_convert_encoding($uff0d, 'SJIS', 'UTF-8');
$uff0dUtf8 = mb_convert_encoding($uff0dSjis, 'UTF-8', 'SJIS');
$codePoint = mb_ord($uff0dUtf8, 'UTF-8');
var_dump(mb_ord($uff0d, 'UTF-8') === $codePoint);  // false
 
$u2212 = '';
var_dump(mb_ord($u2212, 'UTF-8') === $codePoint);  // true

The conversion issue might not only occur in PHP but in other programming languages and systems.

It seemed like it was a known issue and a decision with some purpose.

For example, IBM has a page for this issue: Japanese Shift-JIS Character Mapping

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment