封面《ハロー・レディ! -Superior Entelecheia-》
前言
因为项目中的查询语句使用了 “+” 号,之前就因为空格的原因碰到了 “假空格”,所以就换为 “+” 号。但是让人没有想到的换成加号也有一样的问题。
问题描述
观察下面两种 “+” 号,也许不同人看到的结果不一样。不过在我这微信上看到的两种 “+” 号是一样的。然而在程序中这两种 “+” 号的编码是不一样的,可以使用这个工具中的中文转 unicode 的比较,他们分别是 \uff0b 和 \u002b。这就导致有人输入了查询信息但是却没有触发正确的查询。
1 | + \uff0b #全宽加号 |
就其原因还是全角 (fullwidth) 和半角 (Halfwidth)。
In CJK (Chinese, Japanese and Korean) computing, graphic characters are traditionally classed into fullwidth (in Taiwan and Hong Kong: 全形;in CJK and Japanese: 全角) and halfwidth (in Taiwan and Hong Kong: 半形;in CJK and Japanese: 半角) characters. With fixed-width fonts, a halfwidth character occupies half the width of a fullwidth character, hence the name.
In the days of computer terminals and text mode computing, characters were normally laid out in a grid, often 80 columns by 24 or 25 lines. Each character was displayed as a small dot matrix, often about 8 pixels wide, and an SBCS (single byte character set) was generally used to encode characters of western languages.
For a number of practical and aesthetic reasons, Han characters would need to be twice as wide as these fixed-width SBCS characters. These “fullwidth characters” were typically encoded in a DBCS (double byte character set), although less common systems used other variable-width character sets that used more bytes per character.
Halfwidth and Fullwidth Forms is also the name of a Unicode block U+FF00–FFEF.
简单来说就是汉字用半角显示太窄了,因此使用全角显示,顺便也就把英文一起全角了。不过系统输入一般都是需要半角
全角与半角转换
我查到的一些资料,称呼全角 SBCS、半角 DBCS。根据上文中 wiki 的说法似乎是反的。应该全角是 DBCS、半角是 SBCS。
转换
1 | public class AsciiUtil { |
测试
1 | public class AsciiTests { |
感想
阅历较少的我没想到会碰到这种问题,“永远不要相信用户的输入” 这句话说的很对。