封面《ハロー・レディ! -Superior Entelecheia-》

前言

因为项目中的查询语句使用了 “+” 号,之前就因为空格的原因碰到了 “假空格”,所以就换为 “+” 号。但是让人没有想到的换成加号也有一样的问题。

问题描述

观察下面两种 “+” 号,也许不同人看到的结果不一样。不过在我这微信上看到的两种 “+” 号是一样的。然而在程序中这两种 “+” 号的编码是不一样的,可以使用这个工具中的中文转 unicode 的比较,他们分别是 \uff0b 和 \u002b。这就导致有人输入了查询信息但是却没有触发正确的查询。

1
2
+   \uff0b  #全宽加号
+ \u002b

就其原因还是全角 (fullwidth) 和半角 (Halfwidth)。

In CJK (Chinese, Japanese and Korean) computing, graphic characters are traditionally classed into fullwidth (in Taiwan and Hong Kong: 全形;in CJK and Japanese: 全角) and halfwidth (in Taiwan and Hong Kong: 半形;in CJK and Japanese: 半角) characters. With fixed-width fonts, a halfwidth character occupies half the width of a fullwidth character, hence the name.
In the days of computer terminals and text mode computing, characters were normally laid out in a grid, often 80 columns by 24 or 25 lines. Each character was displayed as a small dot matrix, often about 8 pixels wide, and an SBCS (single byte character set) was generally used to encode characters of western languages.
For a number of practical and aesthetic reasons, Han characters would need to be twice as wide as these fixed-width SBCS characters. These “fullwidth characters” were typically encoded in a DBCS (double byte character set), although less common systems used other variable-width character sets that used more bytes per character.
Halfwidth and Fullwidth Forms is also the name of a Unicode block U+FF00–FFEF.

简单来说就是汉字用半角显示太窄了,因此使用全角显示,顺便也就把英文一起全角了。不过系统输入一般都是需要半角

全角与半角转换

我查到的一些资料,称呼全角 SBCS、半角 DBCS。根据上文中 wiki 的说法似乎是反的。应该全角是 DBCS、半角是 SBCS。

转换

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
public class AsciiUtil {
public static final char DBC_SPACE = 12288; // 全角空格 12288

public static final char SBC_SPACE = 32; // 半角空格 32

// ASCII character 33-126 <-> unicode 65281-65374
public static final char ASCII_START = 33;

public static final char ASCII_END = 126;

public static final char UNICODE_START = 65281;

public static final char UNICODE_END = 65374;

public static final char DBC_SBC_STEP = 65248; // 全角半角转换间隔

public static char sbc2dbc(char src) {
if (src == SBC_SPACE) {
return DBC_SPACE;
}

if (src >= ASCII_START && src <= ASCII_END) {
return (char) (src + DBC_SBC_STEP);
}

return src;
}

/**
* Convert from SBC case to DBC case 半角到全角
*
* @param src
* @return DBC case
*/
public static String sbc2dbcCase(String src) {
if (src == null) {
return null;
}
char[] c = src.toCharArray();
for (int i = 0; i < c.length; i++) {
c[i] = sbc2dbc(c[i]);
}
return new String(c);
}

public static char dbc2sbc(char src) {
if (src == DBC_SPACE) {
return SBC_SPACE;
}
if (src <= UNICODE_END && src>=UNICODE_START) {
return (char) (src - DBC_SBC_STEP);
}
return src;
}

/**
* Convert from DBC case to SBC case. 全角到半角
*
* @param src
* @return SBC case string
*/
public static String dbc2sbcCase(String src) {
if (src == null) {
return null;
}

char[] c = src.toCharArray();
for (int i = 0; i < c.length; i++) {
c[i] = dbc2sbc(c[i]);
}

return new String(c);
}
}

测试

1
2
3
4
5
6
7
8
9
10
11
12
13
public class AsciiTests {

@ParameterizedTest
@CsvSource({
"+,+",
"helloworld,helloworld"
})
public void sbcDbcTest(String sbc,String dbc){
assertEquals(sbc, AsciiUtil.dbc2sbcCase(dbc));
assertEquals(dbc, AsciiUtil.sbc2dbcCase(sbc));
}
}

感想

阅历较少的我没想到会碰到这种问题,“永远不要相信用户的输入” 这句话说的很对。

参考资料

Halfwidth and Fullwidth Forms

Halfwidth and fullwidth forms

java 字符全角半角转换