封面《ハロー・レディ！ -Superior Entelecheia-》

前言

因为项目中的查询语句使用了 “+” 号，之前就因为空格的原因碰到了 “假空格”，所以就换为 “+” 号。但是让人没有想到的换成加号也有一样的问题。

问题描述

观察下面两种 “+” 号，也许不同人看到的结果不一样。不过在我这微信上看到的两种 “+” 号是一样的。然而在程序中这两种 “+” 号的编码是不一样的，可以使用这个工具中的中文转 unicode 的比较，他们分别是 \uff0b 和 \u002b。这就导致有人输入了查询信息但是却没有触发正确的查询。

1 2	＋ \uff0b #全宽加号 + \u002b

就其原因还是全角 (fullwidth) 和半角 (Halfwidth)。

In CJK (Chinese, Japanese and Korean) computing, graphic characters are traditionally classed into fullwidth (in Taiwan and Hong Kong: 全形；in CJK and Japanese: 全角) and halfwidth (in Taiwan and Hong Kong: 半形；in CJK and Japanese: 半角) characters. With fixed-width fonts, a halfwidth character occupies half the width of a fullwidth character, hence the name.
In the days of computer terminals and text mode computing, characters were normally laid out in a grid, often 80 columns by 24 or 25 lines. Each character was displayed as a small dot matrix, often about 8 pixels wide, and an SBCS (single byte character set) was generally used to encode characters of western languages.
For a number of practical and aesthetic reasons, Han characters would need to be twice as wide as these fixed-width SBCS characters. These “fullwidth characters” were typically encoded in a DBCS (double byte character set), although less common systems used other variable-width character sets that used more bytes per character.
Halfwidth and Fullwidth Forms is also the name of a Unicode block U+FF00–FFEF.

简单来说就是汉字用半角显示太窄了，因此使用全角显示，顺便也就把英文一起全角了。不过系统输入一般都是需要半角

全角与半角转换

我查到的一些资料，称呼全角 SBCS、半角 DBCS。根据上文中 wiki 的说法似乎是反的。应该全角是 DBCS、半角是 SBCS。

转换

public class AsciiUtil {
    public static final char DBC_SPACE = 12288; // 全角空格 12288

    public static final char SBC_SPACE = 32; // 半角空格 32

    // ASCII character 33-126 <-> unicode 65281-65374
    public static final char ASCII_START = 33;

    public static final char ASCII_END = 126;

    public static final char UNICODE_START = 65281;

    public static final char UNICODE_END = 65374;

    public static final char DBC_SBC_STEP = 65248; // 全角半角转换间隔

    public static char sbc2dbc(char src) {
        if (src == SBC_SPACE) {
            return DBC_SPACE;
        }

        if (src >= ASCII_START && src <= ASCII_END) {
            return (char) (src + DBC_SBC_STEP);
        }

        return src;
    }

    /**
     * Convert from SBC case to DBC case 半角到全角
     *
     * @param src
     * @return DBC case
     */
    public static String sbc2dbcCase(String src) {
        if (src == null) {
            return null;
        }
        char[] c = src.toCharArray();
        for (int i = 0; i < c.length; i++) {
            c[i] = sbc2dbc(c[i]);
        }
        return new String(c);
    }

    public static char dbc2sbc(char src) {
        if (src == DBC_SPACE) {
            return SBC_SPACE;
        }
        if (src <= UNICODE_END && src>=UNICODE_START) {
            return (char) (src - DBC_SBC_STEP);
        }
        return src;
    }

    /**
     * Convert from DBC case to SBC case. 全角到半角
     *
     * @param src
     * @return SBC case string
     */
    public static String dbc2sbcCase(String src) {
        if (src == null) {
            return null;
        }

        char[] c = src.toCharArray();
        for (int i = 0; i < c.length; i++) {
            c[i] = dbc2sbc(c[i]);
        }

        return new String(c);
    }
}

测试

public class AsciiTests {

    @ParameterizedTest
    @CsvSource({
        "+,＋",
        "helloworld,ｈｅｌｌｏｗｏｒｌｄ"
    })
    public void sbcDbcTest(String sbc,String dbc){
        assertEquals(sbc, AsciiUtil.dbc2sbcCase(dbc));
        assertEquals(dbc, AsciiUtil.sbc2dbcCase(sbc));
    }
}