Skip to content

Instantly share code, notes, and snippets.

@ringsaturn
Forked from windsting/add_utf8_BOM.md
Created May 29, 2019 06:33
Show Gist options
  • Save ringsaturn/ab48af19a9722eec634a70b6d1c085f9 to your computer and use it in GitHub Desktop.
Save ringsaturn/ab48af19a9722eec634a70b6d1c085f9 to your computer and use it in GitHub Desktop.
为 UTF-8 编码的文本文件添加BOM

为 UTF-8 编码的文本文件添加 BOM

应用场景

  • 在 macOS 和 Windows 下同步 Cocos2D 项目

    Cocos2D 项目,在 macOS 下用 Xcode 编辑后,UTF-8 编码文件保存时不带 BOM,导致在 Windows 下用 Visual Studio 编译期报错,给这些文件添加 BOM 后,可以解决这些编译期错误,并且不会导致 Xcode 中编译有问题。

系统需求

本工具是一组 bash 脚本,需要在 bash 命令行下执行,同时请确保系统内存在以下软件

使用方法

在 bash 命令行内,执行

add-bom-for-files-in-folder.sh path-of-files-to-convert

其中 path-of-files-to-convert 是一个路径,该路径下所有以 不带签名的 UTF-8(UTF-8 without Signature) 编码的文件,都会被转换为 带签名的UTF-8编码(UTF-8 with Signature)

脚本文件

find-file-with-encoding.sh

此脚本列出指定编码格式的文件,请用 find-file-with-encoding.sh -h 查看使用说明。

#!/bin/bash

# ENCODING=UTF-8Unicodetext
TARGET=.
usage(){
    echo "find all files with specified encoding in a directory and all subdirectories"
    echo ""
    echo "$0"
    echo "  -h --help           show this message and exit"
    echo "  -e --encoding       specified encoding -e=$ENCODING"
    echo "  -l --list-encodings list all encodings and one file can find currently"
    echo "  -t --target-dir     target directory to check -t=$TARGET"
    echo ""
}

parse_arg() {
    while [ "$1" != "" ]; do
        PARAM=`echo $1 | awk -F= '{print $1}'`
        VALUE=`echo $1 | awk -F= '{print $2}'`
        case $PARAM in
            -h | --help)
                usage
                exit
                ;;
            -e | --encoding)
                ENCODING=$VALUE
                # echo "got ENCODING=$ENCODING"
                ;;
            -l | --list-encodings)
                LIST="1"
                # echo "got LIST=$LIST"
                ;;
            -t | --target-dir)
                TARGET=$VALUE
                # echo "got TARGET=$TARGET"
                ;;
            *)
                echo "ERROR: unknown parameter \"$PARAM\""
                usage
                exit 1
                ;;
        esac
        shift
    done
    echo $*
}

get_type () {
    INFO=`file - < "$1" | cut -d: -f2`
    TYPE=`echo $INFO | cut -d, -f2`
    TYPE=`echo $TYPE | sed 's,^ *,,; s, *$,,'`
    TYPE=`echo ${INFO//[[:space:]]/}`
    echo "$TYPE"
}

declare -A EMap

# BOMs:
# UTF-8Unicode(withBOM)text
# UTF-8Unicodetext
# ASCIItext
# ISO-8859text

find() {
    for file in $1/*
    do
        if [ -d "$file" ]
        then
            if [ -z "$(ls -A $file)" ]
            then
                :
            else
                if (( $# > 1 ))
                then
                    find "$file" "$2"
                else
                    find "$file"
                fi
            fi
        else
            # echo "$file"
            TYPE=`get_type "$file"`
            # echo "$file : $TYPE"
            EMap[$TYPE]="$file"
            if [ -z "$2" ] || [[ "$TYPE" != *$2* ]]
            then
                :
            else
                echo "$file"
            fi
        fi
    done
}

main(){
    parse_arg $*

    find $TARGET $ENCODING

    if [ -z "$LIST" ]
    then
        :
    else
        echo ""
        echo "All encodings:"
        for i in "${!EMap[@]}"
        do
        echo "$i:    ${EMap[$i]}"
        done
    fi
}

main $*

convert-to-utf8-with-signature.sh

此脚本把指定 文件(File)源编码格式(SourceEncoding)(默认是不带BOM的 UTF-8) 转换到 “带签名的UTF-8编码(UTF-8 with Signature)”。

#!/bin/bash

# echo $*

function get_type () {
    INFO=`file - < "$1" | cut -d: -f2`
    TYPE=`echo $INFO | cut -d, -f2`
    TYPE=`echo $TYPE | sed 's,^ *,,; s, *$,,'`
    echo "$TYPE"
}

function trim_string () {
    result=${1##}
    # result=${result%%}
    echo $result
}

function print_with_spaces () {
    echo "-$1-"
}

function test_trim() {
    STR="   Hello  World!       "
    print_with_spaces "$STR"
    STR=`echo $STR | sed 's,^ *,,; s, *$,,'`   # this line do the "Trim" action
    print_with_spaces "$STR"
    exit 0
}

function do_print_type () {
    echo "    $1"
}

function print_type () {
    FILE=$1
    TYPE=`echo $TYPE | sed 's,^ *,,; s, *$,,'`
    echo "$FILE type: -$TYPE-" 1>&2
    if [ "$TYPE" = "ASCII text" ]
    then
        do_print_type "ascii file"
    elif [ "$TYPE" = "UTF-8 Unicode (with BOM) text" ]
    then
        do_print_type "utf-8 with BOM"
    elif [ "$TYPE" = "UTF-8 Unicode text" ]
    then
        do_print_type "utf-8 without BOM"
    elif [ "$TYPE" = "ISO-8859 text" ]
    then
        do_print_type "GB2312"
    else
        do_print_type "========== unknown type: $TYPE"
    fi
}

# test_trim

if [ $# -lt 1 ]
then
    echo "Usage: $0 File [SourceEncoding]"
    exit
fi

File=$1
Src="UTF-8"
# if [ $# -ge 2 ]
# then
#     Src=$2
# fi


for File in $@
do
    echo "converting $File from $Src"
    uconv -f $Src -t UTF-8 --add-signature "$File" -o "$File.new"
    mv "$File.new" "$File"
done

exit


if [ $# -eq 0 ]
then
    echo usage $0 files ...
    exit 1
fi

for file in "$@"
do
    # echo "# Processing: $file" 1>&2
    if [ ! -f "$file" ]
    then
        echo Not a file: "$file" 1>&2
        exit 1
    fi
    TYPE=`get_type "$file"`
    # echo "$file type: -$TYPE-" 1>&2
    print_type "$file" "$TYPE"
    if echo "$TYPE" | grep -q '(with BOM)'
    then
        :
        # echo "# $file already has BOM, skipping." 1>&2
    else
        :
        # echo 1>&2
        # ( mv "${file}" "${file}"~ && uconv -f utf-8 -t utf-8 --add-signature < "${file}~" > "${file}" ) || ( echo Error processing "$file" 1>&2 ; exit 1)
    fi
done

add-bom-for-files-in-folder.sh

此脚本组合以上两个独立的脚本,提供简化的操作接口

#!/bin/bash

if [ $# -lt 1 ]
then
    echo "Usage: $0 path-of-files-to-convert"
    exit
fi

find-file-with-encoding.sh -e=UTF-8Unicodetext -t=$1 | xargs convert-to-utf8-with-signature.sh

补充说明

  1. 同步 Cocos2D 项目事项

    在项目中添加新文件后,如果该文件需要被添加到 项目文件(VS下是 .vcxproj 文件)内,一般会出现链接期的错误提示:

    Error	LNK1120	2 unresolved externals	land	E:\develop\proj\land\proj.win32\Release.win32\client.exe	1	
    Error	LNK2001	unresolved external symbol "public: static void __cdecl DialogNewbieGuide::Dialog(class cocos2d::Node *)" (?Dialog@DialogNewbieGuide@@SAXPAVNode@cocos2d@@@Z)	client	E:\develop\proj\land\proj.win32\DialogClubMain.obj	1	
    

    这种情况,只要找到包含这些符号(本例中是 DialogNewbieGuide)的文件,添加到项目中即可。

  2. 脚本文件

    在本页面上复制文本保存文件时,请尽量使用 Unix 的换行方式(LF),如果使用 Windows 的换行方式(CRLF),可能在执行脚本时,出现如下错误:

    /mnt/d/portable/_bin/add-bom-for-files-in-folder.sh: line 2: $'\r': command not found
    /mnt/d/portable/_bin/add-bom-for-files-in-folder.sh: line 10: syntax error: unexpected end of file
    
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment