Unicode Support // P4Convert: User Guide

Unicode Support

The following Unicode enable options apply to Unicode support for Import Mode and Convert Mode. The Charset options are only applicable to Import Mode, when translating a file through the Perforce client. In Convert Mode archive files are always written in UTF-8 for a Unicode enabled Perforce server.

Defaults (for non-Unicode servers):

com.p4convert.p4.unicode=false
com.p4convert.p4.translate=true
com.p4convert.p4.charset=<none>

In some situations it is preferable to import text files as-is (untranslated). Typically this is true for a non-unicode environment where all the user are on Windows clients. To disable translation set the following option to false.

com.p4convert.p4.translate=false

When translation is disabled high-ascii text uses the content type TEXT-RAW and the following warning is disabled:

... Non-unicode server, downgrading file to text

Recommended configuration for a Unicode conversion:

com.p4convert.p4.unicode=true
com.p4convert.p4.translate=true
com.p4convert.p4.charset=utf8

For Unicode conversions set the JVM arg:

-Dfile.encoding=UTF-8

Some Solaris and Linux conversions may need the locale set:

export LC_ALL=en_GB.UTF-8

Once a Perforce server is switched to Unicode enabled mode (-xi), all client workspaces need to define a character set. For details see:

http://answers.perforce.com/articles/KB_Article/Internationalization-and-Localization

Note

A non-Unicode enabled Perforce Server will accept UTF16 files.

Normalisation

Platform Unicode normalisation is detected when the configuration file is generated, however it can be changed by setting the following configuration option to NFC or NFD:

com.p4convert.p4.normalisation=NFD

The default detection is based on the following:

Platform	Normalization
Windows	`NFC`
Mac	`NFD`
nix/nux	`NFC`
Sun	`NFC`

Subversion Properties

By default, the converter parses Subversion properties as UTF-8 strings. The conversion uses properties such as svn:log, svn:author for attributes in Perforce and must decode the byte sequence to UTF-8. In some data sets Windows users may have added high ASCII characters in one or more code pages. This release now supports a configuration option:

com.p4convert.svn.propTextType=UNKNOWN

The following strings denote the supported char-sets:

Big5	IBM424_rtl	ISO-8859-7	UTF-16LE
BINARY	ISO-2022-CN	ISO-8859-8	UTF-32BE
EUC-JP	ISO-2022-JP	ISO-8859-9	UTF-32LE
EUC-KR	ISO-2022-KR	KOI8-R	UTF-8
GB18030	ISO-8859-1	Shift_JIS	windows-1251
IBM420_ltr	ISO-8859-2	UNKNOWN	windows-1252
IBM420_rtl	ISO-8859-5	US-ASCII	windows-1254
IBM424_ltr	ISO-8859-6	UTF-16BE	windows-1256

The first scan is always UTF-8 followed by the configuration option. BINARY implies a skip and the string <binary property> is inserted.

P4Convert: User Guide (April 2015)

Unicode Support

Note

Normalisation

Subversion Properties