P4Convert tool fails to adhere to types.map table and continues to migrate some *.txt files as binary and some *.dat files as text (despite having specific entries for each extension in typemap table)
Reference Case # 00147162
Text files that have high ascii characters that fall outside of known character sets will get 'downgraded' to binary or other types (utf16/unicode).
If you are working in a non-unicode environment and using Windows clients you may want to disable the translation mode (com.p4convert.p4.translate=false) please refer to the Docs:Configuration:Unicode Support for details.
Ok, so if the content of the text files have high ascii characters, I understand the migration tool will direct Perforce to store the file as binary. But my question is, despite enforcing 'data.txt' file to be stored as "text" in the types.map table, why is the migration tool overriding this?
Perforce stores files internally as UTF8 and with a unix line ending. In the converter I have to be able to determine the character set and re-encode the file as UTF8. If I do not then either p4java will get an encoding error when submitting (Import mode) or the user will get an encoding error during a sync (Convert mode).
If the encoding is unknown then the file must be stored as binary (ignoring the types.map file) to avoid corruption.
If you are able to send me an example file (email) or drag-n-drop it into the chat I would be happy to look at the encoding in detail. Please note the workshop is public, so don't send my anything confidential.
Is this CVS or SVN? If you have the converter's output when adding the file it my explain what is happening.
Text files that have high ascii characters that fall outside of known character sets will get 'downgraded' to binary or other types (utf16/unicode).
If you are working in a non-unicode environment and using Windows clients you may want to disable the translation mode (com.p4convert.p4.translate=false) please refer to the Docs:Configuration:Unicode Support for details.
Change 12325 should help improve binary detection of CVS when the 'expand' option is set.
Please can you include the version of the tool you are using in the job report (java -jar p4convert.jar --version).
Ok, so if the content of the text files have high ascii characters, I understand the migration tool will direct Perforce to store the file as binary. But my question is, despite enforcing 'data.txt' file to be stored as "text" in the types.map table, why is the migration tool overriding this?
[jsiddaga@busgn2304 workshop.main.11331]$ java -jar p4convert.jar --version
workshop.main.11331
No we are not using Windows clients but are a non-unicode workshop.
Perforce stores files internally as UTF8 and with a unix line ending. In the converter I have to be able to determine the character set and re-encode the file as UTF8. If I do not then either p4java will get an encoding error when submitting (Import mode) or the user will get an encoding error during a sync (Convert mode).
If the encoding is unknown then the file must be stored as binary (ignoring the types.map file) to avoid corruption.
If you are able to send me an example file (email) or drag-n-drop it into the chat I would be happy to look at the encoding in detail. Please note the workshop is public, so don't send my anything confidential.
Is this CVS or SVN? If you have the converter's output when adding the file it my explain what is happening.