UTF-8 is the best way to deal with text, and I hope everyone can agree with that. Anyway, while Perl handles any character beautifully within variables, things get messy when you want to save and load these characters into a file, for example. God forbid it’s JSON, then you’re in for a wild ride. But there’s a simple way that just works.
For starters, and to be on the safe side, use utf8! The pragma, I mean:
use utf8;
Next, when saving text (but not binary content), open the file handle in utf8 mode – $file
is the full path to the saved file, $data
holds the file’s contents:
if (open FILE, '>:utf8', $file) {
print FILE $data;
close FILE;
}
Finally, when loading text (again, not binary content unless you want a flood of warnings!), open the file handle in utf8 encoding mode:
if (open FILE, '<:encoding(utf8)', $file) {
while(<FILE>) {
$data .= $_;
}
close FILE;
}
PS. An excellet Perl file module is File::Util
. It’s latest version also has a nice switch to write/read utf8 files, or other binmodes for that matter.
Json burn
Now, JSON. I’m assuming you are including the most popular module like use JSON;
. It does some utf8 operations internally, and from my tests it seems that each of the four encoding/decoding subroutines does it differently.
To make it work with what I described earlier, here’s my suggestion. When creating $data
to be saved into a file, use:
$data = to_json($object, {pretty => 1}));
It seems to_json
has {utf8 => 0}
set by default. pretty
is just useful, but you can live without it. However, the other way around – loading a JSON from a file – is different. Once you get the $data
, you have to decode it like this:
$object = from_json($data, {utf8 => 0});
Disable utf8 explicitly. After all, it was already decoded when you opened the file with <:encoding(utf8)
. Yes, encoding to decode seems strange and any of the Perl Monks will give you a perfectly smart explanation as to why it’s like that. I am not one of them. I just test my code and find stuff that works. Enjoy!