Categories
Technology Backend

How to use utf8 in Perl and don’t go crazy

JSON structure.

UTF-8 is the best way to deal with text, and I hope everyone can agree with that. Anyway, while Perl handles any character beautifully within variables, things get messy when you want to save and load these characters into a file, for example. God forbid it’s JSON, then you’re in for a wild ride. But there’s a simple way that just works.

For starters, and to be on the safe side, use utf8! The pragma, I mean:

use utf8;

Next, when saving text (but not binary content), open the file handle in utf8 mode – $file is the full path to the saved file, $data holds the file’s contents:

if (open FILE, '>:utf8', $file) {
  print FILE $data;
  close FILE;
}

Finally, when loading text (again, not binary content unless you want a flood of warnings!), open the file handle in utf8 encoding mode:

if (open FILE, '<:encoding(utf8)', $file) {
  while(<FILE>) {
    $data .= $_;
  }
  close FILE;
}

PS. An excellet Perl file module is File::Util. It’s latest version also has a nice switch to write/read utf8 files, or other binmodes for that matter.

Json burn

Now, JSON. I’m assuming you are including the most popular module like use JSON;. It does some utf8 operations internally, and from my tests it seems that each of the four encoding/decoding subroutines does it differently.

To make it work with what I described earlier, here’s my suggestion. When creating $data to be saved into a file, use:

$data = to_json($object, {pretty => 1}));

It seems to_json has {utf8 => 0} set by default. pretty is just useful, but you can live without it. However, the other way around – loading a JSON from a file – is different. Once you get the $data, you have to decode it like this:

$object = from_json($data, {utf8 => 0});

Disable utf8 explicitly. After all, it was already decoded when you opened the file with <:encoding(utf8). Yes, encoding to decode seems strange and any of the Perl Monks will give you a perfectly smart explanation as to why it’s like that. I am not one of them. I just test my code and find stuff that works. Enjoy!

By Marek

I graduated Oxford University Computing Laboratory in 2008 and since then have been a full-stack lead on many projects, in different technologies. Myself, I like to code in Perl, Solidity and JavaScript, run on Debian & Nginx, design with Adobe CC & Affinity and work remotely, but overall I always do whatever gets the job done. I like to learn new things all the time!

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.