Skip to content
  • Mathieu Bridon's avatar
    python: Rework bytes/unicode string handling · bd27203f
    Mathieu Bridon authored and Dylan Baker's avatar Dylan Baker committed
    
    
    In both Python 2 and 3, opening a file without specifying the mode will
    open it for reading in text mode ('r').
    
    On Python 2, the read() method of a file object opened in mode 'r' will
    return byte strings, while on Python 3 it will return unicode strings.
    
    Explicitly specifying the binary mode ('rb') then decoding the byte
    string means we always handle unicode strings on both Python 2 and 3.
    
    Which in turns means all re.match(line) will return unicode strings as
    well.
    
    If we also make expandCString return unicode strings, we don't need the
    call to the unicode() constructor any more.
    
    We were using the ugettext() method because it always returns unicode
    strings in Python 2, contrarily to the gettext() one which returns
    byte strings. The ugettext() method doesn't exist on Python 3, so we
    must use the right method on each version of Python.
    
    The last hurdles are that Python 3 doesn't let us concatenate unicode
    and byte strings directly, and that Python 2's stdout wants encoded byte
    strings while Python 3's want unicode strings.
    
    With these changes, the script gives the same output on both Python 2
    and 3.
    
    Signed-off-by: default avatarMathieu Bridon <bochecha@daitauha.fr>
    Reviewed-by: default avatarDylan Baker <dylan@pnwbakers.com>
    bd27203f