Blame doc/man7/passphrase-encoding.pod

Packit c4476c
=pod
Packit c4476c
Packit c4476c
=encoding utf8
Packit c4476c
Packit c4476c
=head1 NAME
Packit c4476c
Packit c4476c
passphrase-encoding
Packit c4476c
- How diverse parts of OpenSSL treat pass phrases character encoding
Packit c4476c
Packit c4476c
=head1 DESCRIPTION
Packit c4476c
Packit c4476c
In a modern world with all sorts of character encodings, the treatment of pass
Packit c4476c
phrases has become increasingly complex.
Packit c4476c
This manual page attempts to give an overview over how this problem is
Packit c4476c
currently addressed in different parts of the OpenSSL library.
Packit c4476c
Packit c4476c
=head2 The general case
Packit c4476c
Packit c4476c
The OpenSSL library doesn't treat pass phrases in any special way as a general
Packit c4476c
rule, and trusts the application or user to choose a suitable character set
Packit c4476c
and stick to that throughout the lifetime of affected objects.
Packit c4476c
This means that for an object that was encrypted using a pass phrase encoded in
Packit c4476c
ISO-8859-1, that object needs to be decrypted using a pass phrase encoded in
Packit c4476c
ISO-8859-1.
Packit c4476c
Using the wrong encoding is expected to cause a decryption failure.
Packit c4476c
Packit c4476c
=head2 PKCS#12
Packit c4476c
Packit c4476c
PKCS#12 is a bit different regarding pass phrase encoding.
Packit c4476c
The standard stipulates that the pass phrase shall be encoded as an ASN.1
Packit c4476c
BMPString, which consists of the code points of the basic multilingual plane,
Packit c4476c
encoded in big endian (UCS-2 BE).
Packit c4476c
Packit c4476c
OpenSSL tries to adapt to this requirements in one of the following manners:
Packit c4476c
Packit c4476c
=over 4
Packit c4476c
Packit c4476c
=item 1.
Packit c4476c
Packit c4476c
Treats the received pass phrase as UTF-8 encoded and tries to re-encode it to
Packit c4476c
UTF-16 (which is the same as UCS-2 for characters U+0000 to U+D7FF and U+E000
Packit c4476c
to U+FFFF, but becomes an expansion for any other character), or failing that,
Packit c4476c
proceeds with step 2.
Packit c4476c
Packit c4476c
=item 2.
Packit c4476c
Packit c4476c
Assumes that the pass phrase is encoded in ASCII or ISO-8859-1 and
Packit c4476c
opportunistically prepends each byte with a zero byte to obtain the UCS-2
Packit c4476c
encoding of the characters, which it stores as a BMPString.
Packit c4476c
Packit c4476c
Note that since there is no check of your locale, this may produce UCS-2 /
Packit c4476c
UTF-16 characters that do not correspond to the original pass phrase characters
Packit c4476c
for other character sets, such as any ISO-8859-X encoding other than
Packit c4476c
ISO-8859-1 (or for Windows, CP 1252 with exception for the extra "graphical"
Packit c4476c
characters in the 0x80-0x9F range).
Packit c4476c
Packit c4476c
=back
Packit c4476c
Packit c4476c
OpenSSL versions older than 1.1.0 do variant 2 only, and that is the reason why
Packit c4476c
OpenSSL still does this, to be able to read files produced with older versions.
Packit c4476c
Packit c4476c
It should be noted that this approach isn't entirely fault free.
Packit c4476c
Packit c4476c
A pass phrase encoded in ISO-8859-2 could very well have a sequence such as
Packit c4476c
0xC3 0xAF (which is the two characters "LATIN CAPITAL LETTER A WITH BREVE"
Packit c4476c
and "LATIN CAPITAL LETTER Z WITH DOT ABOVE" in ISO-8859-2 encoding), but would
Packit c4476c
be misinterpreted as the perfectly valid UTF-8 encoded code point U+00EF (LATIN
Packit c4476c
SMALL LETTER I WITH DIAERESIS) I
Packit c4476c
would be invalid UTF-8>.
Packit c4476c
A pass phrase that contains this kind of byte sequence will give a different
Packit c4476c
outcome in OpenSSL 1.1.0 and newer than in OpenSSL older than 1.1.0.
Packit c4476c
Packit c4476c
 0x00 0xC3 0x00 0xAF                    # OpenSSL older than 1.1.0
Packit c4476c
 0x00 0xEF                              # OpenSSL 1.1.0 and newer
Packit c4476c
Packit c4476c
On the same accord, anything encoded in UTF-8 that was given to OpenSSL older
Packit c4476c
than 1.1.0 was misinterpreted as ISO-8859-1 sequences.
Packit c4476c
Packit c4476c
=head2 OSSL_STORE
Packit c4476c
Packit c4476c
L<ossl_store(7)> acts as a general interface to access all kinds of objects,
Packit c4476c
potentially protected with a pass phrase, a PIN or something else.
Packit c4476c
This API stipulates that pass phrases should be UTF-8 encoded, and that any
Packit c4476c
other pass phrase encoding may give undefined results.
Packit c4476c
This API relies on the application to ensure UTF-8 encoding, and doesn't check
Packit c4476c
that this is the case, so what it gets, it will also pass to the underlying
Packit c4476c
loader.
Packit c4476c
Packit c4476c
=head1 RECOMMENDATIONS
Packit c4476c
Packit c4476c
This section assumes that you know what pass phrase was used for encryption,
Packit c4476c
but that it may have been encoded in a different character encoding than the
Packit c4476c
one used by your current input method.
Packit c4476c
For example, the pass phrase may have been used at a time when your default
Packit c4476c
encoding was ISO-8859-1 (i.e. "naïve" resulting in the byte sequence 0x6E 0x61
Packit c4476c
0xEF 0x76 0x65), and you're now in an environment where your default encoding
Packit c4476c
is UTF-8 (i.e. "naïve" resulting in the byte sequence 0x6E 0x61 0xC3 0xAF 0x76
Packit c4476c
0x65).
Packit c4476c
Whenever it's mentioned that you should use a certain character encoding, it
Packit c4476c
should be understood that you either change the input method to use the
Packit c4476c
mentioned encoding when you type in your pass phrase, or use some suitable tool
Packit c4476c
to convert your pass phrase from your default encoding to the target encoding.
Packit c4476c
Packit c4476c
Also note that the sub-sections below discuss human readable pass phrases.
Packit c4476c
This is particularly relevant for PKCS#12 objects, where human readable pass
Packit c4476c
phrases are assumed.
Packit c4476c
For other objects, it's as legitimate to use any byte sequence (such as a
Packit c4476c
sequence of bytes from `/dev/urandom` that's been saved away), which makes any
Packit c4476c
character encoding discussion irrelevant; in such cases, simply use the same
Packit c4476c
byte sequence as it is.
Packit c4476c
Packit c4476c
=head2 Creating new objects
Packit c4476c
Packit c4476c
For creating new pass phrase protected objects, make sure the pass phrase is
Packit c4476c
encoded using UTF-8.
Packit c4476c
This is default on most modern Unixes, but may involve an effort on other
Packit c4476c
platforms.
Packit c4476c
Specifically for Windows, setting the environment variable
Packit c4476c
C<OPENSSL_WIN32_UTF8> will have anything entered on [Windows] console prompt
Packit c4476c
converted to UTF-8 (command line and separately prompted pass phrases alike).
Packit c4476c
Packit c4476c
=head2 Opening existing objects
Packit c4476c
Packit c4476c
For opening pass phrase protected objects where you know what character
Packit c4476c
encoding was used for the encryption pass phrase, make sure to use the same
Packit c4476c
encoding again.
Packit c4476c
Packit c4476c
For opening pass phrase protected objects where the character encoding that was
Packit c4476c
used is unknown, or where the producing application is unknown, try one of the
Packit c4476c
following:
Packit c4476c
Packit c4476c
=over 4
Packit c4476c
Packit c4476c
=item 1.
Packit c4476c
Packit c4476c
Try the pass phrase that you have as it is in the character encoding of your
Packit c4476c
environment.
Packit c4476c
It's possible that its byte sequence is exactly right.
Packit c4476c
Packit c4476c
=item 2.
Packit c4476c
Packit c4476c
Convert the pass phrase to UTF-8 and try with the result.
Packit c4476c
Specifically with PKCS#12, this should open up any object that was created
Packit c4476c
according to the specification.
Packit c4476c
Packit c4476c
=item 3.
Packit c4476c
Packit c4476c
Do a naïve (i.e. purely mathematical) ISO-8859-1 to UTF-8 conversion and try
Packit c4476c
with the result.
Packit c4476c
This differs from the previous attempt because ISO-8859-1 maps directly to
Packit c4476c
U+0000 to U+00FF, which other non-UTF-8 character sets do not.
Packit c4476c
Packit c4476c
This also takes care of the case when a UTF-8 encoded string was used with
Packit c4476c
OpenSSL older than 1.1.0.
Packit c4476c
(for example, C<ï>, which is 0xC3 0xAF when encoded in UTF-8, would become 0xC3
Packit c4476c
0x83 0xC2 0xAF when re-encoded in the naïve manner.
Packit c4476c
The conversion to BMPString would then yield 0x00 0xC3 0x00 0xA4 0x00 0x00, the
Packit c4476c
erroneous/non-compliant encoding used by OpenSSL older than 1.1.0)
Packit c4476c
Packit c4476c
=back
Packit c4476c
Packit c4476c
=head1 SEE ALSO
Packit c4476c
Packit c4476c
L<evp(7)>,
Packit c4476c
L<ossl_store(7)>,
Packit c4476c
L<EVP_BytesToKey(3)>, L<EVP_DecryptInit(3)>,
Packit c4476c
L<PEM_do_header(3)>,
Packit c4476c
L<PKCS12_parse(3)>, L<PKCS12_newpass(3)>,
Packit c4476c
L<d2i_PKCS8PrivateKey_bio(3)>
Packit c4476c
Packit c4476c
=head1 COPYRIGHT
Packit c4476c
Packit c4476c
Copyright 2018-2020 The OpenSSL Project Authors. All Rights Reserved.
Packit c4476c
Packit c4476c
Licensed under the OpenSSL license (the "License").  You may not use
Packit c4476c
this file except in compliance with the License.  You can obtain a copy
Packit c4476c
in the file LICENSE in the source distribution or at
Packit c4476c
L<https://www.openssl.org/source/license.html>.
Packit c4476c
Packit c4476c
=cut