|
Packit |
acf257 |
-----------------------------------------------------------------------------
|
|
Packit |
acf257 |
-- |
|
|
Packit |
acf257 |
-- Module : Codec.Binary.Base64
|
|
Packit |
acf257 |
-- Copyright : (c) Dominic Steinitz 2005, Warrick Gray 2002
|
|
Packit |
acf257 |
-- License : BSD-style (see the file ReadMe.tex)
|
|
Packit |
acf257 |
--
|
|
Packit |
acf257 |
-- Maintainer : dominic.steinitz@blueyonder.co.uk
|
|
Packit |
acf257 |
-- Stability : experimental
|
|
Packit |
acf257 |
-- Portability : portable
|
|
Packit |
acf257 |
--
|
|
Packit |
acf257 |
-- Base64 encoding and decoding functions provided by Warwick Gray.
|
|
Packit |
acf257 |
-- See <http://homepages.paradise.net.nz/warrickg/haskell/http/#base64>
|
|
Packit |
acf257 |
-- and <http://www.faqs.org/rfcs/rfc2045.html>.
|
|
Packit |
acf257 |
--
|
|
Packit |
acf257 |
-----------------------------------------------------------------------------
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
module Network.HTTP.Base64
|
|
Packit |
acf257 |
( encode
|
|
Packit |
acf257 |
, decode
|
|
Packit |
acf257 |
, chop72
|
|
Packit |
acf257 |
, Octet
|
|
Packit |
acf257 |
) where
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
{------------------------------------------------------------------------
|
|
Packit |
acf257 |
This is what RFC2045 had to say:
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
6.8. Base64 Content-Transfer-Encoding
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
The Base64 Content-Transfer-Encoding is designed to represent
|
|
Packit |
acf257 |
arbitrary sequences of octets in a form that need not be humanly
|
|
Packit |
acf257 |
readable. The encoding and decoding algorithms are simple, but the
|
|
Packit |
acf257 |
encoded data are consistently only about 33 percent larger than the
|
|
Packit |
acf257 |
unencoded data. This encoding is virtually identical to the one used
|
|
Packit |
acf257 |
in Privacy Enhanced Mail (PEM) applications, as defined in RFC 1421.
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
A 65-character subset of US-ASCII is used, enabling 6 bits to be
|
|
Packit |
acf257 |
represented per printable character. (The extra 65th character, "=",
|
|
Packit |
acf257 |
is used to signify a special processing function.)
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
NOTE: This subset has the important property that it is represented
|
|
Packit |
acf257 |
identically in all versions of ISO 646, including US-ASCII, and all
|
|
Packit |
acf257 |
characters in the subset are also represented identically in all
|
|
Packit |
acf257 |
versions of EBCDIC. Other popular encodings, such as the encoding
|
|
Packit |
acf257 |
used by the uuencode utility, Macintosh binhex 4.0 [RFC-1741], and
|
|
Packit |
acf257 |
the base85 encoding specified as part of Level 2 PostScript, do not
|
|
Packit |
acf257 |
share these properties, and thus do not fulfill the portability
|
|
Packit |
acf257 |
requirements a binary transport encoding for mail must meet.
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
The encoding process represents 24-bit groups of input bits as output
|
|
Packit |
acf257 |
strings of 4 encoded characters. Proceeding from left to right, a
|
|
Packit |
acf257 |
24-bit input group is formed by concatenating 3 8bit input groups.
|
|
Packit |
acf257 |
These 24 bits are then treated as 4 concatenated 6-bit groups, each
|
|
Packit |
acf257 |
of which is translated into a single digit in the base64 alphabet.
|
|
Packit |
acf257 |
When encoding a bit stream via the base64 encoding, the bit stream
|
|
Packit |
acf257 |
must be presumed to be ordered with the most-significant-bit first.
|
|
Packit |
acf257 |
That is, the first bit in the stream will be the high-order bit in
|
|
Packit |
acf257 |
the first 8bit byte, and the eighth bit will be the low-order bit in
|
|
Packit |
acf257 |
the first 8bit byte, and so on.
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
Each 6-bit group is used as an index into an array of 64 printable
|
|
Packit |
acf257 |
characters. The character referenced by the index is placed in the
|
|
Packit |
acf257 |
output string. These characters, identified in Table 1, below, are
|
|
Packit |
acf257 |
selected so as to be universally representable, and the set excludes
|
|
Packit |
acf257 |
characters with particular significance to SMTP (e.g., ".", CR, LF)
|
|
Packit |
acf257 |
and to the multipart boundary delimiters defined in RFC 2046 (e.g.,
|
|
Packit |
acf257 |
"-").
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
Table 1: The Base64 Alphabet
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
Value Encoding Value Encoding Value Encoding Value Encoding
|
|
Packit |
acf257 |
0 A 17 R 34 i 51 z
|
|
Packit |
acf257 |
1 B 18 S 35 j 52 0
|
|
Packit |
acf257 |
2 C 19 T 36 k 53 1
|
|
Packit |
acf257 |
3 D 20 U 37 l 54 2
|
|
Packit |
acf257 |
4 E 21 V 38 m 55 3
|
|
Packit |
acf257 |
5 F 22 W 39 n 56 4
|
|
Packit |
acf257 |
6 G 23 X 40 o 57 5
|
|
Packit |
acf257 |
7 H 24 Y 41 p 58 6
|
|
Packit |
acf257 |
8 I 25 Z 42 q 59 7
|
|
Packit |
acf257 |
9 J 26 a 43 r 60 8
|
|
Packit |
acf257 |
10 K 27 b 44 s 61 9
|
|
Packit |
acf257 |
11 L 28 c 45 t 62 +
|
|
Packit |
acf257 |
12 M 29 d 46 u 63 /
|
|
Packit |
acf257 |
13 N 30 e 47 v
|
|
Packit |
acf257 |
14 O 31 f 48 w (pad) =
|
|
Packit |
acf257 |
15 P 32 g 49 x
|
|
Packit |
acf257 |
16 Q 33 h 50 y
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
The encoded output stream must be represented in lines of no more
|
|
Packit |
acf257 |
than 76 characters each. All line breaks or other characters not
|
|
Packit |
acf257 |
found in Table 1 must be ignored by decoding software. In base64
|
|
Packit |
acf257 |
data, characters other than those in Table 1, line breaks, and other
|
|
Packit |
acf257 |
white space probably indicate a transmission error, about which a
|
|
Packit |
acf257 |
warning message or even a message rejection might be appropriate
|
|
Packit |
acf257 |
under some circumstances.
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
Special processing is performed if fewer than 24 bits are available
|
|
Packit |
acf257 |
at the end of the data being encoded. A full encoding quantum is
|
|
Packit |
acf257 |
always completed at the end of a body. When fewer than 24 input bits
|
|
Packit |
acf257 |
are available in an input group, zero bits are added (on the right)
|
|
Packit |
acf257 |
to form an integral number of 6-bit groups. Padding at the end of
|
|
Packit |
acf257 |
the data is performed using the "=" character. Since all base64
|
|
Packit |
acf257 |
input is an integral number of octets, only the following cases can
|
|
Packit |
acf257 |
arise: (1) the final quantum of encoding input is an integral
|
|
Packit |
acf257 |
multiple of 24 bits; here, the final unit of encoded output will be
|
|
Packit |
acf257 |
an integral multiple of 4 characters with no "=" padding, (2) the
|
|
Packit |
acf257 |
final quantum of encoding input is exactly 8 bits; here, the final
|
|
Packit |
acf257 |
unit of encoded output will be two characters followed by two "="
|
|
Packit |
acf257 |
padding characters, or (3) the final quantum of encoding input is
|
|
Packit |
acf257 |
exactly 16 bits; here, the final unit of encoded output will be three
|
|
Packit |
acf257 |
characters followed by one "=" padding character.
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
Because it is used only for padding at the end of the data, the
|
|
Packit |
acf257 |
occurrence of any "=" characters may be taken as evidence that the
|
|
Packit |
acf257 |
end of the data has been reached (without truncation in transit). No
|
|
Packit |
acf257 |
such assurance is possible, however, when the number of octets
|
|
Packit |
acf257 |
transmitted was a multiple of three and no "=" characters are
|
|
Packit |
acf257 |
present.
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
Any characters outside of the base64 alphabet are to be ignored in
|
|
Packit |
acf257 |
base64-encoded data.
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
Care must be taken to use the proper octets for line breaks if base64
|
|
Packit |
acf257 |
encoding is applied directly to text material that has not been
|
|
Packit |
acf257 |
converted to canonical form. In particular, text line breaks must be
|
|
Packit |
acf257 |
converted into CRLF sequences prior to base64 encoding. The
|
|
Packit |
acf257 |
important thing to note is that this may be done directly by the
|
|
Packit |
acf257 |
encoder rather than in a prior canonicalization step in some
|
|
Packit |
acf257 |
implementations.
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
NOTE: There is no need to worry about quoting potential boundary
|
|
Packit |
acf257 |
delimiters within base64-encoded bodies within multipart entities
|
|
Packit |
acf257 |
because no hyphen characters are used in the base64 encoding.
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
----------------------------------------------------------------------------}
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
{-
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
The following properties should hold:
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
decode . encode = id
|
|
Packit |
acf257 |
decode . chop72 . encode = id
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
I.E. Both "encode" and "chop72 . encode" are valid methods of encoding input,
|
|
Packit |
acf257 |
the second variation corresponds better with the RFC above, but outside of
|
|
Packit |
acf257 |
MIME applications might be undesireable.
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
But: The Haskell98 Char type is at least 16bits (and often 32), these implementations assume only
|
|
Packit |
acf257 |
8 significant bits, which is more than enough for US-ASCII.
|
|
Packit |
acf257 |
-}
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
import Data.Array (Array, array, (!))
|
|
Packit |
acf257 |
import Data.Bits (shiftL, shiftR, (.&.), (.|.))
|
|
Packit |
acf257 |
import Data.Char (chr, ord)
|
|
Packit |
acf257 |
import Data.Word (Word8)
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
type Octet = Word8
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
encodeArray :: Array Int Char
|
|
Packit |
acf257 |
encodeArray = array (0,64)
|
|
Packit |
acf257 |
[ (0,'A'), (1,'B'), (2,'C'), (3,'D'), (4,'E'), (5,'F')
|
|
Packit |
acf257 |
, (6,'G'), (7,'H'), (8,'I'), (9,'J'), (10,'K'), (11,'L')
|
|
Packit |
acf257 |
, (12,'M'), (13,'N'), (14,'O'), (15,'P'), (16,'Q'), (17,'R')
|
|
Packit |
acf257 |
, (18,'S'), (19,'T'), (20,'U'), (21,'V'), (22,'W'), (23,'X')
|
|
Packit |
acf257 |
, (24,'Y'), (25,'Z'), (26,'a'), (27,'b'), (28,'c'), (29,'d')
|
|
Packit |
acf257 |
, (30,'e'), (31,'f'), (32,'g'), (33,'h'), (34,'i'), (35,'j')
|
|
Packit |
acf257 |
, (36,'k'), (37,'l'), (38,'m'), (39,'n'), (40,'o'), (41,'p')
|
|
Packit |
acf257 |
, (42,'q'), (43,'r'), (44,'s'), (45,'t'), (46,'u'), (47,'v')
|
|
Packit |
acf257 |
, (48,'w'), (49,'x'), (50,'y'), (51,'z'), (52,'0'), (53,'1')
|
|
Packit |
acf257 |
, (54,'2'), (55,'3'), (56,'4'), (57,'5'), (58,'6'), (59,'7')
|
|
Packit |
acf257 |
, (60,'8'), (61,'9'), (62,'+'), (63,'/') ]
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
-- Convert between 4 base64 (6bits ea) integers and 1 ordinary integer (32 bits)
|
|
Packit |
acf257 |
-- clearly the upmost/leftmost 8 bits of the answer are 0.
|
|
Packit |
acf257 |
-- Hack Alert: In the last entry of the answer, the upper 8 bits encode
|
|
Packit |
acf257 |
-- the integer number of 6bit groups encoded in that integer, ie 1, 2, 3.
|
|
Packit |
acf257 |
-- 0 represents a 4 :(
|
|
Packit |
acf257 |
int4_char3 :: [Int] -> [Char]
|
|
Packit |
acf257 |
int4_char3 (a:b:c:d:t) =
|
|
Packit |
acf257 |
let n = (a `shiftL` 18 .|. b `shiftL` 12 .|. c `shiftL` 6 .|. d)
|
|
Packit |
acf257 |
in (chr (n `shiftR` 16 .&. 0xff))
|
|
Packit |
acf257 |
: (chr (n `shiftR` 8 .&. 0xff))
|
|
Packit |
acf257 |
: (chr (n .&. 0xff)) : int4_char3 t
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
int4_char3 [a,b,c] =
|
|
Packit |
acf257 |
let n = (a `shiftL` 18 .|. b `shiftL` 12 .|. c `shiftL` 6)
|
|
Packit |
acf257 |
in [ (chr (n `shiftR` 16 .&. 0xff))
|
|
Packit |
acf257 |
, (chr (n `shiftR` 8 .&. 0xff)) ]
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
int4_char3 [a,b] =
|
|
Packit |
acf257 |
let n = (a `shiftL` 18 .|. b `shiftL` 12)
|
|
Packit |
acf257 |
in [ (chr (n `shiftR` 16 .&. 0xff)) ]
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
int4_char3 [_] = error "Network.HTTP.Base64.int4_char3: impossible number of Ints."
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
int4_char3 [] = []
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
-- Convert triplets of characters to
|
|
Packit |
acf257 |
-- 4 base64 integers. The last entries
|
|
Packit |
acf257 |
-- in the list may not produce 4 integers,
|
|
Packit |
acf257 |
-- a trailing 2 character group gives 3 integers,
|
|
Packit |
acf257 |
-- while a trailing single character gives 2 integers.
|
|
Packit |
acf257 |
char3_int4 :: [Char] -> [Int]
|
|
Packit |
acf257 |
char3_int4 (a:b:c:t)
|
|
Packit |
acf257 |
= let n = (ord a `shiftL` 16 .|. ord b `shiftL` 8 .|. ord c)
|
|
Packit |
acf257 |
in (n `shiftR` 18 .&. 0x3f) : (n `shiftR` 12 .&. 0x3f) : (n `shiftR` 6 .&. 0x3f) : (n .&. 0x3f) : char3_int4 t
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
char3_int4 [a,b]
|
|
Packit |
acf257 |
= let n = (ord a `shiftL` 16 .|. ord b `shiftL` 8)
|
|
Packit |
acf257 |
in [ (n `shiftR` 18 .&. 0x3f)
|
|
Packit |
acf257 |
, (n `shiftR` 12 .&. 0x3f)
|
|
Packit |
acf257 |
, (n `shiftR` 6 .&. 0x3f) ]
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
char3_int4 [a]
|
|
Packit |
acf257 |
= let n = (ord a `shiftL` 16)
|
|
Packit |
acf257 |
in [(n `shiftR` 18 .&. 0x3f),(n `shiftR` 12 .&. 0x3f)]
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
char3_int4 [] = []
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
-- Retrieve base64 char, given an array index integer in the range [0..63]
|
|
Packit |
acf257 |
enc1 :: Int -> Char
|
|
Packit |
acf257 |
enc1 ch = encodeArray!ch
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
-- | Cut up a string into 72 char lines, each line terminated by CRLF.
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
chop72 :: String -> String
|
|
Packit |
acf257 |
chop72 str = let (bgn,end) = splitAt 70 str
|
|
Packit |
acf257 |
in if null end then bgn else "\r\n" ++ chop72 end
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
-- Pads a base64 code to a multiple of 4 characters, using the special
|
|
Packit |
acf257 |
-- '=' character.
|
|
Packit |
acf257 |
quadruplets :: [Char] -> [Char]
|
|
Packit |
acf257 |
quadruplets (a:b:c:d:t) = a:b:c:d:quadruplets t
|
|
Packit |
acf257 |
quadruplets [a,b,c] = [a,b,c,'='] -- 16bit tail unit
|
|
Packit |
acf257 |
quadruplets [a,b] = [a,b,'=','='] -- 8bit tail unit
|
|
Packit |
acf257 |
quadruplets [_] = error "Network.HTTP.Base64.quadruplets: impossible number of characters."
|
|
Packit |
acf257 |
quadruplets [] = [] -- 24bit tail unit
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
enc :: [Int] -> [Char]
|
|
Packit |
acf257 |
enc = quadruplets . map enc1
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
dcd :: String -> [Int]
|
|
Packit |
acf257 |
dcd [] = []
|
|
Packit |
acf257 |
dcd (h:t)
|
|
Packit |
acf257 |
| h <= 'Z' && h >= 'A' = ord h - ord 'A' : dcd t
|
|
Packit |
acf257 |
| h >= '0' && h <= '9' = ord h - ord '0' + 52 : dcd t
|
|
Packit |
acf257 |
| h >= 'a' && h <= 'z' = ord h - ord 'a' + 26 : dcd t
|
|
Packit |
acf257 |
| h == '+' = 62 : dcd t
|
|
Packit |
acf257 |
| h == '/' = 63 : dcd t
|
|
Packit |
acf257 |
| h == '=' = [] -- terminate data stream
|
|
Packit |
acf257 |
| otherwise = dcd t
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
-- Principal encoding and decoding functions.
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
encode :: [Octet] -> String
|
|
Packit |
acf257 |
encode = enc . char3_int4 . (map (chr .fromIntegral))
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
{-
|
|
Packit |
acf257 |
prop_base64 os =
|
|
Packit |
acf257 |
os == (f . g . h) os
|
|
Packit |
acf257 |
where types = (os :: [Word8])
|
|
Packit |
acf257 |
f = map (fromIntegral. ord)
|
|
Packit |
acf257 |
g = decode . encode
|
|
Packit |
acf257 |
h = map (chr . fromIntegral)
|
|
Packit |
acf257 |
-}
|
|
Packit |
acf257 |
|
|
Packit |
acf257 |
decode :: String -> [Octet]
|
|
Packit |
acf257 |
decode = (map (fromIntegral . ord)) . int4_char3 . dcd
|