pack模板字符串
下面就是Array#pack、String#unpack中所用到的模板字符的一览表。模板字符后面可以跟上表示"长度"的数字。若使用'*'来取代"长度"的话, 则表示"剩下的所有字符"之意。
长度的定义因模板字符的不同而有所差异, 大体上像
这样的连续字符可以写成
这个样子。
在下面的说明中, short和long分别表示长度为2和4字节的数值(也就是通常32位机器所指的short和long的大小), 这与具体的系统无关。 若`s', `S', `l', `L'后面出现`_'或`!'(如"s!")的话, 则表示这个short或long的大小取决于具体的系统。
请注意: `i', `I' (int)的大小总是取决于系统的, 而`n', `N', `v', `V'的大小则是系统无关的(不能添加`!')。
模板字符串中的空格会被忽略。 ruby 1.7 特性: 另外,从`#'开始到换行处或者到模板字符串结尾之间的部分会被看做是注释。
在下面的说明中, 若针对某问题Array#pack和String#unpack有不同的解释时, 就使用/将两者分开, 即采用 "Array#pack的说明部分/String#unpack的说明部分" 的形式加以说明.
-
a
ASCII字符串(塞入null字符/保留后续的null字符或空格)
1 2 3 4 5 6 | [ "abc" ].pack( "a" ) => "a"
[ "abc" ].pack( "a*" ) => "abc"
[ "abc" ].pack( "a4" ) => "abc\0"
"abc\0" .unpack( "a4" ) => [ "abc\0" ]
"abc " .unpack( "a4" ) => [ "abc " ]
|
-
A
ASCII字符串(塞入空格/删除后续的null字符和空格)
1 2 3 4 5 6 | [ "abc" ].pack( "A" ) => "a"
[ "abc" ].pack( "A*" ) => "abc"
[ "abc" ].pack( "A4" ) => "abc "
"abc " .unpack( "A4" ) => [ "abc" ]
"abc\0" .unpack( "A4" ) => [ "abc" ]
|
-
Z
null终点字符串(与a
相同 / 删除后续的null字符)
1 2 3 4 5 6 | [ "abc" ].pack( "Z" ) => "a"
[ "abc" ].pack( "Z*" ) => "abc"
[ "abc" ].pack( "Z4" ) => "abc\0"
"abc\0" .unpack( "Z4" ) => [ "abc" ]
"abc " .unpack( "Z4" ) => [ "abc " ]
|
-
b
位串(从下级位到上级位)
1 2 3 4 5 | "\001\002" .unpack( "b*" ) => [ "1000000001000000" ]
"\001\002" .unpack( "b3" ) => [ "100" ]
[ "1000000001000000" ].pack( "b*" ) => "\001\002"
|
-
B
位串(从上级位到下级位)
1 2 3 4 | "\001\002" .unpack( "B*" ) => [ "0000000100000010" ]
"\001\002" .unpack( "B9" ) => [ "000000010" ]
[ "0000000100000010" ].pack( "B*" ) => "\001\002"
|
-
h
16进制字符串(下级半字节(nibble)在先)
1 2 3 4 | "\x01\xfe" .unpack( "h*" ) => [ "10ef" ]
"\x01\xfe" .unpack( "h3" ) => [ "10e" ]
[ "10ef" ].pack( "h*" ) => "\001\376"
|
-
H
16进制字符串(上级半字节在先)
1 2 3 4 | "\x01\xfe" .unpack( "H*" ) => [ "01fe" ]
"\x01\xfe" .unpack( "H3" ) => [ "01f" ]
[ "01fe" ].pack( "H*" ) => "\001\376"
|
-
c
char (8bit 有符号整数)
1 2 3 4 | "\001\376" .unpack( "c*" ) => [1, -2]
[1, -2].pack( "c*" ) => "\001\376"
[1, 254].pack( "c*" ) => "\001\376"
|
-
C
unsigned char (8bit 无符号整数)
1 2 3 4 | "\001\376" .unpack( "C*" ) => [1, 254]
[1, -2].pack( "C*" ) => "\001\376"
[1, 254].pack( "C*" ) => "\001\376"
|
-
s
short (16bit 有符号整数, 取决于Endian) (s! 并非16bit, 它取决于short的大小)
小Endian:
1 2 3 4 | "\001\002\376\375" .unpack( "s*" ) => [513, -514]
[513, 65022].pack( "s*" ) => "\001\002\376\375"
[513, -514].pack( "s*" ) => "\001\002\376\375"
|
大Endian:
1 2 3 4 | "\001\002\376\375" .unpack( "s*" ) => [258, -259]
[258, 65277].pack( "s*" ) => "\001\002\376\375"
[258, -259].pack( "s*" ) => "\001\002\376\375"
|
-
S
unsigned short (16bit 无符号整数, 取决于Endian) (S!并非16bit,它取决于short 的大小)
小Endian:
1 2 3 4 | "\001\002\376\375" .unpack( "S*" ) => [513, 65022]
[513, 65022].pack( "s*" ) => "\001\002\376\375"
[513, -514].pack( "s*" ) => "\001\002\376\375"
|
大Endian:
1 2 3 4 | "\001\002\376\375" .unpack( "S*" ) => [258, 65277]
[258, 65277].pack( "S*" ) => "\001\002\376\375"
[258, -259].pack( "S*" ) => "\001\002\376\375"
|
-
i
int (有符号整数, 取决于Endian和int的大小)
小Endian, 32bit int:
1 2 3 4 | "\001\002\003\004\377\376\375\374" .unpack( "i*" ) => [67305985, -50462977]
[67305985, 4244504319].pack( "i*" ) => RangeError
[67305985, -50462977].pack( "i*" ) => "\001\002\003\004\377\376\375\374"
|
大Endian, 32bit int:
1 2 3 4 | "\001\002\003\004\377\376\375\374" .unpack( "i*" ) => [16909060, -66052]
[16909060, 4294901244].pack( "i*" ) => RangeError
[16909060, -66052].pack( "i*" ) => "\001\002\003\004\377\376\375\374"
|
-
I
unsigned int (无符号整数, 取决于Endian和int的大小)
小Endian, 32bit int:
1 2 3 4 | "\001\002\003\004\377\376\375\374" .unpack( "I*" ) => [67305985, 4244504319]
[67305985, 4244504319].pack( "I*" ) => "\001\002\003\004\377\376\375\374"
[67305985, -50462977].pack( "I*" ) => "\001\002\003\004\377\376\375\374"
|
大Endian, 32bit int:
1 2 3 4 | "\001\002\003\004\377\376\375\374" .unpack( "I*" ) => [16909060, 4294901244]
[16909060, 4294901244].pack( "I*" ) => "\001\002\003\004\377\376\375\374"
[16909060, -66052].pack( "I*" ) => "\001\002\003\004\377\376\375\374"
|
-
l
long (32bit 有符号整数, 取决于Endian) (l! 并非32bit, 它取决于long的大小)
小Endian, 32bit long:
1 2 3 4 | "\001\002\003\004\377\376\375\374" .unpack( "l*" ) => [67305985, -50462977]
[67305985, 4244504319].pack( "l*" ) => RangeError
[67305985, -50462977].pack( "l*" ) => "\001\002\003\004\377\376\375\374"
|
-
L
unsigned long (32bit 无符号整数, 取决于Endian) (L! 并非32bit, 它取决于long的大小)
小Endian, 32bit long:
1 2 3 4 | "\001\002\003\004\377\376\375\374" .unpack( "L*" ) => [67305985, 4244504319]
[67305985, 4244504319].pack( "L*" ) => "\001\002\003\004\377\376\375\374"
[67305985, -50462977].pack( "L*" ) => "\001\002\003\004\377\376\375\374"
|
-
q
ruby 1.7 特性: long long (有符号整数, 取决于Endian和long long 的大小) (在C中无法处理long long时, 就是64bit)
小Endian, 64bit long long:
1 2 3 4 5 6 7 | "\001\002\003\004\005\006\007\010\377\376\375\374\373\372\371\370" .unpack( "q*" )
=> [578437695752307201, -506097522914230529]
[578437695752307201, -506097522914230529].pack( "q*" )
=> "\001\002\003\004\005\006\a\010\377\376\375\374\373\372\371\370"
[578437695752307201, 17940646550795321087].pack( "q*" )
=> "\001\002\003\004\005\006\a\010\377\376\375\374\373\372\371\370"
|
-
Q
ruby 1.7 特性: unsigned long long (无符号整数, 取决于Endian和 long long 的大小) (在C中无法处理long long时, 就是64bit)
小Endian, 64bit long long:
1 2 3 4 5 6 7 | "\001\002\003\004\005\006\007\010\377\376\375\374\373\372\371\370" .unpack( "Q*" )
=> [578437695752307201, 17940646550795321087]
[578437695752307201, 17940646550795321087].pack( "Q*" )
=> "\001\002\003\004\005\006\a\010\377\376\375\374\373\372\371\370"
[578437695752307201, -506097522914230529].pack( "Q*" )
=> "\001\002\003\004\005\006\a\010\377\376\375\374\373\372\371\370"
|
-
m
被base64编码过的字符串。每隔60个八位组(或在结尾)添加一个换行代码。
Base64是一种编码方法, 它只使用ASCII码中的65个字符(包括[A-Za-z0-9+/]这64字符和用来padding的'='),将3个八位组(8bits * 3 = 24bits)中的二进制代码转为4个(6bits * 4 = 24bits)可印刷的字符。具体细节请参考RFC2045。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | [ "" ].pack( "m" ) => ""
[ "\0" ].pack( "m" ) => "AA==\n"
[ "\0\0" ].pack( "m" ) => "AAA=\n"
[ "\0\0\0" ].pack( "m" ) => "AAAA\n"
[ "\377" ].pack( "m" ) => "/w==\n"
[ "\377\377" ].pack( "m" ) => "//8=\n"
[ "\377\377\377" ].pack( "m" ) => "////\n"
[ "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" ].pack( "m" )
=> "YWJjZGVmZ2hpamtsbW5vcHFyc3R1dnd4eXpBQkNERUZHSElKS0xNTk9QUVJT\nVFVWV1hZWg==\n"
[ "abcdefghijklmnopqrstuvwxyz" ].pack( "m3" )
=> "YWJj\nZGVm\nZ2hp\namts\nbW5v\ncHFy\nc3R1\ndnd4\neXo=\n"
"" .unpack( "m" ) => [ "" ]
"AA==\n" .unpack( "m" ) => [ "\000" ]
"AA==" .unpack( "m" ) => [ "\000" ]
"YWJjZGVmZ2hpamtsbW5vcHFyc3R1dnd4eXpBQkNERUZHSElKS0xNTk9QUVJT\nVFVWV1hZWg==\n" .unpack( "m" )
=> [ "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" ]
"YWJjZGVmZ2hpamtsbW5vcHFyc3R1dnd4eXpBQkNERUZHSElKS0xNTk9QUVJTVFVWV1hZWg==\n" .unpack( "m" )
=> [ "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" ]
|
-
M
经过quoted-printable encoding编码的字符串
1 2 3 | [ "a b c\td \ne" ].pack( "M" ) => "a b c\td =\n\ne=\n"
"a b c\td =\n\ne=\n" .unpack( "M" ) => [ "a b c\td \ne" ]
|
-
n
网络字节顺序(大Endian)的unsigned short (16bit 无符号整数)
1 2 3 4 5 | [0,1,-1,32767,-32768,65535].pack( "n*" )
=> "\000\000\000\001\377\377\177\377\200\000\377\377"
"\000\000\000\001\377\377\177\377\200\000\377\377" .unpack( "n*" )
=> [0, 1, 65535, 32767, 32768, 65535]
|
-
N
网络字节顺序(大Endian)的unsigned long (32bit 无符号整数)
1 2 3 | [0,1,-1].pack( "N*" ) => "\000\000\000\000\000\000\000\001\377\377\377\377"
"\000\000\000\000\000\000\000\001\377\377\377\377" .unpack( "N*" ) => [0, 1, 4294967295]
|
-
v
"VAX"字节顺序(小Endian)的unsigned short (16bit 无符号整数)
1 2 3 4 5 | [0,1,-1,32767,-32768,65535].pack( "v*" )
=> "\000\000\001\000\377\377\377\177\000\200\377\377"
"\000\000\001\000\377\377\377\177\000\200\377\377" .unpack( "v*" )
=> [0, 1, 65535, 32767, 32768, 65535]
|
-
V
"VAX"字节顺序(小Endian)的unsigned long (32bit 无符号整数)
1 2 3 | [0,1,-1].pack( "V*" ) => "\000\000\000\000\001\000\000\000\377\377\377\377"
"\000\000\000\000\001\000\000\000\377\377\377\377" .unpack( "V*" ) => [0, 1, 4294967295]
|
-
f
单精度浮点数(取决于系统)
IA-32 (x86) (IEEE754 单精度 小Endian):
1 | [1.0].pack( "f" ) => "\000\000\200?"
|
sparc (IEEE754 单精度 大Endian):
1 | [1.0].pack( "f" ) => "?\200\000\000"
|
-
d
双精度浮点数(取决于系统)
IA-32 (IEEE754 双精度 小Endian):
1 | [1.0].pack( "d" ) => "\000\000\000\000\000\000\360?"
|
sparc (IEEE754 双精度 大Endian):
1 | [1.0].pack( "d" ) => "?\360\000\000\000\000\000\000"
|
-
e
小Endian的单精度浮点数(取决于系统)
IA-32:
1 | [1.0].pack( "e" ) => "\000\000\200?"
|
sparc:
1 | [1.0].pack( "e" ) => "\000\000\200?"
|
-
E
小Endian的双精度浮点数(取决于系统)
IA-32:
1 | [1.0].pack( "E" ) => "\000\000\000\000\000\000\360?"
|
sparc:
1 | [1.0].pack( "E" ) => "\000\000\000\000\000\000\360?"
|
-
g
大Endian的单精度浮点数(取决于系统)
IA-32:
1 | [1.0].pack( "g" ) => "?\200\000\000"
|
sparc:
1 | [1.0].pack( "g" ) => "?\200\000\000"
|
-
G
大Endian的双精度浮点数(取决于系统)
IA-32:
1 | [1.0].pack( "G" ) => "?\360\000\000\000\000\000\000"
|
sparc:
1 | [1.0].pack( "G" ) => "?\360\000\000\000\000\000\000"
|
-
p
指向null终点字符串的指针
1 2 3 | [ "" ].pack( "p" ) => "\310\037\034\010"
[ "a" , "b" , "c" ].pack( "p3" ) => " =\030\010\340^\030\010\360^\030\010"
[nil].pack( "p" ) => "\000\000\000\000"
|
-
P
指向结构体(定长字符串)的指针
1 2 3 4 5 | [nil].pack( "P" ) => "\000\000\000\000"
[ "abc" ].pack( "P3" ) => "x*\024\010"
[ "abc" ].pack( "P4" ) => ArgumentError: too short buffer for P(3 for 4)
[ "" ].pack( "P" ) => ArgumentError: too short buffer for P(0 for 1)
|
-
u
被uuencode编码的字符串
1 2 3 4 5 6 7 | [ "" ].pack( "u" ) => ""
[ "a" ].pack( "u" ) => "!80``\n"
[ "abc" ].pack( "u" ) => "#86)C\n"
[ "abcd" ].pack( "u" ) => "$86)C9```\n"
[ "a" *45].pack( "u" ) => "M86%A86%A86%A86%A86%A86%A86%A86%A86%A86%A86%A86%A86%A86%A86%A\n"
[ "a" *46].pack( "u" ) => "M86%A86%A86%A86%A86%A86%A86%A86%A86%A86%A86%A86%A86%A86%A86%A\n!80``\n"
[ "abcdefghi" ].pack( "u6" ) => "&86)C9&5F\n#9VAI\n"
|
-
U
utf-8
1 2 3 4 5 6 7 8 9 10 11 | [0].pack( "U" ) => "\000"
[1].pack( "U" ) => "\001"
[0x7f].pack( "U" ) => "\177"
[0x80].pack( "U" ) => "\302\200"
[0x7fffffff].pack( "U" ) => "\375\277\277\277\277\277"
[0x80000000].pack( "U" ) => ArgumentError
[0,256,65536].pack( "U3" ) => "\000\304\200\360\220\200\200"
"\000\304\200\360\220\200\200" .unpack( "U3" ) => [0, 256, 65536]
"\000\304\200\360\220\200\200" .unpack( "U" ) => [0]
"\000\304\200\360\220\200\200" .unpack( "U*" ) => [0, 256, 65536]
|
-
w
BER压缩整数
用7位来表现1字节, 这样就能以最少的字节数来表现任意大小的0以上的整数。各字节的最高位中除了数据的末尾以外,肯定还有个1(也就是说, 最高位可以表示数据伸展到的位置)。
BER是Basic Encoding Rules的缩略语(BER并非只能处理整数。ASN.1的编码中也用到了它)
-
x
读入null字节/1字节
-
X
后退1字节
-
@
向绝对位置移动
用例
下面是一些pack/unpack的用例。
其实有的问题并不需要使用pack, 但我们还是给了出它的例子。主要是因为pack很容易进行加密, 我们想向不愿使用pack的人提供一点新思路。
-
将数值(字符代码)的数组变为字符串的例子
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | p [82, 117, 98, 121].pack( "cccc" )
=> "Ruby"
p [82, 117, 98, 121].pack( "c4" )
=> "Ruby"
p [82, 117, 98, 121].pack( "c*" )
=> "Ruby"
s = ""
[82, 117, 98, 121].each {|c| s << c}
p s
=> "Ruby"
p [82, 117, 98, 121].collect {|c| sprintf "%c" , c}.join
=> "Ruby"
p [82, 117, 98, 121].inject( "" ) {|s, c| s << c}
=> "Ruby"
|
-
将字符串变为数值(字符代码)的数组的例子
1 2 3 4 5 6 7 | p "Ruby" .unpack( 'C*' )
=> [82, 117, 98, 121]
a = []
"Ruby" .each_byte {|c| a << c}
p a
=> [82, 117, 98, 121]
|
-
可以用"x"来处理null字节
1 2 | p [82, 117, 98, 121].pack( "ccxxcc" )
=> "Ru\000\000by"
|
-
可以用"x"来读取字符
1 2 | p "Ru\0\0by" .unpack( 'ccxxcc' )
=> [82, 117, 98, 121]
|
-
将Hex dump变为数值数组的例子
1 2 3 4 5 | p "61 62 63 64 65 66" . delete ( ' ' ).to_a.pack( 'H*' ).unpack( 'C*' )
=> [97, 98, 99, 100, 101, 102]
p "61 62 63 64 65 66" .split.collect {|c| c.hex}
=> [97, 98, 99, 100, 101, 102]
|
-
在二进制和16进制数的pack中, 指定的长度并不是指生成的字节数, 而是指位或半字节的个数
1 2 3 4 5 6 7 8 9 | p [0b01010010, 0b01110101, 0b01100010, 0b01111001].pack( "C4" )
=> "Ruby"
p [ "01010010011101010110001001111001" ].pack( "B32" ) # 8 bits * 4
=> "Ruby"
p [0x52, 0x75, 0x62, 0x79].pack( "C4" )
=> "Ruby"
p [ "52756279" ].pack( "H8" ) # 2 nybbles * 4
=> "Ruby"
|
-
模板字符'a'的长度指定 只适用于一个字符串
1 2 3 4 5 6 7 8 | p [ "RUBY" , "u" , "b" , "y" ].pack( "a4" )
=> "RUBY"
p [ "RUBY" , "u" , "b" , "y" ].pack( "aaaa" )
=> "Ruby"
p [ "RUBY" , "u" , "b" , "y" ].pack( "a*aaa" )
=> "RUBYuby"
|
-
在模板字符"a"中, 若长度不够时, 就用null字符进行填充
1 2 | p [ "Ruby" ].pack( "a8" )
=> "Ruby\000\000\000\000"
|
-
小Endian和大Endian
1 2 3 4 5 6 7 8 9 | p [1,2].pack( "s2" )
=> "\000\001\000\002" # 在大Endian的系统中的输出
=> "\001\000\002\000" # 在小Endian的系统中的输出
p [1,2].pack( "n2" )
=> "\000\001\000\002" # 系统无关的大Endian
p [1,2].pack( "v2" )
=> "\001\000\002\000" # 系统无关的小Endian
|
-
网络字节顺序的 signed long
1 2 3 4 5 6 7 | s = "\xff\xff\xff\xfe"
n = s.unpack( "N" )[0]
if n[31] == 1
n = -((n ^ 0xffff_ffff) + 1)
end
p n
=> -2
|
-
网络字节顺序的 signed long(第2个)
1 2 3 | s = "\xff\xff\xff\xfe"
p n = s.unpack( "N" ).pack( "l" ).unpack( "l" )[0]
=> -2
|
-
IP地址
1 2 3 4 5 6 | require 'socket'
p Socket. gethostbyname ( "localhost" )[3].unpack( "C4" ).join( "." )
=> "127.0.0.1"
p "127.0.0.1" .split( "." ).collect {|c| c.to_i}.pack( "C4" )
=> "\177\000\000\001"
|
-
sockaddr_in 结构体
1 2 3 4 5 | require 'socket'
p [Socket::AF_INET,
Socket. getservbyname ( 'echo' ),
127, 0, 0, 1].pack( "s n C4 x8" )
=> "\002\000\000\a\177\000\000\001\000\000\000\000\000\000\000\000"
|
ruby 1.7 特性: 除了pack/unpack以外, 您还可以使用Socket.pack_sockaddr_in 和 Socket.unpack_sockaddr_in方法。
-
'\0'终点字符串的地址
模板字符 "p" 和 "P"是为了处理C语言层的接口而存在的(例如ioctl)。
1 2 | p [ "foo" ].pack( "p" )
=> "8\266\021\010"
|
结果字符串看起来乱七八糟, 实际上它表示的是字符串"foo\0"的地址(二进制形式)。您可以像下面这样,把它变成您熟悉的形式
1 2 | printf "%#010x\n" , "8\266\021\010" .unpack( "L" )[0]
=> 0x0811b638
|
在pack的结果被GC回收之前, 地址所指的对象(在本例中是"foo\0")保证不会被GC所回收.
您只能使用pack的结果来unpack("p")和unpack("P")。
1 2 3 4 5 | p [ "foo" ].pack( "p" ).unpack( "p" )
=> [ "foo" ]
p "8\266\021\010" .unpack( "p" )
=> -:1:in `unpack': no associated pointer (ArgumentError)
from -:1
|
ruby 1.7 特性: "p"和"P"被解释为NULL指针, 它负责对nil进行特殊的处理。(下面是在普通的32bit机器上的结果)
1 2 | p [nil].pack( "p" ) #=> "\000\000\000\000"
p "\0\0\0\0" .unpack( "p" ) #=> [nil]
|
-
结构体的地址
例如, 表示
1 2 3 4 5 | struct {
int a;
short b;
long c;
} v = {1,2,3};
|
的字符串是
1 | v = [1,2,3].pack( "i!s!l!" )
|
(考虑到byte alignment的问题, 可能需要进行适当的padding才行)
您可以使用
1 2 | p [v].pack( "P" )
=> "\300\265\021\010"
|
来获得指向该结构体的地址。