本文节选自《Netkiller Java 手札》
中国广东省深圳市望海路半岛城邦三期 518067 +86 13113668890 <netkiller@msn.com>
$Id: book.xml 606 2013-05-29 09:52:58Z netkiller $
版权 © 2015-2018 Neo Chan
版权声明
转载请与作者联系,转载时请务必标明文章原始出处和作者信息及本声明。
http://www.netkiller.cn |
---|
http://netkiller.github.io |
http://netkiller.sourceforge.net |
我的系列文档
编程语言
Netkiller Architect 手札 | Netkiller Developer 手札 | Netkiller Java 手札 | Netkiller Spring 手札 | Netkiller PHP 手札 | Netkiller Python 手札 |
---|---|---|---|---|---|
Netkiller Testing 手札 | Netkiller Cryptography 手札 | Netkiller Perl 手札 | Netkiller Docbook 手札 | Netkiller Project 手札 | Netkiller Database 手札 |
我们运行下面一段程序,向文件 netkiller.bin 中写入一个整形数值 1 ,然后观察文件变化
String filename = "netkiller.bin";
DataOutputStream out = new DataOutputStream(new FileOutputStream(filename));
out.writeInt(1);
out.close();
打开终端,使用 xxd 命令查看二进制文件
neo@MacBook-Pro ~/workspace/netkiller % xxd -b netkiller.bin
00000000: 00000000 00000000 00000000 00000001 ....
可以看到一串二进制 00000000 00000000 00000000 00000001,运行下面程序可以讲二进制转换为十进制,注意替换掉空格。
int n = Integer.valueOf("00000000 00000000 00000000 00000001".replaceAll(" ", ""), 2);
System.out.println(n);
运行结果是 1 ,为什前面那么多 0 呢?请运行下面一段程序
String filename = "netkiller.bin";
DataOutputStream out = new DataOutputStream(new FileOutputStream(filename));
out.writeInt(Integer.MAX_VALUE);
out.close();
现在观察结果
neo@MacBook-Pro ~/workspace/netkiller % xxd -b netkiller.bin
00000000: 01111111 11111111 11111111 11111111 ....
int n = Integer.valueOf("01111111 11111111 11111111 11111111".replaceAll(" ", ""), 2);
System.out.println(n);
输出结果是 2147483647, 这是 int 得最大值,2147483647 + 1 会怎么样呢?
String filename = "netkiller.bin";
DataOutputStream out = new DataOutputStream(new FileOutputStream(filename));
out.writeInt(Integer.MAX_VALUE + 1);
out.close();
System.out.println(Integer.MAX_VALUE + 1);
输出结果是 -2147483648,正确应该是 2147483648 这就是整形溢出。整形变量得二进制表示方法是4个字节长度32位 00000000 00000000 00000000 00000000 到 01111111 11111111 11111111 11111111 , 其中第一位0表示正数1表示负数。
neo@MacBook-Pro ~/workspace/netkiller % xxd -b netkiller.bin
00000000: 10000000 00000000 00000000 00000000 ....
整形溢出演示,超出整形范围怎么办? 使用 Long 型。
System.out.println(Integer.MAX_VALUE);
System.out.println(Integer.MAX_VALUE + 1);
System.out.println(Integer.MIN_VALUE);
System.out.println(Integer.MIN_VALUE - 1);
输出结果如下:
2147483647
-2147483648
-2147483648
2147483647
负数演示
String filename = "netkiller.bin";
DataOutputStream out = new DataOutputStream(new FileOutputStream(filename));
out.writeInt(-1);
out.writeInt(Integer.MAX_VALUE + 1);
out.close();
-1 得结果是 11111111 11111111 11111111 11111111
neo@MacBook-Pro ~/workspace/netkiller % xxd -b netkiller.bin
00000000: 11111111 11111111 11111111 11111111 ....
现在我们存储两个整形数值
String filename = "netkiller.bin";
DataOutputStream out = new DataOutputStream(new FileOutputStream(filename));
out.writeInt(1);
out.writeInt(-1);
out.close();
很清楚的看到里面有两个数值,1 和 -1
neo@MacBook-Pro ~/workspace/netkiller % xxd -c 4 -b netkiller.bin
00000000: 00000000 00000000 00000000 00000001 ....
00000004: 11111111 11111111 11111111 11111111 ....
读取二进制文件中的 int 数据
DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStream(filename)));
try {
int i = in.readInt();
System.out.println(i);
} catch (EOFException e) {
e.printStackTrace();
}
String filename = "netkiller.bin";
DataOutputStream out = new DataOutputStream(new FileOutputStream(filename));
out.writeByte(1);
out.close();
byte 只占用一个字节8位
neo@MacBook-Pro ~/workspace/netkiller % xxd -c 4 -b netkiller.bin
00000000: 00000001
如果写入 -1 结果是,由此得出 第一位 0 是正数,1 是负数,可以得出他的取值范围 -128 ~ 127。超出范围也会溢出。
neo@MacBook-Pro ~/workspace/netkiller % xxd -c 4 -b netkiller.bin
00000000: 11111111
常常写入最小值与最大值
String filename = "netkiller.bin";
DataOutputStream out = new DataOutputStream(new FileOutputStream(filename));
out.writeByte(Byte.MIN_VALUE);
out.writeByte(Byte.MAX_VALUE);
out.close();
运行结果
neo@MacBook-Pro ~/workspace/netkiller % xxd -c 1 -b netkiller.bin
00000000: 10000000 .
00000001: 01111111 .
写入一个字符
String filename = "netkiller.bin";
DataOutputStream out = new DataOutputStream(new FileOutputStream(filename));
out.writeBytes("a");
out.close();
写入结果
neo@MacBook-Pro ~/workspace/netkiller % xxd -c 1 -b netkiller.bin
00000000: 01100001 a
从 ASCII 表中查出 01100001 十进制 97 十六进制 61 对应字母 a
写入一段字符串
String filename = "netkiller.bin";
DataOutputStream out = new DataOutputStream(new FileOutputStream(filename));
out.writeBytes("http://www.netkiller.cn");
out.close();
运行结果
neo@MacBook-Pro ~/workspace/netkiller % xxd -c 8 -b netkiller.bin
00000000: 01101000 01110100 01110100 01110000 00111010 00101111 00101111 01110111 http://w
00000008: 01110111 01110111 00101110 01101110 01100101 01110100 01101011 01101001 ww.netki
00000010: 01101100 01101100 01100101 01110010 00101110 01100011 01101110 ller.cn
读取二进制文件中的 byte 字符串,readAllBytes() 可以一次读取所有 byte 到 byte[] 中。
DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStream(filename)));
try {
System.out.println(new String(in.readAllBytes()));
} catch (EOFException e) {
e.printStackTrace();
}
readByte() 逐字节读取
DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStream(filename)));
try {
char c = ' ';
while (true) {
try {
c = (char) in.readByte();
System.out.print(c);
} catch (EOFException e) {
System.out.println();
break;
}
}
} catch (Exception e) {
e.printStackTrace();
}
现在我们已经掌握了 byte 的操作方法,现在我们来做一个例子,读取 int 数据,int 是由 4 个字节组成一组。所以我们每次取 4个字节。
// 这个例子中,我们写入三个数值到 netkiller.bin 文件,分别是 1024,-128,2147483647
String filename = "netkiller.bin";
DataOutputStream out = new DataOutputStream(new FileOutputStream(filename));
out.writeInt(1024);
out.writeInt(-128);
out.writeInt(Integer.MAX_VALUE);
out.close();
二进制文件如下
neo@MacBook-Pro ~/workspace/netkiller % xxd -c 4 -b netkiller.bin
00000000: 00000000 00000000 00000100 00000000 ....
00000004: 11111111 11111111 11111111 10000000 ....
00000008: 01111111 11111111 11111111 11111111 ....
从二进制文件读出 int 数据。
String filename = "netkiller.bin";
FileInputStream stream = new FileInputStream(filename);
byte[] buffer = new byte[4];
while (stream.read(buffer) != -1) {
ByteBuffer byteBuffer = ByteBuffer.wrap(buffer);
System.out.println(byteBuffer.getInt());
}
运行结果
1024
-128
2147483647
我们想文件写入两个布尔类型,一个是 true, 另一个是 false
String filename = "netkiller.bin";
DataOutputStream out = new DataOutputStream(new FileOutputStream(filename));
out.writeBoolean(true);
out.writeBoolean(false);
out.close();
运行结果可以看出 boolean 使用了一个字节,最后一位 1 表示true, 0 表示 false。所以对于二进制文件最小单位就是 byte 字节,虽然boolean型只需要一个 1 bit 位,但是存储的最小单位是字节,所以前面需要补7个零 0000000。
neo@MacBook-Pro ~/workspace/netkiller % xxd -c 1 -b netkiller.bin
00000000: 00000001 .
00000001: 00000000 .
使用 ls 命令可以看这个文件占用了 2B(两个字节)
neo@MacBook-Pro ~/workspace/netkiller % ll netkiller.bin
-rw-r--r-- 1 neo staff 2B Oct 18 13:47 netkiller.bin
读取二进制文件中的 boolean 数据
DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStream(filename)));
try {
boolean bool = in.readBoolean();
System.out.println(bool);
} catch (EOFException e) {
e.printStackTrace();
}
String filename = "netkiller.bin";
DataOutputStream out = new DataOutputStream(new FileOutputStream(filename));
out.writeLong(1);
out.close();
有了上面 int 型数据的经验,下面一看你就会明白。long 型采用 8 个字节保存数据,是 int 的一倍。取值范围这里就不多说了,也会存在溢出现象。
neo@MacBook-Pro ~/workspace/netkiller % xxd -c 8 -b netkiller.bin
00000000: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000001 ........
取值范围
String filename = "netkiller.bin";
DataOutputStream out = new DataOutputStream(new FileOutputStream(filename));
out.writeLong(Long.MIN_VALUE);
out.writeLong(Long.MAX_VALUE);
out.close();
输出文件
neo@MacBook-Pro ~/workspace/netkiller % xxd -c 8 -b netkiller.bin
00000000: 10000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ........
00000008: 01111111 11111111 11111111 11111111 11111111 11111111 11111111 11111111 ........
读取二进制文件中的 long 数据
DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStream(filename)));
try {
long l = in.readLong();
System.out.println(l);
} catch (EOFException e) {
e.printStackTrace();
}
有符号 signed char 类型的范围为 -128~127
无符号 unsigned char 的范围为0~ 255
char 与 byte 操作类似,我们首先去 ASCII 表查找字符 A 对应 65,我们将 65 写入二进制文件。然后读取该字符,输出结果是 A。
String filename = "netkiller.bin";
DataOutputStream out = new DataOutputStream(new FileOutputStream(filename));
out.writeChar(65);
out.close();
DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStream(filename)));
try {
char c = in.readChar();
System.out.println(c);
} catch (EOFException e) {
e.printStackTrace();
}
从二进制文件中我们可以看到 char 类型占用2个字节16位
neo@MacBook-Pro ~/workspace/netkiller % xxd -c 2 -b netkiller.bin
00000000: 00000000 01000001 .A
使用 writeChars()写入字符串到二进制文件
String filename = "netkiller.bin";
DataOutputStream out = new DataOutputStream(new FileOutputStream(filename));
out.writeChars("http://www.netkiller.cn");
out.close();
DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStream(filename)));
char c = ' ';
while (true) {
try {
c = in.readChar();
System.out.print(c);
} catch (EOFException e) {
System.out.println();
break;
}
}
二进制文件如下,你会发现第一个字节没有用到,很多 00000000 所以如果存储英文 byte 更适合,char 是双倍 byte 开销。
neo@MacBook-Pro ~/workspace/netkiller % xxd -c 8 -b netkiller.bin
00000000: 00000000 01101000 00000000 01110100 00000000 01110100 00000000 01110000 .h.t.t.p
00000008: 00000000 00111010 00000000 00101111 00000000 00101111 00000000 01110111 .:././.w
00000010: 00000000 01110111 00000000 01110111 00000000 00101110 00000000 01101110 .w.w...n
00000018: 00000000 01100101 00000000 01110100 00000000 01101011 00000000 01101001 .e.t.k.i
00000020: 00000000 01101100 00000000 01101100 00000000 01100101 00000000 01110010 .l.l.e.r
00000028: 00000000 00101110 00000000 01100011 00000000 01101110 ...c.n
存储汉字
String filename = "netkiller.bin";
DataOutputStream out = new DataOutputStream(new FileOutputStream(filename));
String s = "陈";
char name = s.charAt(s.length() - 1);
out.writeChar(name);
out.close();
DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStream(filename)));
char c = ' ';
while (true) {
try {
c = in.readChar();
System.out.print(c);
} catch (EOFException e) {
System.out.println();
break;
}
}
二进制文件如下,使用两个字节表示一个汉字
neo@MacBook-Pro ~/workspace/netkiller % xxd -c 2 -b netkiller.bin
00000000: 10010110 01001000 .H
转成 Hex 十六进制,得到 96 48 两个数字。
neo@MacBook-Pro ~/workspace/netkiller % hexdump netkiller.bin
0000000 96 48
0000002
现在去搜索引擎搜索“汉字内码”,然后查询“陈”这个汉字,可以看到 Unicode编码16进制就是 96 48
尝试写入汉字字符串
String filename = "netkiller.bin";
DataOutputStream out = new DataOutputStream(new FileOutputStream(filename));
out.writeChars("陈景峰");
out.close();
DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStream(filename)));
try {
char c = ' ';
while (true) {
try {
c = in.readChar();
System.out.print(c);
} catch (EOFException e) {
System.out.println();
break;
}
}
} catch (Exception e) {
e.printStackTrace();
}
neo@MacBook-Pro ~/workspace/netkiller % xxd -b netkiller.bin
00000000: 10010110 01001000 01100110 01101111 01011100 11110000 .Hfo\.
这次我们使用新的文件名 netkiller.txt
String filename = "netkiller.txt";
DataOutputStream out = new DataOutputStream(new FileOutputStream(filename));
out.writeUTF("峰");
out.close();
DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStream(filename)));
try {
System.out.println(in.readUTF());
} catch (EOFException e) {
e.printStackTrace();
}
查看二进制文件,一个汉字怎么这么多字节?
neo@MacBook-Pro ~/workspace/netkiller % xxd -b netkiller.txt
00000000: 00000000 00000011 11100101 10110011 10110000 .....
转成 16 禁止看看。
neo@MacBook-Pro ~/workspace/netkiller % hexdump netkiller.txt
0000000 00 03 e5 b3 b0
0000005
我们在网上查询 “峰” 字的汉字内码,可以看到UTF-8 内码是 E5 B3 B0。这是因为UTF8使用三个字节存储汉字。 00000000 00000011 可能是 UTF 标志位,具体我也不太清楚,总之不是 BOM 信息。
我们现在写入一个字符串试试
out.writeUTF("陈景峰");
xxd -s 2 -c 3 表示跳过两个字节,三列显示
neo@MacBook-Pro ~/workspace/netkiller % xxd -s 2 -c 3 -b netkiller.txt
00000002: 11101001 10011001 10001000 ...
00000005: 11100110 10011001 10101111 ...
00000008: 11100101 10110011 10110000 ...
UTF字符是可以直接使用文本工具查看的。
neo@MacBook-Pro ~/workspace/netkiller % cat netkiller.txt
陈景峰
String filename = "netkiller.bin";
DataOutputStream out = new DataOutputStream(new FileOutputStream(filename));
out.writeShort(1);
out.flush();
out.close();
输出结果,Short 使用两个字节16位表示。
neo@MacBook-Pro ~/workspace/netkiller % xxd -c 2 -b netkiller.bin
00000000: 00000000 00000001 ..
Short 分为有符号和无符号类型
String filename = "netkiller.bin";
DataOutputStream out = new DataOutputStream(new FileOutputStream(filename));
out.writeShort(1);
out.writeShort(1);
out.writeShort(-1);
out.writeShort(-1);
out.flush();
out.close();
DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStream(filename)));
try {
System.out.println(in.readShort());
System.out.println(in.readUnsignedShort());
System.out.println(in.readShort());
System.out.println(in.readUnsignedShort());
} catch (EOFException e) {
e.printStackTrace();
}
运行结果
1
1
-1
65535
有符号的取值范围
最小值:Short.MIN_VALUE=-32768 (-2的15此方)
最大值:Short.MAX_VALUE=32767 (2的15次方-1)
无符号的取值范围是 0 ~ 65535
String filename = "netkiller.bin";
DataOutputStream out = new DataOutputStream(new FileOutputStream(filename));
out.writeFloat(0);
out.writeFloat(1.0f);
out.writeFloat(1.1f);
out.flush();
out.close();
DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStream(filename)));
float c = 0;
while (true) {
try {
c = in.readFloat();
System.out.println(c);
} catch (EOFException e) {
System.out.println();
break;
}
}
float 使用 4 字节 32 为表示浮点类型,float 不同于前面数据类型,无法直接读取浮点数,需要经过计算才能得出,有点复杂。
neo@MacBook-Pro ~/workspace/netkiller % xxd -c 4 -b netkiller.bin
00000000: 00000000 00000000 00000000 00000000 ....
00000004: 00111111 10000000 00000000 00000000 ?...
00000008: 00111111 10001100 11001100 11001101 ?...
浮点型示意图
/------------- 32 bit ----------------\
| 1 | 8 | 23 |
|--------------------------------------|
31 30 22 0
^ ^ ^
符号位 指数位 尾数部分 32位
首先float二进制是从后向前读。与上面所有类型相反。
符号位(Sign) : 0代表正,1代表为负
指数位(Exponent):用于存储科学计数法中的指数数据,并且采用移位存储
尾数部分(Mantissa):尾数部分
将一个内存存储的float二进制格式转化为十进制的步骤:
(1)将第22位到第0位的二进制数写出来,在最左边补一位“1”,得到二十四位有效数字。将小数点点在最左边那个“1”的右边。
(2)取出第29到第23位所表示的值n。当30位是“0”时将n各位求反。当30位是“1”时将n增1。
(3)将小数点左移n位(当30位是“0”时)或右移n位(当30位是“1”时),得到一个二进制表示的实数。
(4)将这个二进制实数化为十进制,并根据第31位是“0”还是“1”加上正号或负号即可。
1.0f = 00111111 10000000 00000000 00000000
Sign 31 位是 0 表示正数
Exponent 23~30 位 0111111 1
Mantissa 0~22 位 0000000 00000000 00000000
得到
| 0 | 0111111 1 | 0000000 00000000 00000000 |
具体细节请参考 IEEE R32.24
String filename = "netkiller.bin";
DataOutputStream out = new DataOutputStream(new FileOutputStream(filename));
out.writeDouble(12.5d);
out.flush();
out.close();
DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStream(filename)));
double d = 0d;
while (true) {
try {
d = in.readDouble();
System.out.println(d);
} catch (EOFException e) {
System.out.println();
break;
}
}
二进制文件
neo@MacBook-Pro ~/workspace/netkiller % xxd -c 8 -b netkiller.bin
00000000: 01000000 00101001 00000000 00000000 00000000 00000000 00000000 00000000 @)......
/------------------------- 64 bit ------------------------------\
| 1 | 11 | 52 |
|----------------------------------------------------------------|
63 62 51 0
^ ^ ^
符号位 指数位 尾数部分 64位
首先float二进制是从后向前读。与上面所有类型相反。
符号位(Sign) : 0代表正,1代表为负
指数位(Exponent):用于存储科学计数法中的指数数据,并且采用移位存储
尾数部分(Mantissa):尾数部分
详细参加考 IEEE R64.53
String filename = "netkiller.bin";
DataOutputStream out = new DataOutputStream(new FileOutputStream(filename));
out.writeInt(1024);
out.writeShort(255);
out.writeLong(100000000000L);
out.writeFloat(3.14f);
out.writeDouble(3.141592653579d);
out.writeBoolean(true);
out.writeChar(165);
out.writeChars("陈景峰");
out.writeUTF("Netkiller Java 手札 - http://www.netkiller.cn");
out.writeChars("这是最后一行\r\n");
out.flush();
out.close();
DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStream(filename)));
System.out.println(in.readInt());
System.out.println(in.readUnsignedShort());
System.out.println(in.readLong());
System.out.println(in.readFloat());
System.out.println(in.readDouble());
System.out.println(in.readBoolean());
System.out.println(in.readChar());
int i = 0;
String name = "";
while (i < 3) {
try {
char c = in.readChar();
name += c;
} catch (EOFException e) {
break;
}
i++;
}
System.out.println(name);
System.out.println(in.readUTF());
System.out.println(in.readUTF());
需要注意的一点是 out.writeChars("陈景峰"); 写入char字符串,在读取的时候你需要知道字符串的长度。然后循环取出char数据。
二进制文件内容
neo@MacBook-Pro ~/workspace/netkiller % xxd -c 8 -b netkiller.bin
00000000: 00000000 00000000 00000100 00000000 00000000 11111111 00000000 00000000 ........
00000008: 00000000 00010111 01001000 01110110 11101000 00000000 01000000 01001000 ..Hv..@H
00000010: 11110101 11000011 01000000 00001001 00100001 11111011 01010100 01000011 ..@.!.TC
00000018: 11001110 00101000 00000001 00000000 10100101 10010110 01001000 01100110 .(....Hf
00000020: 01101111 01011100 11110000 00000000 00101111 01001110 01100101 01110100 o\../Net
00000028: 01101011 01101001 01101100 01101100 01100101 01110010 00100000 01001010 killer J
00000030: 01100001 01110110 01100001 00100000 11100110 10001001 10001011 11100110 ava ....
00000038: 10011100 10101101 00100000 00101101 00100000 01101000 01110100 01110100 .. - htt
00000040: 01110000 00111010 00101111 00101111 01110111 01110111 01110111 00101110 p://www.
00000048: 01101110 01100101 01110100 01101011 01101001 01101100 01101100 01100101 netkille
00000050: 01110010 00101110 01100011 01101110 10001111 11011001 01100110 00101111 r.cn..f/
00000058: 01100111 00000000 01010100 00001110 01001110 00000000 10001000 01001100 g.T.N..L
00000060: 00000000 00001101 00000000 00001010 ....
16 进制编辑器更好阅读一些
neo@MacBook-Pro ~/workspace/netkiller % hexdump -C netkiller.bin
00000000 00 00 04 00 00 ff 00 00 00 17 48 76 e8 00 40 48 |..........Hv..@H|
00000010 f5 c3 40 09 21 fb 54 43 ce 28 01 00 a5 96 48 66 |..@.!.TC.(....Hf|
00000020 6f 5c f0 00 2f 4e 65 74 6b 69 6c 6c 65 72 20 4a |o\../Netkiller J|
00000030 61 76 61 20 e6 89 8b e6 9c ad 20 2d 20 68 74 74 |ava ...... - htt|
00000040 70 3a 2f 2f 77 77 77 2e 6e 65 74 6b 69 6c 6c 65 |p://www.netkille|
00000050 72 2e 63 6e 8f d9 66 2f 67 00 54 0e 4e 00 88 4c |r.cn..f/g.T.N..L|
00000060 00 0d 00 0a |....|
00000064