1. 单磁盘故障测试
1,计算file1 file2 md5值
2,file1 拷贝到ceph存储系统的过程中拔掉硬盘,预期结果:写入过程不会中断,写入完成后,计算MD5值,对比MD5值一样
3,file2 拷贝到ceph存储系统的过程中插回硬盘,预期结果:数据重构周期内写入不会中断,写入完成后,计算MD5值,对比MD5值一样
2. 主机故障域 host
# ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-6 0.17599 root root_rulecopyhost1
-5 0.08800 host ceph4_rulecopyhost1
0 0.08800 osd.0 up 1.00000 1.00000
-7 0.08800 host ceph2_rulecopyhost1
1 0.08800 osd.1 up 1.00000 1.00000
3. 机架故障域 rack
# ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-7 0.26399 root root_rulerack
-6 0.08800 rack rack02_rulerack
-5 0.08800 host ceph3_rulerack
2 0.08800 osd.2 up 1.00000 1.00000
-9 0.08800 rack rack03_rulerack
-8 0.08800 host ceph4_rulerack
0 0.08800 osd.0 up 1.00000 1.00000
-11 0.08800 rack rack01_rulerack
-10 0.08800 host ceph2_rulerack
1 0.08800 osd.1 up 1.00000 1.00000
3.1 机架故障域 rack 数据重构
关闭一个机架,10分钟后数据开始重构recovery
分析机架故障域,2副本,数据迁移
1,故障前数据存放osd位置(osd.1 osd.0)
osdmap e308 pool 'poolrack' (16) object 'rbd_data.c9b2a6b8b4567.0000000000000079' -> pg 16.bf45dc7c (16.7c) -> up ([1,0], p1) acting ([1,0], p1)
2,故障时,数据存放osd位置(osd.1 )
osdmap e310 pool 'poolrack' (16) object 'rbd_data.c9b2a6b8b4567.0000000000000079' -> pg 16.bf45dc7c (16.7c) -> up ([1], p1) acting ([1], p1)
3,故障等待10分钟,开始数据重构recovery,数据重构完毕,查看数据存放osd位置(osd.1 osd.2)
sdmap e314 pool 'poolrack' (16) object 'rbd_data.c9b2a6b8b4567.0000000000000079' -> pg 16.bf45dc7c (16.7c) -> up ([1,2], p1) acting ([1,2], p1)
4,关机的机架重新开机,OSD会重新up,数据会回填
osdmap e325 pool 'poolrack' (16) object 'rbd_data.c9b2a6b8b4567.0000000000000079' -> pg 16.bf45dc7c (16.7c) -> up ([1,0], p1) acting ([1,0], p1)
5,机架故障域 rack 数据重构结论,down的osd数据会迁移到其他OSD上。故障机架重新开机后,数据会回填到之前的OSD上
4. 机房故障域 room