个人数据备份方案设计（二）

2019-04-13

前言

在上一篇中，我设计了一个基于移动硬盘的备份方案。本文是我对这个方案的初步实践。通过实践，我发现这个方案的成本没有想象中那么高，还算可以接受。

理论学习：rsync

rsync 是我的备份系统中的主力工具，因此必须认真学习，以下是我通过 man rsync 的学习笔记。

rsync 的关键在于它有一个 remote-update 协议，能够在两头的文件集中只传输变动的部分，所使用的算法是一种高效的 checksum-search 算法。

rsync 的特性包括：

支持复制 links, devices, owners, groups, permissions
类似于 GNU tar 的 exclude 和 exclude-from 设置
CVS exclude 模式，忽略 CVS 项目中需要忽略的文件
支持 ssh 或者 rsh
不需要超级管理员权限
文件管道式传输，最小化延迟开销
支持匿名或者授权的 rsync daemons (用作镜像非常理想)

rsync 不仅支持向网络中另一台机器拷贝文件，也支持本地拷贝文件。在这里我主要关心的是后者。

基本用法如下：

rsync -t *.c foo:src/

其中：

*.c 表示当前目录下所有 .c 后缀的文件
foo 表示机器名
src/ 是 foo 下的地址
如果远端已经有这个文件，则将会传输差异性的部分

rsync -avz foo:src/bar /data/tmp

其中：

这回递归拷贝 foo 机器 src/bar 目录下的文件到本地的 /data/tmp/bar 目录下，注意 bar 目录名是带着的
文件会按照 archive 模式传输，symbolic links, devices, attributes, permissions, ownerships 等都会保留
还会使用压缩模式来减少传输中的数据量
如果 bar 后面再加一个斜杠，就是拷贝到 /data/tmp/ 目录下

感兴趣的参数：

参数	完整参数	功能
-r	--recursive	recurse into directories
-z	--compress	compress file data during the transfer
-h	--human-readable	output numbers in a human-readable format
	--progress	show progress during transfer
	--delete	delete extraneous files from dest dirs
-m	--prune-empty-dirs	prune empty directory chains from file-list
	--stats	This tells rsync to print a verbose set of statistics on the file transfer, allowing you to tell how effective the rsync algorithm is for your data.
-h	--human-readable	Output numbers in a more human-readable format.

两个 Repository

在上一篇文章中我介绍了 Repository 的概念。Repository 是一个目录，在我的笔记本与移动硬盘之间建立映射关系。

但是我发现一个问题：我的笔记本容量为 500GB，移动硬盘容量为 2TB，Repository 目录上限只能以小的磁盘。

为了解决这个问题，我想到的方法是对它进行拆分，拆分出两个 Repository：

主 Repository：时效性高的资料，需要双向映射关系
次 Repository：时效性低的资料，笔记本上不全没关系，硬盘中是全集，按需获取即可

主 Repository

我首先进行主 Repository 备份：

rsync -r --progress --stats /Users/maxiee/Maxiee备份库 "/Volumes/TOSHIBA EXT"

10:34 分开始备份，11:01 备份完成，历时约 30min，供备份了 63.8GB。

rsync 输出内容如下：

Number of files: 12905
Number of files transferred: 9365
Total file size: 63802158333 bytes
Total transferred file size: 63802158333 bytes
Literal data: 63802158333 bytes
Matched data: 0 bytes
File list size: 442776
File list generation time: 0.277 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 63810826983
Total bytes received: 227290

sent 63810826983 bytes  received 227290 bytes  39621890.27 bytes/sec
total size is 63802158333  speedup is 1.00

其中：

第一次备份会耗费这么多时间，由于 rsync 是增量备份的，在以后的备份过程中会速度飞快！

次 Repository

次 Repository 的备份与主 Repository 基本上是一致的。

不同之处在于两者的映射关系上。

主 Repository 追求的是笔记本与移动硬盘两者完全一致。

次 Repository 追求的是：

移动硬盘中保存有全集，笔记本中始终是其一部分
笔记本中可能会有新的内容
从笔记本向移动硬盘 rsync，将新的部分添加进去
这样次 Repository 能充分发挥移动硬盘的空间容量

TODO1：备份的备份

移动硬盘本身也是不可靠的介质，如果只将数据备份到一个移动硬盘，一旦他挂了备份数据就丢失了。

因此我下一步的计划是对备份进行备份。

这需要在笔记本上连接两个移动硬盘（也分主次），在两者之间进行 rsync 操作。

TODO2：工具化

前期每次手写命令和还好，后面希望能写一个命令固化这些，这样以后只要输入这个命令就可以了。

这个也好办，bash_profile 里面 alias 一下就好。