Shasplit takes a large data block, splits it into smaller parts,
and puts those into an SHA-based content-addressed store.
Reassembling those parts is a trivial cat
invocation.
Repeating parts (e.g. from previous split operations)
are stored only once,
which allows for efficient incremental backups
of whole LVM snapshots via rsync.
Shasplit shows its strengths on encrypted block devices,
but might be useful for non-encrypted data, too.
If you like this tool, feel free to donate:
Have fun!
Installation for a single user (assuming that ~/bin
is in PATH):
git clone https://github.com/vog/shasplit.git ln -s $(pwd)/shasplit/shasplit.py ~/bin/shasplit
Or, if you prefer a system-wide installation:
git clone https://github.com/vog/shasplit.git /opt/shasplit ln -s /opt/shasplit/shasplit.py /usr/bin/shasplit
Shasplit stores everything in the
~/.shasplit
directory.
If you don't like that,
place a symlink to the desired location
(here: /backup
):
ln -sT /backup ~/.shasplit
By default, Shasplit splits the data into parts of size 4 MiB and hashes each part with SHA-256, but will work equally well with any other part size and any other strong secure hash algorithm.
Add a new backup
from /dev/vg0/foobar
with name foobar
,
keeping at most 7 completed backups:
shasplit add foobar 7 < /dev/vg0/foobar
If you backup a running system, don't forget to create a snapshot before backup and to release it afterwards. For LVM volumes, Shasplit takes care of snapshots automatically if you specify the volume group:
shasplit add vg0 foobar 7
The default LVM snapshot size is 1 GiB. If this is too small or too large, you can specify the exact snapshot size in MiB as an additional argument. For example, the following command will use an LVM snapshot that is 10 GiB (10240 MiB) in size:
shasplit add vg0 foobar 7 10240
If ~/.shasplit
is located
on a remote file system such as NFS or SSHFS,
you are done.
Otherwise, you'll have to sync the
~/.shasplit
directory to the backup system.
When using rsync,
you should use the options
-W
for improved performance
and
--delete-after
to keep the old backups
until the new backups are complete:
rsync -aW --delete-after ~/.shasplit/ backup@backupserver:.shasplit/
Show status information for all instances:
shasplit status
Example output:
foobar 2013-05-23T03:42:42 4294967296 100% 2013-05-22T03:42:47 4294967296 75% incomplete 2013-05-21T03:42:23 4294967296 100% raboof 2013-05-23T03:38:24 (unknown) 0% incomplete 2013-05-22T03:38:27 671088640 100%
(Not yet implemented) Perform a thorough integrity check and report all parts and instances that are incomplete or inconsistent:
shasplit check
(Not yet implemented) Run the internal tests:
shasplit test
Recover the latest complete backup of foobar
with:
shasplit recover foobar > /dev/vg0/foobar
Recover the backup of foobar
at 2013-05-23T03:42:42
:
shasplit recover foobar 2013-05-23T03:42:42 > /dev/vg0/foobar
If Shasplit is not available on the target system, it is very simple to recover your data manually, using standard Unix tools.
First, you have to decide which instance you want to look at:
cd ~/.shasplit/foobar/2013-05-23T034242
Then, recovery is a trivial cat
invocation:
cat */* > /dev/vg0/foobar
In case of Argument list too long
errors,
do instead:
find . -mindepth 2 | sort | xargs cat > /dev/vg0/foobar
Before recovery, you may want to run a fast check for completeness by hand:
wc -c */*; cat size
To be safe, you can also run an integrity check for that instance by hand:
cat */* | shasum -a 256; cat hash
You can always enable debug output
via the SHASPLIT_DEBUG
environment variable:
SHASPLIT_DEBUG=1 shasplit add vg0 foobar 7
Design goals:
cat
Base directory layout:
Directory layout of each instance:
Directory layout of .data
: