why back up? - accidental deletion - historical archive - drive failure we address only drive failure basic idea: mirror over the net by copying writes hard part: writer overruns reader our action: write less - RATE_HI: block # and data - RATE_LO: block # - RATE_OFF: nothing kernel part: - hooks at low level in disk drivers (currently wd and sd, others easy to add) - cdev pseudo-device driver - back-pressure: queue size limits (8K blocks with data, 64K without) userland part, client host: - talks to cdev driver - talks over network to server - maintains dirty bitmap - dirty bitmap has multiple levels, scaling factor 32 - five states - NOCONN: tries connection, on failure sleeps 1 minute and retries - SCANNING: requests block cksums, reads local blocks, compares, sends - CATCHUP: scans dirty bitmap, sends dirtied blocks - LIVE: idle, sending blocks as written - RESCAN: transient, used while waiting for abort userland part, server host: - talks to network, maintains backup file/partition networking: - one on server per client partition - one connection per client - another connection does nothing until it passes the crypto exchange; then it replaces current connection (which is dropped) - after crypto exchange, simple protocol: one type byte, with more data following depending on the type - block size (granularity of copying) is 512 bytes - block numbers in wire protocol are 32 bits - => max partition size 2TB - relatively easy to raise this limit by revising the protocol - one client-host-initiated TCP connection => works through NAT, FW - bandwidth demands - must be above the average write rate or client will never catch up - rescan takes a while on slow machines - during rescan, heavy s->c data flow - after rescan, almost entirely c->s data flow crypto: - caveat: IANAC - each end sends 16 random bytes - each end computes hashes, generating 256-byte arcfour key - each end sends 16 random bytes, encrypted - each end receives 16 bytes, decrypts, encrypts, and sends - each end checks it got what it sent - shared secret is used when generating hashes, never sent even encrypted security: - against what? - caveat: IANAC - passive snooping data theft: good - MitM data theft: good - traffic analysis: weak - cribs for cryptanalysis: weak (guessable block contents) - random data disruption by active attacker: hopeless - will be fixed next un-disrupted rescan - backup server is potential weak point: has cleartext copy of disk - shared server means copies of many disks - see future work re encrypting this - disk data exists in user VM on client host (for the paranoid) - if server is remote, can defend against localized physical disaster - eg, my work machine backs up to my home server slashdot questions - der Mouse name - no relation to de Raadt - German, before I knew any German - active filesystems? - yes - it mirrors all writes, not caring about files - nothing new - never said it was - vs raid1 over network block device - "producer overrunning consumer" issues - why raw disk duping? - backs up unused space: yes, but "the steady state of disks is full" - geometries compatible: no need - vs dump/restore, rsync, g4u (including dump -L) - disk-block granularity vs file granularity - sends only deltas during normal operation - liveness - ease of restoring - all eggs in one basket: no more so than dump/restore or rsync - back up everything: true - unmounted: no, an advantage over dump/restore, but not rsync - "shared secret" is an oxymoron - true to an extent, see future work other - drbd - very similar - drbd expects a dedicated network - drbd is for Linux OSes: - NetBSD 1.4T+mouseisms - NetBSD 2.0 experience: - approaching one year in operation (2004-05-16) - two disk failures, all data was safe each time - rescans on boot are annoying - roaming operation is totally painless - fast server needed to withstand client attack on house powerup possible future work: - ports to other systems - hardest part is probably the kernel stuff - needs internals doc (esp. comments in code) - public-key crypto - encrypt disk data on client before sending - fixes server having cleartext copy of disk - must keep the key offline - alleviates crypto crib issue somewhat - add decoy traffic to defeat traffic analysis - strong packet signatures to defeat data disruption - make client able to assume server copy doesn't change and thus not rescan when it loses and regains the connection without client restart - add some kind of version number to the protocol - try to improve rescans on boot