why back up?
- accidental deletion
- historical archive
- drive failure
we address only drive failure

basic idea: mirror over the net by copying writes
hard part: writer overruns reader
our action: write less
- RATE_HI: block # and data
- RATE_LO: block #
- RATE_OFF: nothing

kernel part:
- hooks at low level in disk drivers
  (currently wd and sd, others easy to add)
- cdev pseudo-device driver
- back-pressure: queue size limits (8K blocks with data, 64K without)

userland part, client host:
- talks to cdev driver
- talks over network to server
- maintains dirty bitmap
  - dirty bitmap has multiple levels, scaling factor 32
- five states
  - NOCONN: tries connection, on failure sleeps 1 minute and retries
  - SCANNING: requests block cksums, reads local blocks, compares, sends
  - CATCHUP: scans dirty bitmap, sends dirtied blocks
  - LIVE: idle, sending blocks as written
  - RESCAN: transient, used while waiting for abort

userland part, server host:
  - talks to network, maintains backup file/partition

networking:
- one <address,port> on server per client partition
- one connection per client
  - another connection does nothing until it passes the crypto exchange;
    then it replaces current connection (which is dropped)
  - after crypto exchange, simple protocol: one type byte, with
    more data following depending on the type
- block size (granularity of copying) is 512 bytes
- block numbers in wire protocol are 32 bits
  - => max partition size 2TB
  - relatively easy to raise this limit by revising the protocol
- one client-host-initiated TCP connection => works through NAT, FW
- bandwidth demands
  - must be above the average write rate or client will never catch up
  - rescan takes a while on slow machines
  - during rescan, heavy s->c data flow
  - after rescan, almost entirely c->s data flow

crypto:
- caveat: IANAC
- each end sends 16 random bytes
- each end computes hashes, generating 256-byte arcfour key
- each end sends 16 random bytes, encrypted
- each end receives 16 bytes, decrypts, encrypts, and sends
- each end checks it got what it sent
- shared secret is used when generating hashes, never sent even encrypted

security:
- against what?
- caveat: IANAC
- passive snooping data theft: good
- MitM data theft: good
- traffic analysis: weak
- cribs for cryptanalysis: weak (guessable block contents)
- random data disruption by active attacker: hopeless
  - will be fixed next un-disrupted rescan
- backup server is potential weak point: has cleartext copy of disk
  - shared server means copies of many disks
  - see future work re encrypting this
- disk data exists in user VM on client host (for the paranoid)
- if server is remote, can defend against localized physical disaster
  - eg, my work machine backs up to my home server

slashdot questions
- der Mouse name
  - no relation to de Raadt
  - German, before I knew any German
- active filesystems?
  - yes - it mirrors all writes, not caring about files
- nothing new
  - never said it was
- vs raid1 over network block device
  - "producer overrunning consumer" issues
- why raw disk duping?
  - backs up unused space: yes, but "the steady state of disks is full"
  - geometries compatible: no need
- vs dump/restore, rsync, g4u (including dump -L)
  - disk-block granularity vs file granularity
  - sends only deltas during normal operation
  - liveness
  - ease of restoring
  - all eggs in one basket: no more so than dump/restore or rsync
  - back up everything: true
  - unmounted: no, an advantage over dump/restore, but not rsync
- "shared secret" is an oxymoron
  - true to an extent, see future work

other
- drbd
  - very similar
  - drbd expects a dedicated network
  - drbd is for Linux

OSes:
- NetBSD 1.4T+mouseisms
- NetBSD 2.0

experience:
- approaching one year in operation (2004-05-16)
- two disk failures, all data was safe each time
- rescans on boot are annoying
- roaming operation is totally painless
- fast server needed to withstand client attack on house powerup

possible future work:
- ports to other systems
  - hardest part is probably the kernel stuff
- needs internals doc (esp. comments in code)
- public-key crypto
- encrypt disk data on client before sending
  - fixes server having cleartext copy of disk
  - must keep the key offline
  - alleviates crypto crib issue somewhat
- add decoy traffic to defeat traffic analysis
- strong packet signatures to defeat data disruption
- make client able to assume server copy doesn't change and thus not rescan
  when it loses and regains the connection without client restart
- add some kind of version number to the protocol
- try to improve rescans on boot