Personally cloudy with a chance of redundant offsite backup.
Despite extensive searching, I've been unable to locate a storage system that is a pure drop in implementation for all of my storage system needs. For a young adult, my personal storage needs aren't that massive. I have a single home server, Fenris, which I primarily use for long term file storage. My desktop, Odin, and my laptops, Huginn and Muninn, are clients for my storage system, and I frequently make use of it! Using a collection of a multi-disk BTRFS implementation, and Samba, SSHFS, Fenris provides me with a very convenient place to dump all of my files without any worry that I'll lose everything in a moments notice from a drive failure. At the moment, I'm only using 80% of the storage available on Fenris, and a lot of that usage comes from poor organizing on my part.
So why am I writing this then?
How much of a poor organizational habit (computer information or physical objects) is primarily caused by the environment that surrounds that habit? I postulate that a full 70% or more of my bad computer information organization derives from a poor environment in which to do that organization. Needless to say, I need an environment overhaul.
I've identified a list of features that my "not perfect, but pretty close" personal storage system must have.
- Storage is not inherently centralized.
- Access to the storage system can be fully transparent (i.e. can be mounted as an arbitrary folder).
- Adding new clients / nodes to the system is relatively painless.
- Can handle frequent node connects / disconnects (laptop's turned off / on).
- Frequently accessed AND user designated files / folders always cached locally.
- Data cached locally can be used without being connected to other nodes. (Laptop away from home).
- Snapshot support. Even better if on a per file, or per folder, basis.
- User configurable redundancy settings, (i.e. specify how many copies of a given file should be in the network simultaneously).
- Allow nodes to operate primarily as storage nodes or clients, allowing the server to handle the storage and redundancy, while laptops cache their files locally with permanent storage on the server.
- Some concession to offsite, including incremental transfer, and not requiring all of my nodes to be online simultaneously during a backup.
http://ceph.newdream.net/ comes close, offering non-centralized storage, and mount-ability, plus several in the works features for snapshots and tunable redundancy settings, but ultimately lacks the support for local data caching.
http://www.coda.cs.cmu.edu/ comes even closer, offering built in support for disconnected operation and caching, but is unacceptable for having too many centralized components with no ability to designate fall-over instances, and a lack of offsite backup considerations. Even more disappointing is that coda appears to no longer be developed, with no support in the Ubuntu package archives, or code commits in the last several years.
But wait, there's more!
My need for some kind of offsite storage doesn't stop with storing my own files. I manage my parents home storage server, and also provide tech support to multiple friends to manage their storage systems. As everybody knows, if you don't have offsite backups for your critical data, you might as well assume you've already lost that critical data.
So I aimed requirement number 10 at providing me with some means by which I can provide automatic offsite backups of all of the storage systems I manage. By my current count, that's approaching 4 independent storage servers, in addition to another 3 or so friends who would probably jump on board if the barrier to entry for participating was low enough.
I've considered https://tahoe-lafs.org/trac/tahoe-lafs, which provides fantastic data-recovery features, but tahoe-lafs doesn't seem to provide much support for trusted storage partners. After all, who can you trust if not your family?
My reasons for being wary of tahoe's hard adherence to trusting only yourself is that the filesystem loses significant opportunities for compression, and a massively increased need for storage space compared to the raw file sizes.
I store a lot of text files, including homework, email archives, ebooks, source code, and so on. I also have a lot of video and pictures, but primarily I'm storing a massive amount of plaintext. I know for a fact that the storage behavior of my family is similar to mine, and I strongly suspect the same can be said of most of my friends. In order to take advantage of our limited hardware capacities, I'd want the perfect distributed / redundant wide area network backup system to take full advantage of the files being stored in the system to squeeze every last drop out of our hardware. Plain text compresses best when you collect it all into the same archive, after all!
Ultimately, I'm left with a bitter taste in my mouth. I can cobble something together that does meet all of the needs I espouse in this post with scripts and a few small programs, but that system won't scale well, and certainly won't be as reliable as a preexisting system.
I'm going to continue watching the development of new filesystems, and hopefully contribute in the needed direction to finally have my perfect personal cloud.
Update May 29th 2014Now announcing Aerosta!
Check out our website: http://www.aerosta.com/
Check out a video: https://www.youtube.com/watch?v=4mpHPxVu1XA