Today someone asked me: "You speak about ZFS a lot. I know other people that talk about the latest filesystems with praise, but generally speaking they just don't have much to offer. Is ZFS that different?"
My answer is "yes." But, of course, I can't leave it at that. I'm not going to make a performance argument -- ZFS is fast in some cases and slow in others -- just like everything else. I think one of the things we've seen in the last 10 years is that everyone felt the need to come out with their own filesystem -- at least on Linux. So, you have to as yourself why. My personal opinion is that filesystems on Linux suck.
Most filesystems on the market support snapshots. No open source filesystems on Linux (that I'm aware of) support snapshots. Of course, you can use LVM to do block-level snapshots. First off, that's a pain in the ass w.r.t. storage provisioning. Other systems make the process of allocating and managing snapshots "not my problem." (simple and easy). Let's be frank, ext2 and ext3 are nothing to write home about. reiserfs, xfs, jfs, the list goes on and on.
There are a few closed-source filesystems that are really nice. Specifically Veritas Filesystem (VxFS) and its excellent layered volume manager VxVM which appears to have heavily inspired geom on FreeBSD. DEC thought it was so cool that they pulled it white-label into Tru64. Respect.
So, what makes ZFS so different? ZFS is a disruptive technology as it abolishes the sacred line in the sand between block devices, volume management and filesystems. This means it just make storage management easy. When I say easy... I mean easy.
So you want more space? Add more disks. Want to move from from failing disks to replacements? Tell zfs to add the new ones and tell it to remove the old ones. Read that report by Google about disk errors? ZFS checksums all data. My personal experience says checksums are good. Snapshots? Sure snapshot to your heart's content. We snapshot some systems hourly and never ever delete the old ones. Snapshots are really cool, but what if you could rollback to a snapshot? zfs rollback. What if you wanted to make a read/write copy of the fileystem or an old snapshot? zfs clone. You want to store a lot of raw data? zfs has built-in compression. Oh, and it is open-source.
Simply put. ZFS. Respect.
Wednesday, April 23. 2008 at 05:49 (Reply)
> Oh, and it is open-source.
Yeah - CDDL, great.
Wednesday, April 23. 2008 at 06:53 (Reply)
just take a look to ext3cow and/or some FUSE plugins around. You can say they are hacks, but you can't say there are not free linux filesystems with snapshot support.
Wednesday, April 23. 2008 at 07:25 (Reply)
So one thing that concerns me about ZFS is the fact that all the storage you have seems to become one `pool`.
While this is "easy", in practise am I correct in thinking this means you don't know which physical disk(s) a particular file resides on?
At home its normal practise for me to mirror files to different physical disks to make backups, but how can you do this if all your drives make one 'pool'?
Or maybe this a complete non-issue as you can tell ZFS to make specific data pools from specific disks?
Essentially it seems to me that ZFS makes your enterprise storage needs "easy" but doesn't simply things for personal desktop storage...
Thursday, April 24. 2008 at 09:23 (Link) (Reply)
You can have as many pools as you like.
http://en.wikipedia.org/wiki/ZFS#Capacity
Tuesday, August 5. 2008 at 00:41 (Reply)
1: You can put disks on any pool you want.
2: you can make mirrored disks in a pool.
AND -- very relevant to your wish to back up some files
3: You can tell ZFS to keep 2 or 3 copies of any file or directory, and it will make sure that every block of that file or directory is kept on at least 2 (or 3) distinct disks in your pool.
So, if you want to back something up in case of danger, or provide very high levels of protection, you can do it at the pool level with mirrors, or at the file/dir level.
This is MUCH easier than manually copying your stuff for a backup, and it is snapshot compatible.
Wednesday, April 23. 2008 at 07:45 (Reply)
I was going to write a long diatribe about the problems I've had with ZFS, but I'll let the sun kids find out where their problems lie.
It is not as suitable for a filesystem as many people make out. I hope you've tested your disaster recovery well.
You know my mail address if you want to talk about it.
Thursday, April 24. 2008 at 09:12 (Link) (Reply)
ZFS is instrumental in our disaster recovery. We've been using https://labs.omniti.com/trac/zetaback with great success. We also wrap database backups and block-level-incrementals up with ZFS. We've had man successful tests and real restores. Of course we backup and restore ext3 and UFS and XFS too... and restore them. It's not rocket science.
Wednesday, April 23. 2008 at 07:50 (Reply)
Would you mind fixing the following sentence: 'We snapshot some systems hourly and ever delete the old ones.'?
Thanks!
Nice Rant ;)
Wednesday, April 23. 2008 at 07:51 (Link) (Reply)
Is this zfs as in z/OS?
Wednesday, April 23. 2008 at 08:03 (Reply)
nice commercial .. .added you to my adblocker blocklist.
Wednesday, April 23. 2008 at 08:44 (Reply)
"I think one of the things we've seen in the last 10 years is that everyone felt the need to come out with their own filesystem -- at least on Linux. So, you have to as yourself why. My personal opinion is that filesystems on Linux suck."
As opposed to non-Linux filesystems? :-)
ZFS is cool, no doubt, but there's some features that I've wanted badly found in only one product, PeerFS. Now that's really cool!
Wednesday, April 23. 2008 at 08:54 (Link) (Reply)
Excellent run-down of features. Thank you :]
Nit: 's/ever /never /'
-Tyler
Wednesday, April 23. 2008 at 09:13 (Reply)
ZFS is great and all, but is useless for one reason : you can't boot with ZFS. This means that patchs that need to be in single user mode can't be applied to a ZFS filesystem. We tried it, but had to abandon it till the change this. This should have been a must from the beginning.
Wednesday, April 23. 2008 at 09:27 (Reply)
But released under the CDDL. Just thought people needed to know.
Wednesday, April 23. 2008 at 09:48 (Reply)
I am using ZFS for daily backups (snapshots) and I absolutely love it. It made backing up and managing 10 copies (snapshots) of 8TB data walk in the park.
ZFS is the best thing that happened in FS in a while. It is a game changing technology.
Sun deserves credit, where credit is due.
If they can add some sort of clustering to it, it will be like the mother of all FS.
Wednesday, April 23. 2008 at 10:07 (Link) (Reply)
I was looking at some ZFS benchmarks a week or so ago and I must say they looked very attractive.
I might have to give it a try sometime.
Wednesday, April 23. 2008 at 11:10 (Reply)
But is there a Linux kernel module implementation in the works? I'd settle for the FUSE module, but my server won't.
Wednesday, April 23. 2008 at 12:36 (Link) (Reply)
Unfortunately, the only way to use ZFS under Linux is using FUSE (runs in user space, not kernel space) and is therefore not really optimal. That's too bad.
Wednesday, April 23. 2008 at 12:38 (Reply)
My fear (and it may be unfounded, I don't know):
I have a complex system. Perhaps 40 filesystems of 1g-40g. I lose two disks which where in some way redundant to each other. (Yes, pardon my wording this in terms of conventional filesystems/volumes.)
How contained is my data loss? Will ZFS still come up with the remaining undamaged filesystems, and will it let me know what has been damaged? Are damaged items simply corrupted, or do they not come online?
How do I go forward from this point to put in new disks so I can start with a blank filesystem to restore from backups?
My scare with ZFS is not how it works on a day-to-day basis. It is with how it responds when things go terribly wrong. In my (casual) look-over of ZFS information, I haven't seen anything but the most trivial cases covered (you lose a disk that had redundant information on it, so you replace the disk and the information is rebuilt).
If things are just plain damaged or corrupt, how can I be sure that ZFS will make the decisions to best bring back what is still there (whatever "best" means), and that it just won't say it is corrupt and leave me hanging?
Thursday, April 24. 2008 at 08:41 (Link) (Reply)
Like with any filesystem. Recovery of corruption and permanent data loss is a black-magic data recovery effort.
Wednesday, April 23. 2008 at 14:01 (Reply)
And yet Google uses Linux, not OpenSolaris. ;)
Tuesday, May 6. 2008 at 21:14 (Reply)
Google may not use OpenSolaris but they do use Solaris. They even have a job opening for a Solaris admin http://www.google.ie/support/jobs/bin/answer.py?answer=70267&query=solaris&topic=&type=solaris
And there were reports that Goolge was testing OpenSolaris more
http://www.computerworld.com/action/article.do?command=viewArticleBasic&articleId=9003492
http://lists.pdxlinux.org/pipermail/plug/2007-January/052404.html
Wednesday, April 23. 2008 at 14:42 (Reply)
Just two points to add:
* ZFS is Open Source but under a license (CDDL, Sun's standard OS license) that does not allow integrating it into the Linux Kernel. ZFS also misses user/group quotas.
* ZFS does not allow removing HDDs, while the Linux LVM v2 allows this.
Thursday, April 24. 2008 at 08:38 (Link) (Reply)
Semantic detail here. The CDDL has no language preventing it from being adopted into the Linux kernel. It is free and open. The GPL has language that prevents its adoption. At the end, the effect is the same -- no ZFS in Linux. What's worse? No DTrace in Linux.
Wednesday, April 23. 2008 at 14:50 (Reply)
You can rap on any FS you like, but don't you be rapping on XFS!
If there is one conventional filesystem that can take any amount of insane abuse and deliver high performance while remaining reliable, it is XFS.
No ifs, buts, or maybes.
I have personally put XFS filesystems through abuse that no filesystem should ever take; and I've seen it put through even worse abuse by a friend of mine and still keep serving data.
You seem to forget that XFS was not a brain child of wannabe engineer GNU/Linux geeks, but a brainchild of the BEST engineers that the computer industry ever saw, sgi.
Now that sgi is busted, Sun is numero uno. Before that, Sun was always holding second candle to sgi.
Thursday, April 24. 2008 at 08:45 (Link) (Reply)
XFS is amazing. I agree 100%. I've run it under Irix and been amazed. Under Linux it has been relatively stable. I've had about 1 kernel panic a year under Linux's implementation. On 50 or so machines that's a panic a week -- a bit frustrating. Never once have I had data loss.
XFS's goals are quite different from most filesystems with a strong focus on bounded service times for real-time streaming data. It really is impressive.
Thursday, April 24. 2008 at 00:15 (Link) (Reply)
ZFS sounds pretty awesome... so when are we going to see it rolled into standard linux distros?
Thursday, April 24. 2008 at 09:31 (Link) (Reply)
Yeah, enough with the hype. All we have is zfs-fuse. We want to boot off it!
Thursday, April 24. 2008 at 06:58 (Link) (Reply)
Well i share your enthusiasm for ZFS and frankly when i first read about it like a year ago i wondered while nobody has thought of those features before. I mean they sound like complete necessities for a filesystem. Ofcourse this statement neglects the fact that filesystems are not a weekend project.
I also share your view that most of the Linux filesystems are nothing spectacular. And the others that do have some niceties are marked experimental and you wouldn't want to run them on your main system.
But there a couple of issues that turned me off ZFS for the time being.
- The fact that when i read about it and later, there were some still some stories about nasty bugs in it. Mainly uncovered by hosting companies which tend to manage a lot of data.
- I'm a Linux user, and I am not going to switch OSes just so that i can have a superior filesystem while in my opinion, a lot of things in Linux are much better than their counterpart in Solaris.
- I am not sure that i got this part right, but it seems to be only recommended for 64 bit systems and may cause problems on 32 bit ones.
So in a nutshell, I'd love to be able to try out and advocate ZFS. But i can't and Sun isn't making it any easier.
P.S. I know about the BSD port, but last i checked it had a LOT of problems.
Thursday, August 7. 2008 at 09:42 (Reply)
ZFS will not cause problems on 32 bit on the files. The problem is that ZFS is 128 bit and is quite slow on 32 bit CPU. That is the problem.
I used ZFS raid on 32 bit cpu P4 with 1 GB RAM and got like 20MB/sec. But my data were safe, with all those checksums. If you can stand slow transfer rates, then there is no problem.
BTW, I dont get it regarding ZFS and Linux. ZFS is open, Mac has ZFS, FreeBSD has it also. Mac has DTrace. So has QNX. etc. If Linux has problem with ZFS and Dtrace, is it SUN's fault? Should SUN change CDDL? Why should SUN change? Why not Linux? Linux wants it, then Linux can change. Maybe to CDDL, or FreeBSD. Fork Linux. ZFS and DTrace ARE open. Look at Mac, FreeBSD, QNX, etc.