I spent a few years writing down my thoughts about how one should approach problems. If you're looking for a how-to guide, a cookbook, or a reference this book is not for you. If you want to learn by challenging the way you think, pick up a copy.
We've been doing a lot of PostgreSQL work lately and we have one largish system (terabytes) that runs on top of Apple XServe RAIDs. While people argue that SATA is getting better, let it be understood that Fibre channel SCSI drives rule. The difference between carrier class storage and "enterprise" (a.k.a. commodity) is pretty tremendous.
While this system will eventually make good use of the XServer RAIDs and long-term storage containers for write-once read-many data tables (archives), the "fast" storage it currently tied up and we've been "making due" with all of our heavy read/write activity on 14 SATA drives. It is always advisable to understand the access patterns of your database so that you can put files that are heavily utilized at the same time on different spindles. This allows for a set of drives heads to be occupied with one style of work load without being distracted by another. In big databases, getting more spindles is a must and usually not a budgetary concern. BUT, how do you know which files?
Oracle provides enough information inside the database itself for a DBA to do an assessment of which datafiles could benefit from running on separate spindles. PostgreSQL's statistics exist, but are very lacking in comparison. The rule of thumb is that you pur indexes and data on different spindles. In big systems you don't follow "rules of thumb" you instead use them as the basis for experimentation and emperical analysis. Big systems often vary widely in their configuration and stress points and the "rules of thumb" are for generic problems, while they may hold true on your system, "don't jump."
So, what is a PostgreSQL user to do? First, run Solaris 10. Second leverage Dtrace to really answer the age old question "where does it hurt?" I scratched together a little perl/dtrace to measure the min/avg/max read() and write() times induced by PostgreSQL (yes it uses read() and write() for all of its I/O ops on UNIX) and break it down by filename. I initially used the io::: provider to only measure the ones that induced physical reads against media (didn't come out of buffer cache). While it is really cool that you can do this, we would miss the reads that would happen we moved things around and blew our buffer cache -- so I opted for all reads (less cool : more useful).
After we collect metrics from all the file I/O, we connect to PostreSQL and ask it which database objects correspond to the files we see being read and written to. The result is a simple prstat-like output by database object of the number of reads/writes performed along with the minimum, avergage and maximum turn-around time for the operation in milliseconds. Quite useful for understanding "what hurts."
So the command 'pg_file_stress -d dbname -u postgres -i 30 -n 20' would yield useful output like:
Theo shows off dtrace
For a database like PostgreSQL that likes to rely on operating system functionality to keep from re-inventing wheels, figuring out how to get the best information from your operating system is extremly important. As Theo shows us, dtrace can work wonders