Oren Hurvitz has a great post about LinkedIn’s architecture. It’s well-written and well thought out. Their architecture has evolved on what appears to be a steady and safe path of improvement. It is well worth a read.

I would like to comment on something I see repeated again and again and is likely misinterpreted by young scalability architects. The statement of what you should expect to lose when you scale up/out. Oren writes:

The presentation ends with some tips about scaling. These are oldies but goodies:
  • Can’t use just one database. Use many databases, partitioned horizontally and vertically.
  • Because of partitioning, forget about referential integrity or cross-domain JOINs.
  • Forget about 100% data integrity.
  • At large scale, cost is a problem: hardware, databases, licenses, storage, power.
  • Once you’re large, spammers and data-scrapers come a-knocking.
  • Cache!
  • Use asynchronous flows.
  • Reporting and analytics are challenging; consider them up-front when designing the system.
  • Expect the system to fail.
  • Don’t underestimate your growth trajectory.

Now, I agree with much of that. The spammers comment should be revised to “Fraud happens and the bigger you are, the bigger the bullseye.” Be aware and protect your assets. Everything from Cache! on down: hard and fast rules. The cost argument is odd. While it is completely correct, it’s also rather obvious. If your business model ties audience size and site use to revenue (which it should), then the cost should simply scale sub-linearly w.r.t. revenues (i.e. no big deal). However, there are a few that remain on that list that should be cherished and the loss of them should pain you.

[You] Can’t use just one database” – this is a conclusion you should arrive at after analysis. We have one client that supports 10 million users on a cluster of partitioned databases. We have another that supports 35 millions users on one database without issue and room for growth.

Because of partitioning, forget about referential integrity or cross-domain JOINs.” Think. Think hard. Think harder. Sometimes it is possible to partition in a fashion that allows for integrity. While I’m sure (or at least hope) that the LinkedIn guys had some sleepless nights making the decision to break foreign constraints, it isn’t conveyed. You should absolutely have some sleepless nights over a decision like that. My bank supports many more users and transactions than LinkedIn – and it damn well better have FKs and 100% integrity. So, while you still may partition in such a fashion that requires a loss of enforced integrity, the decision should be a heavy one.

Forget about 100% data integrity.” WTF? While I’m sure it was the end of the post and he was being smart, someone somewhere might actually take the advice to forget about data integrity. You never, ever, ever forget about it. We have some “one big database” architectures where data integrity has been an issue due to memory bit-flips (corrupt data on disk) – it’s a BFP (big f@#$ing problem) and we treat it that way. Sometimes you make an architectural decision that will make the loss of integrity much more probable (partitioning and losing FK constraints is a ripe example). It’s still something that should be attended to with great attention and diligence. you should never forget about data integrity and always put forth the effort required to reach as close to 100% as possible. When you lose data integrity you end up with a big pile of shit in your database. I’ll leave you with a rather crass metaphor:

There’s an expectation that there is no shit on your living room floor. Don’t shit in your living room. Don’t let your dog shit in your living room. If you’re a dog owner, you know your dog could have an accident. You bought the dog. You chose to increase the probability of finding shit in your living room. Don’t ignore it or forget it. Clean up the shit when it happens. If you get suddenly ill while playing your Wii naked and shit on your living room floor (be it probable or improbable)… respect yourself – clean it up. Never forget the goal: a 100% shit-free living room.