2009-08-07

How to avoid midnight conf-calls

Ten tips to get a good night's sleep
  1. Train and explain - Show others how to restart servers and services, on production platforms. Not dev, not test and not a replica. Allegedly replica or "reference" platforms have a habit of not being in sync.
  2. Accept Outsourcing models - wake up and smell the cheese. Outsourcing is here to stay. To Vietnam if not India. The sooner you accept and share vital (for keeping systems running) information with them, the better.
  3. Regular Backups - Create a Definitive Software Library, and then have periodic backups. Of database, and filesystem. Including log files.
  4. Test your backups - Backup processes tend to get clunky during restore. Banks do this the best - periodic simulated disasters to test their backups and redundancies. But that's our next point.
  5. Plan redundancy/high-availability - Though strictly not the same, the classification would do for our purpose. It isn't fun to try and bring up a service when there's a power failure in the only data center you use.
  6. Restrict access - Strictly on a need to know basis. Unused accounts should be disabled after a few weeks. User journalling. And Named user accounts.
  7. Scalability testing before rolling out - That change may work well in the teeny development environment. Stress test it before rolling out to production.
  8. Batch your releases - Stack up your releases, so you touch production platform less frequently. Also means you can test them properly, provided there's a reasonable cut-off for inclusion in a batch.
  9. Use multiple platforms - Development and production are two separate, distinct entities.
  10. Have a rollback plan

No comments:

Post a Comment