Saturday, February 02, 2008

The inevitability of a crisis

It was 7:00 AM, and I was 50 hours without sleep and we were just about done. So whats going on?

We had scheduled a project to execute in 3 phases over 3 nights since we couldnt take downtime on the servers in the day. OUr execution time had to start at 7PM and had to finish by 1 AM (6 hours scheduled downtime).

The product development team took up this deployment project with me (the product manager) leading this to get a first hand feel and sensitise ourselves to what the deployment and support team faces when they use our creation (the product).

We experienced on the first day how it feels to execute a project which has been planned well and executed as per plan with no deviations done or needed. Here we experienced a form of comfort zone, feeling more and confident as time went by that the result will be as expected.

Even though we didnt face any problems, the time estimate was way of the mark, due to slow execution speed of a few servers. Our learning may have been limited to small challenges coming in the execution steps. That phase ended at 6 AM in the morning.

Even though the team managed to go get some sleep, I was too hyper to get any. The nights happenings, questions, answers, thoughts, learnings, notes, kept flashing in my mind keeping me wide awake. I ended up having a normal working day pondering deeply on how to make our product such that these reconfiguration exercises can be actually offloaded to the customer. I didnt see any scale in this exercise.

Same day evening, we start again. This time we experienced, how it feels to execute a well planned project, which throws up an unknown/unforeseen situation throwing the plan of track. We were faced with a situation of a currupted database on all the servers due to wrong data input to the program. We had to undo the damage and redo the operation properly. An unforeseen situation for which we were not ready.

Here you are pushed out of your comfort zone since you realize the repercussions of what might happen. Its a crisis situation, especially so at 2 Am in the morning when you really cant think that well.

Here however a proper structured response with levels of participation is required to ensure a successful result. If we all had piled into the details and started fixing each case with inadequate discussion or planning we would have created multiple crisis situations within the main crisis.

As an observer, I quickly stepped in to set a framework and document the cases of the failure with an appropriate action plan. Once this was done, I also quickly stepped back. Thereafter the team handled it wonderfully by writing new scripts, meticulously executing these and ensuring that the servers were finally setup as expected.

What did we learn from this:
1. Acknowledge the inevitability of the crisis moment. No point in blaming, getting frustrated, angry etc. Accept and act.
2. Managers to step in at the right time and step out at the right time. Much as you might be tempted to roll up your sleeves, move everybody aside, and start running the machines, I suggest that to keep up the team morale up, restrict yourself to set the framework and the direction and move aside.
3. Plan, act and take unbiased data driven decisions with responsibility.
4. Divide the work amongst the team, but work together and support each other in whatever way possible without too much interference, always staying alert to the changes in the situation.
5. Dont forget to assign, the coffee boy job to someone responsible :-).
6. Once the crisis is over, think prevention, to ensure such a situation doesnt recur. Ask yourself, what you could have done to prevent this situation.

This blog is being published on the third day of the project execution at 1 AM. Estimated completion time for today's project is 6 AM. Good night.

No comments: