Last week, our whole team was poised to take an online exam of about 8000+ students. We thought we were ready.
The day arrived and the flood of traffic found the ignored cracks in our system and showed us that what we thought was a solid foundation was actually nothing but sand.
What a catastrophic failure!
Luckily, we got another chance to conduct the test five days later and meanwhile my colleagues worked hard day and night and fixed some of the problems. I say some of the problems because we know many problems are still lurking under there. And these will take some more time to fix.
On the second attempt, we were successful in taking the exam and this time our server did not give up on us. It held on to the traffic and exchanged the data well.
I consider the failure as a much needed slap on our faces to look hard and fix the foundation of our product. We may not get to make any more such mistakes and survive.