Service Update
We wanted to provide a full update on the service problems that FormSpring experienced today. On Tuesday, May 26, 2009 we experienced a number of outages over the course of approximately 2 hours that led to downtime of the FormSpring site, affecting your ability to build, view or submit forms. The first outage happened at 10:39am EST and intermittent outages or slowness continued until 12:42pm EST.
What happened
The root of the outage was our migration to a new hosting environment that took place shortly before 2am. The migration itself went smoothly and there was barely a blip of downtime as we made the switch. The new environment had been running in parallel with the old for over a month, and we had run it through a barrage of tests and trial runs to help ensure that the migration would go unnoticed. The migration was one of many steps we had planned to expand FormSpring’s underlying server architecture.
We watched the new environment closely after the migration, and around 10:30am started to see elevated levels of CPU usage on the primary web server. While the level of traffic on the site wasn’t above norms, it appeared that the new environment reached a tipping point where each new connection to the site slowed down the server and new connections queued up behind others causing the server to become even slower. In just a few minutes the site went from normal to slow beyond usable.
We started working with our hosting provider to troubleshoot the issue. Over the course of the next hour we tried a combination of configuration tweaks, failed over to a backup web server, and increased the capacity of the primary web server from 2 CPUs to 4, all without much success of solving the problem. We also shut down the admin side of the application so that the servers could focus on just serving forms and collecting submissions. We finally failed back to the original hosting environment, and continue to run there right now.
Status of form submissions
Fortunately, we did not see any data loss during the outages. Our primary database server continued to operate as expected, and we had multiple backups on hand in case of data corruption. However, users likely would have had trouble viewing or submitting forms, depending on the time of submission. In that case they would have seen either an error message from their browser or a FormSpring error message indicating that the site was down or a connection was lost. If that happened then the submission was never saved to the database and the form will need to be re-submitted.
Going forward
We’re fully back on the old environment and can continue to run there for weeks or months if necessary. However, we do feel it’s important to eventually continue to expand our architecture and are assessing our original plans for doing so.
The first step is to work with engineers at our hosting provider to figure out what happened with the new environment and either find a fix or start from scratch elsewhere. It’s apparent that our tests did not sufficiently simulate real world usage. Various benchmarks and load testing showed performance improvements of 50%-100% per server, but this number was obviously misleading. While it’s notoriously hard to simulate real world web traffic, we’ll be trying some new things to help make sure that we’re not surprised again by performance problems.
We know that you depend on us for many of your business transactions and we want to ensure you that we take this very seriously. We’ll keep you posted and let you know what our next steps will be. If you would like a refund or would like to discuss any specific problems you incurred please contact our support team and we will work with you to resolve any problems promptly. We thank everyone for being patient with us, whether following our updates here on the blog, or via Twitter, and appreciate the support as we worked to get everything working properly.





