New technology presents itself as a difference-maker. Every new application has the power to streamline routine tasks or faulty systems. But a dose of reality can damper the hype, as the Iowa caucuses app did this week. Instead of hope and hype, they inspire doubt and maybe scorn. With democracy, there is also a fear that we don’t control our own votes—hackers or foreign powers might be trying to steal the results for their own reasons. And if someone doesn’t understand what’s happening in the gears of the digital machine, they come to fear it will make the wrong kind of difference.
This week, it wasn’t security that provoked doubt—it was usability.
The Iowa caucuses app reportedly gave users hell trying to sign in, even to download it outright. There was also confusion or lack of access to security codes required to upload voting results. Worst of all, the app reportedly reset whenever it was minimized on users’ phones.
I was told by at least one Democratic aide here in Iowa that the app would reset if the screen wasn’t kept open.
— Sabrina Siddiqui (@SabrinaSiddiqui) February 4, 2020
Some thought the app might have problems in Iowa given the poor connectivity the state provides across rural areas. Even once it was downloaded, poor connectivity could have also made signing in a slow process.
Given that nearly 1,700 precinct chairs had to use the app simultaneously, there was also a risk there would be excessive traffic. In fact, state Democratic Party headquarters got calls throughout the day asking for help with the app. At the moment, it’s unknown what their data strategy looked like. But Nevada should take notes, since the Democratic Party apparently planned to use the same app in that state’s 2020 caucuses also.
Here are the five basic steps to deploying an app quickly with a high degree of certainty about performance, security and overall user experience:
1. Design it for Scale
Expecting a flurry of activity, it needs to be able to handle a sudden influx of logins and data uploads. Especially for an app that will see high demand on its first outing, it needs to support high capacity.
In Iowa, where nearly 1,700 caucus chiefs were expected to upload three separate sets of data, that meant nearly 5,100 datasets in very short order, duplicated (and probably triplicated) to ensure accuracy of the data.
Here are a few best practices for developing high scale web apps (there are many others):Code should be as stateless as possible, to make it easy to scale
- The different components of the application should have a clear and standard API between them
- If running in the public cloud, make sure auto-scale groups or other scaling technologies are applied to EVERY part of the application
- Allocate enough storage, compute and networking resourcing especially if this is your first time
- Use managed solutions when possible so teams which are experts in specific services can handle it for you (like authentication services etc’)
2. Instrumenting Your App for (Extended) Observability
Instrumenting your code to produce the right logs and metrics during early coding are a step toward a more permanently performant and observable application.
Make sure all critical events are logged, preferably in a JSON format, the right debug logs are produced and the necessary availability and performance monitoring dashboards are created. This should always include things as valuable as sign-in and sign-out to hunt down access problems (activity logs, access logs).
3. Testing Applications and benchmarking performance
Test out the app’s features, either within the team or a select group of people. This is before an official release. As you stress the app’s resources, you see its bottlenecks and other weaknesses. You also can start setting up alerts for certain events. With monitoring tools like Logz.io, that might include alerting for log patterns that indicate excessive errors.
Testing should include a special focus on security. NPR reported that this was the main reason the Iowa caucuses app was kept secret until the day of the caucuses. Fearing election-day hackers, caucus organizers thought they were doing themselves a favor.
“Basic transparency about how it was built, how up to date the security of the caucusing app is and how it’s been tested all could be made publicly available with little cost to the DNC,” Betsy Cooper, director of the Aspen Tech Policy Hub, told NPR in January.
Without a complete picture of what happened, it does sound like this tactic was a major disadvantage. The Iowa Democrats’ app was not properly tested in the field. Even lower-scale testing in a canary release among precinct volunteers—particularly in a controlled environment like pre-caucus training for those same volunteers—would have given organizers and DevOps teams a chance to monitor how well it worked.
Teams should have been sending sample voting data—for instance, 2016 results—to test load performance. Testers here might’ve seen the need for more data nodes, particularly coordinator nodes, to balance anticipated traffic once the caucuses report their results. This would have been another chance to add more alerting mechanisms to the results-reporting app to watch for flaws.
4. Push It to Production
Now move all this from a staging ground—your testing environment—to the main field of play.
Once it is out in the field, you have to be on-task and on-call with proper alerts set up and support. For something as serious as a primary, it’s essential to have people on stand-by to deal with issues. Unreported voting results are not like products in abandoned carts in e-commerce apps—this information has to be processed immediately.
Additionally, errors recording the vote are more serious. If a customer saves a product before buying it, but that information somehow fails to save or is lost, at least the customer has not lost anything of value on the deal. In an election (or primary or caucus), the value is the vote itself.
5. Monitor Your App in Production
This is where all your instrumentation, benchmarking, and alerts would have the time to shine.
Make sure you have logs and metrics analyzed in realtime. On-call teams should be ready to handle infrastructure and application issues, as well as immediately assess and react. Once the application is designed the right way, has the right metrics and logs, was carefully tested, and is now at prime time, engineering teams can quickly dive into issues before users are affected.
And while this isn’t technology-specific, any major event where the app plays a central role requires stand-by support. In Iowa, there was no tech support on call. In fact, the vote-reporting phone number for the party’s state headquarters—which was already understaffed anticipating that data would go through the app—was also designated as the helpline for caucus volunteers.
Considering the stakes of the event and the sheer volume of demand on the Iowa caucuses app, support should have been constantly monitoring the app’s activity logs. It would have been immediate apparent that login/logout was not functioning properly, OR a suspiciously low number of users would have been on the app when massive data uploads were expected.
A major event like this should benefit from newer technology like data-collecting vote machines. Apps should make that process simple. With adequate preparation, instrumentation, reiterated observability programming, and surplus backup plans for high data volumes, it is more than possible to manage an event like Iowa caucuses.
However, there is a reason that tech analysts and politicos are still cautious about using that kind of technology in this scenario. Hopefully, smaller-stakes events take a stab at running this kind of operation in the near future. Experiences in those environments would serve as great testbeds for the big time—state, regional and nationwide elections.