February 6, 2020:
Simple question. If you live somewhere other than the state of Iowa, can you find it on a map?
I'm going to go with "no."
Why? Because basically nobody cares about Iowa except every four years when there's a US presidential nominating battle in the US. Because Iowa has decided since forever to be the first contest in our quadrennial horse race.
And then everybody obsesses about the Iowa Caucus.
Washington State--where I live and vote--used to have a caucus system instead of a "normal" primary. If you've never seen one, you've missed out on a particularly odd form of democracy. Here's how they work.
Skip this if you're familiar with the caucus process
On caucus day (or night, in the case of Iowa), everyone shows up at their local public school gymnasium--the largest open space inside. It's noisy and confusing and crowded. First people give speeches about why the candidate they support is the most rocking candidate ever. They always refer to their chosen human as "THE NEXT PRESIDENT OF THE UNITED STATES!" I think that's required by the Federal Election Commission, because literally everyone does it. After the speeches, which I'm sure change absolutely no one's mind about who they are supporting (that 20 year old covered with Andrew Yang buttons is not going to go, "Hmm, what a well thought out speech by the campaign surrogate for Tom Steyer. Guess I'll go vote for him instead of Yang.") you go stand in groups under a sign for your candidate. This is the initial sort. If your group doesn't get 15% of the total people in the room, your candidate is cast into the flames of Hell so to speak and you have to go home or find another one. That's the second--and final--sort. After the first sort, all the people whose candidate didn't achieve viability (sort of a caucus escape velocity) get wooed by the remaining viable candidate's folks, either with logic, emotion, or even cookies, which would definitely sway my opinion. If they were chocolate chip. I mean, is there even another legit cookie? I think not.
But I digress.
In the good old days (think 2016) the final tallies were on paper and signed and counter signed etc to ensure honesty and fairness and through the magic of the telephone they were reported up to Central Headquarters or wherever for Steve Kornacki of NBC to foam at the mouth over and point to maps and stuff on TV.
And that was the Iowa Caucus, and mostly it worked, and by the next day no one was thinking about Iowa again for the next four years.
Until Monday of this week, when, as a friend of mine used to say, the defecation hit the ventilation.
Enter the Iowa Caucus app.
Some genius in the Iowa Democratic Party (IDP) decided that for 2020 they needed a better system, and so somebody wisely decided to hire a company to write an app.
Reading this, right now, I bet you can see what is coming next. Because reading this, right now, I know--and you know that I know--that you know writing an app is not like cooking a nice spaghetti dinner, or building a little garden shed out back, or sending a manned spacecraft to Mars.
Writing an app is hard.
I'm going to bet that all the well-intentioned citizens who decided that they needed an app were not software developers, had never written an app, had no idea what was involved in creating a new functioning piece of software that was robust, well tested, and successfully deployed to a random group of non-technical users. On time and within budget.
You know the saying "success is a journey, not a destination"? That doesn't work for mission-critical software.
What went wrong with the Iowa app?
Ooh, what's more fun that being a Monday morning quarterback?
Nothing, so let's get started.
They had a hard deadline. The caucus has a fixed date, so the app had to be deployed in time for the app: it was, in the language of project management, a time-driven project. But it was simultaneously a requirements-driven project, because there were a set of functions the app had to perform or it would be worthless on caucus night.
We've all been there, right? "It has to do X, Y, and Z, and it has to be finished by such-and-such date." Given the well-known project triangle of resources, time, and specs, and given you've locked two sides of the triangle (time and specs) you now have two grim choices: either add resources (knowing full well that adding people to a software development project beyond a relatively small number has the opposite effect, actually slowing down progress) or settle for reduced quality (ie more bugs). In the real world the usual outcome is either really crappy software (bug farm) or the date slips and the persecution of the innocent begins.
The developers had learners permits.
The code has been reviewed by various outside experts and lots of things have been found. An important one is that it appears the coding was done by a novice, someone who tried to follow on-line tutorials to build a native React app.
Everybody is a rookie sometimes. But there's a reason you don't trust important stuff to somebody who is learning on the job. Need a heart transplant? Want a med school first year student cutting you open? Or somebody with some gray hair and lots of experience under her belt? There's a very good reason airline pilots have to spend a lot of hours in the co-pilot seat before they get four stripes on their sleeves.
They used a complex stack.
The app was built on React, not exactly the most intuitive programming framework around. It required two-factor authentication, APIs, and a beta-testing platform for downloading because they ran out of time (see "hard deadline" above).
Look, frameworks are awesome. .NET made Windows programming safer, quicker, easier, and frankly more fun. That doesn't mean it's trivial to learn. And as for web development frameworks...
Maybe an app designed to accurately assist our sacred democratic process shouldn't be built with a tool designed to share pictures of cats.
A bug in software is simply a test that hasn't been written yet, said someone smarter than me. Test driven development (TDD) is simply a way to code tests for everything into the program and make sure, with every change you make, that those tests all pass before going on. Unfortunately, TDD has a bad rep for a lot of reasons like that it takes too long, doesn't perform random user actions, and encourages writing pointless tests (like an assert that the addition operator actually works).
Agile and CI/CD encourage the use of a full-blown testing harness that runs a set of acceptance tests on the app with every build; if the tests fail the build is not promoted.
Both of these approaches require someone to guess/figure out/predict what can go wrong with the app before that error has been found in the wild. See "random user actions" above. Developers who design and test around a conviction that "no user would ever do THAT" forget that users thought DVD drives were cupholders and floppy disks should be stuck to the fridge with magnets.
Yeah, those users.
Software development creates defects (bugs) at some non-zero rate; estimates from industry guru Steve McConnell range from 10-50 per 1000 lines of code (KLOC). Interestingly, he cites the Space Shuttle project as having zero defects per 500 KLOC. One imagines that the amount of code review and brute testing NASA employs for things that go into outer space might exceed what the Iowa Democratic Party's chosen contractor used for this app.
Unsophisticated user base
In spite of all these risk factors, the actual precinct workers who had to use the app at the Iowa Caucus on Monday night were all intelligent, educated, sophisticated, computer literate engineers, capable of downloading the app, authenticating correctly, and capturing the data.
People who help make our democracy work donate their own time and effort in a noble cause for the sake of a better country. God bless 'em, every one.
They also tend to be retired, older, and did not grow up with computers. Think the target audience of those adds for cell phones with BIG BUTTONS. So Grandma can call the kids. And a speed dial for 9-1-1.
Nice folks, no doubt about it, but frankly--in general--more comfortable using a stick shift than an Android app. In other words, the opposite of Millennials, who are puzzled why some cars have three pedals but understand apps perfectly.
A bridge too far
I get that the IDP wanted to have a better, more reliable, more transparent system than the old one. But in trying to automate everything perhaps they reached beyond their grasp. Maybe having something--at least for 2020--that just tried to automate one part of the process would have been better. Especially if that automated bit had a solid, reliable, performative Plan B. Then, for the next election cycle, having debugged and gotten used to one automated part, they could add more functionality to the app. That's more of an agile approach, but one where the sprints are 2 years long, not 2 weeks. Just a thought.
New code is broken code
The software industry goes back to the 40s and 50s; starts to take off in the 60s and 70s, and by now is around $500B worldwide and total IT is over $4 trillion. In other words, HUGE. And not exactly new anymore.
Still, we really don't know how to write software, even after all those years and programmers and projects and books and methodologies and frameworks and programming languages. The fact that there are so many languages, frameworks, books, and opinions is proof that there is no settled, solid, recognized method to write reliable software.
Before you flame me, consider the humble abode. A house. In the USA, you can go to basically any neighborhood where houses are being constructed and, except for the abundance or scarcity of standard building materials, you will find a common set of tools, standards, approaches, and methods of building. Exterior frame walls are 2x6 studs set on 16 inch centers; floor and ceiling joists are set on 24 inch centers; sheet goods are 4 x 8 and have a standard nailing pattern. A roof truss in Alabama looks just like a roof truss in Nevada and gets ordered, made, delivered, and set up the same way.
A framing carpenter from Ohio can move anywhere in the country and go to work the next day with zero training. Ask her to frame a window on that wall there and she will only need to know the rough opening size and height off the sill plate and the king, jack, and cripple studs will look identical to every other window framed since World War II when "California" framing became normal.
Ditto sheet rock installers, trim carpenters, roofers, painters, plumbers, electricians...and so on.
Compare that to creating a random piece of software. First of all, what OS is it going to run on? Windows, MacOS, Android, iOS, or Linux? Responsive or fixed UI? Client-server or browser-based? What physical form factor? Which programming language (of which there are almost an infinity of choices--even .NET lets you use VB.NET, C#, or F#). How about a framework and/or a UI library? Ok, now to methodology: BDD, TDD, Agile, Scrum, pair programming, mob programming, outsourced, in-house, or hybrid? What about tools like IDE, compiler, test tools, bug tracking tools, CI/CD tools, build tools, yada yada yada.
And once you make those decisions, good luck finding an informed consensus on code it. Lots of thoughtful, well-meaning, experienced software engineers opine regularly online and in conferences on the best way to write software.
The problem is, they don't agree with each other.
Whether it's models/no models, static vs dynamic types, interfaces, function design, dependencies, or whatever, there are as many disagreements--well reasoned, too--as there are agreements. As an example, remember how OOP was going to save the world from procedural languages like COBOL or C? So everyone who could got on the bandwagon and started inheriting, polymorphing, and overriding their code. Yet there's a recent line of argument that OOP is basically a satanic approach that can only lead to tears of rage. Functional programming, that's the bomb now. At least according to that bunch.
Who you gonna believe?
Old code isn't necessarily bad code
With the best will in the world, and the smartest people on the planet (just like all of your teammates, right?), software will have bugs after it's released. Over time, with luck and the right processes, you will identify and fix most of those defects. After some point in time, the code will be reliable and trustworthy, even if it's poorly designed, badly coded, and in an obsolete programming language.
This is why companies hang on to legacy applications for years and years. Rewriting would be expensive, time consuming, risky, and would invariably result in an application with new (and hidden) defects--some of which could be very damaging indeed. Companies keep legacy mission-critical systems running precisely because they work and work well. Given enough time, almost all code paths get exercised, even the obscure ones that no one thought to test (see "random users" above). Errors are identified, fixes--good, bad, or ugly--are applied, and the app keeps doing valuable work.
The Iowa Caucus app killed the Iowa Caucus
The one thing I hear a consensus on is that the Iowa caucus is dead. Gone. History. If they hadn't tried to break all the rules with this app, it would have been just another Iowa caucus: noteworthy only because of its precedence and quaint because, well, caucus. You know that this past week states like Washington that bailed on the caucus system recently are going like, "Whoa, dodged that bullet!" while states like Nevada and North Dakota are rethinking the whole caucus vs primary approach.
Software failures are nothing new, and certainly the Iowa app glitches got out sized attention compared to the actual consequences they created. A simple Google search on "costliest software failures" bring up this delightful list with $1B UK child support system screw up, $125M Mars mission lost, and 42,000 pieces of luggage that didn't get on the right airplane at Heathrow. Now those are consequences.
Software is hard, it's always been hard, and it will stay hard for the foreseeable future. To borrow from an old joke, a man has a problem, so he decides to write some code. Now he has two problems. Just like the IDP.