Why did NHS Track and Trace Fail?
What happened?
Recently, close to 16,000 Covid-19 cases went unaccounted for in a colossal error which has popularly been blamed on the newly created Covid-19 Track and Trace system. In this article I plan to explain how the Track and Trace system works; why I believe (and now the BBC who have confirmed this) the fault does not lay with the new system, but instead Public Health England.
How Track and Trace Works
Thankfully, there isn’t too much guesswork as to how the NHS CV19 system works because it’s completely open source. You can find the backend architecture repo and documentation here and under the NHSX account there are repositories for the entire ecosystem including iOS and Android native/specific repositories. (Which I enjoy for reasons I may ramble about in the future.)
The Frontend
This has little bearing on the system but is a part of the overall architecture. The frontend enables interactions such as scanning a QR code, reading advice served by the NHS backend and handling BLE encounters, that is to say: It picks up nearby bluetooth connections. Furthermore it works by having iOS/Android native applications interact with ‘services’ provided by Apple and Google while also interacting with the backend systems created by NHSX.
The Backend
When it comes to CRUD operations, the NHS CV19 system is utilising Amazon Web Services (AWS) extensively. The benefits of this include not having to directly set up and control the infrastructure required to run a national Track and Trace system and instead leaving it to dedicated providers who are already established - this means that the NHS can just get up and running provided they have programmed the system and configured the serverless infrastructure (which is done with Terraform)
In essence, the backend so far has absolutely nothing to do with the reported Excel problem. The data is inserted into S3 buckets/DyanmoDB which is completely fine.
The Problem
The problem occurs when external sources start to pull this data in a format they would like to retrieve, in the systems architecture diagram we can see that Public Health England (PHE), Public Health Wales and other external systems grab this data in https text/csv format. This is where it has left the NHS CV19 system and enterered the domain of PHE.
This is almost guesswork, however is confirmed overall by a BBC article - At some point, PHE has taken that data, as a CSV for example and transformed it into the XLS excel format, which is a standard for Excel between 1997 and 2003. This was massively underequipped to deal with the shear amount of data that is being dealt with and as a result, failed.
Conclusion
In conclusion, we can see for ourselves via Open Source Software that the new NHS CV19 system is not to blame, however PHE completely dropped the ball so to speak. This has been reaffirmed by the BBC. By PHE getting this data and inserting it into an obselete format not designed for this amount of data, they failed to catch and insert the data for thousands of individuals.
Please send me a message if you want to get in touch to discuss any of the above or suggest amendments, I have not yet been able to scroll through all of the source code provided by NHSX and as such may have some details incorrect. You can email me via: [email protected]