In today’s environment, the concept of medical privacy is dead…



Using sophisticated algorithms, even de-identified medical data stripped of personally-identifying information can be manipulated to the point where individuals can be identified, located, and information from independent databases added to form a comprehensive dossier containing details beyond your medical history, including your lifestyle, social activities, purchases, financial affairs, and employment.

Artificial intelligence advances threaten privacy of health data
Study finds current laws and regulations do not safeguard individuals' confidential health information

Advances in artificial intelligence have created new threats to the privacy of people's health data, a new University of California, Berkeley, study shows.

Led by UC Berkeley engineer Anil Aswani, the study suggests current laws and regulations are nowhere near sufficient to keep an individual's health status private in the face of AI development. The research was published Dec. 21 [2018] in the JAMA Network Open journal.

The mining of two years' worth of data covering more than 15,000 Americans led to the conclusion that the privacy standards associated with 1996's HIPAA (Health Insurance Portability and Accountability Act) legislation need to be revisited and reworked.

"We wanted to use NHANES (the National Health and Nutrition Examination Survey) to look at privacy questions because this data is representative of the diverse population in the U.S.," said Aswani. "The results point out a major problem. If you strip all the identifying information, it doesn't protect you as much as you'd think. Someone else can come back and put it all back together if they have the right kind of information."

"In principle, you could imagine Facebook gathering step data from the app on your smartphone, then buying health care data from another company and matching the two," he added. "Now they would have health care data that's matched to names, and they could either start selling advertising based on that or they could sell the data to others."

According to Aswani, the problem isn't with the devices, but with how the information the devices capture can be misused and potentially sold on the open market.

"I'm not saying we should abandon these devices," he said. "But we need to be very careful about how we are using this data. We need to protect the information. If we can do that, it's a net positive."

Though the study specifically looked at step data, the results suggest a broader threat to the privacy of health data.

Feasibility of Reidentifying Individuals in Large National Physical Activity Data Sets From Which Protected Health Information Has Been Removed With Use of Machine Learning

And, as we saw during the HIV/AIDS epidemic, protection of private and government security measures were both accidentally and deliberately breached.

Who has the “keys” to the kingdom?

It appears a foreign-owned entity, San Francisco, California-based Datavant, owned by Basel, Switzerland-based pharmaceutical company Roivant Sciences Ltd, wants to provide the software and services to aggregate, de-identify, and make your medical information available if it is related to the COVID-19 pandemic. It is possible that even being tested for the virus could provide access to your data.


Leading Healthcare Companies Announce COVID-19 Research Database

A consortium of leading healthcare companies today announced the launch of the COVID-19 Research Database, a secure repository of HIPAA-compliant, de-identified and limited patient-level data sets made available to public health and policy researchers to extract insights to help combat the COVID-19 pandemic.

The database is a pro bono, cross-industry collaboration.

Researchers and policymakers seeking to better understand the COVID-19 pandemic have faced challenges because data relevant for this research are hard to access, fragmented and limited in their ability to answer critical research questions. The COVID-19 Research Database contains HIPAA-compliant, de-identified and limited, longitudinal, patient-level data sets from a consortium of institutions and organizations. It comprises a large, diverse repository of real-world data, including medical claims, pharmacy claims, electronic health records, and demographic data. In addition to the underlying data, the repository integrates privacy-preserving patient linking technology and statistical certification, connecting data sources in a HIPAA-compliant manner to provide a more complete view of the patient journey. Researchers can access the COVID-19 Research Database via an analytic platform, enabling them to conduct large-scale studies while protecting patient privacy.

“The COVID-19 pandemic has revealed many of the gaps in America’s healthcare infrastructure, one of which is the challenge of accessing and connecting relevant data to answer pressing research questions,” said Niall Brennan, President and Chief Executive Officer, Health Care Cost Institute. “The COVID-19 Research Database is focusing on getting the most valuable data to researchers as quickly as possible and represents one of the largest public-private repositories of real-world data ever assembled.” <Source>

Somehow public databases always seem to morph into private proprietary databases where access is controlled by those who can afford to purchase subscriptions. As with most public-private ventures, the profits are privatized and the losses socialized. Even free database access requires expensive data analysis tools to produce useful results.

About  Datavant’s technology…


By definition, any “token” that acts as a globally unique identifier (GUID) is just another data identification key to a data aggregator. It is the analysis of the actual data itself that allows the linkage of multi-record datasets and subsequent data aggregation pointing to a specific individual. While one-way hashed keys are common in data processing, knowing how keys are assembled and creating a dictionary of common data elements is often used to create a “dictionary” which can be run against a protected record and assist in re-identifying the information. 

Overview of Datavant's De-Identification and Linking Technology for Structured Data

“We’ve designed cutting-edge, patent-pending, de-identification technology that replaces private patient information with an encrypted “token” that can’t be reverse-engineered to reveal the original information. Furthermore, our technology can create these same patient-specific tokens in any data set, which means that now two different data sets can be combined using the patient tokens to match corresponding records without ever sharing the underlying patient information.”

“Datavant is installed at sites across the healthcare spectrum. All healthcare stakeholders face the common challenge of protecting patient privacy while maximizing their data’s utility for healthcare analytics. Datavant offers a simple, reliable, and flexible way to de-identify data sets in a way that can be deemed adequate under HIPAA while still retaining the ability to link data sets from multiple sources without exposure of PHI.” <Source>


As with most electronic health records, they are replete with missing, faulty, or incorrect data. Diagnoses are often manipulated to maximize reimbursements from the government or private insurers and may not reflect the patient’s true medical condition. The government is offering up to a 20% hospital add-on payment for Medicare beneficiaries diagnosed with COVID-19, an attractive incentive to jigger the records.

Bottom line…

I am not expressing an opinion on the value of the COVID-19 database, Datavant, or its technology, but a warning of the inherent dangers of medical databases, the inadequacy of outdated HIPAA legislation (which is rarely enforced) and the growth of big brother.

I suggest that you may wish to support the Electronic Frontier Foundation ( that is attempting to rationalize analog laws for a digital age to protect what is left of our personal privacy.

To illustrate how scary the world may become, consider your electronic health records stored everywhere like a giant blockchain repository and available to anyone with a specific key?

We are so screwed.

