search
HomeTechnology peripheralsAIPrivacy Protection: AI Anonymizes Healthcare Clinical Data

Privacy Protection: AI Anonymizes Healthcare Clinical Data

Apr 12, 2023 pm 03:19 PM
aimedical insurance

Privacy Protection: AI Anonymizes Healthcare Clinical Data

In the face of the sudden COVID-19 epidemic, we have witnessed record-breaking data breaches. A recent IBM report found that the cost of data breaches is also rising dramatically.

Healthcare is undoubtedly one of the industries most affected by data breaches, with each data breach costing an average of $9.2 million. The type of information most often exposed in such breaches is sensitive customer data.

Pharmaceutical and healthcare companies are required to organize and operate in accordance with strict guidance while protecting patient data. Therefore, any breach can be costly. For example, companies are required to collect, process and store personally identifiable information (PII) throughout the drug discovery phase, and when trials are concluded and clinical applications submitted, care must be taken to protect patient privacy in published results.

The European Medicines Agency (EMA) Regulation No. 0070 and the "Public Release of Clinical Information" regulations issued by Health Canada both put forward specific suggestions on data anonymization, hoping to minimize the use of results to restore patient identity information. risk.

In addition to advocating for data privacy, these regulations also require the sharing of trial data to ensure that the community can work on it. But this undoubtedly puts companies in a dilemma.

So, how do pharmaceutical companies strike a balance between data privacy and transparency, while publishing research results in a timely, cost-effective and efficient manner? Facts have proven that AI technology can take on more than 97% of the workload in the submission process, greatly reducing the operational burden of enterprises.

Why is it so difficult to anonymize clinical research results (CSR)?

In the process of implementing anonymization of clinical submissions, companies mainly face three core challenges:

Unstructured data is difficult to process: Among clinical trial data, there are many Most of it is unstructured data. Research results contain a large amount of text data, scanned images and tables, making processing inefficient. Research reports often run into thousands of pages, and identifying sensitive information in them is like finding a needle in a haystack. Furthermore, there are no standardized technical training solutions that can automate this type of processing.

Manual processes are cumbersome and error-prone: Today, pharmaceutical companies employ hundreds of employees to anonymize clinical study submissions. The entire team needs to go through more than 25 complex steps, and a typical summary document may take up to 45 days to process. And when manually reviewing thousands of pages of material, the tedious process often leads to errors.

Open interpretation of regulatory guidelines: Although there are many detailed suggestions in the regulations, the details are still incomplete. For example, Health Canada's "Public Release of Clinical Information" regulations require that the risk of recovery of identity information should be less than 9%, but it does not detail the specific risk calculation method.

Below, we will envision specific solutions that can handle such anonymization needs from a problem-solving perspective.

Using augmented analytics to identify sensitive information in human language

The following three elements help build technology-driven anonymization solutions:

For natural language AI language model for processing (NLP)

Nowadays, AI can create like an artist and diagnose like a doctor. Deep learning technology has promoted many advances in AI, and AI language models are one of the backbones. As a branch of algorithms designed to process human language, AI language models are particularly good at detecting named entities, such as patient names, social security numbers, and zip codes.

Unconsciously, these powerful AI models have penetrated into every corner of the public domain and been trained on a large scale using public documents. In addition to the well-known Wikipedia, the MIMIC-III v1.4 database containing desensitized data of 40,000 patients has also become a valuable resource for training AI models. Of course, in order to improve model performance, domain experts also need to conduct subsequent retraining of the model based on internal clinical trial reports.

Improving accuracy through human-machine loop design

The 9% risk threshold standard proposed by Health Canada can be roughly converted into a model accuracy requirement of about 95% (usually using recall rate or measured by accuracy). AI algorithms are able to look at large amounts of data and run multiple training cycles to improve their accuracy. However, technological improvements alone are not enough to prepare them for clinical application; these models also require human guidance and support.

To address the subjectivity of clinical trial data and improve outcomes, analytics solutions are designed to work alongside humans—this is called augmented intelligence. That is to say, humans are regarded as part of the human-machine loop. They are not only responsible for data labeling and model training, but also provide regular feedback after the solution is effective. In this way, the accuracy and output performance of the model will be improved.

Solving Problems in a Collaborative Approach

Let’s assume that a study involves 1,000 patients, 980 of whom are from the continental United States and the remaining 20 from South America. So, does the data of these 20 patients need to be edited (blacked out) or anonymized? Is it necessary to select patient samples within the same country or continent? In what ways might an attacker combine this anonymized information with age, postal code, and other data to ultimately restore the patient's identity?

Unfortunately, there are no standard answers to these questions. To more clearly interpret clinical submission guidance, pharmaceutical manufacturers, clinical research organizations (CROs), technology solution providers and researchers from academia need to join forces and collaborate.

AI-driven anonymization method

With the above basic ideas, the next step is to piece them together into a complete solution process. The various technologies in the entire anonymization solution should be based on the actual methods we already use in our work.

Clinical study reports contain a variety of structured data (numeric and identity entities, such as demographic information and address entries), as well as various unstructured data elements that we discussed previously. This must be handled properly to prevent malicious hackers from restoring these to sensitive named entities. Structured data is relatively easy to process, but AI algorithms still need to overcome the difficulty of unstructured data.

So, unstructured data (usually in a format such as a scanned image or PDF) is first converted into a readable form using technologies such as optical character recognition (OCR) or computer vision. Afterwards, AI algorithms are applied to the documents to detect personally identifiable information. To improve algorithm performance, users can share feedback on sample results to help the system understand how to handle these lower-confidence analyses.

Privacy Protection: AI Anonymizes Healthcare Clinical Data

AI-driven anonymization method

After anonymization is completed, the corresponding identity restoration risks must also be assessed. This work usually requires reference to the background of the population and combined with data from other similar trials. The risk assessment focuses on identifying three major risk scenarios – prosecutors, journalists and marketers – through a set of elements. These three groups will try to restore patient information based on their own needs.

Until the risk level reaches 9% of the prescribed recommendations, the anonymization process will continue to introduce more business rules and algorithm improvements, trying to enhance effectiveness in a repetitive cycle. Then by integrating with other technology applications and establishing a machine learning operations (ML Ops) process, the entire anonymization solution can be incorporated into the actual workflow.

A more difficult challenge than algorithms—data quality

For pharmaceutical companies, such anonymization solutions can shorten the submission cycle by up to 97%. More importantly, this semi-automated workflow improves efficiency while ensuring human involvement. But what are the biggest challenges in building AI-powered anonymization solutions?

In fact, like most data science practices, the biggest obstacle to this work is not the AI ​​algorithm used to identify named entities, but how to convert research reports into high-quality data that can be processed by AI. For documents with different formats, styles and structures, the corresponding content ingestion pipeline is often at a loss.

Therefore, AI anonymization solutions need to be constantly fine-tuned to adapt to new document encoding formats, or to accurately detect the starting and ending positions in picture/table scans. Obviously, this aspect of work is the most time-consuming and energy-consuming area of ​​​​AI anonymization.

New challenges of anonymization in clinical research

With the rapid advancement of technology, will the anonymization of clinical research continue to be less difficult and more efficient? While AI-driven solutions are indeed impressive, there will be new challenges that require attention.

First, consumer data collected through social media, device usage and online tracking are greatly increasing the risk of identity restoration. Attackers can combine this public information with clinical research data to accurately identify patients. What is even more worrying is that malicious hackers are very active in applying AI results and may even get ahead of pharmaceutical companies.

Finally, regulations continue to evolve to accommodate country-specific practices. Perhaps soon some countries will announce specific regulations on the anonymization of clinical submissions, which will certainly increase the complexity and cost burden for companies to maintain compliance. But as the saying goes, the future is bright but the road is tortuous. The mature development of AI technology at least brings hope to the entire industry to overcome problems.

The above is the detailed content of Privacy Protection: AI Anonymizes Healthcare Clinical Data. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
A Comprehensive Guide to ExtrapolationA Comprehensive Guide to ExtrapolationApr 15, 2025 am 11:38 AM

Introduction Suppose there is a farmer who daily observes the progress of crops in several weeks. He looks at the growth rates and begins to ponder about how much more taller his plants could grow in another few weeks. From th

The Rise Of Soft AI And What It Means For Businesses TodayThe Rise Of Soft AI And What It Means For Businesses TodayApr 15, 2025 am 11:36 AM

Soft AI — defined as AI systems designed to perform specific, narrow tasks using approximate reasoning, pattern recognition, and flexible decision-making — seeks to mimic human-like thinking by embracing ambiguity. But what does this mean for busine

Evolving Security Frameworks For The AI FrontierEvolving Security Frameworks For The AI FrontierApr 15, 2025 am 11:34 AM

The answer is clear—just as cloud computing required a shift toward cloud-native security tools, AI demands a new breed of security solutions designed specifically for AI's unique needs. The Rise of Cloud Computing and Security Lessons Learned In th

3 Ways Generative AI Amplifies Entrepreneurs: Beware Of Averages!3 Ways Generative AI Amplifies Entrepreneurs: Beware Of Averages!Apr 15, 2025 am 11:33 AM

Entrepreneurs and using AI and Generative AI to make their businesses better. At the same time, it is important to remember generative AI, like all technologies, is an amplifier – making the good great and the mediocre, worse. A rigorous 2024 study o

New Short Course on Embedding Models by Andrew NgNew Short Course on Embedding Models by Andrew NgApr 15, 2025 am 11:32 AM

Unlock the Power of Embedding Models: A Deep Dive into Andrew Ng's New Course Imagine a future where machines understand and respond to your questions with perfect accuracy. This isn't science fiction; thanks to advancements in AI, it's becoming a r

Is Hallucination in Large Language Models (LLMs) Inevitable?Is Hallucination in Large Language Models (LLMs) Inevitable?Apr 15, 2025 am 11:31 AM

Large Language Models (LLMs) and the Inevitable Problem of Hallucinations You've likely used AI models like ChatGPT, Claude, and Gemini. These are all examples of Large Language Models (LLMs), powerful AI systems trained on massive text datasets to

The 60% Problem — How AI Search Is Draining Your TrafficThe 60% Problem — How AI Search Is Draining Your TrafficApr 15, 2025 am 11:28 AM

Recent research has shown that AI Overviews can cause a whopping 15-64% decline in organic traffic, based on industry and search type. This radical change is causing marketers to reconsider their whole strategy regarding digital visibility. The New

MIT Media Lab To Put Human Flourishing At The Heart Of AI R&DMIT Media Lab To Put Human Flourishing At The Heart Of AI R&DApr 15, 2025 am 11:26 AM

A recent report from Elon University’s Imagining The Digital Future Center surveyed nearly 300 global technology experts. The resulting report, ‘Being Human in 2035’, concluded that most are concerned that the deepening adoption of AI systems over t

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
WWE 2K25: How To Unlock Everything In MyRise
1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

ZendStudio 13.5.1 Mac

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 English version

SublimeText3 English version

Recommended: Win version, supports code prompts!

SublimeText3 Linux new version

SublimeText3 Linux new version

SublimeText3 Linux latest version