Privacy Protection: AI Anonymizes Healthcare Clinical Data-AI-php.cn

Home

Technology peripherals

Privacy Protection: AI Anonymizes Healthcare Clinical Data

王林

Apr 12, 2023 pm 03:19 PM

aimedical insurance

Privacy Protection: AI Anonymizes Healthcare Clinical Data

In the face of the sudden COVID-19 epidemic, we have witnessed record-breaking data breaches. A recent IBM report found that the cost of data breaches is also rising dramatically.

Healthcare is undoubtedly one of the industries most affected by data breaches, with each data breach costing an average of $9.2 million. The type of information most often exposed in such breaches is sensitive customer data.

Pharmaceutical and healthcare companies are required to organize and operate in accordance with strict guidance while protecting patient data. Therefore, any breach can be costly. For example, companies are required to collect, process and store personally identifiable information (PII) throughout the drug discovery phase, and when trials are concluded and clinical applications submitted, care must be taken to protect patient privacy in published results.

The European Medicines Agency (EMA) Regulation No. 0070 and the "Public Release of Clinical Information" regulations issued by Health Canada both put forward specific suggestions on data anonymization, hoping to minimize the use of results to restore patient identity information. risk.

In addition to advocating for data privacy, these regulations also require the sharing of trial data to ensure that the community can work on it. But this undoubtedly puts companies in a dilemma.

So, how do pharmaceutical companies strike a balance between data privacy and transparency, while publishing research results in a timely, cost-effective and efficient manner? Facts have proven that AI technology can take on more than 97% of the workload in the submission process, greatly reducing the operational burden of enterprises.

Why is it so difficult to anonymize clinical research results (CSR)?

In the process of implementing anonymization of clinical submissions, companies mainly face three core challenges:

Unstructured data is difficult to process: Among clinical trial data, there are many Most of it is unstructured data. Research results contain a large amount of text data, scanned images and tables, making processing inefficient. Research reports often run into thousands of pages, and identifying sensitive information in them is like finding a needle in a haystack. Furthermore, there are no standardized technical training solutions that can automate this type of processing.

Manual processes are cumbersome and error-prone: Today, pharmaceutical companies employ hundreds of employees to anonymize clinical study submissions. The entire team needs to go through more than 25 complex steps, and a typical summary document may take up to 45 days to process. And when manually reviewing thousands of pages of material, the tedious process often leads to errors.

Open interpretation of regulatory guidelines: Although there are many detailed suggestions in the regulations, the details are still incomplete. For example, Health Canada's "Public Release of Clinical Information" regulations require that the risk of recovery of identity information should be less than 9%, but it does not detail the specific risk calculation method.

Below, we will envision specific solutions that can handle such anonymization needs from a problem-solving perspective.

Using augmented analytics to identify sensitive information in human language

The following three elements help build technology-driven anonymization solutions:

For natural language AI language model for processing (NLP)

Nowadays, AI can create like an artist and diagnose like a doctor. Deep learning technology has promoted many advances in AI, and AI language models are one of the backbones. As a branch of algorithms designed to process human language, AI language models are particularly good at detecting named entities, such as patient names, social security numbers, and zip codes.

Unconsciously, these powerful AI models have penetrated into every corner of the public domain and been trained on a large scale using public documents. In addition to the well-known Wikipedia, the MIMIC-III v1.4 database containing desensitized data of 40,000 patients has also become a valuable resource for training AI models. Of course, in order to improve model performance, domain experts also need to conduct subsequent retraining of the model based on internal clinical trial reports.

Improving accuracy through human-machine loop design

The 9% risk threshold standard proposed by Health Canada can be roughly converted into a model accuracy requirement of about 95% (usually using recall rate or measured by accuracy). AI algorithms are able to look at large amounts of data and run multiple training cycles to improve their accuracy. However, technological improvements alone are not enough to prepare them for clinical application; these models also require human guidance and support.

To address the subjectivity of clinical trial data and improve outcomes, analytics solutions are designed to work alongside humans—this is called augmented intelligence. That is to say, humans are regarded as part of the human-machine loop. They are not only responsible for data labeling and model training, but also provide regular feedback after the solution is effective. In this way, the accuracy and output performance of the model will be improved.

Solving Problems in a Collaborative Approach

Let’s assume that a study involves 1,000 patients, 980 of whom are from the continental United States and the remaining 20 from South America. So, does the data of these 20 patients need to be edited (blacked out) or anonymized? Is it necessary to select patient samples within the same country or continent? In what ways might an attacker combine this anonymized information with age, postal code, and other data to ultimately restore the patient's identity?

Unfortunately, there are no standard answers to these questions. To more clearly interpret clinical submission guidance, pharmaceutical manufacturers, clinical research organizations (CROs), technology solution providers and researchers from academia need to join forces and collaborate.

AI-driven anonymization method

With the above basic ideas, the next step is to piece them together into a complete solution process. The various technologies in the entire anonymization solution should be based on the actual methods we already use in our work.

Clinical study reports contain a variety of structured data (numeric and identity entities, such as demographic information and address entries), as well as various unstructured data elements that we discussed previously. This must be handled properly to prevent malicious hackers from restoring these to sensitive named entities. Structured data is relatively easy to process, but AI algorithms still need to overcome the difficulty of unstructured data.

So, unstructured data (usually in a format such as a scanned image or PDF) is first converted into a readable form using technologies such as optical character recognition (OCR) or computer vision. Afterwards, AI algorithms are applied to the documents to detect personally identifiable information. To improve algorithm performance, users can share feedback on sample results to help the system understand how to handle these lower-confidence analyses.

Privacy Protection: AI Anonymizes Healthcare Clinical Data

AI-driven anonymization method

After anonymization is completed, the corresponding identity restoration risks must also be assessed. This work usually requires reference to the background of the population and combined with data from other similar trials. The risk assessment focuses on identifying three major risk scenarios – prosecutors, journalists and marketers – through a set of elements. These three groups will try to restore patient information based on their own needs.

Until the risk level reaches 9% of the prescribed recommendations, the anonymization process will continue to introduce more business rules and algorithm improvements, trying to enhance effectiveness in a repetitive cycle. Then by integrating with other technology applications and establishing a machine learning operations (ML Ops) process, the entire anonymization solution can be incorporated into the actual workflow.

A more difficult challenge than algorithms—data quality

For pharmaceutical companies, such anonymization solutions can shorten the submission cycle by up to 97%. More importantly, this semi-automated workflow improves efficiency while ensuring human involvement. But what are the biggest challenges in building AI-powered anonymization solutions?

In fact, like most data science practices, the biggest obstacle to this work is not the AI algorithm used to identify named entities, but how to convert research reports into high-quality data that can be processed by AI. For documents with different formats, styles and structures, the corresponding content ingestion pipeline is often at a loss.

Therefore, AI anonymization solutions need to be constantly fine-tuned to adapt to new document encoding formats, or to accurately detect the starting and ending positions in picture/table scans. Obviously, this aspect of work is the most time-consuming and energy-consuming area of AI anonymization.

New challenges of anonymization in clinical research

With the rapid advancement of technology, will the anonymization of clinical research continue to be less difficult and more efficient? While AI-driven solutions are indeed impressive, there will be new challenges that require attention.

First, consumer data collected through social media, device usage and online tracking are greatly increasing the risk of identity restoration. Attackers can combine this public information with clinical research data to accurately identify patients. What is even more worrying is that malicious hackers are very active in applying AI results and may even get ahead of pharmaceutical companies.

Finally, regulations continue to evolve to accommodate country-specific practices. Perhaps soon some countries will announce specific regulations on the anonymization of clinical submissions, which will certainly increase the complexity and cost burden for companies to maintain compliance. But as the saying goes, the future is bright but the road is tortuous. The mature development of AI technology at least brings hope to the entire industry to overcome problems.

The above is the detailed content of Privacy Protection: AI Anonymizes Healthcare Clinical Data. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

How to Build an Intelligent FAQ Chatbot Using Agentic RAGMay 07, 2025 am 11:28 AM

AI agents are now a part of enterprises big and small. From filling forms at hospitals and checking legal documents to analyzing video footage and handling customer support – we have AI agents for all kinds of tasks. Compan

From Panic To Power: What Leaders Must Learn In The AI AgeMay 07, 2025 am 11:26 AM

Life is good. Predictable, too—just the way your analytical mind prefers it. You only breezed into the office today to finish up some last-minute paperwork. Right after that you’re taking your partner and kids for a well-deserved vacation to sunny H

Why Convergence-Of-Evidence That Predicts AGI Will Outdo Scientific Consensus By AI ExpertsMay 07, 2025 am 11:24 AM

But scientific consensus has its hiccups and gotchas, and perhaps a more prudent approach would be via the use of convergence-of-evidence, also known as consilience. Let’s talk about it. This analysis of an innovative AI breakthrough is part of my

The Studio Ghibli Dilemma – Copyright In The Age Of Generative AIMay 07, 2025 am 11:19 AM

Neither OpenAI nor Studio Ghibli responded to requests for comment for this story. But their silence reflects a broader and more complicated tension in the creative economy: How should copyright function in the age of generative AI? With tools like

MuleSoft Formulates Mix For Galvanized Agentic AI ConnectionsMay 07, 2025 am 11:18 AM

Both concrete and software can be galvanized for robust performance where needed. Both can be stress tested, both can suffer from fissures and cracks over time, both can be broken down and refactored into a “new build”, the production of both feature

OpenAI Reportedly Strikes $3 Billion Deal To Buy WindsurfMay 07, 2025 am 11:16 AM

However, a lot of the reporting stops at a very surface level. If you’re trying to figure out what Windsurf is all about, you might or might not get what you want from the syndicated content that shows up at the top of the Google Search Engine Resul

Mandatory AI Education For All U.S. Kids? 250-Plus CEOs Say YesMay 07, 2025 am 11:15 AM

Key Facts Leaders signing the open letter include CEOs of such high-profile companies as Adobe, Accenture, AMD, American Airlines, Blue Origin, Cognizant, Dell, Dropbox, IBM, LinkedIn, Lyft, Microsoft, Salesforce, Uber, Yahoo and Zoom.

Our Complacency Crisis: Navigating AI DeceptionMay 07, 2025 am 11:09 AM

That scenario is no longer speculative fiction. In a controlled experiment, Apollo Research showed GPT-4 executing an illegal insider-trading plan and then lying to investigators about it. The episode is a vivid reminder that two curves are rising to

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

How to fix KB5055523 fails to install in Windows 11?

3 weeks agoByDDD

How to fix KB5055518 fails to install in Windows 10?

3 weeks agoByDDD

Roblox: Grow A Garden - Complete Mutation Guide

2 weeks agoByDDD

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

How to fix KB5055612 fails to install in Windows 10?

3 weeks agoByDDD

Hot Tools

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

WebStorm Mac version

Useful JavaScript development tools

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

SublimeText3 English version

Recommended: Win version, supports code prompts!

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

Hot Topics

1663

1419

1313

1263

1237