Home  >  Article  >  Technology peripherals  >  Responsible machine learning – the “glass box” approach

Responsible machine learning – the “glass box” approach

王林
王林forward
2023-04-09 12:21:031254browse

​Translator | Cui Hao

Reviewer | Sun Shujuan

Opening

Machine learning is not a profound technology. Just as multi-parameter and hyperparameter methods in complex deep neural networks are just a form of cognitive computing, they don't seem that profound.

Responsible machine learning – the “glass box” approach

There are other types of machine learning (some involving deep neural networks). The model results of this type of machine learning, the determination of the model, and the complexity that affects the model are all Be very transparent.

All of this depends on how well the organization understands the sources of its data.

In other words, you need to understand everything from model training data to production data model. It is also integral to interpreting, refining and improving its results. In this way, organizations can greatly increase the business value of their models.

More importantly, it further improves the fairness, accountability and transparency of this technology, making it more reliable and complete for the entire society.

Joel Minnick, vice president of marketing at Databricks, admitted: “This is why you need granular understanding of the upstream and downstream of the data to be able to do machine learning responsibly.”

for data Lineage compilation catalog

Data training and data generation for models will involve multiple technologies such as data sources, data conversion, and data integration. In a mature data catalog solution, real-time data capture can be achieved, so the progress can be monitored at any time to understand the execution progress of the model. "It gives me a clear understanding of the context in which the data is being used in the model. Also, you know, where did this data come from? What other data did we get from it? When was it generated? So I can To better understand how I should use this data," said data scientist Minnick.

"Data lineage" (recording data source, movement, and processing) consists of metadata, and the data directory is used to store relevant data sets. Catalogs also enable users to include tags and other descriptors as additional metadata, which can help trace data provenance and establish trust in the data. "Data lineage," as Minnick describes it, can generate "API-driven services" that connect a range of platforms (including data scientist platforms, data engineer platforms, and end-user platforms).

Data Governance: Born for Data Science

The improved traceability of data training and data operations will affect the results of machine learning models, and the model results are in turn related to data governance in the field of data science. closely related. Therefore, data governance is inextricably linked to the data science platform that creates and deploys models. “Skills manage spreadsheets and files, manage notebooks, and manage dashboards at the same time. It’s the modern way to manage production and consumption data,” commented Minnick. This statement rings true for data scientists who build models in their notebooks and monitor output through dashboards.

Clear and Transparent

Nonetheless, simply connecting to a data science tool platform through an API to obtain "data lineage" is only one aspect of transparently leveraging machine learning. In order to achieve the purpose of improving the output of the model, the output model also needs to be calibrated with the content determined in the data lineage. For example, how to traceability model data so that data scientists "can understand if something goes wrong with some data, they can isolate that part of the data," Minnick noted.

Logically, this knowledge can be used to understand why there are problems with specific data types, and thus correct them or improve the accuracy of the model by removing them entirely. According to Minnick, more and more organizations are realizing the benefits of applying “data lineage” to model results, “due in part to the rise of machine learning and artificial intelligence in various industries today. It’s becoming more and more common. Last year, when we launched our AutoML product, we used a "glass box" to represent transparency into data sources."

Regulatory consequences and others

Some organizations also use "data lineage" to provide The ability of adaptive cognitive computing models to enhance their regulatory compliance capabilities. Industries such as finance and healthcare are highly regulated, requiring companies to clearly explain how they make decisions for their customers. Data traceability creates a roadmap for building machine learning models and understanding model results—invaluable for regulatory compliance.

This information also assists internal audits, allowing companies to understand where they are failing in regulatory areas so that issues can be corrected to prevent breaches. “Being able to present very granular data lineage information to regulators, not just across tables but where that data can be used anywhere across a broad organization, is really important,” Minnick asserts. When this advantage coincides with the idea that data sources improve model accuracy, this approach is likely to become a best practice for deploying this technology.

Translator Introduction

Cui Hao, 51CTO community editor and senior architect, has 18 years of software development and architecture experience and 10 years of distributed architecture experience. Formerly a technical expert at HP. He is willing to share and has written many popular technical articles with more than 600,000 reads. Author of "Principles and Practice of Distributed Architecture".

Original title: A “Glass Box” Approach to Responsible Machine Learning​, author: Jelani Harper ​

The above is the detailed content of Responsible machine learning – the “glass box” approach. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete