Home  >  Article  >  Technology peripherals  >  At CCIG2024, Hehe Information document analysis technology solves the "famine" problem of large model corpus

At CCIG2024, Hehe Information document analysis technology solves the "famine" problem of large model corpus

WBOY
WBOYOriginal
2024-05-31 22:28:49741browse

In 2024, the China Image and Graphics Conference opened grandly in the ancient capital Xi'an. This conference was hosted by the Chinese Image and Graphics Society and hosted by the Air Force Medical University, Xi'an Jiaotong University, and Northwestern Polytechnical University. Through more than 20 forums and more than 100 achievements, it focused on displaying production artificial intelligence, large models, and machine learning. , brain-inspired computing and other areas of image graphics.

Large model technology is being widely used with technological innovation to meet the image processing needs of many industries. During the conference, the CSIG Document Image Analysis and Recognition Special Committee and Shanghai Hehe Information Technology Co., Ltd. (referred to as "Hehe Xinheng") jointly hosted the forum "Large Model Technology and Its Frontier Applications", with representatives from South China University of Technology and Expert representatives from Shanghai Jiao Tong University, Tsinghua University, Fudan University, Shanghai Artificial Intelligence Laboratory, Hehexin University and other universities, research institutions and enterprises conducted in-depth discussions on the development and application of large model technology in the image field.

At CCIG2024, Hehe Information document analysis technology solves the famine problem of large model corpus

##Caption: Industry followers listen to the "Large Model Technology and Its Frontier Applications" forum sharing

Behind the rapid progress of large models, there is an "energy crisis" regarding model training corpus. Epoch Research, a group of artificial intelligence researchers, estimates that machine learning datasets could run out of "high-quality language data" by 2026. At this stage, a large amount of high-quality corpus data exists in books, papers, research reports, corporate documents and other documents, which are complex The layout structure restricts the training corpus processing of large models and the application capabilities of large model document question and answer. Document parsing Advances in technology allow machines to identify multiple elements in documents and better process text, tables, images Wait for multiple typesdata,restore the document reading order, and accelerate the training and application of large models. At the forum, Chang Yang, R&D Director of Hehe Information Intelligent Innovation Division, shared the work of Hehe Information intelligent document processing technology in the field of document parsing, which brought new insights to the participants. Technical perspective.

The difficulty of document parsing is how to accurately identify the various elements in the document and understand the logical relationship between them. You need to pay attention 'Physical layout analysis''Logical layout analysis. #" According to Chang Yang, physical layout analysis focuses on visual features and document layout. The main task is to aggregate highly relevant text into an area, such as a paragraph, a table, etc., and select the target detection task to perform it. Modeling uses a regression-based single-stage detection model for fitting to obtain various layouts in the document; logical layout analysis focuses on the analysis of semantic features, and the main task is to model different text blocks according to semantics , for example, through semantic hierarchical relationships, a directory tree structure is formed. In document parsing technology, document element detection, text table recognition, document

layoutanalysis, reading order restoration, etc. The task involves the judgment of layout elements and layout overall layout, which is a typical technical problem in the field of document processing. Through more than ten years of technical accumulation, Hehe Information has opened up the electronic file analysis, scanned file image processing, Text recognition, table recognition, layout analysis, layout restoration and typesetting layout and other intelligent processing of documents The whole process, facing the scanned copies of electronic documents and , can flexibly identify text, tables, wireless tables, and cross-page tables , headers, footers, formulas, images, flow charts and other layout elements, accuratelyrestore the document reading order, and provide a large modelfieldProvides accurate training corpus and documentsquestion and answer applicationexperience.

At CCIG2024, Hehe Information document analysis technology solves the famine problem of large model corpus

Caption: University researchers and students line up to experience intelligent document processing technology

"Our research process It was found that real-world documents have extremely rich layout types, which cannot be simply defined by categories such as single column, double column, and three columns. "Chang Yang said that in recent years, Open Vocabulary Object Detection (OVD), visual semantics. Work such as Alignment, as well as cutting-edge developments such as generative models, will bring new research ideas to layout analysis. The Hehe Information Technology team will also continue to delve into the field of intelligent document processing, allowing new technologies to be introduced more quickly in the industry. Generate value.

The above is the detailed content of At CCIG2024, Hehe Information document analysis technology solves the "famine" problem of large model corpus. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn