Cổng tri thức PTIT

Trang chủ

Giới thiệu

AI Cộng đồng

Kho tri thức

Tin tức

Liên hệ

Bài báo quốc tế

Kho tri thức

Bài báo quốc tế

Information extraction from Visually Rich Documents using graph convolutional network

Trịnh Thịnh, Nguyễn Trọng Khánh

Visually rich documents, such as forms, invoices, receipts, and ID cards, are ubiquitous in daily business and life. Various methods have been used to convey such diverse information, including text, layout, font size, or text position. Combining these elements in information extraction can improve the result performance. However, previous works have not effectively utilized the cooperation between these rich information sources. Text detection and recognition have been performed without semantic supervision (e.g., entity name annotation), and text information extraction has been performed using only serialized plain text, ignoring rich visual information. This paper presents a method for extracting information from such documents, which integrates textual, non-spatial, and spatial visual features. The method consists of two main steps and uses three deep neural networks. The first step, Text Reading, employs two CNN models (Lightweight DB and C-PREN) for OCR tasks, based on the state-of-the-art models DB and PREN, with two improvements. These improvements include reducing noise by removing the SE block of DB and integrating both context and position features in PREN. The second step, Text Information Extraction, uses a graph convolutional network, RGCN, for name entity recognition. Experiments on self-collected and two public datasets have demonstrated that our method improves the performance of the original models and outperforms other state-of-the-art methods.

Xuất bản trên:

Journal of Intelligent & Fuzzy Systems

Ngày đăng:

2023

DOI:

http://dx.doi.org/10.3233/JIFS-230204

Nhà xuất bản:

IOS Press

Địa điểm:

Từ khoá:

Graph Convolutional Network, OCR, Text detection, text recognition, NER

Bài báo liên quan

Sign Language Recognition With Self-Learning Fusion Model

Vũ Hoài Nam, Phạm Văn Cường, Hoàng Mậu Trung, Trần Tiến Công

Person re-identification from multiple surveillance cameras combining face and body feature matching

Nguyen Xuan Ha, Hoang Nhu Dong, Nguyen V Thang, Pham D An, Nguyen Duc Toan, Đặng Minh Tuấn

Performance Analysis of AV1 video codec

Nguyễn Thị Thu Hiên, Lê Thanh Thủy

Measurement of Bubbly Two-Phase Flow in a Vertical Pipe Using Ultrasonic Velocity Profiler and Digital Optical Imaging

Nguyễn Tất Thắng

Learning Binary Codes for Fast Image Retrieval with Sparse Discriminant Analysis and Deep Autoencoders

Đào Thị Thúy Quỳnh, An Hồng Sơn, Nguyễn Hữu Quỳnh, Cù Việt Dũng, Ngô Quốc Tạo

Learning Adaptive Motion Search for Fast Versatile Video Coding in Visual Surveillance Systems

Hoàng Văn Xiêm, Nguyễn Quang Sang, Bùi Thanh Hương, Vũ Hữu Tiến

Hand gesture recognition from wrist-worn camera for human-machine interaction

Nguyễn Hồng Quân, Lê Trung Hiếu, Trần Trung Kiên, Hoàng Nhật Tân, Trần Thị Thanh Hải, Lê Thị Lan, Vũ Hải, Nguyễn Thanh Phương, Nguyễn Hữu Thanh, Phạm Văn Cường