Abstract

Multilingual Pretrained based Multi-feature Fusion Model for English Text Classification

Ruijuan Zhang

Deep learning methods have been widely applied to English text classification tasks in recent years, achieving strong performance. However, current methods face two significant challenges: (1) they struggle to effectively capture longrange contextual structure information within text sequences, and (2) they do not adequately integrate linguistic knowledge into representations for enhancing the performance of classifiers. To this end, a novel multilingual pre-training based multifeature fusion method is proposed for English text classification (MFFMP-ETC). Specifically, MFFMP-ETC consists of the multilingual feature extraction, the multilevel structure learning, and the multi-view representation fusion. MFFMP-ETC utilizes the Multilingual BERT as deep semantic extractor to introduce language information into representation learning, which significantly endows text representations with robustness. Then, MFFMP-ETC integrates Bi-LSTM and TextCNN into multilingual pre-training architecture to capture global and local structure information of English texts, via modelling bidirectional contextual semantic dependencies and multi-granularity local semantic dependencies. Meanwhile, MFFMP-ETC devises the multi-view representation fusion within the invariant semantic learning of representations to aggregate consistent and complementary information among views. MFFMP-ETC synergistically integrates Multilingual BERT's deep semantic features, Bi-LSTM's bidirectional context processing, and TextCNN local feature extraction, offering a more comprehensive and effective solution for capturing long-distance dependencies and nuanced contextual information in text classification. Finally, results on three datasets show MFFMP-ETC conducts a new baseline in terms of accuracy, sensitivity, and precision, verifying progressiveness and effectiveness of MFFMP-ETC in the text classification.