a天堂在线资源,欧美性视频在线,欧洲av一区二区

基于多頭卷積殘差連接的文本數(shù)據(jù)實(shí)體識(shí)別

網(wǎng)絡(luò)安全與數(shù)據(jù)治理

劉微，李波，楊思瑤

沈陽(yáng)理工大學(xué)信息科學(xué)與工程學(xué)院

摘要： 為構(gòu)建工作報(bào)告中的文本數(shù)據(jù)關(guān)系型數(shù)據(jù)庫(kù)，針對(duì)非結(jié)構(gòu)化文本數(shù)據(jù)中有效信息實(shí)體提取問(wèn)題以及傳統(tǒng)網(wǎng)絡(luò)在提取信息時(shí)特征丟失問(wèn)題，設(shè)計(jì)了一種基于深度學(xué)習(xí)的實(shí)體識(shí)別模型RoBERTa-MCR-BiGRU-CRF，首先利用預(yù)訓(xùn)練模型RoBERTa作為編碼器，將訓(xùn)練后的詞向量輸入到多頭卷積殘差網(wǎng)絡(luò)層MCR擴(kuò)充語(yǔ)義信息，接著輸入到門控循環(huán)BiGRU層進(jìn)一步提取上下文特征，最后經(jīng)過(guò)條件隨機(jī)場(chǎng)CRF層解碼進(jìn)行標(biāo)簽判別。經(jīng)過(guò)實(shí)驗(yàn)，模型在工作報(bào)告數(shù)據(jù)集上F1值達(dá)到96.64%，優(yōu)于其他對(duì)比模型；并且在數(shù)據(jù)名稱實(shí)體類別上，F(xiàn)1值分別比BERT-BiLSTM-CRF和RoBERTa-BiGRU-CRF提高了3.18%、2.87%，結(jié)果表明該模型能較好地提取非結(jié)構(gòu)化文本中的有效信息。

關(guān)鍵詞： 深度學(xué)習(xí) 命名實(shí)體識(shí)別神經(jīng)網(wǎng)絡(luò) 數(shù)據(jù)挖掘

中圖分類號(hào)：TP391.1文獻(xiàn)標(biāo)識(shí)碼：ADOI:10.19358/j.issn.2097-1788.2024.12.008
引用格式：劉微，李波，楊思瑤. 基于多頭卷積殘差連接的文本數(shù)據(jù)實(shí)體識(shí)別［J］.網(wǎng)絡(luò)安全與數(shù)據(jù)治理，2024，43（12）：54-59.

Text data entity recognition based on muti-head convolution residual connections

Liu Wei, Li Bo, Yang Siyao

School of Information Science and Engineering, Shenyang University of Technology

Abstract： To construct a relational database for text data in work reports, and address the problem of extracting useful information entities from unstructured text and feature loss in traditional networks during information extraction, a deep learning-based entity recognition model, which is named RoBERTa-MCR-BiGRU-CRF is proposed. The model firstly uses the pre-trained model Robustly Optimized BERT Pretraining Approach (RoBERTa) as an encoder, feeding the trained word embeddings into the Multi-head Convolutional Residual network (MCR) layer to enrich semantic information. Next, the embeddings are input into a gated recurrent Bidirectional Gated Recurrent Unit (BiGRU) layer to further capture contextual features. Finally, a Conditional Random Field (CRF) layer is used for decoding and label prediction. Experimental results show that the model achieves an F1 score of 96.64% on the work report dataset, outperforming other comparative models. Additionally, for named entity categories in the data, the F1 score is 3.18% and 2.87% higher than BERT-BiLSTM-CRF and RoBERTa-BiGRU-CRF, respectively. The results demonstrate the model′s effectiveness in extracting useful information from unstructured text.

Key words : deep learning; named entity recognition; neural networks; data mining

引言

實(shí)體識(shí)別在信息抽取方面有著重要作用，現(xiàn)階段數(shù)據(jù)提取主要是利用深度學(xué)習(xí)技術(shù)，運(yùn)用到命名實(shí)體識(shí)別（Named Entity Recognition，NER）中提取名詞和一些相關(guān)概念。命名實(shí)體識(shí)別可以提取有效數(shù)據(jù)，去除無(wú)關(guān)信息，方便建立數(shù)據(jù)庫(kù)，對(duì)數(shù)據(jù)進(jìn)行后續(xù)處理與追蹤從而提升其安全性，可以應(yīng)用于構(gòu)建知識(shí)圖譜問(wèn)答系統(tǒng)和數(shù)據(jù)追溯系統(tǒng)等領(lǐng)域。實(shí)體識(shí)別本質(zhì)上是解決一個(gè)序列標(biāo)注問(wèn)題，對(duì)文本和數(shù)字序列進(jìn)行標(biāo)簽分類。

隨著深度學(xué)習(xí)技術(shù)的發(fā)展，實(shí)體識(shí)別取得了顯著進(jìn)展，傳統(tǒng)的基于規(guī)則和詞典的方法逐漸被基于統(tǒng)計(jì)學(xué)習(xí)和神經(jīng)網(wǎng)絡(luò)的方法所取代，自2018年以來(lái)，基于BERT的預(yù)訓(xùn)練神經(jīng)網(wǎng)絡(luò)模型（如BERT-BiLSTM-CRF）在多個(gè)公開(kāi)數(shù)據(jù)集上達(dá)到了同年的最好性能。本文提出一種新的融合外部知識(shí)資源的方法來(lái)提高NER模型的性能。本模型在自制的數(shù)據(jù)集上進(jìn)行實(shí)驗(yàn)，驗(yàn)證了所提方法在非結(jié)構(gòu)文本數(shù)據(jù)方面識(shí)別的性能，證明模型在NER任務(wù)中的有效性。

本文詳細(xì)內(nèi)容請(qǐng)下載：

http://www.jysgc.com/resource/share/2000006267

作者信息：

劉微，李波，楊思瑤

（沈陽(yáng)理工大學(xué)信息科學(xué)與工程學(xué)院，遼寧沈陽(yáng)110158）

Magazine.Subscription.jpg

原創(chuàng)聲明：此內(nèi)容為AET網(wǎng)站原創(chuàng)，未經(jīng)授權(quán)禁止轉(zhuǎn)載。

相關(guān)內(nèi)容