English

Journal of Chinese Agricultural Mechanization

Journal of Chinese Agricultural Mechanization ›› 2025, Vol. 46 ›› Issue (3): 336-342.DOI: 10.13733/j.jcam.issn.20955553.2025.03.047

• Comprehensive Research • Previous Articles     Next Articles

Named entity recognition of veterinary drug text based on character and word fusion and attention mechanism

Yan Shijun1, Zhu Hongmei1, Wang Yatong2, Zhang Liang1   

  1. (1.  School of Information Science and Engineering, Shandong Agricultural University, Tai'an, 271018, China;
    2. Information Center of Dongfang Electronics Group Co., Ltd., Yantai, 264000, China)
  • Online:2025-03-15 Published:2025-03-14

基于字词融合和注意力机制的兽药文本命名实体识别#br#

颜士军1,朱红梅1,王雅童2,张亮1   

  1. (1. 山东农业大学信息科学与工程学院,山东泰安,271018; 2. 东方电子集团有限公司信息中心,山东烟台,264000)
  • 基金资助:
    国家重点研发计划政府间/港澳台重点专项(2019YFE0103800);山东省自然科学基金面上项目(ZR2022MG070)

Abstract:

 In view of the characteristics of strong professionalism, strong relevance, obvious local features, and polysemy of the information in the field of veterinary drugs and the problem that the mainstream named entity recognition model does not make full use of vocabulary information, a named entity recognition model of veterinary drug text based on character and word fusion and the attention mechanism is proposed. Firstly, the character vector obtained by the pre-training model BERT and the word vector obtained by Word2vec are fused. Secondly, it utilizes a bidirectional long short-term memory network (BiLSTM) to capture overall contextual information and employs a multi-head self-attention mechanism (MHA) to extract local features from sequences. Lastly, a conditional random field (CRF) is utilized to determine the optimal sequence of labels for named entity recognition. Multiple sets of experiments on the veterinary drug text dataset show that the recognition precision, recall rate, and F1—score of the model are 94.73%, 95.29%, and 95.01%, respectively. The performance of the model is better than the comparison model.

Key words: veterinary drug text, named entity recognition, character and word fusion, multi-head self-attention mechanism

摘要:

针对兽药领域信息专业性强、关联性强、局部特征明显和一词多义的特点,以及主流的命名实体识别模型未充分利用词汇信息的问题,提出一种基于字词融合和注意力机制的兽药文本命名实体识别模型。首先,将BERT预训练模型得到的字向量和Word2vec得到的词向量融合。然后,在双向长短期记忆网络中提取全局上下文特征的基础上加入多头自注意力机制挖掘序列的局部特征。最后,通过条件随机场获得最佳标签序列来完成实体识别任务。在兽药文本数据集上进行多组对比试验,结果表明,该模型识别的准确率、召回率和F1值分别为94.73%、95.29%和95.01%,性能均优于对比模型。

关键词: 兽药文本, 命名实体识别, 字词融合, 多头自注意力机制

CLC Number: