English

中国农机化学报

中国农机化学报 ›› 2022, Vol. 43 ›› Issue (12): 125-132.DOI: 10.13733/j.jcam.issn.2095-5553.2022.12.019

• 农业信息化工程 • 上一篇    下一篇

基于BiLSTM-CNN的水稻问句相似度匹配方法研究

刘志超1,王晓敏2, 3,吴华瑞2, 3,王郝日钦1,许童羽1   

  1. 1. 沈阳农业大学信息与电气工程学院,沈阳市,110866; 2. 国家农业信息化工程技术研究中心,北京市,100097;

    3. 北京农业信息技术研究中心,北京市,100097
  • 出版日期:2022-12-15 发布日期:2022-12-02
  • 基金资助:
    辽宁省教育厅重点攻关项目(LSNZD202005);国家重点研发计划(2019YFD1101105);北京市科技计划(Z191100004019007);江苏大学农业装备学部项目资助(4111680005)

Research on rice questionandsentence similarity matching method based on BiLSTM-CNN

Liu Zhichao, Wang Xiaomin, Wu Huarui, Wang Haoriqin, Xu Tongyu   

  • Online:2022-12-15 Published:2022-12-02

摘要: 中国农技推广信息平台(NJTG)问答社区可以帮助用户与农业专家进行交互,从而获得精准的问题答案以解决农业场景问题。在平台问答社区中,每天会新增关于水稻的提问语句上千百条,检测相同语义问句是农业智能问答的关键技术环节,针对此问题采用字符级别的Word2Vec表示初始化问句表征,使用Siamese神经网络作为基础模型框架,学习句子的语义特征,获取上下文信息,然后使用BiLSTM长短期神经网络提取语义时序特征,最后在语义层次上使用一种包含语义信息的余弦函数计算问句相似度,并与其他语义匹配模型进行对比试验。通过构建7 820对水稻问句的相似对数据集,用来优化和训练模型的重要参数。试验结果表明:本文提出的BiLSTM-CNN模型可高效提取文本不同粒度的特征,提高水稻问句相似度匹配效果,在所构建的数据集上BiLSTM-CNN模型准确率和F1值均高于其他文本匹配模型,达到98.2%和88.75%。与此同时,所提出的模型在6种不同类别的水稻问句对的准确率也优于其他对比模型,在数据量较小的情况下,仍然可以取得较高的准确率,证明提出的模型具有良好的鲁棒性。

关键词: 水稻, 双向长短时记忆网络, 卷积神经网络, 孪生神经网络, 相似度匹配

Abstract: China Agricultural Technology Promotion Information Platform (NJTG) Q & A community can help users interact with agricultural experts, so as to obtain accurate answers to questions to solve agricultural scenarios. In the platform Q & A community, thousands of questions about rice are added every day, and the detection of the same semantic questions is the key technical link of agricultural intelligent Q & A. To solve this problem, Word2Vec at character level is used to represent the initial question representation, and Siamese neural network is used as the basic model framework to learn the semantic features of sentences and obtain the context information. Then BiLSTM long and short term neural network is used to extract semantic temporal features. Finally, a cosine function containing semantic information is used to calculate the similarity of question at the semantic level, and a comparative experiment is conducted with other semantic matching models. A similarity pair dataset of 7 820 pairs of rice questions is constructed to optimize and train the important parameters of the model. The experimental results show that the BiLSTM-CNN model proposed in this paper can extract features with different granularity effectively and improve the matching effect of rice question similarity. In the constructed data set, BiLSTM-CNN model has higher accuracy and F1 value than other text matching models, reaching 98.2% and 88.75%. At the same time, the accuracy of the proposed model is better than that of other comparison models in six different categories of rice question pairs. In the case of small amount of data, the accuracy is still high, which proves that the proposed model has good robustness.

Key words: rice, BiLSTM, convolutional neural network, siamese network, similarity matches

中图分类号: