“基于医疗知识图谱的问答系统”代码解析（二）

2023-10-26

“基于医疗知识图谱的问答系统”代码解析（二）

question_classifier.py --问题分类器代码解析

“基于知识医疗图谱的问答系统”代码解析（一）
“基于医疗知识图谱的问答系统”代码解析（三）
“基于医疗知识图谱的问答系统”代码解析（四）
“基于医疗知识图谱的问答系统”代码解析（五）

#!/usr/bin/env python3
# coding: utf-8
# File: question_classifier.py
# Author: lhy<lhy_in_blcu@126.com,https://huangyong.github.io>
# Date: 18-10-4

# 导入操作系统接口模块
import os
# ahocosick：自动机的意思
#  可实现自动批量匹配字符串的作用，即可一次返回该条字符串中命中的所有关键词
import ahocorasick

# 建立问题分类器的类
class QuestionClassifier:
    def __init__(self):
        # cur_dir 是当前目录 其中[:-1]可以达到返回上一层的效果
        cur_dir = '/'.join(os.path.abspath(__file__).split('/')[:-1])
        # 加载特征词路径
        self.disease_path = os.path.join(cur_dir, 'dict/disease.txt')
        self.department_path = os.path.join(cur_dir, 'dict/department.txt')
        self.check_path = os.path.join(cur_dir, 'dict/check.txt')
        self.drug_path = os.path.join(cur_dir, 'dict/drug.txt')
        self.food_path = os.path.join(cur_dir, 'dict/food.txt')
        self.producer_path = os.path.join(cur_dir, 'dict/producer.txt')
        self.symptom_path = os.path.join(cur_dir, 'dict/symptom.txt')
        self.deny_path = os.path.join(cur_dir, 'dict/deny.txt')
        # 加载特征词  这里encoding用的是‘utf-8’模式，不加的话，我的pycharm会报错
        self.disease_wds= [i.strip() for i in open(self.disease_path,encoding='utf-8') if i.strip()]
        self.department_wds= [i.strip() for i in open(self.department_path,encoding='utf-8') if i.strip()]
        self.check_wds= [i.strip() for i in open(self.check_path,encoding='utf-8') if i.strip()]
        self.drug_wds= [i.strip() for i in open(self.drug_path,encoding='utf-8') if i.strip()]
        self.food_wds= [i.strip() for i in open(self.food_path,encoding='utf-8') if i.strip()]
        self.producer_wds= [i.strip() for i in open(self.producer_path,encoding='utf-8') if i.strip()]
        self.symptom_wds= [i.strip() for i in open(self.symptom_path,encoding='utf-8') if i.strip()]
        self.region_words = set(self.department_wds + self.disease_wds + self.check_wds + self.drug_wds + self.food_wds + self.producer_wds + self.symptom_wds)
        self.deny_words = [i.strip() for i in open(self.deny_path,encoding='utf-8') if i.strip()]
        # 构造领域 actree
        self.region_tree = self.build_actree(list(self.region_words))
        # 构建词典 格式比如{'感冒':'disease'....}
        self.wdtype_dict = self.build_wdtype_dict()
        # 问句疑问词
        self.symptom_qwds = ['症状', '表征', '现象', '症候', '表现']
        self.cause_qwds = ['原因','成因', '为什么', '怎么会', '怎样才', '咋样才', '怎样会', '如何会', '为啥', '为何', '如何才会', '怎么才会', '会导致', '会造成']
        self.acompany_qwds = ['并发症', '并发', '一起发生', '一并发生', '一起出现', '一并出现', '一同发生', '一同出现', '伴随发生', '伴随', '共现']
        self.food_qwds = ['饮食', '饮用', '吃', '食', '伙食', '膳食', '喝', '菜' ,'忌口', '补品', '保健品', '食谱', '菜谱', '食用', '食物','补品']
        self.drug_qwds = ['药', '药品', '用药', '胶囊', '口服液', '炎片']
        self.prevent_qwds = ['预防', '防范', '抵制', '抵御', '防止','躲避','逃避','避开','免得','逃开','避开','避掉','躲开','躲掉','绕开',
                             '怎样才能不', '怎么才能不', '咋样才能不','咋才能不', '如何才能不',
                             '怎样才不', '怎么才不', '咋样才不','咋才不', '如何才不',
                             '怎样才可以不', '怎么才可以不', '咋样才可以不', '咋才可以不', '如何可以不',
                             '怎样才可不', '怎么才可不', '咋样才可不', '咋才可不', '如何可不']
        self.lasttime_qwds = ['周期', '多久', '多长时间', '多少时间', '几天', '几年', '多少天', '多少小时', '几个小时', '多少年']
        self.cureway_qwds = ['怎么治疗', '如何医治', '怎么医治', '怎么治', '怎么医', '如何治', '医治方式', '疗法', '咋治', '怎么办', '咋办', '咋治']
        self.cureprob_qwds = ['多大概率能治好', '多大几率能治好', '治好希望大么', '几率', '几成', '比例', '可能性', '能治', '可治', '可以治', '可以医']
        self.easyget_qwds = ['易感人群', '容易感染', '易发人群', '什么人', '哪些人', '感染', '染上', '得上']
        self.check_qwds = ['检查', '检查项目', '查出', '检查', '测出', '试出']
        self.belong_qwds = ['属于什么科', '属于', '什么科', '科室']
        self.cure_qwds = ['治疗什么', '治啥', '治疗啥', '医治啥', '治愈啥', '主治啥', '主治什么', '有什么用', '有何用', '用处', '用途',
                          '有什么好处', '有什么益处', '有何益处', '用来', '用来做啥', '用来作甚', '需要', '要']

        print('model init finished ......')

        return

    '''分类主函数'''
    def classify(self, question):
        data = {}
        # check_medical 是定义在后面的函数 搜寻最终提取词的信息 比如{'感冒‘：’diseases‘.....}
        medical_dict = self.check_medical(question)
        # 若不存在
        if not medical_dict:
            return {}
        data['args'] = medical_dict
        # 收集问句当中所涉及到的实体类型
        types = []
        for type_ in medical_dict.values():
            types += type_
        # 定义问题类型
        question_type = 'others'
        question_types = []

        # 症状
        if self.check_words(self.symptom_qwds, question) and ('disease' in types):
            question_type = 'disease_symptom'
            question_types.append(question_type)
        if self.check_words(self.symptom_qwds, question) and ('symptom' in types):
            question_type = 'symptom_disease'
            question_types.append(question_type)

        # 原因
        if self.check_words(self.cause_qwds, question) and ('disease' in types):
            question_type = 'disease_cause'
            question_types.append(question_type)

        # 并发症
        if self.check_words(self.acompany_qwds, question) and ('disease' in types):
            question_type = 'disease_acompany'
            question_types.append(question_type)

        # 推荐食品
        if self.check_words(self.food_qwds, question) and 'disease' in types:
            deny_status = self.check_words(self.deny_words, question)
            if deny_status:
                question_type = 'disease_not_food'
            else:
                question_type = 'disease_do_food'
            question_types.append(question_type)

        # 已知食物找疾病
        if self.check_words(self.food_qwds+self.cure_qwds, question) and 'food' in types:
            deny_status = self.check_words(self.deny_words, question)
            if deny_status:
                question_type = 'food_not_disease'
            else:
                question_type = 'food_do_disease'
            question_types.append(question_type)

        # 推荐药品
        if self.check_words(self.drug_qwds, question) and 'disease' in types:
            question_type = 'disease_drug'
            question_types.append(question_type)

        # 药品治啥病
        if self.check_words(self.cure_qwds, question) and 'drug' in types:
            question_type = 'drug_disease'
            question_types.append(question_type)

        # 疾病接受检查项目
        if self.check_words(self.check_qwds, question) and 'disease' in types:
            question_type = 'disease_check'
            question_types.append(question_type)

        # 已知检查项目查相应疾病
        if self.check_words(self.check_qwds+self.cure_qwds, question) and 'check' in types:
            question_type = 'check_disease'
            question_types.append(question_type)

        # 症状防御
        if self.check_words(self.prevent_qwds, question) and 'disease' in types:
            question_type = 'disease_prevent'
            question_types.append(question_type)

        # 疾病医疗周期
        if self.check_words(self.lasttime_qwds, question) and 'disease' in types:
            question_type = 'disease_lasttime'
            question_types.append(question_type)

        # 疾病治疗方式
        if self.check_words(self.cureway_qwds, question) and 'disease' in types:
            question_type = 'disease_cureway'
            question_types.append(question_type)

        # 疾病治愈可能性
        if self.check_words(self.cureprob_qwds, question) and 'disease' in types:
            question_type = 'disease_cureprob'
            question_types.append(question_type)

        # 疾病易感染人群
        if self.check_words(self.easyget_qwds, question) and 'disease' in types :
            question_type = 'disease_easyget'
            question_types.append(question_type)

        # 若没有查到相关的外部查询信息，那么则将该疾病的描述信息返回
        if question_types == [] and 'disease' in types:
            question_types = ['disease_desc']

        # 若没有查到相关的外部查询信息，那么则将该疾病的描述信息返回
        if question_types == [] and 'symptom' in types:
            question_types = ['symptom_disease']

        # 将多个分类结果进行合并处理，组装成一个字典
        data['question_types'] = question_types

        return data

    '''构造词对应的类型'''
    def build_wdtype_dict(self):
        wd_dict = dict()
        # region_words 包含了一系列信息
        for wd in self.region_words:
            wd_dict[wd] = []
            # 查询 关键词 是否在对应的列表中存在，若存在则添加，不存在返回空
            if wd in self.disease_wds:
                wd_dict[wd].append('disease')
            if wd in self.department_wds:
                wd_dict[wd].append('department')
            if wd in self.check_wds:
                wd_dict[wd].append('check')
            if wd in self.drug_wds:
                wd_dict[wd].append('drug')
            if wd in self.food_wds:
                wd_dict[wd].append('food')
            if wd in self.symptom_wds:
                wd_dict[wd].append('symptom')
            if wd in self.producer_wds:
                wd_dict[wd].append('producer')
        return wd_dict

    '''构造actree，加速过滤'''
    def build_actree(self, wordlist):
        # 类似kmp  快速匹配
        actree = ahocorasick.Automaton()
        for index, word in enumerate(wordlist):
            actree.add_word(word, (index, word))
        actree.make_automaton()
        return actree

    '''问句过滤'''
    def check_medical(self, question):
        region_wds = []
        # region_tree 是一棵用region_wds 做出来的actree，快速找出question与之匹配的实体
        # 但是有时候匹配的结果与我们想的不一，比如“瓜烧白菜”和“白菜”是不一样的
        for i in self.region_tree.iter(question):
            # wd是question 用actree做了加速
            wd = i[1][1]
            region_wds.append(wd)
        # 利用停用词过滤
        stop_wds = []
        for wd1 in region_wds:
            for wd2 in region_wds:
                # 如果词语不一样，则添加较长的
                if wd1 in wd2 and wd1 != wd2:
                    stop_wds.append(wd1)
        # 更新最后剩下的词语组合
        final_wds = [i for i in region_wds if i not in stop_wds]
        # 更新字典，格式比如{'感冒':'disease'....}
        final_dict = {i:self.wdtype_dict.get(i) for i in final_wds}
        return final_dict

    '''基于特征词进行分类'''
    def check_words(self, wds, sent):
        for wd in wds:
            if wd in sent:
                return True
        return False


if __name__ == '__main__':
    handler = QuestionClassifier()
    # 问题输入到分类过程
    while 1:
        question = input('input an question:')
        data = handler.classify(question)
        print(data)

总结

就是把问题里的关键词提取，然后各个分类了一下，如有不足，欢迎提出。

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

知识图谱问答

python

neo4j

知识图谱

“基于医疗知识图谱的问答系统”代码解析（二）的相关文章

如何在 Python 3.8+ 和 Python 2.7 中使用 collections.abc

在Python 3 3 抽象基类中collections like MutableMapping or MutableSequence 被移至二级模块collections abc 所以在 Python 3 3 中真实类型是collec
Pycharm3.0.1&Win7，当我调试我的webpy项目时，输出“ImportError: Cannot import namecompile_command”

Traceback most recent call last File D Program Files JetBrains PyCharm 3 0 1 helpers pydev pydevd py line 2 in
Docker 远程上的 Pycharm 远程解释器：[Errno 2] 没有这样的文件或目录

正如标题中所指定的我尝试在 LAN 中的远程服务器上托管的 Docker 计算机中使用 Pycharm Professional 2018 2 和 python 远程解释器我按照帮助创建了一个非常简单的示例 https www jetb
用任意参数替换 sympy 函数

这应该是一项简单的任务但我很难让它在 Sympy 中工作我想用特定公式替换带有任意参数的未定义函数例如 from sympy import var a b c f Function f test f a b lin test subs
Brython 完全是客户端吗？

我有一段用Python编写的代码我想将该代码放在网页中 Brython 似乎是将这两件事粘合在一起的最简单方法但我没有可以在服务器端实际运行代码的服务器 Brython 是否需要服务器端代码或者我可以通过例如 Dropbox 便宜地
如何对使用 SimpleITK 读取的 DICOM 图像进行直方图均衡化

我正在尝试对从 nii gz 文件读取的所有图像进行直方图均衡我试过这段代码 import SimpleITK as sitk flair file content gdrive My Drive Colab Notebooks FLAI
使用Python将宏注入电子表格

我有一个宏我想使用一堆现有的电子表格唯一的问题是电子表格太多了手工做太费时间了我已经编写了一个 Python 脚本来使用 pyWin32 访问所需的文件但我似乎无法找到使用它来添加宏的方法一个类似的问题here给出了这个答案它
如何交错或创建两个字符串的唯一排列（无需递归）

问题是打印两个给定字符串的所有可能的交错所以我用 Python 编写了一个工作代码其运行如下 def inter arr1 arr2 p1 p2 arr thisarr copy arr if p1 len arr1 and p2 le
如何从 Google Colab 笔记本运行“.py”文件中的 Python 脚本？

javascript IPython OutputArea prototype should scroll function lines return false run rl base py 我运行此错误提示 rl base py 文件
如何从 nltk 分类器获得精度和召回率？

import nltk from nltk corpus import movie reviews from nltk tokenize import word tokenize documents list movie reviews w
如何将 .pb 文件转换为 .h5。（张量流模型到keras）

我已经使用重新训练了我的模型tensorflow现在想使用keras以避免会话内容我怎样才能转换 pb文件至 h5 import tensorflow as tf from tensorflow keras models import s
如何检测pyside2中Qwebengine内的按钮点击

我在 pyside2 中编写了一个应用程序它在 QWebEngine 中打开一个网页该网页有 2 个按钮我不明白如何检测 pyside2 应用程序模块中的按钮单击我需要对该按钮单击执行其他操作 Example 下面是我的代码 fro
Pydub 按样本切片音频片段

假设我有两个采样率相同的音频片段它们是从 Pydub 中的 wav 文件导入的并且假设我知道哪个更短现在假设我想将较长的音频文件分成两个片段以便第一个片段与较短的音频文件具有完全相同的长度精确到相同的样本数量并将这两个片段中的每
VotingClassifier：不同的功能集

在我的例子中我有两个不同的功能集因此行数相同且标签相同 DataFrames df1 A B C 1 4 2 1 4 8 2 1 1 2 3 0 3 2 5 df2 E F 6 1 1 3 8 1 2 8 5 2 labels lab
将雅虎财经导入Python时遇到问题

我已经使用 pip 从 PyPI 安装了 yahoo Finance 当我运行以下脚本时出现导入错误没有名为 yahoo finance 的模块 from yahoo finance import Share BlackDiamond
如何在Python 3.7中使用Pygame显示用Pillow加载的图像？

我使用以下命令将图像导入到我的项目中 from PIL import Image myImage Image open myImageDirectory png 所以 myImage 现在作为 png 文件导入但我想使用 Pygame 将
在所有列上 apply() 自定义函数提高效率

我应用这个功能 def calculate recency for one column column pd Series gt int Returns the inverse position of the last non zero v
基于一个键将数据从 df 复制到多列中的另一个 df

我有两个数据框 df1 和 df2 每个数据帧的唯一标识符是 ID 和 Prop Number 我需要将 df1 中的 Num1 2 和 3 列复制到 df2 1 Num 中的相应列但我不确定如何对多个列进行合并我想将 df2 保留为
Python列表来存储类实例？

给定一个 python 类class Student 和一个清单names 然后我想创建几个实例Student 并将它们添加到列表中names names For storing the student instances class St
使用pytube下载youtube视频时如何添加tqdm以显示进度条？

我在学习pytube下载 Youtube 视频并尝试tqdm在它的顶部显示进度条但它显示各种错误而且我无法理解当我下载视频时发生了什么pytube并显示进度条这就是我无法添加的原因tqdm in it 我写的代码pytube运行良好

随机推荐

Chrome浏览器命令行启动参数

Chrome浏览器命令行启动参数 http blog csdn net qq 32786873 article details 70173265 http blog csdn net u012593626 article details 4
Markdown 基本语法

Markdown 基本语法初级一什么是Markdown Markdown 是一种轻量级标记语言它允许人们使用易读易写的纯文本格式编写文档将格式元素添加到纯文本文档 Markdown 允许您使用易于阅读易于编写的纯文本格式进行编写
[运营专题]零预算引爆个人和企业品牌

文章推荐 Selenium 自动化测试从零实战原文链接原来这样做才能向架构师靠近原文链接 Cordova App 打包全揭秘原文链接 TensorFlow on Android 物体识别原文链接 TensorFlow on An
File 类和 InputStream, OutputStream 的用法总结

目录一 File 类 1 File类属性 2 构造方法 3 普通方法二 InputStream 1 方法 2 FileInputStream 三 OutputStream 1 方法 2 FileOutputStream 四针对字符流对
点云数据生成三维模型_Agisoftphotoscan生成三维模型步骤

随着航空测量技术的飞速发展利用低空无人飞机进行航空摄影获取遥感数据已成为现实利用Agisoftphotoscan软件进行影像数据处理生成数字地表模型 DSM 和正射影像图 DOM 产品的生产数据生产流程 1 无人机的用途及种类的不同
FFmpeg编译配置命令

configure help Usage configure options Options defaults in brackets after descriptions Help options help print this mess
华为OD机试 - 城市聚集度（Java)

题目描述一张地图上有n个城市城市和城市之间有且只有一条道路相连要么直接相连要么通过其它城市中转相连可中转一次或多次城市与城市之间的道路都不会成环当切断通往某个城市 i 的所有道路后地图上将分为多个连通的城市群设该城市i的聚
我在项目中遇到的一些经典功能bug

1 传参类型不同类型是数组实际传的是字符串导致重置搜索条件后导出失败刷新页面或者切换中英文也是刷新页面的效果初始化赋值为null 可以导出成功重置搜索条件后导出失败 2 下划线百分号可能是适配符特殊字符空格边界值
AlSD 系列智能安全配电装置是安科瑞电气有限公司专门为低压配电侧开发的一款智能安全用电产品-安科瑞黄安南

一应用背景电力作为一种清洁能源给人们带来了舒适便捷的电气化生活与此同时由于使用不当维护不及时等原因引发的漏电触电和电气火灾事故也给人们的生命和财产带来了巨大的威胁和损失为了防止低压配电系统发生漏电和电气火灾事故传统的方
网络运维词汇汇总

本篇之所以起该名字是因为我在一家网络公司工作所遇到的一些相关词汇仅供参考 1 关系型数据库服务 RDS 关系型数据库服务 RelationalDatabase Service 简称RDS 是一种稳定可靠可弹性伸缩的在线数据库服务 RD
Metasploitable2在VMware上的安装与初步渗透学习

环境靶机 Metasploitable2 IP 未知攻击机 KALI IP 192 168 127 5 平台 VMware 16 2 4 一 Metasploitable2的简介 Metasploitable2是一个故意易受攻击的Lin
JDK8之Stream流

1 集合处理数据的弊端当我们在需要对集合中的元素进行操作的时候除了必需的添加删除获取外最典型的操作就是集合遍历 public class StreamTest01 public static void main String ar
在cmd控制台运行java程序，错误: 编码GBK的不可映射字符？

此错误是由于字符编码造成的出现这样的错误一般是因为代码中含有中文字符注释中的中文字符也算由于使用CMD运行java程序的时候系统默认的编码格式是gbk 而包含中文字符的代码一般是UNICODE格式所以直接运行含有中文字符的代码就
JBPM工作流管理例子

工作中要用到JBPM写工作流自习的时候找到一篇较好的文章贴过来共享下在某一公司中部门员工要休假的话需要部门主管的批准如果休假天数大于10天的话在部门主管的同意后还必须上级主管批准如果是部门主管要休假只要上级主管批准即可在休
For input string: “ “

For input string 如果出现这样的异常报错是指的数据转换时出错比如字符串转整数解决方法去检查前端代码中相应的值的value 有没有多了空格删去即可因为本身就是引用的一个int变量的值加了一个空格后反而变成了字符
EI会议论文的检索报告怎么开?

根据Engineering Village数据库可以通过检索确定会议论文是否被EI收录并可开具检索报告 EI检索覆盖了EI期刊和EI会议等资源要开具EI会议论文检索报告先是要进行查询确保能查到后才可以开具查询方式如下直接查询
python自动导入包_【pycharm常用设置】自动导入包+自动生成文件头注释

一自动导入包设置首先确保pycharm中设置 File Settings General Auto Import Python Show import popup 导入包是alt enter 键组合如果弹出下拉菜单选项说明缺少依赖
C语言-队列

队列是一种特殊的线性表特殊之处在于它只允许在表的前端 front 进行删除操作而在表的后端 rear 进行插入操作和栈一样队列是一种操作受限制的线性表进行插入操作的端称为队尾进行删除操作的端称为队头队列的特性先进先出后进后
Windows系统漏洞之5次Shift漏洞启动计算机

一原理知识当我们使用计算机时连续按下5次shift键会弹出一个程序程序名称为 esthc exe 其路径为 c windows system32 sethc exe 该系统漏洞由于部分Win7及Win10在未进入系统时可以通过连续
“基于医疗知识图谱的问答系统”代码解析（二）

基于医疗知识图谱的问答系统代码解析二 question classifier py 问题分类器代码解析基于知识医疗图谱的问答系统代码解析一基于医疗知识图谱的问答系统代码解析三基于医疗知识图谱的问答系统代码解析四基于医

“基于医疗知识图谱的问答系统”代码解析（二）

“基于医疗知识图谱的问答系统”代码解析（二）

question_classifier.py --问题分类器代码解析

总结

“基于医疗知识图谱的问答系统”代码解析（二） 的相关文章

随机推荐

热门标签

“基于医疗知识图谱的问答系统”代码解析（二）的相关文章