当前位置: 首页 > news >正文

使用TrieTree(字典树)来实现敏感词过滤

使用TrieTree(字典树)来实现敏感词过滤

1. 字典树定义

字典树(TrieTree),是一种树形结构,典型应用是用于统计,排序和保存大量的字符串(但不仅限于字符串,如01字典树)。主要思想是利用字符串的公共前缀来节约存储空间。很好地利用了串的公共前缀,节约了存储空间。字典树主要包含两种操作,插入和查找。
字典树具有以下规则:
-1. 根节点不包含字符,其他节点包含一个字符。

    1. 从根节点到某一节点经过的字符连接起来构成一个字符串。如图中的 him 、 her 、 cat 、 no 、 nova。
    1. 一个字符串与 Trie 树中的一条路径对应。
    1. 在实现过程中,会在叶节点中设置一个标志,用来表示该节点是否是一个字符串的结尾,本例中用isEnd标记。
      关于字典树的插入、删除等操作,可参考以下文章:

我来说说我对 Trie 数的理解。
下面是用Java实现的简易的TrieTree字典树

import java.util.HashMap;
import java.util.Map;public class TrieTree {public class TrieNode{public char value;public int isEnd; //0表示非终结 1表示终结public Map<Character,TrieNode>children;public TrieNode(char value,int isEnd){this.value=value;this.isEnd=isEnd;this.children=new HashMap<>();}public TrieNode(char value){this.value=value;this.isEnd=0;this.children=new HashMap<>();}public TrieNode(){this.isEnd=0;this.children=new HashMap<>();}}private TrieNode root;public TrieTree(){this.root=new TrieNode();}//插入敏感词汇public void insert(String str){if(str==null||str.length()==0){return ;}root=insert(root,str,0);}//判断字符串中,是否包含敏感词汇public boolean match(String str){if (str == null || "".equals(str)) {return false;}TrieNode temp=root;for(int i=0;i<str.length();i++){char ch=str.charAt(i);//获取到下一个节点TrieNode next = temp.children.get(ch);if (next==null){temp=root;}else{temp=next;}if (temp.isEnd==1){return true;}}return false;}//移除敏感词汇public void remove(String str){if (str == null || "".equals(str)) {return;}//没有该敏感词时,直接返回if (!match(str)){return;}//开始删除敏感词root=remove(root,str,0);}private TrieNode remove(TrieNode t,String str,int index){char ch=str.charAt(index);TrieNode child = t.children.get(ch);//到达最末尾if (index==str.length()-1){if (child.children.size()>0){//当前节点有子节点时,将标记为设置为0即可child.isEnd=0;}else{//否则直接删除该节点t.children.remove(ch);}return t;}//往下删除child=remove(child,str,++index);//回溯if (child.children.size()==0&&child.isEnd==1){//当没有节点并且isEnd==0时t.children.remove(ch);}return t;}private TrieNode insert(TrieNode t,String str,int index){char ch=str.charAt(index);TrieNode child = t.children.get(ch);if (child!=null){if (index==str.length()-1){child.isEnd=1;return t;}child=insert(child,str,++index);
//            t.children.put(ch,child);return t;}child=new TrieNode(ch);if (index==str.length()-1){child.isEnd=1;}else{child=insert(child,str,++index);}t.children.put(ch,child);return t;}public static void main(String[] args) {String[]sensitive={"华南理工","大学生","泰裤辣"};TrieTree trieTree=new TrieTree();for(int i=0;i<sensitive.length;i++){trieTree.insert(sensitive[i]);}System.out.println(trieTree.match("我是华南大学的学生"));System.out.println(trieTree.match("华北理工大学泰裤"));System.out.println(trieTree.match("华南理工大学"));System.out.println(trieTree.match("大学生"));System.out.println(trieTree.match("大学生泰裤辣"));System.out.println(trieTree.match("人之初性本善性相近习相远华南大学泰山崩于前而面不改色泰裤辣哈哈哈哈哈哈"));trieTree.remove("华南理工");System.out.println(trieTree.match("华南理工大学"));trieTree.remove("大学生");System.out.println(trieTree.match("大学生"));trieTree.remove("泰裤辣");System.out.println(trieTree.match("人之初性本善性相近习相远华南大学泰山崩于前而面不改色泰裤辣哈哈哈哈哈哈"));}
}

测试结果如下:
在这里插入图片描述

2. 使用字典树实现话题发布时,检查是否有敏感词汇

先创建三个表,分别是m_user用户表,m_topic话题表,m_sensitive敏感词汇表,表的具体内容如下:
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
表的部分内容如下:
在这里插入图片描述
在这里插入图片描述
创建一个maven项目,添加下列依赖

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"><modelVersion>4.0.0</modelVersion><groupId>com.young</groupId><artifactId>trie01</artifactId><version>1.0-SNAPSHOT</version><parent><artifactId>spring-boot-starter-parent</artifactId><groupId>org.springframework.boot</groupId><version>2.7.0</version></parent><dependencies><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-web</artifactId></dependency><dependency><groupId>mysql</groupId><artifactId>mysql-connector-java</artifactId></dependency><dependency><groupId>com.baomidou</groupId><artifactId>mybatis-plus-boot-starter</artifactId><version>3.4.3</version></dependency><dependency><groupId>org.projectlombok</groupId><artifactId>lombok</artifactId></dependency><dependency><groupId>com.alibaba</groupId><artifactId>fastjson</artifactId><version>1.2.83</version></dependency></dependencies><properties><maven.compiler.source>11</maven.compiler.source><maven.compiler.target>11</maven.compiler.target></properties></project>

application.yml

server:port: 8089
spring:datasource:username: rootpassword: 123456url: jdbc:mysql://localhost:3306/young?useSSL=false&serverTimezone=UTCdriver: com.mysql.cj.jdbc.Driver
mybatis-plus:global-config:db-config:logic-not-delete-value: 0logic-delete-value: 1

实体类信息如下图,其中User类中有一个字段isEnabled,我们可以使用这个字段,来约束用户的行为,但用户多次发布含有不当言论的话题时,将用户的isEnable置为0,这里为了方便演示,不实现该功能
在这里插入图片描述
相关的mapper
UserMapper.java

package com.young.mapper;import com.baomidou.mybatisplus.core.mapper.BaseMapper;
import com.young.entity.User;
import org.apache.ibatis.annotations.Mapper;@Mapper
public interface UserMapper extends BaseMapper<User> {
}

SensitiveMapper.java

package com.young.mapper;import com.baomidou.mybatisplus.core.mapper.BaseMapper;
import com.young.entity.Sensitive;
import org.apache.ibatis.annotations.Mapper;@Mapper
public interface SensitiveMapper extends BaseMapper<Sensitive> {
}

TopicMapper.java

package com.young.mapper;import com.baomidou.mybatisplus.core.mapper.BaseMapper;
import com.young.entity.Topic;
import org.apache.ibatis.annotations.Mapper;@Mapper
public interface TopicMapper extends BaseMapper<Topic> {
}

UserService.java

package com.young.service;import com.baomidou.mybatisplus.core.conditions.query.LambdaQueryWrapper;
import com.young.entity.User;
import com.young.mapper.UserMapper;
import com.young.vo.UserVO;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;@Service
public class UserService {@Autowiredprivate UserMapper userMapper;public User login(String username,String password){LambdaQueryWrapper<User>queryWrapper=new LambdaQueryWrapper<>();queryWrapper.eq(User::getUsername,username).eq(User::getPassword,password);User user = userMapper.selectOne(queryWrapper);return user;}
}

TopicService.java

package com.young.service;import com.young.entity.Topic;
import com.young.exception.BusinessException;
import com.young.mapper.TopicMapper;
import com.young.vo.TrieTree;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;import javax.annotation.Resource;@Service
public class TopicService {@Autowiredprivate TopicMapper topicMapper;@Resourceprivate TrieTree trieTree;public boolean saveTopic(Topic topic){//判断是否有敏感词汇if (trieTree.match(topic.getTitle())||trieTree.match(topic.getContent())) {throw new BusinessException("发布内容中存在不当词汇,请遵守相关法律法规,营造良好的网络环境!!!");}return topicMapper.insert(topic)>0;}
}

SensitiveService.java,用于获取数据库中的敏感词汇表

package com.young.service;import com.young.entity.Sensitive;
import com.young.mapper.SensitiveMapper;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;import javax.annotation.PostConstruct;
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;@Service
public class SensitiveService {@Autowiredprivate SensitiveMapper sensitiveMapper;public List<Sensitive>getAllSensitive(){return sensitiveMapper.selectList(null);}public List<String>getAllSensitiveWord(){List<Sensitive> allSensitive = getAllSensitive();if (allSensitive!=null&&allSensitive.size()>0){return allSensitive.stream().map(sensitive -> sensitive.getWord()).collect(Collectors.toList());}return new ArrayList<>();}
}

TrieTreeConfig.java,创建TrieTree的相关bean,方便后续使用

package com.young.config;import com.young.service.SensitiveService;
import com.young.vo.TrieTree;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;import java.util.List;@Configuration
public class TrieTreeConfig {@Autowiredprivate SensitiveService sensitiveService;@Beanpublic TrieTree constructTrieTree(){System.out.println("初始化字典树======================");List<String> words = sensitiveService.getAllSensitiveWord();TrieTree trieTree=new TrieTree();for (String word : words) {trieTree.insert(word);}return trieTree;}
}

BusinessException.java

package com.young.exception;public class BusinessException extends RuntimeException{private String msg;public BusinessException(String msg){super(msg);}
}

GlobalExceptionHandler.java

package com.young.exception;import com.young.util.ResultVOUtil;
import com.young.vo.ResultVO;
import lombok.extern.slf4j.Slf4j;
import org.springframework.web.bind.annotation.ControllerAdvice;
import org.springframework.web.bind.annotation.ExceptionHandler;
import org.springframework.web.bind.annotation.RestControllerAdvice;@RestControllerAdvice
@Slf4j
public class GlobalExceptionHandler {@ExceptionHandler(BusinessException.class)public ResultVO businessExceptionHandler(BusinessException e){log.error("businessException:{}",e);return ResultVOUtil.fail(400,e.getMessage());}@ExceptionHandler(Exception.class)public ResultVO exceptionHandler(Exception e){log.error("exception:{}",e);return ResultVOUtil.fail(400,e.getMessage());}
}

相关的vo
在这里插入图片描述
ResultVOUtil.java

package com.young.util;import com.young.vo.ResultVO;public class ResultVOUtil <T>{public static <T> ResultVO<T> success(){return new ResultVO<>(200,"操作成功");}public static <T> ResultVO<T> success(T data){return new ResultVO<>(200,"操作成功",data);}public static <T> ResultVO<T> fail(){return new ResultVO<>(400,"操作失败");}public static <T> ResultVO<T> fail(Integer code,String msg){return new ResultVO<>(code,msg);}
}

DemoController.java,这里为了方便演示,用了session保存用户信息

package com.young.controller;import com.young.entity.Topic;
import com.young.entity.User;
import com.young.service.TopicService;
import com.young.service.UserService;
import com.young.util.ResultVOUtil;
import com.young.vo.ResultVO;
import com.young.vo.UserVO;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;import javax.servlet.http.HttpServletRequest;@RestController
@RequestMapping("/user")
public class DemoController {@Autowiredprivate UserService userService;@Autowiredprivate TopicService topicService;@PostMapping("/login")public ResultVO login(@RequestBody UserVO userVO, HttpServletRequest request){User user = userService.login(userVO.getUsername(), userVO.getPassword());if (user==null){return ResultVOUtil.fail(400,"用户名或密码错误");}request.getSession().setAttribute("user",user);return ResultVOUtil.success(user);}@PostMapping("/topic")public ResultVO addTopic(@RequestBody Topic topic,HttpServletRequest request){User user = (User)request.getSession().getAttribute("user");if (user==null){return ResultVOUtil.fail(400,"用户未登录");}topic.setUserId(user.getId());if (topicService.saveTopic(topic)){return ResultVOUtil.success();}return ResultVOUtil.fail(400,"发布话题失败");}
}

运行项目,登录用户
在这里插入图片描述
发布文章(包含敏感词汇)
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

http://www.lryc.cn/news/64105.html

相关文章:

  • USB转串口芯片CH9101U
  • Java语言介绍
  • 终于把 vue-router 运行原理讲明白了(二)!!!
  • ChatGPT实现服务器体验沙箱
  • 【算法】刷题中的位运算
  • 9.Java中异常处理机制是什么
  • GeoTools实战指南: 叠加GeoTIFF与Shapefile图层生成截图
  • nginx配置sh脚本远程执行一键安装
  • Excel表格成绩排名全攻略,让你事半功倍!
  • Docker 持久化存储 Bind mounts
  • LVS +Keepalived 高可用群集部署
  • Kafka调优
  • Debezium系列之:详细介绍Debezium2.X版本导出Sqlserver数据库Debezium JMX指标的方法
  • 基于PWM技术的三相光伏逆变器研究(Simulink)
  • 〖Python网络爬虫实战㉑〗- 数据存储之JSON操作
  • 不得不说的行为型模式-责任链模式
  • 基于dsp+fpga+AD+ENDAC的半导体运动台高速数据采集电路仿真设计(四)
  • 快速搭建Electron+Vite3+Vue3+TypeScript5脚手架 (无需梯子,快速安装Electron)
  • 语义分割学习笔记(二)转置卷积
  • docker运行PostgreSQL数据库维护,执行脚本备份数据库与更新表结构
  • 【计算机网络】127.0.0.1、0.0.0.0、localhost地址是什么?
  • 分享2款CSS3母亲节主题寄语文字动画特效
  • 【AutoGPT】AutoGPT出现,是否意味着ChatGPT已被淘汰
  • ( 字符串) 9. 回文数 ——【Leetcode每日一题】
  • SpringAOP
  • 学系统集成项目管理工程师(中项)系列15_质量管理
  • 统计学习方法第四章——朴素贝叶斯法
  • 安装配置goaccess实现可视化并实时监控nginx的访问日志
  • springboot第14集:MyBatis-CRUD讲解
  • ES6新特性(1)