> 文章列表 > ​ES elasticsearch-analysis-dynamic-synonym​连接数据库动态更新synonym近义词

​ES elasticsearch-analysis-dynamic-synonym​连接数据库动态更新synonym近义词

​ES elasticsearch-analysis-dynamic-synonym​连接数据库动态更新synonym近义词

 前言

        在很多搜索场景中,我们希望能够搜索出搜索词相关的目标,同时也希望能搜索出其近义词相关的目标。例如在商品搜索中,搜索“瓠瓜”,也希望能够搜索出“西葫芦”,但“西葫芦”商品名称因不含有“瓠瓜”,导致无法搜索出来。
        此时就需要将“瓠瓜”解析成“瓠瓜”和“西葫芦”,es的synonym,synonym gragh过滤器就是提供了该功能,将词转为近义词再分词
        如下,声明了一个将“瓠瓜”和“西葫芦”定义为近义词的分词器

 

// 定义自定义分词
PUT info_goods_v1/_settings
{"analysis": {"filter": {"my_synonyms": {"type": "synonym_graph","synonyms": ["瓠瓜,西葫芦"]}},"analyzer": {"my_analyzer": {"type": "custom","tokenizer": "ik_max_word","filter": ["lowercase","my_synonyms"]}}}
}// 使用“瓠瓜”分词
GET info_goods_v1/_analyze
{"analyzer": "my_analyzer","text": "瓠瓜"
}// 结果:
{"tokens" : [{"token" : "西葫芦","start_offset" : 0,"end_offset" : 2,"type" : "SYNONYM","position" : 0},{"token" : "瓠","start_offset" : 0,"end_offset" : 1,"type" : "CN_CHAR","position" : 0,"positionLength" : 2},{"token" : "葫芦","start_offset" : 0,"end_offset" : 2,"type" : "SYNONYM","position" : 1,"positionLength" : 2},{"token" : "瓜","start_offset" : 1,"end_offset" : 2,"type" : "CN_CHAR","position" : 2}]
}

        可以看到,“瓠瓜” 被分词成为了“西葫芦”,“葫芦”,“瓠”和“瓜”。这是因为在自定分词器中,我们将“瓠瓜”和“西葫芦”定义成了近义词“瓠瓜=》 瓠瓜,西葫芦”,相当于先将“瓠瓜”转为“瓠瓜”和“西葫芦”,再依次对近义词集合(也就是“瓠瓜”和“西葫芦”)分词得到结果。

        是不是被“瓠瓜” 和“西葫芦”弄晕了,不急缓一缓我们接着看...

        假如近义词发生了更新,我们该如何更新呢?一种方案是关闭索引,更新索引的分词器后再打开;或者可以借助elasticsearch-analysis-dynamic-synonym插件来动态更新,该插件提供了基于接口和文件的动态更新,但是没有提供基于数据库的。但是不要紧,我们可以稍稍修改一下就能达到我们的目的,这也是本文的主要内容。

        过程如下

修改源码实现连接数据库获取近义词汇

        下载elasticsearch-analysis-dynamic-synonym打开项目

一、修改pom.xml

        引入依赖

        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>8.0.21</version>
        </dependency>

        将版本修改成跟你的es版本号一样的,比如我的是7.17.7

<version>7.17.7</version>

二、 修改main/assemblies/plugin.xml

        在<dependencySets>标签下添加

        <dependencySet>
            <outputDirectory/>
            <useProjectArtifact>true</useProjectArtifact>
            <useTransitiveFiltering>true</useTransitiveFiltering>
            <includes>
                <include>mysql:mysql-connector-java</include>
            </includes>
        </dependencySet>

        在<assemble>标签下添加

    <fileSets>
        <fileSet>
            <directory>${project.basedir}/config</directory>
            <outputDirectory>config</outputDirectory>
        </fileSet>
    </fileSets>

三、jdbc配置文件

        在项目根目录下创建config/jdbc.properties文件,写入以下内容

jdbc.driver=com.mysql.cj.jdbc.Driver
jdbc.url=jdbc:mysql://cckg.liulingjie.cn:3306/test?useUnicode=true&characterEncoding=utf8&autoReconnect=true&useSSL=false&serverTimezone=Asia/Shanghai
jdbc.username=账号
jdbc.password=密码
#近义词sql查询语句。(注意要以words字段展示)
synonym.word.sql=SELECT `keys` AS words FROM es_synonym WHERE ifdel = '0'
#获取近义词最后更新时间,用来判断是否发生了更新。(注意要以maxModitime词汇显示)
synonym.lastModitime.sql=SELECT MAX(moditime) AS maxModitime FROM es_synonym
interval=10

  四、编写加载词汇类

        在com.bellszhu.elasticsearch.plugin.synonym.analysis包下,我们可以看到很多加载近义词汇的类,比如RemoteSynonymFile类就是通过接口来加载近义词词汇的。
        我们在该包下创建类DynamicSynonymFromDb,同时继承SynonymFile接口,该类是用来读取数据库的近义词汇的,代码如下:


/*** @author liulingjie* @date 2023/4/12 19:43*/
public class DynamicSynonymFromDb implements SynonymFile {/*** 配置文件名*/private final static String DB_PROPERTIES = "jdbc.properties";private static Logger logger = LogManager.getLogger("dynamic-synonym");private String format;private boolean expand;private boolean lenient;private Analyzer analyzer;private Environment env;/*** 动态配置类型*/private String location;/*** 作用类型*/private String group;private long lastModified;private Path conf_dir;private JdbcConfig jdbcConfig;DynamicSynonymFromDb(Environment env, Analyzer analyzer,boolean expand, boolean lenient, String format, String location, String group) {this.analyzer = analyzer;this.expand = expand;this.lenient = lenient;this.format = format;this.env = env;this.location = location;this.group = group;// 读取配置文件setJdbcConfig();// 加载驱动try {Class.forName(jdbcConfig.getDriver());} catch (ClassNotFoundException e) {e.printStackTrace();}// 判断是否需要加载isNeedReloadSynonymMap();}/*** 读取配置文件*/private void setJdbcConfig() {// 读取当前 jar 包存放的路径Path filePath = PathUtils.get(new File(DynamicSynonymPlugin.class.getProtectionDomain().getCodeSource().getLocation().getPath()).getParent(), "config").toAbsolutePath();this.conf_dir = filePath.resolve(DB_PROPERTIES);File file = conf_dir.toFile();Properties properties = null;try {properties = new Properties();properties.load(new FileInputStream(file));} catch (Exception e) {logger.error("load jdbc.properties failed");logger.error(e.getMessage());}jdbcConfig = new JdbcConfig(properties.getProperty("jdbc.driver"),properties.getProperty("jdbc.url"),properties.getProperty("jdbc.username"),properties.getProperty("jdbc.password"),properties.getProperty("synonym.word.sql"),properties.getProperty("synonym.lastModitime.sql"),Integer.valueOf(properties.getProperty("interval")));}/*** 加载同义词词典至SynonymMap中* @return SynonymMap*/@Overridepublic SynonymMap reloadSynonymMap() {try {logger.info("start reload local synonym from {}.", location);Reader rulesReader = getReader();SynonymMap.Builder parser = RemoteSynonymFile.getSynonymParser(rulesReader, format, expand, lenient, analyzer);return parser.build();} catch (Exception e) {logger.error("reload local synonym {} error!", e, location);throw new IllegalArgumentException("could not reload local synonyms file to build synonyms", e);}}/*** 判断是否需要进行重新加载* @return true or false*/@Overridepublic boolean isNeedReloadSynonymMap() {try {Long lastModify = getLastModify();if (lastModified < lastModify) {lastModified = lastModify;return true;}} catch (Exception e) {logger.error(e);}return false;}/*** 获取同义词库最后一次修改的时间* 用于判断同义词是否需要进行重新加载** @return getLastModify*/public Long getLastModify() {Connection connection = null;Statement statement = null;ResultSet resultSet = null;Long last_modify_long = null;try {connection = DriverManager.getConnection(jdbcConfig.getUrl(),jdbcConfig.getUsername(),jdbcConfig.getPassword());statement = connection.createStatement();resultSet = statement.executeQuery(jdbcConfig.getSynonymLastModitimeSql());while (resultSet.next()) {Timestamp last_modify_dt = resultSet.getTimestamp("maxModitime");last_modify_long = last_modify_dt.getTime();}} catch (SQLException e) {logger.error("获取同义词库最后一次修改的时间",e);} finally {try {if (resultSet != null) {resultSet.close();}if (statement != null) {statement.close();}if (connection != null) {connection.close();}} catch (SQLException e) {e.printStackTrace();}}return last_modify_long;}/*** 查询数据库中的同义词* @return DBData*/public ArrayList<String> getDBData() {ArrayList<String> arrayList = new ArrayList<>();Connection connection = null;Statement statement = null;ResultSet resultSet = null;try {connection = DriverManager.getConnection(jdbcConfig.getUrl(),jdbcConfig.getUsername(),jdbcConfig.getPassword());statement = connection.createStatement();String sql = jdbcConfig.getSynonymWordSql();if (group != null && !"".equals(group.trim())) {sql = String.format("%s AND `key_group` = '%s'", sql, group);}resultSet = statement.executeQuery(sql);while (resultSet.next()) {String theWord = resultSet.getString("words");arrayList.add(theWord);}} catch (SQLException e) {logger.error("查询数据库中的同义词异常",e);} finally {try {if (resultSet != null) {resultSet.close();}if (statement != null) {statement.close();}if (connection != null) {connection.close();}} catch (SQLException e) {e.printStackTrace();}}return arrayList;}/*** 同义词库的加载* @return Reader*/@Overridepublic Reader getReader() {StringBuffer sb = new StringBuffer();try {ArrayList<String> dbData = getDBData();for (int i = 0; i < dbData.size(); i++) {sb.append(dbData.get(i)).append(System.getProperty("line.separator"));}logger.info("load the synonym from db");} catch (Exception e) {logger.error("reload synonym from db failed:", e);}return new StringReader(sb.toString());}
}/*** 自己创建的配置类*//*** @author liulingjie* @date 2022/11/30 16:03*/
public class JdbcConfig {public JdbcConfig() {}public JdbcConfig(String driver, String url, String username, String password, String synonymWordSql, String synonymLastModitimeSql, Integer interval) {this.url = url;this.username = username;this.password = password;this.synonymWordSql = synonymWordSql;this.synonymLastModitimeSql = synonymLastModitimeSql;this.interval = interval;this.driver = driver;}/*** 驱动名*/private String driver;/*** 数据库url*/private String url;/*** 数据库账号*/private String username;/*** 数据库密码*/private String password;/*** 查询近义词汇的sql,注意是以words字段展示*/private String synonymWordSql;/*** 获取近义词最近更新时间的sql*/private String synonymLastModitimeSql;/*** 间隔,暂时无用*/private Integer interval;
}

        然后在DynamicSynonymTokenFilterFactory类的getSynonymFile方法添加如下代码

​ES elasticsearch-analysis-dynamic-synonym​连接数据库动态更新synonym近义词

         注意group字段是我自己加的,你们可以删除或者传空!!!

 五、打包

        最后点击 package 打包

​ES elasticsearch-analysis-dynamic-synonym​连接数据库动态更新synonym近义词

        在~\\target\\releases可以看到压缩包

​ES elasticsearch-analysis-dynamic-synonym​连接数据库动态更新synonym近义词

六、配置放入ES

        在es安装路径\\plugins下创建dynamic-synonym文件夹,将上面的压缩包解压放入该文件夹

​ES elasticsearch-analysis-dynamic-synonym​连接数据库动态更新synonym近义词

 

         最后重启es,可以看到以下内容

​ES elasticsearch-analysis-dynamic-synonym​连接数据库动态更新synonym近义词

七、尝试一下         

        然后,我们使用该过滤器类型。参考语句如下

POST info_goods/_close
PUT info_goods/_settings
{"analysis": {"filter": {"my_synonyms": {"type": "dynamic_synonym","synonyms_path": "fromDB","interval": 30    // 刷新间隔(秒)}},"analyzer": {"my_analyzer": {"type": "custom","tokenizer": "ik_max_word","filter": ["lowercase","my_synonyms"]}}}
}
POST info_goods/_open

           浅浅试一下

# 解析“瓠瓜”
GET info_goods/_analyze
{"analyzer": "my_analyzer","text": "瓠瓜"
}# 结果
{"tokens" : [{"token" : "西葫芦","start_offset" : 0,"end_offset" : 2,"type" : "SYNONYM","position" : 0},{"token" : "瓠","start_offset" : 0,"end_offset" : 1,"type" : "CN_CHAR","position" : 0,"positionLength" : 2},{"token" : "葫芦","start_offset" : 0,"end_offset" : 2,"type" : "SYNONYM","position" : 1,"positionLength" : 2},{"token" : "瓜","start_offset" : 1,"end_offset" : 2,"type" : "CN_CHAR","position" : 2}]
}

         

        有效果了!大功搞成!嘿嘿^_^

        知道你们懒,源码最终插件包已上传,你们看需下载吧^_^