elasticsearch filter插件开发初探
本文以7.10版本elasticsearch源码为示例
插件分类
elasticsearch把插件抽象成多个分类,不同分类的插件作用不同,具体分类可以看 org.elasticsearch.plugins.Plugin 类:
/* <ul>* <li>{@link ActionPlugin}* <li>{@link AnalysisPlugin}* <li>{@link ClusterPlugin}* <li>{@link DiscoveryPlugin}* <li>{@link IngestPlugin}* <li>{@link MapperPlugin}* <li>{@link NetworkPlugin}* <li>{@link RepositoryPlugin}* <li>{@link ScriptPlugin}* <li>{@link SearchPlugin}* <li>{@link ReloadablePlugin}* </ul>*/
public abstract class Plugin implements Closeable {// 省略。。。
}
插件类型介绍【官网】:https://www.elastic.co/guide/en/elasticsearch/plugins/master/intro.html
以下摘自网上:
- ActionPlugin:Rest api接口请求插件。开发者可以开发自身需要的rest命令,也可以对rest请求进行增加处理。如果Elasticsearch内置的命令如_all,cat,/cat/health等rest命令无法满足需求,开发者可以自己开发需要的rest命令。
- AnalysisPlugin:分析插件,扩展索引分析功能,用于增强ES自身分析功能的不足,例如大家熟知的IK分词插件。
- IngestPlugin:预处理插件。在数据索引之前进行预处理,例如根据ip来增加地理信息的geoip processor plugin。
- MapperPlugin:映射插件。增强ES的数据类型。
- NetworkPlugin:网络传输插件。
- ScriptPlugin:脚本插件。主要用于扩展ES的脚本功能,比如自定义方法打分,让ES支持其他脚本语言。
- SearchPlugin:查询插件。扩展ES的查询功能。
写一个filter插件
如果我们需要对文本做处理,那么我们写的插件应该定义成AnalysisPlugin类型;我们知道elasticsearch提供了很多内置的插件,可以看看这个类
org.elasticsearch.analysis.common.CommonAnalysisPlugin
这个类注册了很多常用的分析器、分词器、过滤器、分词过滤器,自定义插件可以学习里面的写法。
接下来我们来写一个对字符加密的插件,在参看了icu插件的部分源码(icu_normalizer)后,按照下面步骤:
新增一个 EncPlugin 类
package com.xx.plugin.es.enc;import com.xx.plugin.es.enc.token.EncCharTokenFilterFactory;
import org.elasticsearch.index.analysis.TokenFilterFactory;
import org.elasticsearch.indices.analysis.AnalysisModule;
import org.elasticsearch.plugins.AnalysisPlugin;
import org.elasticsearch.plugins.MapperPlugin;
import org.elasticsearch.plugins.Plugin;import java.util.HashMap;
import java.util.Map;public class EncPlugin extends Plugin implements AnalysisPlugin, MapperPlugin {@Overridepublic Map<String, AnalysisModule.AnalysisProvider<TokenFilterFactory>> getTokenFilters() {Map<String, AnalysisModule.AnalysisProvider<TokenFilterFactory>> extra = new HashMap<>();extra.put("enc_normalizer", EncCharTokenFilterFactory::new);return extra;}
}
这个类可以通过实现 AnalysisPlugin,重写方法向es返回CharFilters、TokenFilters、Tokenizers、Analyzers等。关于这几个概念的关系可以查看 :
-
【Elastic知识简报】normalizer与analyzer的区别】 https://developer.aliyun.com/article/1082061
-
【Elasticsearch中什么是 tokenizer、analyzer、filter】 https://www.cnblogs.com/a-du/p/16272901.html
EncCharTokenFilterFactory
package com.xx.plugin.es.enc.token;import org.apache.lucene.analysis.TokenStream;
import org.elasticsearch.common.logging.DeprecationLogger;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.env.Environment;
import org.elasticsearch.index.IndexSettings;
import org.elasticsearch.index.analysis.AbstractTokenFilterFactory;
import org.elasticsearch.index.analysis.NormalizingTokenFilterFactory;/* 参考:* org.elasticsearch.analysis.common.PatternReplaceCharFilterFactory* org.elasticsearch.analysis.common.MappingCharFilterFactory*/
public class EncCharTokenFilterFactory extends AbstractTokenFilterFactory implements NormalizingTokenFilterFactory {private static final DeprecationLogger deprecationLogger = DeprecationLogger.getLogger(EncCharTokenFilterFactory.class);private EncTokenNormalizer normalizer = null;public EncCharTokenFilterFactory(IndexSettings indexSettings, String name, Settings settings) {super(indexSettings, name, settings);normalizer = new EncTokenNormalizer();}public EncCharTokenFilterFactory(IndexSettings indexSettings, Environment environment, String name, Settings settings) {super(indexSettings, name, settings);}@Overridepublic TokenStream create(TokenStream tokenStream) {return new EncTokenFilter(tokenStream, normalizer);}
}
EncTokenNormalizer
一个空类,逻辑后续再实现。
package com.xx.plugin.es.enc.token;public class EncTokenNormalizer {
}
EncTokenFilter
package com.xx.plugin.es.enc.token;import org.apache.lucene.analysis.TokenFilter;
import org.apache.lucene.analysis.TokenStream;import java.io.IOException;public class EncTokenFilter extends TokenFilter {private EncTokenNormalizer normalizer;@Overridepublic boolean incrementToken() throws IOException {System.out.println(">>>>>>>>>>>");// 根据实际情况实现return false;}public EncTokenFilter(TokenStream input, EncTokenNormalizer normalizer) {super(input);this.normalizer = normalizer;}
}
配置文件
在 resources 目录下新增两个配置文件
- plugin-descriptor.properties,这里使用 maven-assembly-plugin 插件,文件里变量定义在pom中
classname=${elasticsearch.plugin.classname}
name=${elasticsearch.plugin.name}
description=${project.description}
version=${project.version}
elasticsearch.version=${elasticsearch.version}
java.version=${maven.compiler.target}
- plugin-security.policy 这个是插件申请权限的配置。以下是示例,根据自己的实际情况设置
grant {permission java.security.AllPermission;
};
在assembly目录新增打包配置文件,打包成zip文件
<?xml version="1.0"?>
<assembly><dependencySets><dependencySet><outputDirectory>enc-filter</outputDirectory><useProjectArtifact>true</useProjectArtifact><useTransitiveFiltering>true</useTransitiveFiltering></dependencySet></dependencySets><fileSets><fileSet><directory>${project.basedir}/config</directory><outputDirectory>config</outputDirectory></fileSet></fileSets><files><file><fileMode>0755</fileMode><filtered>true</filtered><outputDirectory>enc-filter/</outputDirectory><source>${project.basedir}/src/main/resources/plugin-descriptor.properties</source></file><file><fileMode>0755</fileMode><filtered>true</filtered><outputDirectory>enc-filter/</outputDirectory><source>${project.basedir}/src/main/resources/plugin-security.policy</source></file></files><formats><format>zip</format></formats><id>plugin-develop</id><includeBaseDirectory>false</includeBaseDirectory>
</assembly>
目录结构如下
打包
注意:plugin-descriptor.properties和plugin-security.policy 不能 打进zip包
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"><modelVersion>4.0.0</modelVersion><groupId>org.example</groupId><artifactId>enc-plugin</artifactId><version>1.0-SNAPSHOT</version><properties><maven.compiler.source>14</maven.compiler.source><maven.compiler.target>14</maven.compiler.target><elasticsearch.version>7.10.1</elasticsearch.version><project.build.sourceEncoding>UTF-8</project.build.sourceEncoding><project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding><elasticsearch.plugin.classname>com.xx.plugin.es.enc.EncPlugin</elasticsearch.plugin.classname><elasticsearch.plugin.name>enc_plugin</elasticsearch.plugin.name><project.description>this is a test</project.description></properties><dependencies><dependency><groupId>org.elasticsearch</groupId><artifactId>elasticsearch</artifactId><version>${elasticsearch.version}</version><scope>provided</scope></dependency></dependencies><build><plugins><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-assembly-plugin</artifactId><version>2.6</version><configuration><appendAssemblyId>false</appendAssemblyId><descriptors><descriptor>${basedir}/src/main/assembly/plugin.xml</descriptor></descriptors><outputDirectory>${project.build.directory}/releases/</outputDirectory></configuration><executions><execution><goals><goal>single</goal></goals><phase>package</phase></execution></executions></plugin><plugin><artifactId>maven-compiler-plugin</artifactId><groupId>org.apache.maven.plugins</groupId><version>3.8.1</version><configuration><encoding>${project.build.sourceEncoding}</encoding><source>${maven.compiler.target}</source><target>${maven.compiler.target}</target></configuration></plugin></plugins><resources><resource><directory>src/main/resources</directory><excludes><exclude>*.properties</exclude><exclude>*.policy</exclude></excludes><filtering>false</filtering></resource></resources></build></project>
安装
todo
验证
todo
本文自定义插件demo上传到了 【gitee】 https://gitee.com/aqu415/enc-plugin
参考:
https://cloud.tencent.com/developer/article/2213726