#normalization
GET _analyze
{"text":"Mr. Ma is an excellent teacher","analyzer":"english"}
2 字符过滤器(character filter):分词之前的预处理,过滤无用字符
HTML Strip Character Filter:html_strip
参数:escaped_tags 需要保留的html标签
##HTML Strip Character Filter
###测试数据<p>I'm so <a>happy</a>!</p>DELETE my_index
PUT my_index
{"settings":{"analysis":{"char_filter":{"my_char_filter":{"type":"html_strip","escaped_tags":["a"]}},"analyzer":{"my_analyzer":{"tokenizer":"keyword","char_filter":["my_char_filter"]}}}}}
Mapping Character Filter:type mapping
##Mapping Character Filter
DELETE my_index
PUT my_index
{"settings":{"analysis":{"char_filter":{"my_char_filter":{"type":"mapping","mappings":["滚 => *","垃 => *","圾 => *"]}},"analyzer":{"my_analyzer":{"tokenizer":"keyword","char_filter":["my_char_filter"]}}}}}GET my_index/_analyze
{"analyzer":"my_analyzer","text":"你就是个垃圾!滚"}
Pattern Replace Character Filter:type pattern_replace
##Pattern Replace Character Filter
#17611001200DELETE my_index
PUT my_index
{"settings":{"analysis":{"char_filter":{"my_char_filter":{"type":"pattern_replace","pattern":"(\\d{3})\\d{4}(\\d{4})","replacement":"$1****$2"}},"analyzer":{"my_analyzer":{"tokenizer":"keyword","char_filter":["my_char_filter"]}}}}}GET my_index/_analyze
{"analyzer":"my_analyzer","text":"您的手机号是17611001200"}
#自定义分词器
DELETE custom_analysis
PUT custom_analysis
{"settings":{"analysis":{"char_filter":{"my_char_filter":{"type":"mapping","mappings":["& => and","| => or"]},"html_strip_char_filter":{"type":"html_strip","escaped_tags":["a"]}},"filter":{"my_stopword":{"type":"stop","stopwords":["is","in","the","a","at","for"]}},"tokenizer":{"my_tokenizer":{"type":"pattern","pattern":"[ ,.!?]"}},"analyzer":{"my_analyzer":{"type":"custom","char_filter":["my_char_filter","html_strip_char_filter"],"filter":["my_stopword","lowercase"],"tokenizer":"my_tokenizer"}}}}}GET custom_analysis/_analyze
{"analyzer":"my_analyzer","text":["What is ,<a>as.df</a> ss<p> in ? &</p> | is ! in the a at for "]}