public class SmartChineseLuceneAnalyzer extends AbstractBookAnalyzer
SmartChineseAnalyzer, which takes overlapping
two character tokenization approach which leads to larger index size, like
org.apache.lucene.analyzer.cjk.CJKAnalyzer. This analyzer's stop list
is merely of punctuation. It does stemming of English.The GNU Lesser General Public License for details.| Modifier and Type | Field and Description |
|---|---|
private org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer |
myAnalyzer |
book, doStemming, doStopWords, stopSet| Constructor and Description |
|---|
SmartChineseLuceneAnalyzer() |
| Modifier and Type | Method and Description |
|---|---|
org.apache.lucene.analysis.TokenStream |
reusableTokenStream(String fieldName,
Reader reader) |
org.apache.lucene.analysis.TokenStream |
tokenStream(String fieldName,
Reader reader) |
getBook, getDoStopWords, setBook, setDoStemming, setDoStopWords, setStopWordspublic final org.apache.lucene.analysis.TokenStream tokenStream(String fieldName, Reader reader)
tokenStream in class org.apache.lucene.analysis.Analyzerpublic final org.apache.lucene.analysis.TokenStream reusableTokenStream(String fieldName, Reader reader) throws IOException
reusableTokenStream in class org.apache.lucene.analysis.AnalyzerIOException