Introduction to full-text retrieval
Basic introduction
Full-text retrieval refers to the retrieval of any content information in the whole book and article stored in the database. It can obtain the information of chapters, sections, paragraphs, sentences and words in the full text as needed, that is to say, it is similar to labeling every word in the whole book, and it can also carry out various statistics and analysis. For example, it can answer quickly? Dream of Red Mansions? Lin Daiyu? How many times did the problem appear?
relevant issues
Rooting (stem)
Token parser 1 meta grammar, 2 yuan grammar, n meta grammar.
participle
inverted index
Algorithm and search strategy model
Boolean Boolean type
Statistical model probability model
Vector space model vector basic model
Latent semantic model
Introduction to system retrieval
evaluation criteria
Two indexes to judge the retrieval effect:
Recall rate = amount of relevant information detected/total amount of relevant information (%)
Accuracy rate = relevant information detected/total information detected (%)
Open source full-text retrieval system
Apache sol
BaseX
Clusterpoint server (free license for a single server)
Data park search
ferret
Ht-//Dig
superstar
KinoSearch
Lemur/Indy
Full-text search engine
mnoGoSearch
sphinx
Swish-e
Shapian
Elastic search
The concept of theme optimization
Problems related to Chinese
Word segmentation (word segmentation)
grammatical analysis
Ancient books problem
Multilingual mixing
Optimize
stop words
part-of-speech tagging
Rights file (rights file)
Knowledge system, ontology
Page ranking technology
History and future trends
Free sentence search