在做一个全文搜索的工具,用到 whoosh, 但是对于短语的搜索,不是很明白要怎么用, 请大神指教一下。
http://whoosh.readthedocs.org/en/latest/quickstart.html>>> from whoosh.index import create_in
>>> from whoosh.fields import *
>>> schema = Schema(title=TEXT(stored=True), path=ID(stored=True), content=TEXT)
>>> ix = create_in("indexdir", schema)
>>> writer = ix.writer()
>>> writer.add_document(title=u"First document", path=u"/a",
... content=u"This is the first document we've added!")
>>> writer.add_document(title=u"Second document", path=u"/b",
... content=u"The second one is even more interesting!")
>>> writer.commit()
>>> from whoosh.qparser import QueryParser
>>> with ix.searcher() as searcher:
... query = QueryParser("content", ix.schema).parse("first")
... results = searcher.search(query)
... results[0]
我用了这样的代码, 但是之后发现 parse("first") 里面跟的这个会被拆分成若干个独立的 word, 所以如果我用 parse("first man to buy a apple") 这样会找到各个字段对应的, 而不是这一条短语。
然后我看到了 QueryParser 里面,是有这个 parameter 的,不知道是不是可以达到我的目的,但是我不知道要怎么拿来用:
phraseclass – the query class to use for phrases. The default is whoosh.query.Phrase.
class whoosh.query.Phrase(fieldname, words, slop=1, boost=1.0, char_ranges=None)
Matches documents containing a given phrase.
