I want to create a big inverted index of around 106 terms. What method would you suggest? I'm thinking in fast binary key store DBs like Tokyo cabinet, voldemort, etc. Edit: I've tried MySQL in the past for storing a table of two integers to represent the inverted index, but even with the first column having a db index, queries were very slow. I think for those situations a SQL database has too much overhead, overhead of transactions, query parsing, etc. I'm searching for what technologies or algorithmic approaches would scale while having good response times and performance. I'm rolling my own solution for research purposes.
这个问题有点模糊,所以我认为我能给出的唯一答案是:使用“广义倒排索引”(杜松子酒指数 http://www.postgresql.org/docs/8.4/static/gin-intro.html) 在 PostgreSQL 中创建您想要的任何类型的倒排索引。所有艰苦的工作都为您完成:它使用预写日志来保证崩溃安全,内部使用 btree 结构来提高性能,并且它是成熟数据库管理系统的一部分。
如果你的问题是全文搜索,那么 postgresql 的全文检索 http://www.postgresql.org/docs/8.4/static/textsearch.html已经为您构建并且可以在内部使用 GIN。
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)