MongoDB+Elasticsearchで全文検索をする(リトライ)
前回の続き。
kuromojiが動かないのはバージョンの組み合わせの問題?
Elasticsearchは最新版(1.2.1)だから、kuromojiも最新版なら動くと思ったのだけど。
elasticsearch/elasticsearch-analysis-kuromoji · GitHub
In order to install the plugin, simply run: bin/plugin -install elasticsearch/elasticsearch-analysis-kuromoji/2.1.0.
kuromojiのトップでは2.1.0が最新らしい……と思いきや、その下に2.2.0があった!
Elasticsearch 1.2系用のブランチページも、
elasticsearch/elasticsearch-analysis-kuromoji at es-1.2 · GitHub
In order to install the plugin, simply run: bin/plugin -install elasticsearch/elasticsearch-analysis-kuromoji/2.2.0.
2.2.0になってる。
結局、直し忘れかよ(っω-`。)
'simply run'とか書いてあったからそのままコピペしてたわ。
kuromojiを入れ直す
$ plugin --remove elasticsearch/elasticsearch-analysis-kuromoji/2.1.0
-> Removing elasticsearch/elasticsearch-analysis-kuromoji/2.1.0
Removed elasticsearch/elasticsearch-analysis-kuromoji/2.1.0
消して、
$ plugin --install elasticsearch/elasticsearch-analysis-kuromoji/2.2.0
-> Installing elasticsearch/elasticsearch-analysis-kuromoji/2.2.0...
Trying http://download.elasticsearch.org/elasticsearch/elasticsearch-analysis-kuromoji/elasticsearch-analysis-kuromoji-2.2.0.zip...
Downloading ...(略)...DONE
Installed elasticsearch/elasticsearch-analysis-kuromoji/2.2.0 into /usr/local/Cellar/elasticsearch/1.2.1/plugins/analysis-kuromoji
入れる。
再びkuromojiで日本語解析してみる
$ curl -XGET 'http://localhost:9200/sandbox/_analyze?analyzer=kuromoji&pretty' -d '貧乳はステータスだ!希少価値だ!'
{
"tokens" : [ {
"token" : "貧",
"start_offset" : 0,
"end_offset" : 1,
"type" : "word",
"position" : 1
}, {
"token" : "乳",
"start_offset" : 1,
"end_offset" : 2,
"type" : "word",
"position" : 2
}, {
"token" : "ステータス",
"start_offset" : 3,
"end_offset" : 8,
"type" : "word",
"position" : 4
}, {
"token" : "希少",
"start_offset" : 10,
"end_offset" : 12,
"type" : "word",
"position" : 6
}, {
"token" : "価値",
"start_offset" : 12,
"end_offset" : 14,
"type" : "word",
"position" : 7
} ]
}
おお、今度は成功した。
しかし貧乳という言葉は知らないらしい。無知なやつめ。
MongoDBのデータと連携させる
連携情報の登録
$ curl -XPUT 'http://localhost:9200/_river/mongodb/_meta' -d '{
"type": "mongodb",
"mongodb": {
"db": "test",
"collection": "pages"
},
"index": {
"name": "test",
"type": "pages"
}
}'
ElasticsearchでいうIndex NameはMongoDBのデータベース、Index TypeはMongoDBのコレクション、という考え方でいいのではないかと。
{"_index":"_river","_type":"mongodb","_id":"_meta","_version":1,"created":true}%
__my_preexec_end_timetrack:local:3: not valid in this context: "type":
しかし、実行したらエラー。
[2014-06-12 09:18:38,610][WARN ][river ] [Wyatt Wingfoot] failed to create river [mongodb][mongodb]
org.elasticsearch.common.settings.NoClassSettingsException: Failed to load class with value [mongodb]
at org.elasticsearch.river.RiverModule.loadTypeModule(RiverModule.java:87)
at org.elasticsearch.river.RiverModule.spawnModules(RiverModule.java:58)
at org.elasticsearch.common.inject.ModulesBuilder.add(ModulesBuilder.java:44)
at org.elasticsearch.river.RiversService.createRiver(RiversService.java:137)
at org.elasticsearch.river.RiversService$ApplyRivers$2.onResponse(RiversService.java:275)
at org.elasticsearch.river.RiversService$ApplyRivers$2.onResponse(RiversService.java:269)
at org.elasticsearch.action.support.TransportAction$ThreadedActionListener$1.run(TransportAction.java:93)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.ClassNotFoundException: mongodb
at java.net.URLClassLoader$1.run(URLClassLoader.java:359)
at java.net.URLClassLoader$1.run(URLClassLoader.java:348)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:347)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at org.elasticsearch.river.RiverModule.loadTypeModule(RiverModule.java:73)
... 9 more
[2014-06-12 09:18:38,623][INFO ][cluster.metadata ] [Wyatt Wingfoot] [_river] update_mapping [mongodb] (dynamic)
サーバー側にもスタックトレースが出てる。
ここで気になるIssueが
MongoDB 2.6 support · Issue #265 · richardwilly98/elasticsearch-river-mongodb · GitHub
どうやら、MongoDB 2.6系にはまだ対応していないらしい。
また、バージョンの問題か……
MongoDBのダウングレード
というわけで、MongoDBのバージョンを下げます。
$ mkdir backup_2014-06-12
$ mongodump --out backup_2014-06-12
...
今のデータをバックアップして、
$ pkill mongod
$ brew uninstall mongo
現在のMongoDBを削除。
$ brew versions mongo
Warning: brew-versions is unsupported and may be removed soon.
Please use the homebrew-versions tap instead:
https://github.com/Homebrew/homebrew-versions
2.6.1 git checkout db4fb7f /usr/local/Library/Formula/mongodb.rb
2.6.0 git checkout 1ccbeaa /usr/local/Library/Formula/mongodb.rb
2.4.10 git checkout 10e8328 /usr/local/Library/Formula/mongodb.rb
...
2.4系の最新版にしてみる。
$ cd /usr/local
$ git checkout 10e8328 /usr/local/Library/Formula/mongodb.rb
$ brew install mongodb
...
$ mongo --version
MongoDB shell version: 2.4.10
インストールできた。
$ launchctl load ~/Library/LaunchAgents/homebrew.mxcl.mongodb.plist
起動させて、
$ cd ~/work
$ mongorestore --drop 2014-06-12
...
データのリカバリも完了。
Elasticsearchのダウングレード
これで行けると思ってElasticsearchを起動させたら、以下のエラーで起動しなくなった。
[2014-06-12 17:44:44,434][INFO ][plugins ] [Ultragirl] loaded [mongodb-river, marvel, analysis-kuromoji], sites [marvel, head, river-mongodb]
{1.2.1}: Initialization Failed ...
- ExecutionError[java.lang.NoClassDefFoundError: org/elasticsearch/rest/XContentRestResponse]
NoClassDefFoundError[org/elasticsearch/rest/XContentRestResponse]
ClassNotFoundException[org.elasticsearch.rest.XContentRestResponse]
前回最初に出たエラーと同じだ。
というわけで、仕方なくElasticsearchもダウングレード。
$ brew uninstall elasticsearch
Uninstalling /usr/local/Cellar/elasticsearch/1.2.1...
一度消して、
$ brew versions elasticsearch
Warning: brew-versions is unsupported and may be removed soon.
Please use the homebrew-versions tap instead:
https://github.com/Homebrew/homebrew-versions
1.2.1 git checkout be34fbb Library/Formula/elasticsearch.rb
1.2.0 git checkout 53d3a63 Library/Formula/elasticsearch.rb
1.1.1 git checkout e81e0c2 Library/Formula/elasticsearch.rb
1.1.0 git checkout c7f653b Library/Formula/elasticsearch.rb
1.0.1 git checkout 9b8103f Library/Formula/elasticsearch.rb
1.0.0 git checkout 1fb5dda Library/Formula/elasticsearch.rb
...
バージョン一覧を確認。
$ cd /usr/local
$ git checkout 1fb5dda Library/Formula/elasticsearch.rb
1.0.0をチェックアウトして、
$ brew install elasticsearch
==> Downloading https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.0.0.tar.gz
######################################################################## 100.0%
==> Caveats
Data: /usr/local/var/elasticsearch/elasticsearch_hentai-kun/
Logs: /usr/local/var/log/elasticsearch/elasticsearch_hentai-kun.log
Plugins: /usr/local/var/lib/elasticsearch/plugins/
To have launchd start elasticsearch at login:
ln -sfv /usr/local/opt/elasticsearch/*.plist ~/Library/LaunchAgents
Then to load elasticsearch now:
launchctl load ~/Library/LaunchAgents/homebrew.mxcl.elasticsearch.plist
Or, if you don't want/need launchctl, you can just run:
elasticsearch --config=/usr/local/opt/elasticsearch/config/elasticsearch.yml
==> Summary
🍺 /usr/local/Cellar/elasticsearch/1.0.0: 31 files, 20M, built in 36 seconds
インストールできた。
$ vi /usr/local/Cellar/elasticsearch/1.0.0/config/elasticsearch.yml
cluster.name: elasticsearch_sandbox
クラスタ名を変えて、
$ plugin --install elasticsearch/elasticsearch-analysis-kuromoji/2.0.0
$ plugin --install elasticsearch/elasticsearch-mapper-attachments/2.0.0
$ plugin --install com.github.richardwilly98.elasticsearch/elasticsearch-river-mongodb/2.0.0
プラグインを入れ直す。
$ elasticsearch
[2014-06-12 17:56:49,798][INFO ][node ] [Namor the Sub-Mariner] version[1.0.0], pid[31622], build[a46900e/2014-02-12T16:18:34Z]
[2014-06-12 17:56:49,799][INFO ][node ] [Namor the Sub-Mariner] initializing ...
[2014-06-12 17:56:49,839][INFO ][plugins ] [Namor the Sub-Mariner] loaded [mongodb-river, analysis-kuromoji, mapper-attachments], sites [river-mongodb]
[2014-06-12 17:56:52,616][INFO ][node ] [Namor the Sub-Mariner] initialized
[2014-06-12 17:56:52,617][INFO ][node ] [Namor the Sub-Mariner] starting ...
[2014-06-12 17:56:52,791][INFO ][transport ] [Namor the Sub-Mariner] bound_address {inet[/127.0.0.1:9300]}, publish_address {inet[/127.0.0.1:9300]}
[2014-06-12 17:56:55,833][INFO ][cluster.service ] [Namor the Sub-Mariner] new_master [Namor the Sub-Mariner][Egx-oSyVQZy3f0Ump0RcRg][Air.local][inet[/127.0.0.1:9300]], reason: zen-disco-join (elected_as_master)
[2014-06-12 17:56:55,860][INFO ][discovery ] [Namor the Sub-Mariner] elasticsearch_sandbox/Egx-oSyVQZy3f0Ump0RcRg
[2014-06-12 17:56:55,881][INFO ][http ] [Namor the Sub-Mariner] bound_address {inet[/127.0.0.1:9200]}, publish_address {inet[/127.0.0.1:9200]}
[2014-06-12 17:56:55,905][INFO ][gateway ] [Namor the Sub-Mariner] recovered [0] indices into cluster_state
[2014-06-12 17:56:55,906][INFO ][node ] [Namor the Sub-Mariner] started
そして動作確認。
$ curl -XPUT 'http://localhost:9200/sandbox'
{"acknowledged":true}% ➜ ~ curl -XGET 'http://localhost:9200/sandbox/_analyze?analyzer=kuromoji&pretty' -d '貧乳はステータスだ!希少価値だ!'
$ curl -XGET 'http://localhost:9200/sandbox/_analyze?analyzer=kuromoji&pretty' -d '貧乳はステータスだ!希少価値だ!'
{
"tokens" : [ {
"token" : "貧",
"start_offset" : 0,
"end_offset" : 1,
"type" : "word",
"position" : 1
}, {
"token" : "乳",
"start_offset" : 1,
"end_offset" : 2,
"type" : "word",
"position" : 2
}, {
"token" : "ステータス",
"start_offset" : 3,
"end_offset" : 8,
"type" : "word",
"position" : 4
}, {
"token" : "希少",
"start_offset" : 10,
"end_offset" : 12,
"type" : "word",
"position" : 6
}, {
"token" : "価値",
"start_offset" : 12,
"end_offset" : 14,
"type" : "word",
"position" : 7
} ]
}
kuromojiも使えた。
再びMongoDB連携
さて、次のMongoDB連携は……
$ curl -XPUT 'http://localhost:9200/_river/mongodb/_meta' -d '{
"type": "mongodb",
"mongodb": {
"db": "test",
"collection": "pages"
},
"index": {
"name": "test",
"type": "pages"
}
}'
{"_index":"_river","_type":"mongodb","_id":"_meta","_version":1,"created":true}% __my_preexec_end_timetrack:local:3: not valid in this context: "type":
やっと成功した!
[2014-06-12 18:02:28,144][INFO ][cluster.metadata ] [Jack Frost] [_river] creating index, cause [auto(index api)], shards [1]/[1], mappings []
[2014-06-12 18:02:28,338][INFO ][cluster.metadata ] [Jack Frost] [_river] update_mapping [mongodb] (dynamic)
[2014-06-12 18:02:29,371][INFO ][river.mongodb ] Parse river settings for mongodb
[2014-06-12 18:02:29,397][INFO ][org.elasticsearch.river.mongodb.MongoDBRiver] Starting river mongodb
[2014-06-12 18:02:29,409][INFO ][cluster.metadata ] [Jack Frost] [_river] update_mapping [mongodb] (dynamic)
[2014-06-12 18:02:29,410][INFO ][org.elasticsearch.river.mongodb.MongoDBRiver] MongoDB River Plugin - version[2.0.0] - hash[a0c23f1] - time[2014-02-23T20:40:05Z]
[2014-06-12 18:02:29,410][INFO ][org.elasticsearch.river.mongodb.MongoDBRiver] starting mongodb stream. options: secondaryreadpreference [false], drop_collection [false], include_collection [], throttlesize [5000], gridfs [false], filter [null], db [test], collection [pages], script [null], indexing to [test]/[pages]
[2014-06-12 18:02:29,484][INFO ][cluster.metadata ] [Jack Frost] [test] creating index, cause [api], shards [5]/[1], mappings []
[2014-06-12 18:02:30,165][INFO ][org.elasticsearch.river.mongodb.MongoDBRiver] MongoDB version - 2.4.10
[2014-06-12 18:02:30,185][INFO ][cluster.metadata ] [Jack Frost] [_river] update_mapping [mongodb] (dynamic)
[2014-06-12 18:02:30,228][INFO ][org.elasticsearch.river.mongodb.Slurper] MongoDBRiver is beginning initial import of test.pages
[2014-06-12 18:02:32,016][INFO ][org.elasticsearch.river.mongodb.Slurper] Collection pages - count: 1825569
[2014-06-12 18:02:32,142][INFO ][cluster.metadata ] [Jack Frost] [test] update_mapping [pages] (dynamic)
[2014-06-12 18:02:32,320][INFO ][cluster.metadata ] [Jack Frost] [test] update_mapping [pages] (dynamic)
[2014-06-12 18:02:34,609][INFO ][cluster.metadata ] [Jack Frost] [test] update_mapping [pages] (dynamic)
サーバー側も動きだして、CPUも本気だし始めた。
デフォルトでkuromojiを使うように設定変更
$ vi /usr/local/Cellar/elasticsearch/1.0.0/config/elasticsearch.yml
index.analysis.analyzer.default.type: custom
index.analysis.analyzer.default.tokenizer: kuromoji_tokenizer
この2行を追加。
$ curl -XDELETE 'http://localhost:9200/_river/mongodb/'
$ curl -XPUT 'http://localhost:9200/_river/mongodb/_meta' -d '{
"type": "mongodb",
"mongodb": {
"db": "test",
"collection": "pages"
},
"index": {
"name": "test",
"type": "pages"
}
}'
(する必要があるかわからないけど)一度Indexを削除して、再作成。
検索してみる
$ echo 貧乳 | nkf -WwMQ | tr = %
%E8%B2%A7%E4%B9%B3
$ curl -XGET 'http://localhost:9200/test/_search?q=title:%E8%B2%A7%E4%B9%B3&pretty'
(えろえろわーどなので自粛)
MongoDBのpagesコレクションから、titleに貧乳が含まれている一覧を検索。
それっぽい結果が出た。しかも速い!
Elasticsearch 1.1.1にしてみる
せっかくなので、1.1.1でもできるか試してみた。
$ git checkout e81e0c2 Library/Formula/elasticsearch.rb
$ brew uninstall elasticsearch
$ brew install elasticsearch
$ plugin -install elasticsearch/elasticsearch-analysis-kuromoji/2.0.0
-> Installing elasticsearch/elasticsearch-analysis-kuromoji/2.0.0...
Failed to install elasticsearch/elasticsearch-analysis-kuromoji/2.0.0, reason: plugin directory /usr/local/var/lib/elasticsearch/plugins/analysis-kuromoji already exists. To update the plugin, uninstall it first using -remove elasticsearch/elasticsearch-analysis-kuromoji/2.0.0 command
Pluginはインストール済みになってる?
$ plugin --list
Installed plugins:
- analysis-kuromoji
- mapper-attachments
- river-mongodb
確かに入っている。
どうやらプラグインのインストール先が、
- 1.2.1 /usr/local/Cellar/elasticsearch/1.2.1/plugins/
- 1.1.1 /usr/local/var/lib/elasticsearch/plugins/
- 1.0.0 /usr/local/var/lib/elasticsearch/plugins/
1.2系から変わっているらしい。
ついでに公式配布の管理用のプラグインも入れる。
$ plugin --install elasticsearch/marvel/latest
-> Installing elasticsearch/marvel/latest...
Trying http://download.elasticsearch.org/elasticsearch/marvel/marvel-latest.zip...
Downloading ...(略)...DONE
Installed elasticsearch/marvel/latest into /usr/local/var/lib/elasticsearch/plugins/marvel
管理画面へは、
http://localhost:9200/_plugin/marvel/
にアクセスで見れる。画面がカッコイイ。
$ vi /usr/local/Cellar/elasticsearch/1.1.1/config/elasticsearch.yml
cluster.name: elasticsearch_sandbox
...
index.analysis.analyzer.default.type: custom
index.analysis.analyzer.default.tokenizer: kuromoji_tokenizer
そして、起動
$ elasticsearch
起動できた。
$ curl -XPUT 'http://localhost:9200/_river/mongodb/_meta' -d '{
"type": "mongodb",
"mongodb": {
"db": "test",
"collection": "pages"
},
"index": {
"name": "test",
"type": "pages"
}
}'
MongoDBのIndexも作れた。
$ echo 貧乳 | nkf -WwMQ | tr = %
%E8%B2%A7%E4%B9%B3
$ curl -XGET 'http://localhost:9200/test/_search?q=title:%E8%B2%A7%E4%B9%B3&pretty'
(再び、えろえろわーどなので自粛)
ちゃんと見れた。
結果としては
MongoDB River Plugin for ElasticSearch 2.0.0で対応できているのは、
- Elasticsearch 1.1.1
- MongoDB 2.4.10
のバージョンまでということみたい。
追記:このバージョンで一見動きますが、安定動作はしないです。詳しくは別記事で
Node.jsから使う
疲れたので、次回。