solr导入mysql数据内存不足
在solr使用dataImport时,在测试机上由于内存太小,创建索引时不成功.
###开启batchSize
在data-config.xml文件的dataSource中加入batchSize=”-1”的配置.参考http://wiki.apache.org/solr/DataImportHandlerFaq
I’m using DataImportHandler with a MySQL database. My table is huge and DataImportHandler is going out of memory. Why does DataImportHandler bring everything to memory?
DataImportHandler is designed to stream row one-by-one. It passes a fetch size value (default: 500) to Statement#setFetchSize which some drivers do not honor. For MySQL, add batchSize property to dataSource configuration with value -1. This will pass Integer.MIN_VALUE to the driver as the fetch size and keep it from going out of memory for large tables.
Should look like:
1 | <dataSource type="JdbcDataSource" name="ds-2" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:8889/mysqldatabase" batchSize="-1" user="root" password="root"/> |
###分批处理
在data-config.xml的query节点使用limit来分批处理数据,比如
1 | query="select * from tb_content limit ${dataimporter.request.begin},50000 " |
使用了一个begin的参数来每次分批处理50000条记录.
然后访问的连接参数,第一次clean=true清理旧的索引,后面的clean=false不清理索引.访问连接如下
1 | http://localhost:8080/solr/core1/dataimport?wt=json&commit=true&clean=true&command=full-import&begin=0 |
分批建立索引php脚本
1 | <?php |
solr导入mysql数据内存不足