Tuesday, February 17, 2015

Configuring Cloudera Search to use local directory indexes

If you are using Cloudera Search, the SOLR collections that are created using Solrctl commands create collections that are backed in HDFS (org.apache.solr.core.HdfsDirectoryFactory) by default. If you want certain collections to use local indexes instead, then you can do the following:

1. Log in to one of the SOLR nodes and generate a new instance directory locally using the solrctl command:


solrctl instancedir —generate collectionConfigDir

The above command will generate the folder 'collectionConfigDir' with default configuration
2. Open the solrconfig.xml in the editor and make the following changes:

  • Comment out the section related to <directoryFactory> - The default generated config uses the HDFSDirectoryFactory. We can add a new section for the directoryFactory to use solr.NRTCachingDirectoryFactory


<directoryFactory name="DirectoryFactory" class="solr.NRTCachingDirectoryFactory">
</directoryFactory>

<!--  <directoryFactory name="DirectoryFactory" class="org.apache.solr.core.HdfsDirectoryFactory">
    <str name="solr.hdfs.home">${solr.hdfs.home:}</str>
    <str name="solr.hdfs.confdir">${solr.hdfs.confdir:}</str>
    <str name="solr.hdfs.security.kerberos.enabled">${solr.hdfs.security.kerberos.enabled:false}</str>
    <str name="solr.hdfs.security.kerberos.keytabfile">${solr.hdfs.security.kerberos.keytabfile:}</str>
    <str name="solr.hdfs.security.kerberos.principal">${solr.hdfs.security.kerberos.principal:}</str>
    <bool name="solr.hdfs.blockcache.enabled">${solr.hdfs.blockcache.enabled:true}</bool>
    <int name="solr.hdfs.blockcache.slab.count">${solr.hdfs.blockcache.slab.count:1}</int>
    <bool name="solr.hdfs.blockcache.direct.memory.allocation">${solr.hdfs.blockcache.direct.memory.allocation:true}</bool>
    <bool name="solr.hdfs.blockcache.direct.memory.allocation">false</bool>
    <int name="solr.hdfs.blockcache.blocksperbank">${solr.hdfs.blockcache.blocksperbank:16384}</int> 
    <bool name="solr.hdfs.blockcache.read.enabled">${solr.hdfs.blockcache.read.enabled:true}</bool>
    <bool name="solr.hdfs.blockcache.write.enabled">${solr.hdfs.blockcache.write.enabled:true}</bool>
    <bool name="solr.hdfs.nrtcachingdirectory.enable">${solr.hdfs.nrtcachingdirectory.enable:true}</bool>
    <int name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">${solr.hdfs.nrtcachingdirectory.maxmergesizemb:16}</int>
    <int name="solr.hdfs.nrtcachingdirectory.maxcachedmb">${solr.hdfs.nrtcachingdirectory.maxcachedmb:192}</int>
  </directoryFactory> 
-->
  • Specify the solr data directory to use a local directory on the server. Also, ensure that the folder exists with required write permissions for the user running solr.

<!-- <dataDir>${solr.data.dir:}</dataDir>-->
<dataDir>/solr</dataDir>

  •  Change the lock type to use 'simple' instead of 'hdfs', which is set using this property 'solr.lock.type'

      <!--<lockType>${solr.lock.type:hdfs}</lockType>-->
 <lockType>simple</lockType>
3. Once these changes are done, then the instance directory can be uploaded as a new configuration to zookeeper using the following command

solrctrl instancedir --create collectionConfig collectionConfigDir

4. Now, the new collection can be created which will start using the local directory for storing the indexes.

solrctl collection --create newCollection -c collectionConfig -s 1 -r 2

The above command will create a collection (with 1 shard and 2 replicas) that will use local directory (/solr) for storing the indexes

No comments:

Post a Comment