I have recently returned from Solr training by LucidWorks. The training was excellent. I had been using Solr for a couple of months experimenting with various queries trying to improve recall on my particular data. The clarification on the various caches was very valuable. But enough of that, this post is about setting up SolrCloud on windows.
This tutorial uses:
- Solr 4.1.0
- Java 1.7.0_13
After installing Java I added this to the sytem Path:
C:\Program Files\Java\jdk1.7.0_13;C:\Program Files\Java\jre7\bin
I also added this environment variable:
with the value:
Of course those values need to reflect where you installed Java.
Check your install of Java by going to some directory other than where you installed java and run the following:
java version "1.7.0_13"
Java(TM) SE Runtime Environment (build 1.7.0_13-b20)
Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)
SolrDownload Solr 4.1.
I downloaded the zip file, solr-4.1.0.zip.
Install SolrUnzip Solr.
Modify Contents of the Example Directory and Prepare for Shards and ReplicasGo into the directory that contains Solr. In my install the zip file created a directory structure like this:
I will refer to the above path as SOLRHOME.
Here is the listing of the directory:
02/27/2013 03:31 PM .
02/27/2013 03:31 PM ..
02/27/2013 03:28 PM 286,759 CHANGES.txt
02/27/2013 03:29 PM contrib
02/27/2013 03:30 PM dist
02/27/2013 03:31 PM docs
02/27/2013 03:31 PM example
02/27/2013 03:28 PM 12,872 LICENSE.txt
02/27/2013 03:31 PM licenses
02/27/2013 03:28 PM 24,495 NOTICE.txt
02/27/2013 03:28 PM 5,464 README.txt
02/27/2013 03:28 PM 805 SYSTEM_REQUIREMENTS.txt
5 File(s) 330,395 bytes
7 Dir(s) 50,699,972,608 bytes free
Go to SOLRHOME\example\solr
Here I want to simulate setting up a custom collection. So, rename the directory "collection1" to "junk".
Still in the same directory edit the "solr.xml" file. We are interested in the portion at the bottom that specifies the cores.
Here is the original contents of the solr.xml file core entry:
Change each instance of collection1 in the xml file to junk. The results are shown here:
Make sure you have the host port set to the jetty port in the solr.xml file.
In the SOLRHOME directory I duplicated the "example" directory. I want to setup two shards each with a replica, so I need four directories total. So I make three copies of the example directory and name them example1, example2, example3, and example4. (By the way, I did this in Windows Explorer, you could do it the same, or in a Command window, doesn't matter.)
Start SolrGo to SOLRHOME\example1.
From a command prompt, execute the following:
java -Dbootstrap_confdir=./solr/junk/conf -Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar start.jar
The -Dcollection.configName=myconf works the same as when I fully specify the location of the config file like this:
java -Dbootstrap_confdir=./solr/junk/conf -Dcollection.configName=solrconfig -DzkRun -DnumShards=2 -jar start.jar
Notice that this command specifies the number of shards. Here is something to remember, re-sharding means re-indexing. If you setup two shards and load data into them and then decide you want three shards, at the time of writing this blog you have to re-idex (re-import) all of your data.
Also, the command specifies to launch an instance of ZooKeeper with the -DzkRun. ZooKeeper you say. What is ZooKeeper? It is an application for managing clusters. ZooKeeper comes with the Solr install (as well as Jetty) and is launched for you. This is for a convenience. In a production system you would not want ZooKeeper running on the same box as Solr which makes a single point of failure. Also ZooKeeper should be ran in an ensemble of at least three instances. You can look up ZooKeeper if you want more details. The command runs ZooKeeper at the Solr Port + 1000. The default Solr port is 8983, therefore ZooKeeper is at 9983.
Open a browser (I use Firefox, I have experienced problems with IE) and go to this url:
You should see this:
The page will show you the graph of the cloud. Remember, our collection is named "junk" and we setup two shards and two replicas by making four "example" directories.
Starting the Second ShardStarting the second shard is very simple.
- Launch another command window and go to SOLRHOME\example2.
- Runs this command:
- java -Djetty.port=8984 -DzkHost=localhost:9983 -jar start.jar
The paramter -DzkHost is specifying where ZooKeeper is running.
After running the command you should see that shard2 is now running from the Solr Cloud page.
Starting the ReplicasStarting the replicas is like starting the second shard above, just go into each remaining directory (example3 and example4) and run the command used before specifying a different port for each instance.
- Runs this command:
- java -Djetty.port=8985 -DzkHost=localhost:9983 -jar start.jar
- Launch another command window and go to SOLRHOME\example4.
- Runs this command:
- java -Djetty.port=8986 -DzkHost=localhost:9983 -jar start.jar
MiscellaneousIn the Solr Dashboard if you select "Tree" under "Cloud" it shows you the information that was is used by ZooKeeper to configure the shards and replicas.
Mistakes I Made Trying to Figure this OutBefore I tried to setup my first SolrCloud configuration I had been running Solr for about two months. During that time I was experimenting with various schemas and field types, and Lucene queries. My problem set is one of "recall" based on "scoring".
Sometime during this experimentation I had altered many of the files, and I must have messed up the solr.xml file. I would do the same steps I have above and I would never get any other shards to appear. Finally I just reinstalled Solr and everything started working.
I suspect one culprit that caused things not to work was I had experimented with DistributedSearch where you manually setup shards. In the solr.xml file you can specify the core information with shard details, and that may have been "floating" around somewhere.
Another mistake I made is I forgot to specify the Zookeeper param when launching what I thought would be a new shard or replica. So, make sure you don't forget to tell where Zookeeper is with the -DzkHost param.
If things don't seem to be working you can go to "example1" and delete the zoo_data directory and try launching things again.