Recently I walked you through the process of deploying the enterprise-grade search platform, Apache Solr. With this tool, you can take massive amounts of data and run powerful search queries against them with hit-highlighting, real-time indexing, dynamic clustering and more.
Once you have Apache Solr deployed, you’re going to need to be able to add your data to a collection so it can be searched. Here, we’ll import a CSV list of data (which can be of any size) into a new collection, and then run a query against the new data.
SEE: Hiring Kit: Database engineer (TechRepublic Premium)
What you’ll need
To follow along with this, you’ll need a running instance of Apache Solr (with the Solr user credentials) and a CSV data file. I’ll create a sample CSV data file that you can use as a template.
How to create a CSV file for import
The first thing you’ll need to do is log into the server hosting Apache Solr, either via SSH or a local login. Once logged in, create the new file with the command:
nano ~/solrdata.csv
You can name this file whatever you like and house it in any directory. Create a top row that includes the names for each column: I’m going to demonstrate with a CSV file defining countries. The top line will define several items (such as country-code, region and sub-region) and looks like this:
name,alpha-2,alpha-3,country-code,iso_3166-2,region,sub-region,intermediate-region,region-code,sub-region-code,intermediate-region-code
The remainder of the file contains entries like this:
Afghanistan,AF,AFG,004,ISO 3166-2:AF,Asia,Southern Asia,"",142,034,""
Ă…land Islands,AX,ALA,248,ISO 3166-2:AX,Europe,Northern Europe,"",150,154,""
Albania,AL,ALB,008,ISO 3166-2:AL,Europe,Southern Europe,"",150,039,""
Algeria,DZ,DZA,012,ISO 3166-2:DZ,Africa,Northern Africa,"",002,015,""
American Samoa,AS,ASM,016,ISO 3166-2:AS,Oceania,Polynesia,"",009,061,""
Andorra,AD,AND,020,ISO 3166-2:AD,Europe,Southern Europe,"",150,039,""
Angola,AO,AGO,024,ISO 3166-2:AO,Africa,Sub-Saharan Africa,Middle Africa,002,202,017
You can download the entire sample country.csv file with the command:
wget https://cdn.wsform.com/wp-content/uploads/2018/09/country.csv
Save that file to the local drive of the Apache Solr hosting machine.
How to create a new collection
Let’s now create a new collection to house our country data. We’ll call this collection “country_data” and create it with the command:
su - solr -c "/opt/solr/bin/solr create -c country_data -n data_driven_schema_configs"
You’ll be prompted for the Solr user password. Once you successfully authenticate, the collection will be created, and you’re ready to move on.
How to import the data
Change into the directory housing Solr with the command:
cd /opt/solr
We can then import the data with the command:
./bin/post -c country_data /path/to/country.csv
Where /path/to
is the exact path to the directory housing the newly downloaded country.csv file.
You should see output similar to this:
Posting files to [base] url http://localhost:8983/solr/country_data/update...
Entering auto mode. File endings considered are xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
POSTing file country.csv (text/csv) to [base]
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/country_data/update...
Time spent: 0:00:02.674
How to view the new data
Log in to the Apache Solr web interface by pointing a browser to http://SERVER:8983
(where SERVER
is the IP address of the hosting server). Select country_data from the newdata drop-down in the left navigation. In the resulting window (Figure A), click on Query.
Figure A
In the resulting window, click Execute Query without changing anything and the entire imported document will be listed (Figure B).
Figure B
Let’s say you want to search for Ireland. Type “Ireland” in the q section (under common) and hit Execute Query. The result will only list the entry for, you guessed it, Ireland (Figure C).
Figure C
An even easier way to import CSV data
There’s even an easier way to import CSV data into Apache Solr.
Let’s say you’ve created a new collection, called datacollection, and you want to import the country.csv file from the web-based interface. Log into Apache Solr, select datacollection from the drop-down, and then click Documents in the left navigation. In the resulting window, select CSV from the Document Type drop-down and then copy/paste the entire contents of the country.csv file into the Documents section (Figure D).
Figure D
Click Submit Document and you should eventually see (in the right pane) the following output:
Status: success
Response:
{
"responseHeader": {
"status": 0,
"QTime": 3533
}
}
You should now be able to query your imported data in the same way you did earlier.
And that’s all there is to importing CSV-formatted data into Apache Solr. This is a very powerful tool that makes searching massive collections of data very simple. If your business relies on data, this might be one of the many tools you need.
Subscribe to TechRepublic’s How To Make Tech Work on YouTube for all the latest tech advice for business pros from Jack Wallen.