import-data-apache-solr-jack
Image: Looker_Studio/Adobe Stock

Recently I walked you through the process of deploying the enterprise-grade search platform, Apache Solr. With this tool, you can take massive amounts of data and run powerful search queries against them with hit-highlighting, real-time indexing, dynamic clustering and more.

Once you have Apache Solr deployed, you’re going to need to be able to add your data to a collection so it can be searched. Here, we’ll import a CSV list of data (which can be of any size) into a new collection, and then run a query against the new data.

SEE: Hiring Kit: Database engineer (TechRepublic Premium)

What you’ll need

To follow along with this, you’ll need a running instance of Apache Solr (with the Solr user credentials) and a CSV data file. I’ll create a sample CSV data file that you can use as a template.

How to create a CSV file for import

The first thing you’ll need to do is log into the server hosting Apache Solr, either via SSH or a local login. Once logged in, create the new file with the command:

nano ~/solrdata.csv

You can name this file whatever you like and house it in any directory. Create a top row that includes the names for each column: I’m going to demonstrate with a CSV file defining countries. The top line will define several items (such as country-code, region and sub-region) and looks like this:

name,alpha-2,alpha-3,country-code,iso_3166-2,region,sub-region,intermediate-region,region-code,sub-region-code,intermediate-region-code

The remainder of the file contains entries like this:

Afghanistan,AF,AFG,004,ISO 3166-2:AF,Asia,Southern Asia,"",142,034,""

Ă…land Islands,AX,ALA,248,ISO 3166-2:AX,Europe,Northern Europe,"",150,154,""

Albania,AL,ALB,008,ISO 3166-2:AL,Europe,Southern Europe,"",150,039,""

Algeria,DZ,DZA,012,ISO 3166-2:DZ,Africa,Northern Africa,"",002,015,""

American Samoa,AS,ASM,016,ISO 3166-2:AS,Oceania,Polynesia,"",009,061,""

Andorra,AD,AND,020,ISO 3166-2:AD,Europe,Southern Europe,"",150,039,""

Angola,AO,AGO,024,ISO 3166-2:AO,Africa,Sub-Saharan Africa,Middle Africa,002,202,017

You can download the entire sample country.csv file with the command:

wget https://cdn.wsform.com/wp-content/uploads/2018/09/country.csv

Save that file to the local drive of the Apache Solr hosting machine.

How to create a new collection

Let’s now create a new collection to house our country data. We’ll call this collection “country_data” and create it with the command:

su - solr -c "/opt/solr/bin/solr create -c country_data -n data_driven_schema_configs"

You’ll be prompted for the Solr user password. Once you successfully authenticate, the collection will be created, and you’re ready to move on.

How to import the data

Change into the directory housing Solr with the command:

cd /opt/solr

We can then import the data with the command:

./bin/post -c country_data /path/to/country.csv

Where /path/to is the exact path to the directory housing the newly downloaded country.csv file.

You should see output similar to this:

Posting files to [base] url http://localhost:8983/solr/country_data/update...

Entering auto mode. File endings considered are xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log

POSTing file country.csv (text/csv) to [base]

1 files indexed.

COMMITting Solr index changes to http://localhost:8983/solr/country_data/update...

Time spent: 0:00:02.674

How to view the new data

Log in to the Apache Solr web interface by pointing a browser to http://SERVER:8983 (where SERVER is the IP address of the hosting server). Select country_data from the newdata drop-down in the left navigation. In the resulting window (Figure A), click on Query.

Figure A

solrdata-a
Image: Jack Wallen/TechRepublic. The country_data collection houses our imported data.

In the resulting window, click Execute Query without changing anything and the entire imported document will be listed (Figure B).

Figure B

solrdata-b
Image: Jack Wallen/TechRepublic. Our entire country CSV file is now searchable.

Let’s say you want to search for Ireland. Type “Ireland” in the q section (under common) and hit Execute Query. The result will only list the entry for, you guessed it, Ireland (Figure C).

Figure C

solrdata-c
Image: Jack Wallen/TechRepublic. Ireland has been searched for and found.

An even easier way to import CSV data

There’s even an easier way to import CSV data into Apache Solr.

Let’s say you’ve created a new collection, called datacollection, and you want to import the country.csv file from the web-based interface. Log into Apache Solr, select datacollection from the drop-down, and then click Documents in the left navigation. In the resulting window, select CSV from the Document Type drop-down and then copy/paste the entire contents of the country.csv file into the Documents section (Figure D).

Figure D

solrdata-d
Image: Jack Wallen/TechRepublic. Importing our CSV file from within the Apache Solr web-based interface.

Click Submit Document and you should eventually see (in the right pane) the following output:

Status: success

Response:

{

"responseHeader": {

"status": 0,

"QTime": 3533

}

}

You should now be able to query your imported data in the same way you did earlier.

And that’s all there is to importing CSV-formatted data into Apache Solr. This is a very powerful tool that makes searching massive collections of data very simple. If your business relies on data, this might be one of the many tools you need.

Subscribe to TechRepublic’s How To Make Tech Work on YouTube for all the latest tech advice for business pros from Jack Wallen.

Subscribe to the Developer Insider Newsletter

From the hottest programming languages to commentary on the Linux OS, get the developer and open source news and tips you need to know. Delivered Tuesdays and Thursdays

Subscribe to the Developer Insider Newsletter

From the hottest programming languages to commentary on the Linux OS, get the developer and open source news and tips you need to know. Delivered Tuesdays and Thursdays