Showing posts with label SOLR Search. Show all posts
Showing posts with label SOLR Search. Show all posts

Sunday 30 December 2018

Indexing Process in Solr

The Following post defines how exactly the Indexing process in Solr works.

When we take the Indexing part there are multiple ways we can achieve in Solr such as.

  • Indexing using the post.jar
  • Indexing using the dataImport handlers.
  • Indexing by executing the curl commands.

We will concentrate more on the first two pointers of Indexing.

Indexing using the post.jar

As I already mentioned in previous posts that the Solr Ships with the exampleDocs from where we can do getting started.

Navigate to C:\Dev\solr-7.5.0\example\exampledocs

In this folder we have the sample xml and json files,using which we can use to Index the data.Also In the Same folder We have the post.jar that process these documents and Index it .

C:\Dev\solr-7.5.0\bin>java -jar -Dc=example -Dauto C:\Dev\solr-7.5.0\example\exampledocs\post.jar C:\MicroservicesPOC\solr-7.5.0\solr-7.5.0\example\exampledocs\ .*

Where -Dc is the name of the core.

-Dauto is the location where the post.jar resides.

This post.jar reads the collection and Index the documents given to it. But the condition here is that we have to follow the format the post.jar expects, otherwise the Indexing will not happen.

Indexing using the dataImport handlers

For the Second way of Indexing checkout my detailed post here using the DataImport handler.

Happy Indexing!!!

Monday 24 December 2018

Core Creation in Solr

Before Starting anything into the Solr We have to create the Core. A Core is a running instance or the process of a Lucene index that contains all the Solr configuration files. We need to create a Core to perform operations like indexing and analyzing. It is mentioned that the Solr application may contain one core or more and can communicate with multiple cores.

Its similar to Creating the Endeca App. The Core can be created in two ways.


Creating through Command

Navigate to C:\Dev\solr-7.5.0\bin>solr.cmd create -c example

It will create the core for the Solr.

WARNING: Using _default configset with data have driven schema functionality. NOT RECOMMENDED for production use.
         To turn off: bin\solr config -c example -p 8983 -action set-user-property -property update.autoCreateFields -value false
INFO  - 2018-12-25 12:03:30.613; org.apache.solr.util.configuration.SSLCredentialProviderFactory; Processing SSL Credential Provider chain: env;sysprop

Created a new core 'example'

Creating through Solr Admin UI

Navigate to AdminUI>core Admin>Add Core

Fill the following popup

name:<name of the solr>

instanceDir:<Directory where the solr-Config.xml is avalible> In case if the solr-Config.xml is not created the We can use it from the default config set that comes up with Solr.

C:\Dev\solr-7.5.0\server\solr\example

Copy the directory conf from the

C:\Dev\solr-7.5.0\server\solr\configsets\_default

to

C:\Dev\solr-7.5.0\server\solr\example\conf

Give the Instance Directory as C:\Dev\solr-7.5.0\server\solr\example

dataDir:<Directory Where the Indexing Files stored>

C:\Dev\solr-7.5.0\server\solr\example\data

Remaining config and schema leave it as it.

Here understanding two folder structure is important.

conf> Where the solr configurations are stored.

data>Where the indexed data files are stored in the non readable format.

Happy Coring !!!!

Understanding Solr and Admin Console

We have seen how to download and install from our previous posts, Now its time to Understand it further.
After unzipping into the Folder Observe the Folder Structure.

Folder Structure

C:\Dev\solr-7.5.0\




bin> This will be having the Command Files. From where we will start/stop the Solr.

Contrib> This Will have add-on plugins for specialized features of Solr

dist> This will have the main Solr .jar files.

docs> This will have the link for the online documentation.

example> This will have the example docs which can be used for learning and getting started purpose.

licenses> The licenses directory includes all of the licenses for 3rd party libraries used by Solr.

server> This is the core of the Solr, Official documentation defines it as a heart. This will have the Following

server>solr-webapp> -->Solr’s Admin UI

server>lib> -->Jetty libraries

server>logs> -->Log files

server>solr>configsets> --> Sample configsets


Solr Admin UI.

Solr has the default admin UI that can be accessed via the port number 8983

http://127.0.0.1:8983/solr/ or http://localhost:8983/solr/ 





This will have the core selector, Logging, Schema selection, Query  Execution, memory stats and more. For the developers from Endeca, it can also Similar to the jspref Orange Application in Endeca, with more features.

Happy Structuring !!!

Installing Solr

The Installation of Solr is very simple. On Comparing with other Search Platforms Which I worked, this is considered to be the simplest one in terms of installation.

Download the zip file from the official site of Solr. Its always good Practise to move to some development folder to proceed, instead of having it in the Downloads folder.

Prerequesties:

Make sure your java is compatible with the version of Solr you download.

Make Sure your java home and path variable is set.

Follow the below steps for Installation.

1. Unzip the zip file which we downloaded. Usually, the File is in the Following format solr-7.X.X.zip

Once you unzip it. Congratulations you are done with your installation.

We have a walk through explanation on the folder structure in a different post.

Starting the Solr.

Normal Mode

Consider my Solr in the following Directory C:\Dev\solr-7.5.0 then

Navigate to C:\Dev\solr-7.5.0\bin open the command prompt in this location

and execute the Following command C:\Dev\solr-7.5.0\bin>solr.cmd start

Solr is started with the following logs on the prompt.

INFO  - 2018-12-25 10:39:32.458; org.apache.solr.util.configuration.SSLCredentialProviderFactory; Processing SSL Credential Provider chain: env;sysprop
Waiting up to 30 to see Solr running on port 8983
Started Solr server on port 8983. Happy searching!

Debug Mode

If you want the solr in Debug mode then execute the below command in the same location.

C:\Dev\solr-7.5.0\bin>solr.cmd start -a "-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=18983"

This will start the solr in debug mode with listening to the port 18983. if you need detailed explannation check my post here.

By Default Solr is running on the port 8983, If that port is already occupied either stop the process in that port or start the solr with different port.

Stopping the Process running on the port

1.Identifying the process running on the port.(Windows)

netstat -ano | findstr :8983 will list all the ports currently running in the machine.

2.Killing the Process running on the port (Windows)

taskkill /PID <PID_NO> /F

C:\Dev\solr-7.5.0\bin>netstat -ano | findstr :8983
  TCP    0.0.0.0:8983           0.0.0.0:0              LISTENING       18596
  TCP    [::]:8983              [::]:0                 LISTENING       18596

C:\Dev\solr-7.5.0\bin>taskkill /PID 18596 /F
SUCCESS: The process with PID 18596 has been terminated.

Starting the Solr in a different port.

C:\Dev\solr-7.5.0\bin>solr.cmd start -p 8990

This will start the solr in different port.


Stopping the solr.


C:\Dev\solr-7.5.0\bin>solr.cmd stop -all

This will stop if the solr is running in all the ports.

Happy Installation !!!!!

Monday 7 August 2017

Customizing “q” Parameter in SOLR

Hi All,
         Today we are going to see how we are going to customize the Search parameter in solr. Almost all of our Projects will be having the Requirement of Customizing this parameter. Please follow the Below Instructions so you can change it easily.

package com.mycommercesearch.solr;
              
import org.apache.solr.common.params.SolrParams;
import org.apache.solr.common.util.NamedList;
import org.apache.solr.handler.component.SearchHandler;
import org.apache.solr.request.SolrQueryRequest;
import org.apache.solr.response.SolrQueryResponse;

public class SearchEndecaHanlder extends SearchHandler {
      
       public static String SEARCH="search";
      
       public static String QUERY_PARAM="q";

       public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp)   throws Exception{
              String queryParam=req.getParams().get(SEARCH);
              SolrParams paramSolrParams = req.getParams();
              NamedList<Object> nmList = new NamedList<Object>();
              nmList=paramSolrParams.toNamedList();
              nmList.remove(SEARCH);
              nmList.add(QUERY_PARAM, queryParam);
              paramSolrParams=SolrParams.toSolrParams(nmList);
              req.setParams(paramSolrParams);
              super.handleRequestBody(req, rsp);
       }
}

In this Scenario I have used "search" instead of  "q" and it worked out. Create a Jar File from the above class and paste it in <SOLR_ISTALED_DIR>solr-6.6.0\server\solr-webapp\webapp\WEB-INF\lib\

Registering the Custom Handler in Solr

Navigate to Solrconfig.xml and add the below entry.

<requestHandler name="/mysearch" class="com.mycommercesearch.solr.SearchEndecaHanlder">
    <lst name="defaults">
      <str name="echoParams">explicit</str>
      <int name="rows">10</int>
      <!-- <str name="df">text</str> -->
    </lst>
  </requestHandler>

Once if you restart and access with the Below Url and get the Results .



Happy Searching !!!!

Monday 31 July 2017

Defining multiple entity In Solr

Most of us when implementing the Search for the Site , the data we are going to process is not from the Same Table and same fields , for information on how to Index the Data From Database can be seen in my previous blog here. This deals only with data from multiple datasources or the data from different tables here.

Navigate to db-data-config .xml and edit it. I am going to setup the Customer Data for Search here.

<?xml version="1.0" encoding="UTF-8" ?>
<dataConfig>
<dataSource name="ds1" type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/classicmodels" user="root" password="root"/>
<dataSource name="ds2" type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/customerdata" user="root" password="root"/>

Here is the place where, I can define the different datasources . Here I have configured two types of datasources, one is called ds1 and another called as ds2. You can have different Set like hsql and XML also defined for processing.

<document>
   <entity name="products" dataSource="ds1" pk="id" query="select * from products" deltaImportQuery="select * from products"
   deltaQuery="select * from products where last_modified > '${dataimporter.last_index_time}'">
     <field column="productCode" name="id"/>
     <field column="productName" name="name"/>
     <field column="productDescription" name="description"/>
                 <field column="productLine" name="category"/>
    </entity>    
    
In the products entity, unique key is the id and we can mention the data source also here.         

   <entity name="customers" dataSource="ds2"  pk="customerNumber" query="select * from customers" deltaImportQuery="select * from customers"
   deltaQuery="select * from customers where last_modified > '${dataimporter.last_index_time}'">
     <field column="customerNumber" name="id"/>
     <field column="customerName" name="customerName"/>
     <field column="contactLastName" name="contactLastName"/>
                 <field column="contactFirstName" name="contactFirstName"/>      
                 <field column="phone" name="phone"/>
                 <field column="addressLine1" name="addressLine1"/>
                 <field column="addressLine2" name="addressLine2"/>
                 <field column="city" name="city"/>
                 <field column="state" name="state"/>
                 <field column="postalCode" name="postalCode"/>
                 <field column="country" name="country"/>
     <field column="salesRepEmployeeNumber" name="salesRepEmployeeNumber"/>
                 <field column="creditLimit" name="creditLimit"/>
  </entity>
</document>
</dataConfig>


If you are introducing the new entity it is must to have field called the id , which is used for the uniqueness of the records.

Querying for products


Querying for customers


df is the data fields that holds this indexed data. Refrence is the datafield for products and customer is the datafield for customers.


Errors:

2017-04-22 10:24:17.256 WARN  (Thread-14) [   x:refrence] o.a.s.h.d.SolrWriter Error creating document : SolrInputDocument(fields: [category=Ships, id=S72_3212, name=Pont Yacht, description=Measures 38 inches Long x 33 3/4 inches High. Includes a stand.
Many extras including rigging, long boats, pilot house, anchors, etc. Comes with 2 masts, all square-rigged, _version_=1565373662235721728])
org.apache.solr.common.SolrException: [doc=S72_3212] missing required field: city


When you face the error you have to remove the field required=”true ” or make it “false” in managed-schema.xml as like below
                <field name="city" type="string" indexed="true" stored="true" required="false" multiValued="false" />

<field name="city" type="string" indexed="true" stored="true" multiValued="false" />


If you face below error

Solr Error Document is missing mandatory uniqueKey field id


It means your document does not have the property id which is defined like below in  <uniqueKey>id</uniqueKey> in managed-schema.xml


Happy learning !!!!

Faceting in Solr

Faceting can be defined as grouping up the grouping up the fields of search results, so that user can narrow down their search. Solr comes with simple implementation of it.




Parameter
Example
Explanation
facet
If this is set to true then facets are enable for the current search.
facet.field
Facets will be returned for these fields defined.
facet.prefix
This will return only the fields matching the prefix in the facets.
facet.contains
This will return the facets containing the term matching
facet.sort
This will sort the results based on the field given.
facet.limit
This will limit the facets to be returned.
facet.offset
This will display the facets from the given offset
facet.mincount
This will return the facets , having only the matching count.
facet.missing
This will return the facets that is matching query but not the facet matching
facet.method
Algorithm to be used .
facet.range
This has to be returned for range faceting

Standard Query Parser Parameters

QParameter Whatever we ask solr to search, it has to be denoted with the q Parameter. Once the Solr sees this parameter it will return the search results matching this parameter.

Usgae:


Specifying Terms for the Standard Query Parser A query to the standard query parser is broken up into terms and operators.
There are two types of terms:
Solr search for the single terms and phrases(combination of the words)

Wildcard Searches

Usually             Wildcard searches, searches for all the possible results, same concept is applicable in the solr as well.We can define in both the way

1)For the particular word we can define in the following way.

Usage:

When we define in the above way, then the solr searches for the words like texting and testing as well.


2) For All the possiblites

Usage:

When we define like this , solr will search for all the possible outcomes of the given word, like testing, tested etc.

Fuzzy Searches

Solr has some unique features, by which we can achieve the other terms for the searched term.

Usage:

When we define the term like above it searches for the words like beat,feat etc.We can also define the edit distance so based on it will search for the terms .

Usage:

It will search for only feat,beat but not foat

Proximity Searches

Solr will search only in the specified distance between this two words.

Usage:

This will return results so it identifies it.

Where else


This will not return the results , since there are 5 words distance between it.

Range Searches

This will return only of the specified ranges results.

Usage:

Boosting a Term with ^

We can use ^ for boosting of the term, and making it more relevant.

Usage:

Boolean Opertors in solr

AND (&&)

NOT (!)

OR (||)

+

-




Happy Learning !!!!!

Common Query Parameters in Solr

Following tutorials deals with Solr's common query parameters, which are supported by the Search
RequestHandlers

After the Data Setup it’s our time to learn about the Querying part of Solr. Solr comes with simple parameters for the Querying.

Parameter
Url Parameter Example
Explanation
 start

When the start parameter is defined, solr start displaying the results from this .The Default value is 0. Setting the start parameter to some other number, such as 4, causes Solr to skip over the preceding records and start at the document identified by the offset.
rows
this is the row parameter, specifies the number of products should be returned in the results set.

fq (Filter Query)


this parameter is used for the filtering results from the results already returned. We can use the multiple instances of the fq parameter. We can also concanete this type of query as well.

fl (Field List)

this field list parameter is used for explicitly explaning the solr to search only these fields. It is always good practice to define this field when we have more fields in our indexing data.Only the mentioned fields will be returned.
 Debug
the debug parameter is used to see the debug information about the query.
explainOther
This is used to compare the results with the id:S10_4698

this query not only returns the debug information and also used to compare with the given value.
Wt
The wt parameter selects the Response Writer that Solr should use to format the query's response.
omitHeader
This parameter may be set to either true or false.
If set to true, this parameter excludes the header from the returned results. The header contains information
about the request, such as the time it took to complete. The default value for this   parameter is false.


 logParamsList


And only the 'q' and 'fq' parameters will be logged.
By default, Solr logs all parameters of requests. From version 4.7, set this parameter to restrict which parameters of a request are logged. This may help control logging to only those parameters considered important to your
organization.

 echoParams

The echoParams parameter controls what information about request parameters is included in the response header.
 Sort

The sort parameter arranges search results in either ascending (asc) or descending (desc) order.