Apache Solr Interview Questions
Q1.
What do you understand by the term Apache Lucene?
Apache Solr is a standalone full-text search platform to perform
searches on multiple websites and index documents using XML and HTTP. Built on
a Java Library called Lucence, Solr supports a rich schema specification for a
wide range and offers flexibility in dealing with different document fields. It
also consists of an extensive search plugin API for developing custom search
behavior.
Supported
by Apache Software Foundation, Apache Lucene is a free, open-source,
high-performance text search engine library written in Java by Doug Cutting.
Lucence facilitates full-featured searching, highlighting, indexing and
spellchecking of documents in various formats like MS Office docs, HTML, PDF,
text docs and others. Solr is built on top of lucene.
Advantages
|
Disadvantages
|
Has a powerful
language structure
|
Learning the syntax
consumes a lot of time
|
empowers clients to
perform precise scans for every one of the
questions either it
may be simple or complex
|
There is a
requirement of expert programmers
who can write codes.
|
Both Solr and Elasticsearch are popular open source search engines built on top of Lucene. Both have vibrant communities and are well documented. The difference is in the way each builds a wrapper and implements features on top of Lucene.
Q2.
Describe the term Request Handler.
A
Request Handler is basically a plugin, which handles approaching solicitations
with a specific goal in mind. At the point when a client runs a search in Solr,
a request handler prepares the inquiry question. SolrRequestHandler is the Solr
Plugin that represents the logic to be performed at any request.
Q3.
List the different type of information that can be retrieved from a field
type.
The
different type of information that can be retrieved from a field type include
the following:
·
Name of the field
·
Field properties
·
A usable class names
·
Description of the field investigation for the field type, in
case the field type is that of a Text Field
Q4.
What do you understand by the term Field Analyzer?
“Working
with literary information in Solr, Field Analyzer audits and checks the
documented content and produces a token stream. The pre-procedure of examining
any input content is performed during the time of inquiring or classifying and
at inquiry time. Many of the Solr applications utilize Custom Analyzers
characterized by clients. However, it is essential to keep in mind that every
Analyzer has just a single Tokenizer.”
Field
analyzers
are used both during ingestion, when a document is indexed, and at query time.
An analyzer examines the text of fields and generates a token stream. Analyzers
may be a single class or they may be composed of a series of tokenizer and
filter classes.
Tokenizers break field data into
lexical units, or tokens.
Filters
examine
a stream of tokens and keep them, transform or discard them, or create new
ones. Tokenizers and filters may be combined to form pipelines, or chains,
where the output of one is input to the next. Such a sequence of tokenizers and
filters is called an analyzer and the resulting output of an analyzer is used
to match query results or build indices.
Q5.
List the various categories of highlighters.
Different
categories of highlighters available in Apache Solr include the following:
Standard
Highlighter:
gives exact matches even to innovative query parsers.
FastVector
Highlighter:
Though less progressed in comparison to Standard Highlighter, it works better
for more dialects and promotes Unicode break iterators.
Postings
Highlighter:
One of the most precise, compact and effective highlighter categories in
comparison to other vectors. However, inappropriate for a progressive number of
question terms.
Q6.
What does the term Highlighting refer?
Highlighting
is
only the fragmentation of records relating to the client's question that is
incorporated into the Query reaction. A short time later, these parts are shown
and set in the unique portion, that is utilized by the clients and customers to
exhibit the pieces. The Solr contains various featuring utilities and has power
overdifferent fields. The featuring utilities can be called by Handlers of
Request and can be reused with the standard question parsers.
Q7.
How can one utilize Apache Solr for achieving maximum potential for
performance?
Solr
can accomplish quick inquiry reactions in light of the fact that, rather than
looking through the content legitimately, it looks through a record. This
resembles recovering pages in a book identified with a catchphrase by checking
the file at the back of a book, rather than looking through each expression of
each page of the book.
Q8.
List and describe the various building blocks of Apache Solr.
The
chief building blocks associated with Apache Solr include the following:
Request
Handler:
A request handler is used in order to process various queries that might be
related to updating or other features. Based on the requirement of the user,
from a variety of request handlers, themost appropriate one can be picked to do
the job.
Search
Component:
Search Component is a special feature that allows searching for different
facilities within Apache Solr. These facilities might include spell checks,
faceting, highlighting, etc. that might be particularly required by the user.
Query
Parser:
This building block of Apache Solr helps in the verification of different
queries for specific syntactical errors. Once the error has been resolved then
it is modified to a format that is acceptable by Lucene
Response
Writer:
Response Writer in Apache Solr generates various outputs of different formats
for each query place by the user. Numerous formats supported by Apache Solr
include JSON, XML, CSV, and so on. Each type of response has a different
response writer assigned to it.
Analyzer/Tokenizer: Data is recognized by
data in the format of tokens. These token that is analyzed and segregated to
different contents by Apache Solr is then passed onto Lucene. The role of the
Tokenizer is to then break the stream of tokens that is organized by the
analyzer as tokens.
Update
Request Processor: When an update is sent as an appeal to Apache Solr, then this
particular request is run via a range of different plugins that are jointly
named as update request processor.
Q9.
List the different types of Fields that are used in Apache Solr.
The
different type of Fields used in Apache Solr include the following:
·
date
·
double
·
float
·
long
·
Text
Q10.
What do you infer by the term Dynamic Fields with respect to Apache Solr?
During
times when a user neglected to characterize some important field then dynamic fields
are only the ideal decision to consider. One can make different dynamic fields
together and they are profoundly adaptable in ordering fields that are not
uniquely characterized in the pattern.
Q11.
Explain the term SolrCloud.
Apache
Solr incorporates the capacity to set up a group of Solr servers that
consolidates adaptation to noncritical failure and high accessibility is Called
SolrCloud. These abilities give circulated ordering and hunt capacities and the
accompanying highlights:
·
Central arrangement for the whole group
·
Automatic burden adjusting and flop over for inquiries
·
ZooKeeper combination for group coordination and setup.
In
other terms, SolrCloud is adaptable circulated pursuit and order, without an
ace hub to assign hubs, shards,and reproductions. Rather, Solr utilizes
ZooKeeper to deal with these areas, contingent upon setup records and diagrams.
Archives can be sent to any server and ZooKeeper will make sense of it.
Q12.
List the various categories of query parameters used in Apache Solr.
The
various categories of query parameters used in Apache Solr include the
following:
fl: stipulates the list
of various fields that are required to be returned to each document within the
result
fq: represents a set of
filter queries that are filled by Apache Solr within strict bounds for the best
result to be obtained for various documents
rows: represents the exact
number of various documents that need to be recovered per page; the default
number is 10
start: represents the
initial offset for a particular page, the default number is 0
sort: indicates the rundown
of fields isolated by commas, in light of which the aftereffects of the
question is to be arranged
q: this is the
fundamental inquiry parameter of Apache Solr, the archives are scored by their
closeness to terms in this parameter
wt: represents the kind
of the reaction the user needs to see the outcome
Q13.
List the various configuration files used by Apache Solr.
The
various configuration files used by Apache Solr include the following:
Solr.xml - This record is in
$SOLR_HOME index and is composed of Solr Cloud related data.
Schema.xml - It constitutes the
entire schema.
Solrconfig.xml - It incorporates the
definitions and center explicit setups identified with solicitation taking care
of and reaction organizing.
Core.properties - This record contains
the arrangements explicit profoundly.
Q14.
What do you understand by the term Apache Solr core?
Apache
Solr Core is a functioning occurrence of a Lucene list that is composed of all
the Solr arrangement records. Solr core should be made to perform activities
like analyzing and recording. Solr application may contain one or different
centers. On the off chance that core might require two centers in a Solr
application have the leverage to communicate with one another.
Q15.
Features of Apache Solr
1.
Permits Scalable, superior ordering Near ongoing ordering
2.
Standard levels provide open interfaces such as XML, HTTP, and JSON
3.
Adaptable and versatile faceting
4.
Progressed and precise full – content exploration
5.
Directly adaptable, auto list replication, auto failover, and recuperation
6.
Permits simultaneous examination and refreshing.
7.
Complete HTML organization interfaces
8.
Gives cross – stage arrangements that are compatible with different files
Q16.
Pros of Apache Solr
1.
Easy access Apache Solr: Regardless of whether it is handling a
setup issue or attempting to become familiar with a portion of the further
developed highlights, there are a lot of assets to enable you to go out and
make you go.
2.
Excellent performance: Apache Solr takes into consideration a ton
of custom tuning (if necessary) and gives extraordinary out of the crate
execution for seeking on expansive informational collections.
3.
Maintenance: Subsequent to setting up Solr in a generation domain
there are a lot of devices given to enable you to keep up and update your
application. Apache Solr accompanies extraordinary adaptation to non-critical
failure worked in and has turned out to be entirely solid.
Q17.
Cons of Apache Solr
1.
An ordering of information can once in a while be a trudge, which means it can
here and there require a significant stretch of time to get a huge accumulation
fully operational in the event that you have numerous fields that should be
recorded.
Q18. What is SolrJ?
Q19.Can you compare the features of Apache Solr vs
Elasticsearch?