05th
May
ElasticSearch interview questions

ElasticSearch interview questions

  • Shom Das
  • 05th May, 2021
  • 477 Followers

Elasticsearch is developed in Java language. Elasticsearch is a highly scalable open-source full-text search engine and analytics engine. ElasticSearch is a widely adopted search engine. It is easy to start working with, but hard to master in the long run. Elasticsearch, basically helps to enhance the eCommerce conversion. For e-commerce related websites, it gives the option for enabling a high level of flexibility, reliability, and relevance, etc. and some particular features that help in adaptation, conversation and retention. It is easy to start working with, but hard to master in the long run. Elasticsearch is allows you to search, store, and analyze big volumes of data speedily and in near real time. Elasticsearch is a also near real time search platform.

ElasticSearch interview questions

1) What is Elasticsearch?

Elasticsearch is a search engine based on the Lucene library. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. Elasticsearch is developed in Java.

2) What are advantages of Elasticsearch?

Major advantages of using Elasticsearch are

  • Document-oriented
  • Scalability
  • Query fine-tuning
  • RESTful API
  • Multi-tenancy
  • Distributed approach
  • Supports Cloud
  • Faster data retrieval
  • Faceted search/analysis capability
  • Developer friendly.

3) Where is Elasticsearch used?

Elasticsearch is used for a lot of different use cases: "classical" full-text search, analytics store, auto-completer, spell checker, alerting engine, and as a general-purpose document store.

4) What are software requirements to install Elasticsearch?

Since Elasticsearch is built using Java, we require any of the following software to install and run Elasticsearch on our device.

  • The latest version of Java 8 series
  • Java version 1.8.0_131 is recommended.

5) How to start elastic search server?

Run Following command on your terminal to start Elasticsearch server:

cd elasticsearch
./bin/elasticsearch

curl 'http://localhost:9200/?pretty' command is used to check ElasticSearch server is running or not.

6) What are dedicated master nodes in ElasticSearch?

Dedicated master nodes are Amazon Elasticsearch Service which is used to increase cluster stability. This node performs cluster management tasks but does not hold data or respond to data upload requests. This offloading of cluster management tasks increases the stability of your domain.

Dedicated master nodes can perform the following cluster management tasks:

  • Track all nodes in the cluster
  • Track the number of indices in the cluster
  • Track the number of shards belonging to each index
  • Maintain routing information for nodes in the cluster
  • Update the cluster state after state changes, such as creating an index and adding or removing nodes in the cluster
  • Replicate changes to the cluster state across all nodes in the cluster
  • Monitor the health of all cluster nodes by sending heartbeat signals, periodic signals that monitor the availability of the data nodes in the cluster

7) What is an index in Elastic search?

An index in Elasticsearch is similar to a table in relational databases.The only difference lies in storing the actual values in the relational database, whereas that is optional in Elasticsearch.

An index is defined as:

  • An index is like a 'database' in a relational database. It has a mapping which defines multiple types.
  • An index is a logical namespace which maps to one or more primary shards and can have zero or more replica shards.

8) What is a node in Elastic search?

A node in ElasticSearch is an instance of ElasticSearch Server. Every time you start an instance of Elasticsearch, you are starting a node.

A collection of two or more connected nodes is called a cluster. Every node in the cluster can handle HTTP and Transport traffic by default. The transport layer is used exclusively for communication between nodes and the Java TransportClient; the HTTP layer is used only by external REST clients.

Here are steps to add a node to an ElasticSearch cluster.

  • Step 1: Set up a new Elasticsearch instance.
  • Step 2: Specify the name of the cluster in its cluster.name attribute. For example, to add a node to the logging-prod cluster, set cluster.name: "logging-prod" in elasticsearch.yml.
  • Step 3: Start Elasticsearch. The node automatically discovers and joins the specified cluster.

9) What is a Cluster in Elasticsearch?

A collection of two or more connected nodes is called a cluster. Every node in the cluster can handle HTTP and Transport traffic by default. The transport layer is used exclusively for communication between nodes and the Java TransportClient; the HTTP layer is used only by external REST clients.

10) What is type in Elasticsearch?

Type in Elasticsearch represents a class of similar documents and it has a name such as "customer" or "item". Types are a convenient way to store several types of data in the same index.Type name of each document in Elasticsearch is stored in a metadata field of a document called _type.

11) What is Document in Elasticsearch?

A document is a JSON document that is stored in Elasticsearch. It is like a row in a table in a relational database. Each document is stored in an index and has a type and an id.

12) What are SHARDS in Elasticsearch?

A shard in Elasticsearch is an unbreakable entity, in the sense that a shard can only stay on one machine (Node). An index which is a group of shards can spread across multiple machines(ES nodes) but shards can not. So, your data size to # of shards ratio decides your cluster scalability limits.

13) What are REPLICAS in Elasticsearch?

Replicas are copies of the shards and provide reliability if a node is lost. In ElasticSearch an index is broken into shards in order to distribute them and scale. There are two types of shards, the primary shard, and a copy, or replica.

14) What is a Tokenizer in ElasticSearch?

A tokenizer receives a stream of characters, breaks it up into individual tokens (usually individual words), and outputs a stream of tokens. For instance, a whitespace tokenizer breaks text into tokens whenever it sees any whitespace.

Elasticsearch uses a tokenization and analysis method to find relevant results based on your query.

15) How to create an index in ElasticSearch Cluster?

Create index API is used to add a new index to an Elasticsearch cluster. When creating an index, you can specify the following:

  • Settings for the index
  • Mappings for fields in the index
  • Index aliases

Syntax

PUT /index_name

16) How to delete an index in Elastic search?

Delete index API is used to delete an index from the ElasticSearch cluster. You can also apply delete index API to more than one index, by either using a comma-separated list or on all indices (be careful!) by using _all or * as index.

Syntax

DELETE /index_name

17) How to list all indexes of a Cluster in ElasticSearch?

You can get a concise list of all indices in your cluster by calling

curl curl http://localhost:9200/_cat/indices

If you want it pretty-printed, add pretty=true:

curl curl http://localhost:9200/_cat/indices?pretty=true

18) How to add a Mapping in an Index in ElasticSearch?

Use ElasticSearch put mapping api to adds new fields or mapping to an existing index.

A mapping type has:

  • Meta-fields
  • Fields or properties

You can use text, keyword, date, long, double, boolean or ip, geo_point, geo_shape, or completion, JSON object or nested data as a type.

19) How does aggregation work in Elasticsearch?

Elasticsearch Aggregations provide you with the ability to group and perform calculations and statistics (such as sums and averages) on your data by using a simple search query. An aggregation can be viewed as a working unit that builds analytical information across a set of documents.

20) Where data is stored in Elasticsearch?

On basis of operating system you can find Elasticsearch data as follow

  • If you've installed ElasticSearch on Linux, the default data storage directory is /var/lib/elasticsearch
  • If you're on CentOS or Ubuntu then storage directory is /var/lib/elasticsearch/data
  • If you're on Windows or if you've simply extracted ES from the ZIP/TGZ file, then you should have a data sub-folder in the extraction folder.

21) How to check elastic search server is running or not?

By running below command you can check elastic search server is running or not

service elasticsearch status

It should give the success status. Otherwise, you have to look into the log files. Those you can find in "/var/log/elasticsearch/" directory.

22) What is an Analyzer in ElasticSearch?

An Analyzer in Elasticsearch is actually a component of the underlying engine that does search and analytics, named Apache Lucene. The Analyzer component is used to break down a text in multiple terms (a.k.a tokens) that are used to build the inverted index which exists inside Lucene for implementing full-text search.

Elasticsearch ships with a wide range of built-in analyzers, which can be used in any index without further configuration.

23) What is Ingest Node in Elasticsearch?

Ingest nodes are an integral part of the Elasticsearch cluster and have been available since the 5.0 release. You can use the ingest node to process documents prior to indexing. The ingest node intercepts bulk and index requests, it applies transformations, and it then passes the documents back to the index or bulk APIs.

24) What is Tribe Node in Elasticsearch?

A Tribe node is a unit that gathers critical cluster information of these federated clusters and syncs them. It allows you to have separate clusters act together even if they are in different data centers and used to connect multiple clusters together.

25) What is a quorum in Elasticsearch?

Quorum is a subset of the master-eligible nodes in the cluster. Quorum helps Elasticsearch in electing a master node and changing the cluster state are the two fundamental tasks that master-eligible nodes must work together to perform.

26) What is fuzzy search Elasticsearch?

Fuzzy search or Query in Elasticsearch is a powerful tool for a multitude of situations. It creates a set of all possible variations, or expansions, of the search term within a specified edit distance.

27) What query language ElasticSearch uses?

ElasicSearch use DSL (Domain Specific Language) that is based on JSON to define queries. Elasticsearch DSL is a high-level library whose aim is to help with writing and running queries against Elasticsearch. It provides a more convenient and idiomatic way to write and manipulate queries.

28) What is Character Filter in Elasticsearch?

Character filters are used to preprocess the stream of characters before it is passed to the tokenizer. Elasticsearch has a number of built-in character filters that can be used to build custom analyzers like HTML Strip Character Filter (html_strip), Mapping Character Filter (mapping), Pattern Replace Character Filter. (pattern_replace )

29) List different types of Filters available in Elasticsearch?

Filters in ElasticSearch is used to add filter or condition in Query DSL.

Below is list of different types of filters available in ElasticSearch

  • IDs Filter
  • Match All Filter
  • Nested Filter
  • Prefix Filter
  • Query Filter
  • Range Filter
  • Regexp Filter
  • Script Filter
  • Term Filter
  • Terms Filter
  • Type Filter
  • Character Filter

30) What is an inverted index in Elasticsearch?

Inverted index in Elastic search is a special type of data structure that is used very fast full-text searches. An inverted index consists of a list of all the unique words that appear in any document, and for each word, a list of the documents in which it appears.

31) What is a split brain problem in Elasticsearch?

Split brain in Elastic search is a situation where each node elects itself as the new master (thinking that the other master-eligible node has died) and the result is two clusters.

For avoiding the split-brain situation, the first parameter we can look at is discovery.zen.minimum_master_nodes. This parameter determines how many nodes need to be in communication in order to elect a master. It’s default value is 1. The rule of thumb is that this should be set to N/2 + 1, where N is the number of nodes in the cluster. For example in the case of a 3 node cluster, the minimum_master_nodes should be set to 3/2 + 1 = 2 (rounding down to the nearest integer).

32) What is dynamic mapping in Elasticsearch?

Dynamic Mapping is one of the most important features of Elasticsearch is that it tries to get out of your way and let you start exploring your data as quickly as possible.

To index a document in Elasticsearch, you don't have to first create an index, define a mapping type, and define your fields - you can just index a document and the index, type, and fields will spring to life automatically:

PUT data/_doc/1 
{ "count": 5 }

The automatic detection and addition of new fields is called dynamic mapping. The dynamic mapping rules can be customised to suit your purposes.

33) What are different types of Node in ElasticSearch?

There are 4 different types of Node in ElasticSearch. They are Master Node , Data Node, Client Node and Tribe Node.

34) How many Shards and replica are available by default in Elasticsearch Index?

By default, each index in Elasticsearch is allocated 5 primary Shards and 1 replica which means that if you have at least two nodes in your cluster, your index will have 5 primary shards and another 5 replica shards (1 complete replica) for a total of 10 shards per index.

35) What are segments in Elasticsearch?

Elasticsearch segment is a small Lucene index. Lucene searches in all segments sequentially. Lucene creates a segment when a new writer is opened, and when a writer commits or is closed. When we add new documents into your Elasticsearch index, Lucene creates a new segment and writes it. Segments in Elasticsearch are immutable.

36) What is a mapping in ElasticSearch?

Mapping is the process of defining how a document should be mapped to the Search Engine, including its searchable characteristics such as which fields are searchable and if/how they are tokenized. Elasticsearch allows one to associate multiple mapping definitions for each mapping type.

37) How to check cluster health in elasticsearch?

You can use cluster health API to check the health of cluster in Elasticsearch. This API returns a simple status on the health of the cluster.

Usage

GET /_cluster/health?wait_for_status=yellow&timeout=50s

The cluster health status is: green, yellow or red. On the shard level, a red status indicates that the specific shard is not allocated in the cluster, yellow means that the primary shard is allocated but replicas are not, and green means that all shards are allocated.

Leave A Comment :

Valid name is required.

Valid name is required.

Valid email id is required.