Apache Solr vs Elasticsearch, the 2 leading open-source search engines... What are the main differences between these technologies?
Which one's faster? And which one's more scalable? How about ease-of-use?
Which one should you choose? Which search engine's the perfect fit for your own:
Obviously, there's no universally applicable answer. Yet, there are certain parameters to use when evaluating these 2 technologies.
And this is precisely what we've come up with: a list of 10 key criteria to evaluate the two search engines by, revealing both their main strengths and most discouraging weakness.
So you can compare, weight pros and cons and... draw your own conclusions.
But First, A Few Words About The Two “Contestants”
I find it only natural to start any Apache Solr vs Elasticsearch comparison by briefly shading some light on their common origins:
Both open source search engine “giants” are built on the Apache Lucene platform. And this is precisely why you're being challenged with a significant number of similar functionalities.
Already a mature and versatile technology, with a broad user community (including some heavy-weighting names: Netflix, Amazon CloudSearch, Instagram), Apache Solr is an open source search platform built on Lucene, a Java library.
And no wonder why these internet giants have chosen Solr. Its indexing and searching multiple sites capabilities are completed by a full set of other powerful features, too:
NoSQL features & rich document handling
It's a (younger) distributed open source (RESTful) search engine built on top of Apache Lucene library.
Practically, it emerged as a solution to Solr's limitations in meeting those scalability requirements specific to modern cloud environments. Moreover, it's a:
... search engine, with schema-free JSON documents and HTTP web interfaces, that it “spoils” its users with.
And here's how Elasticsearch works:
It includes multiple indices that can be easily divided into shards which, furthermore, can (each) have their own “clusters” of replicas.
Each Elasticsearch node can have multiple (or just a single one) shards and the search engine is the one “in charge” with passing over operations to the right shards.
Now, if I am to highlight some of its power features:
grouping & aggregation
1. User and Developer Communities: Truly Open-Source vs Technically Open-Source
A contrast that we could define as:
“Community over code” philosophy vs Open codebase that anyone can contribute to, but that only “certified” committers can actually apply changes to.
And by “certified” I do mean Elasticsearch employees only.
So, you get the picture:
If it's a fully open-source technology that you're looking for, Apache Solr is the one. Its robust community of contributors and committers, coming from different well-known companies and its large user base make the best proof.
It provides a healthy project pipeline, everyone can contribute, so there's no one single company claiming the monopoly over its codebase.
One that would decide which changes make it to the code base and which don't.
Elasticsearch, on the other hand, is a single commercial entity-backed technology. Its code is right there, open and available to everyone on Github, and anyone can submit pull requests.
And yet: it's only Elasticsearch employees who can actually commit new code to Elastic.
2. What Specific Use Cases Do They Address?
As you can just guess it yourself:
There's a better or worse fit, in any Apache Solr vs Elasticsearch debate, depending exclusively on your use case.
So, let's see first what use cases are more appropriate for Apache Solr:
applications relying greatly on text-search functionality
complex scenarios with entire ecosystems of apps (microservices) using multiple search indexes, processing a heavy load of search-request operations
And now some (modern) use cases that call for Elasticsearch:
applications relying (besides the standard text-search functionality) on complex search-time aggregations, too
open-source log management use cases with many organizations indexing their logs in Elasticsearch in order to make them more searchable
use cases depending on high(er) query rates
data stores “supercharged” with capabilities for handling analytical type of queries (besides text searching)
… and pretty much any new project that you need to jump right onto, since Elasticsearch is much easier to get started with. You get to set up a cluster in no time.
3. Apache Solr vs Elastic Search: Which One's Best in Terms of Performance?
And a performance benchmark must be on top of your list when doing an Apache Solr vs Elasticsearch comparison, right?
Well, the truth is that, performance-wise, the two search engines are comparable. And this is mostly because they're both built on Lucene.
In short: there are specific use cases where one “scores” a better performance than the other.
Now, if you're interested in search speed, in terms of performance, you should know that:
Solr scores best when handling static data (thanks to its capability to use an uninverted reader for sorting and faceting and thanks to its catches, as well)
Elasticsearch, being “dynamic by nature”, performs better when used in... dynamic environments, such as log analysis use cases
4. Installation and Configuration
Elasticsearch is a clear winner at this test:
It's considerably easier to install, suitable even for a newbie, and lighter, too.
And yet (for there is a “yet”), this ease of deployment and use can easily turn against it/you. Particularly when the Elasticsearch cluster is not managed well.
For instance, if you need to add comments to every single configuration inside the file, then the JSON-based configuration, otherwise a surprisingly simple one, can turn into a problem.
In short, what you should keep in mind here is that:
Elastricsearch makes the best option if you're already using JSON
if not, then Apach Solr would make a better choice, thanks to its well-documented solrconfig.xml and schema.xml
5. Which One Scales Better?
And Elasticsearch wins this Apache Solr vs Elasticsearch test, too.
As already mentioned here, it has been developed precisely as an answer to some of Apache Solr well-known scalability shortcomings.
It's true, though, that Apache Solr comes with SolrCloud, yet its younger “rival”:
comes with better built-in scalability
it's designed, from the ground up, with cloud environments in mind
And so, Elasticsearch can be scaled to accommodate very large clusters considerably easier than Apach Solr. This is what makes it a far better fit for cloud and distributed environments.
And this is the END of PART 1. Stay tuned for I have 5 more key aspects “in store” for you, 5 more “criteria” to consider when running an Apache Solr vs Elasticsearch comparison!
Still a bit curious: judging by these 5 first key features only, which search engine do you think that suits your project best?