And I'm back, as promised, with 5 more key differences meant to help you solve your Apache Solr vs Elasticsearch dilemma.
To help you properly evaluate the 2 open source search engines and, therefore, to identify the perfect fit for your own use case and your project's particular needs.
6. Node Discovery
Another aspect that clearly differentiates the 2 search engines is the way(s) they handle node discovery.That is, whenever a new node joins the cluster or when there's something wrong with one of them, immediate measures, following certain criteria, need to be taken.
The 2 technologies handle this node-discovery challenge differently:
- Apache Solr uses Apache Zookeeper — already a “veteran”, with plenty of projects in its “portfolio” — requiring external Zookeper instances (minimum 3 for a fault-tolerant SolrCloud cluster).
- Elasticsearch relies on Zen for this, requiring 3 dedicated master nodes to properly carry out its discovery “mission”
7. Apache Solr vs Elasticsearch: Machine Learning
Machine learning has a way too powerful influence on the technological landscape these days not to take it into consideration in our Apache Solr vs Elasticsearch comparison here.
So, how do these 2 open source search engines support and leverage machine learning algorithms?
- Apache Solr, for instance, comes with a built-in dedicated contrib module, on top of streaming aggregations framework; this makes it easy for you to use machine-learning ranking models right on top of Solr
- Elasticsearch comes with its own X-Pack commercial plugin, along with the plugin for Kibana (supporting machine learning algorithms) geared at detecting anomalies and outlines in the time series data
8. Full-Text Search Features
In any Apache Solr vs Elasticsearch comparison, the first one's richness in full-text search related features is just... striking!
Its codebase's simply “overcrowded” with text-focused features, such as:
- the functionality to correct user spelling mistakes
- a heavy load of request parsers
- configurable, extensive highlight support
- a rich collection of request parsers
Even so, Elasticsearch “strikes back” with its own dedicated suggesters API. And what this feature does precisely is hiding implementation details from user sight, so that we can add our suggestions far more easily.
And, we can't leave out its highlighting functionality (both search engines rely on Lucene for this), which is less configurable than in Apache Solr.
9. Indexing & Searching: Text Searching vs Filtering & Grouping
As already mentioned in this post, any Apache Solr vs Elasticsearch debate is a:
Text-search oriented approach vs Filtering and grouping analytical queries type of contrast.
Therefore, the 2 technologies are built, from the ground up, so that they approach different, specific use cases:
- Solr is geared at text search
- Elasticsearch is always a far better fit for those apps where analytical type of queries, complex search-time aggregations need to be handled
Moreover, each one comes with its own “toolbox” of tokenizers and analyzers for tackling text, for breaking it down into several terms/tokens to be indexed.
Speaking of which (indexing), I should also point out that the two search engine “giants” handle it differently:
- Apache Solr has the single-shard join index “rule”; one that gets replicated across all nodes (to search inter-document relationships)
- Elasticsearch seems to be playing its “efficiency card” better, since it enables you to retrieve such documents using top_children and has_children queries
10. Shard Placement: Static by Nature vs Dynamic By Definition
Shard replacement: the last test that our two contestants here need to pass, so you can have your final answer to your “Apache Solr vs Elasticsearch” dilemma.
In this respect, Apache Solr is static, at least far more static than Elasticsearch. It calls for manual work for migrating shards whenever a Solr node joins or leaves the cluster.
Nothing impossible, simply less convenient and slightly more cumbersome for you:
- you'll need to create a replica
- wait till it synchronizes the data
- remove the “outdated” node
Luckily for you, Elasticsearch is not just “more”, but “highly” dynamic and, therefore, far more independent.
It's capable to move around shards and indices, while you're being granted total control over shard placement:
- by using awareness tags, you get to control where those shards should/shouldn't be placed
- by using an API call you can guide Elasticsearch into moving shards around on demand
The END! Now if you come to think about it, my 10-point comparative overview here could be summed up to 2 key ideas worth remembering:
- go for ApacheSolr if it's a standard text-search focused app that you're planning to build; if you already have hands-on experience working with it and you're particularly drawn to the open-source philosophy
- go for Elasticsearch if it's a modern, real-time search application that you have in mind; one perfectly “equipped” to handle analytical queries. If your scenario calls for a distributed/cloud environment (since Elastic is built with out-of-the-ordinary scalability in mind)