Effective Approaches to Do No SQL Query Optimization

It is possible to execute database queries in various ways. However, all different paths lead to the same desired query result. The query optimizer will evaluate all such possibilities and then select the most efficient and easy path to execute the same. A query’s efficiency is measured in throughput and latency, which is primarily based on the workload. The CPU, disk usage, and cost of Memory are added to the overall cost of a plan in a cost-based optimization approach.

Not, majority of the NoSQL databases also have SQL-like support for the query language. So, the use of a functional optimizer is necessary. When you do not have any good optimizer, the developers also have to live with the feature restrictions, and also, the DBAs need to suffer some performance issues. The use of a database optimizer will help find a reliable solution for this issue.

Database optimizer

An ideal query optimizer will choose the optimal index and access the ideal paths to execute the given query. The SQL optimizers will help decide the following aspects before creating an execution tree at higher levels.

  • Rewrite of query based on the heuristics or cost or both.
  • Select the index.
  • Selection of the most optimal index or indexes for each of given table (this is also specified as keyspaces in case of Couchbase N1QL and collection on MongoDB)

Based on the index selected, you may choose predicates to push down, see if the query got covered or not, and decide on the sort of pagination strategy executed.

Join reordering

You may consider the specific case of MongoDB restriction. An ideal collection may have at most one text index.  It will document other restrictions too along with it. Let us further discuss why you should care about such a restriction.

The general NoSQL DBs like MongoDB will encourage the users to denormalize or aggregate the schema to create a single and large document representing any given object as a partner, a user, or a customer. So, the majority of the operations may happen on a single JSON document. In this case, a single customer document will be containing customer orders, customer information, shipping and billing information about that specific customer, etc. having a single search index further means that you have to create a single and large index by combining all fields you may want to search. There is a problem here in this case. While you search for the customer address, you may not want to see the shipping address instead. Also, when you try to search for the order ID of shipping, you may not want to see the item return order ID.

Usually, the search indexes are built with an inverted-tree structure; however, MongoDB has further chosen to build the same with a B-Tree index. But this is less likely to be an issue in a real-time scenario. Text indexes may generate a linear array of tokens and further index them. So, it is in the form of an array index. The size of this index may grow exponentially while you use the array indexes. An index’s size will increase linearly to reach several word indexes and not the number of documents. It may also cause some issues. For database-related queries and support, you may check out the offerings of RemoteDBA.com.

You need to consider further whether it is a problem with the database optimizer. While you have many indexes, the optimizer may choose the most appropriate index for the given query. If you tend to restrict it to only one index, then the choice is much easy. However, this can be a symptom of a larger problem too in MongoDB optimizer, with which you may make some ad-hoc decisions to end in some restrictions finally.

The query language of MongoDB is much simplistic even if it tries to mimic the SQL syntax. Let us see a few instances as to how MongoDB optimizer handles the same.

  • Query rewrite: It is unsupported. The queries of MongoDB are very simplistic in save(), find(), remove(), and update() modes. Usually, the aggregation pipeline is verbose and procedural. While it is possible to rewrite theoretically, there is nothing in the documentation or the plan to indicate any such query rewrites.
  • Index selection: This is supported. The optimizer of MongoDB will try to pick up the most suitable index for each part of the query, and the index can be effectively used.
  • Join reordering: Not supported. MongoDB’s $lookup function comes as a part of its convoluted aggregation framework in which the query is primarily written as a Unix pipeline, which is a procedural approach.
  • Join type selection: This is also unsupported as there is only one type of join available in MongoDB. MongoDB also has constrained outer join support through the $lookup operator. The arrays area is also unsupported in the join condition. If you use the $lookup function, then the optimizer may automatically use the default join algorithm. There is also no mention of what type of join is done.

Overall, the query optimizer of MongoDB query will make the index selection before creating an execution plan, but this tends to select the indexes in a very odd fashion, which is neither by rule nor by the related statistics.

To manage all related performance issues and uncertainties, MongoDB offers several unique APIs for managing the query plan cache, flush the specific cache entries, and also to flush the whole plan cache. Instead of developing such applications, developers and MongoDB DBAs may have to manage a plan cache. The DBA’s and developers may not have to manage such plan cache in any other enterprise databases.

Coming back to the actual question of “why users cannot create multiple text indexes on MongoDB?” we can see that building multiple indexes should not be an issue if they are allowed simply. The actual problem here is that while you try to include a text predicate in the query, MongoDB optimizer may become unable to make the right index choice. It will not be able to validate the text indexes against the given predicates. This is because MongoDB optimizer may not follow a natural, logical framework, so this restriction exists.