Overcoming Lucene Pitfalls in Kibana with Kibana Advisor

Even though search is the primary function of Elasticsearch, getting search right can be tough — and sometimes even confusing. To retrieve your data in the most efficient way from Elasticsearch, sometimes you’ll need to overcome some Lucene’s obstacles. While you need to familiarize yourself with Lucene Query Syntax for advanced Kibana use, Lucene’s implementation within Elasticsearch still has some challenges. That is why we are introducing our latest feature, the Logz.io Kibana Advisor.

At Logz.io we always try to make our users’ lives easier. This little tool detects most of the major Lucene pitfalls—and thus, Kibana pitfalls—that exist. This guide will cover some of the most common drawbacks in Lucene, as well as dish out tips to format search and get better results.

What is Lucene?

Lucene is a search library from Apache, and is essential to Elasticsearch, the search and analytics engine of the ELK Stack. It stores data in ways that ensure extremely fast searches. You can query Elasticsearch with the Elasticsearch REST API or via Kibana, the ELK Stack’s UI.

Over the years, we have all gotten used to popular search engines like Google, hampering our own intuition for using something like Lucene. In this piece, we cover some of the limitations that we found in Lucene and provide examples to help overcome them.

To help, this guide will take you through the ins and outs of search query pain points and set you up for a smoother searching success. We created this guide to help you overcome these challenges. Read on to learn how to overcome the search pitfalls our users report to us.

1. Uppercase Operators

Boolean operators: In the age of Google, it’s easy to fall into the habit of running case-insensitive searches. But with Lucene, you’ll need to capitalize the Boolean operators `AND`, `OR`, and `NOT`.

Lucene won’t recognize their lowercase versions — instead, it finds the actual string literals.

For example:

session AND created – results in all documents that contain both “session” and “created” in a single log

session and created – results in all documents that contain either word of – “session”, “and”, “created”

Enclosing the search expressions with quotes will find the exact literals with no boolean operators involved. In this case, “session and created” and “session AND created” will provide the same case-insensitive results of the exact phrase.

Range operator: The operator TO in range queries is also case sensitive.

For example:

response:[400 TO 599] – will provide all error values (400 – 599) of the field response while [400 to 599] is actually invalid syntax, and the Kibana search button will be disabled.

2. Unary NOT operator

When searching in a specific field, boolean operators are used after the field name as in this example (find all occurrences of the animal field with cat or fox values):

animal:(cat OR fox)

The NOT operator is different and should be used outside (and before) the field name (find all occurrences of the animal field with non dog values):

NOT animal:dog

3. Free search

In Kibana, you can either direct your search to a specific field in your index or just use free search. The way Logz.io indexes your data in Elasticsearch affects the results of your free searches.

We recommend avoiding the use of free search as much as possible. Free searches are executed against multiple fields and will be slower than specific field search. Parse your searchable data into independent fields to search more efficiently.

In free search, as opposed to Google search, the search string is tokenized before it is run. Tokenization is a process of breaking the search string into sections of strings or terms (called tokens) whenever a separator character is found.

When you run a free search in Logz.io, the following special characters and whitespace will be used as a separator ~ ` ! @ # $ % ^ & * ( ) + = { } [ ] | \ / : ; ” ‘ < > , ?.

For example, if you search the path value “/items/games,” This actually results in searching for items OR games. The slash character is omitted!

In another example, if you are looking to find exact matches for a UUID like “8fc19c4e-74db-4556-8c3b-6f1800bf1ab0,” this will actually result in searching the parts of the id separately—i.e., 8fc19c4e OR 74db OR 4556 OR 8c3b OR 6f1800bf1ab0—which could return unintended results, as seen in the screenshot below.

The solution: Make sure the intended value is parsed out to a separate field, and then perform a field search (see below for our parsing service). Note that you could also get around this by using Regular Expressions that allow using special characters within the free search.

For example, searching the UUID of our previous example in the sessionId field results in exact matches ONLY:

The same issue will happen if you search directly on the message field. The message field is a special field (an Analyzed field) and therefore will behave similarly to free searches.

Note: This pitfall WILL NOT be resolved even if you surround your search string with quotes (“). An exception to that would be a search with the dot character (.)

For example, searching the IP address 36.105.224.90 will result in many unintended results due to tokenization of the search string. However, using “36.105.224.90” will perform better.

To get help with your parsing, Logz.io provides “parsing as a service,” as part of our 24/7 support. Just click the support chat bubble and a support engineer will pick your request and handle it promptly. If you feel comfortable parsing the data yourself, Logz.io provides an easy-to-use parsing wizard, which you can find by hitting “Settings” and hovering over “Tools.”

4. Field searches

The pitfall: When using field searches, results of simple search strings require a full match. They will not return results for partial matches.

For example, searching for ‘path_name: demolive’ will not return results if the ‘path_name’ value is ‘demolive.py’.

The solution: Only ‘path_name: demolive.py’ will return the expected logs.

To find partial matches, use Wildcards or Regular Expressions.

For example:

‘path_name:demolive*’

‘path_name:/demolive.*/’

5. Regular Expressions

Regular Expressions can be tricky to handle. However, they can serve as a very useful tool to improve your search options in Kibana. Elasticsearch uses its own regex flavor that might be a bit different from what you are used to. We have a few examples to show you here:

Pitfall 1: Using a Regular Expression can slow the search significantly, especially if this is a complex expression.

The solution: Speed up your search by using the expression in a specific field and not as a free search on all fields.

Pitfall 2: Some characters are part of the Regular Expression syntax.

The solution: To search for the special characters, remember to escape them. These are (preceded by backslash \) . ? + * | { } [ ] ( ) ” \ # @ & < > ~

For example:

\@ # renders as a literal ‘@’
\\ # renders as a literal ‘\’

Pitfall 3: If you write Regular Expressions you might be used to anchors. The concept of anchors in the Lucene regular expression is not supported. You cannot use the operators ^ (beginning of line) or $ (end of line).

The solution: To match a term, the regular expression must match the entire string.

6. Save Your Searches

Writing a long and complex search may take some time. To quickly reuse your work, don’t forget to use the ‘Save’ option in Kibana. Saved searches can be loaded with a click of a button.

7. Use Quotes Wisely

When searching on a specific field, you don’t necessarily need to surround your search term with quotes to get your exact match. However, if your search string contains multiple terms separated by a space character or a different special char, it would be best to enclose them with quotes to ensure the results you expect.

For example:

Searching for city name with ‘geoip.city_name: San Francisco’ brings up results such as “San Francisco” but also “Francisco Beltrão”, a city in Brazil. This is because the search translates into ‘geoip.city_name: San AND Francisco’ (free search for ‘Francisco’).

To get the exact match, search for ‘geoip.city_name: “San Francisco”’

Another example would be ‘geoip.timezone:America/Sao_Paulo’. This results in an invalid query (The Update button is disabled)

Enclosing the search term in quotes ‘geoip.timezone:”America/Sao_Paulo”’ will make the query valid and return accurate results

Summary

Kibana Discover provides a very powerful search interface using the Lucene syntax. It provides lightning fast access to your data in order to troubleshoot the day to day incidents of your production system.

Understanding Lucene syntax is essential in order to master your search creation. However, this is not enough, as Lucene and Elasticsearch combined have some additional challenges. Using the tips in this article will help you fine tune searches and get your desired results faster and more accurately.

If you have additional questions or face some search challenges that are not addressed here, go ahead and contact our support team through the Support Chat on the Logz.io application (bottom right corner). Our support engineers are available around the clock to address and solve any problem.