Performance Considerations

AnalytiX-BI Server is a lightweight, read-optimized, in-memory only database. This means that data ingested in online data models is entirely loaded inside the AnalytiX-BI Server process and queries are executed against the AnalytiX-BI memory storage.

To keep memory consumption at the minimal possible levels, AnalytiX-BI Server is built as a columnar database: table data is not stored in a row layout, but instead columns are stored individually. Since columns are essentially long lists of values of the same data type, they can be easily and efficiently compressed using lightweight compression schemes. Columnar layout also allows for increased efficiency in query execution, as columns can be scanned individually without having to process the entire row.

Memory Usage

AnalytiX-BI Server employs several compression techniques to reduce the memory footprint of a column, but as a rule of thumb the lower the column cardinality (amount of unique values) the better the compression will be. When creating data flows for ingesting data into models, consider dropping any column that will not be used in runtime, especially high cardinality columns (i.e.: identifiers). For numeric columns consider dropping non-significant decimal digits by configuring ta flow to round the value using a Transform Column step and the rounto function, as this might reduce the column’s cardinality.

Sorted columns offer even more opportunities for compression, as they usually contain long runs of the same value and lend themselves well to be compressed using Run-Length Encoding. Run-Length Encoding represents a run of a value with just a triple: the value itself, the number of repetitions and the position of the first occurrence, which saves memory the longer the runs are.

For example, let’s assume that we have imported 1 year worth of historical data logged at 15 minutes interval for 1,000 tags – this translates to a table with just a little over 35 million rows. If all the values have with good quality, the Quality column in the table will contain 35 million zeros: this can be stored using Run-Length Encoding with just three integers [0; 35,040,000; 0] – a compression ratio of over 99%.

For this reason, consider pre-sorting data on applicable attributes before ingesting it into a data model.

Query Execution

As columns are stored individually by AnalytiX-BI Server, tables with dozens (or even hundreds) of columns do not impact operations like column scans. However, since the query results must be ultimately stitched together in row format, queries that request lots of columns will require more time to reconstruct the rows. For this reason, when writing a query, it is recommended to only select the necessary attributes.

AnalytiX-BI Server also uses dictionary encoding for column compression, which allows the query engine to operate on encoded values rather than original values when performing comparisons for WHERE predicate evaluation. Some comparison operators cannot be evaluated against the encoded values however, notably the LIKE operator for strings. For this reason, it is recommended to use the equals (=) or not equals (<>) operators for string comparison whenever possible.

Lastly, AnalytiX-BI Server employs a temporary cache for query results. Results of a query are stored in the cache using the query itself as a key, which means that if two different clients request the same query the query will only be executed once, cached, and the clients will receive the result from the cache. This is especially important when subscribing to AnalytiX-BI queries as process points (such as from a GraphWorX64 display).

For example, let’s assume that we want to display the average product unit price per category with in several process points in GraphWorX64. We could use the following set of point names to get these results, specifying the CategoryID in each query:

bi:Models:Northwind(SELECT AVG(Products.UnitPrice), Products.CategoryID WHERE Products.CategoryID = 1)[UnitPrice][0]

bi:Models:Northwind(SELECT Categories.CategoryName WHERE Categories.CategoryID = 1)[CategoryName][0]

bi:Models:Northwind(SELECT AVG(Products.UnitPrice), Products.CategoryID WHERE Products.CategoryID = 2)[UnitPrice][0]

bi:Models:Northwind(SELECT Categories.CategoryName WHERE Categories.CategoryID = 2)[CategoryName][0]

…

bi:Models:Northwind(SELECT AVG(Products.UnitPrice), Products.CategoryID WHERE Products.CategoryID = 8)[UnitPrice][0]

bi:Models:Northwind(SELECT Categories.CategoryName WHERE Categories.CategoryID = 8)[CategoryName][0]

The result is that we are submitting 16 different queries to the AnalytiX-BI Server to get the value for our process points. Each query returns a single column and single row. This is not necessarily an issue. However, we can improve performance by taking advantage of the AnalytiX-BI Serve cache. The results shown above can all be produced in a single dataset by the following query:

SELECT AVG(Products.UnitPrice), Categories.CategoryName, Categories.CategoryID ORDER BY Categories.CategoryID

This will return all of the data we want in a single dataset of two columns and eight rows. We can then display individual elements in from the dataset with our process points in the following way:

bi:Models:Northwind(SELECT AVG(Products.UnitPrice), Categories.CategoryName, Categories.CategoryID ORDER BY Categories.CategoryID)[UnitPrice][0]

bi:Models:Northwind(SELECT AVG(Products.UnitPrice), Categories.CategoryName, Categories.CategoryID ORDER BY Categories.CategoryID)[CategoryName][0]

…

bi:Models:Northwind(SELECT AVG(Products.UnitPrice), Categories.CategoryName, Categories.CategoryID ORDER BY Categories.CategoryID)[UnitPrice][7]

bi:Models:Northwind(SELECT AVG(Products.UnitPrice), Categories.CategoryName, Categories.CategoryID ORDER BY Categories.CategoryID)[CategoryName][7]

Now all 16 points are executing the exact same query, and each point is specifying a different row or column from the query results. This means that AnalytiX-BI Server will only have to execute the query once, cache it, and then serve the cached results to all sixteen process points.

The lifespan of cached queries is determined by the ItemCacheSlidingExpirationMin setting. See Point Manager Parameters for more information about this setting. If a table is refreshed, all queries referencing that table will be dropped from the cache.