The benchANT-Score
The benchANT score is the primary combined benchmarking assessment metric. The score sums up all performance KPIs into one overarching metric and enables sorting and good setups.
How is the benchANT score calculated?
How is it to be understood?
And what benefits does it offer you?
What Is the benchANT Score?
The benchANT score is a multi-dimensional-unified scoring algorithm that enables evaluation criteria of different non-comparable dimensions to be combined into a unified score.
The goal is to evaluate the "performance" of different cloud database setups from the multi-dimensional performance metrics - "throughput", "READ latency", "WRITE latency" and "cloud costs". Based on this, a ranking will be created, which will allow a pre-selection of the numerous setups for a more detailed analysis.
The higher the benchANT score, the "better" the setup is to be classified.
How Is the benchANT Score Calculated?
The calculation of the benchANT score is done in 2 steps:
- calculation of the score per performance dimension
- addition of the individual points to the benchANT score.
The first calculation step in particular is not trivial. The calculation is carried out in detail as follows:
- Based on all measurement results of the setups for a performance indicator e.g. "throughput", the median value is determined.
- 3 points: All setups with a result of ± 5%, around the median receive the score 3.
- 4 points: Setups with results that are 5-15% better than the median.
- 5 points: only setups that are 15% better than the median.
- 2 points: Setups with results that are 5-15% better than the median.
- 1 point is given for setups that are 15%-50% bad
- 0 points: for outliers below
Note: It should be noted that, for example, "more is better" for throughput, but "less is better" for costs and latencies. A high score therefore always represents a better performing setup.
The chart helps to understand the point calculation per dimension. The median is defined by the setup 5 at 8,000 points. All values in the range of 7,601-8,400 also receive 3 points. The ranges for "1 point", "2 points" and "4 points" have a width of 800 ops/s, this corresponds to 10% of the median value.
Special characteristics
- If only one setup is measured, the benchANT score = 12. This is because a score of 3 results for each of the 4 performance indicators, since all values define the median and thus always lie in the 3 point range.
- Each additional measurement result of a setup can shift the median value and the point boundaries. As a result, the benchANT score can also change due to further measurements. The scoring is always relative and depends dynamically on the sample.
Benefits & Limitations
- The benchANT score makes it possible to bundle the different performance dimensions of a setup into one score for a high-level overview. It provides a good first overview of the measurement results.
- The benchANT score enables dynamic scoring depending on the measured values. An absolute scoring based on fixed limits does not make sense due to the large variance of all possible existing setups.
- The benchANT score is really only useful for a high-level overview, because on the one hand it equates the different performance dimensions and also evaluates them equally. On the other hand, the score ranges are not fine-grained enough to derive further insights and decisions from them. This is also not the aim of this score.
Background: The Z-Score
The concept of the benchANT scoring model is based on the concept of the well-known Z-scoring algorithm. The Z-score is based on an allocation of points depending on the standard deviation of a distribution. Thus, a value that is 3 times standard deviations away from the mean receives a Z-score = 3, and one with 1.6 times deviation receives a Z-score of 1.6.
The Z-score is a good scoring method especially for distributions that are the same or similar to the normal distribution.
Database benchmarking measurements with a sample of 100 setups rarely resemble a normal distribution.
Database benchmarking procedures are optimisation procedures that lead to an accumulation of measured values at the optimum. In many cases, the standard deviation provides too large a range and too "similar" results.
Furthermore, an expansion of the sample (=more measurements) results not only in a new mean, but always also in a new standard deviation, which changes the scoring.
These reasons led to the development of the benchANT score, which provides better results for a high-level score.