top of page
Search

How to interpret percentiles in google monitoring dashboards?

  • Writer: Andy Brave
    Andy Brave
  • Apr 10, 2023
  • 2 min read

Are you in charge of designing a dashboard for measuring performance and don't know how to use the metrics provided by the platform? Don't worry; I've got you covered.

Here are an example of how you could use them in your favor


Imagine that you are building a dashboard for measuring cloud function performance. You want to know the Execution time by percentiles: The time taken to execute a function from start to finish. This can be broken down into average, minimum, maximum, and percentiles (e.g., 95th percentile) to understand your function's performance better.


Interpreting execution time by percentiles can provide valuable insights into your function's performance. Percentiles help you understand the distribution of execution times and identify any outliers or bottlenecks. Here's an example of how you might interpret these metrics for a hypothetical function:


  1. Minimum execution time: 50 ms

  2. Maximum execution time: 2000 ms

  3. Average execution time: 300 ms

  4. 50th percentile (median) execution time: 250 ms

  5. 95th percentile execution time: 1500 ms

  6. 99th percentile execution time: 1800 ms


In this example, the minimum execution time is 50 ms, indicating that the function can sometimes execute very quickly. The maximum execution time is 2000 ms, significantly higher than the minimum, suggesting that there might be occasional performance issues or slow external dependencies.


The average execution time of 300 ms gives a general sense of the function's performance but may not provide the whole picture, especially when extreme values are in the distribution.


The median (50th percentile) execution time of 250 ms shows that half of the invocations took less than 250 ms, and the other half took longer. Since the median is lower than the average, there might be some outliers (e.g., very high execution times) that skew the average.


The 95th percentile execution time of 1500 ms indicates that 95% of the invocations took 1500 ms or less to execute. This means that 5% of the requests took longer than 1500 ms, which can be considered slow outliers. Identifying and investigating these slow cases can help optimize your function's performance.


Identifying and investigating these slow cases (95th percentile) can help optimize your function's performance.

The 99th percentile execution time of 1800 ms shows that only 1% of the invocations took longer than 1800 ms. These extreme cases might indicate rare performance issues or edge cases that must be addressed.



Conclusion


By looking at the execution time broken down by percentiles, you can better understand the distribution of your function's performance and identify areas for optimization. Monitoring these metrics over time is essential to detect trends, regressions, or improvements in your function's implementation.


Happy monitoring! May the bits be ever in your favor.




 
 
 

Comments


Post: Blog2_Post

©2022 by andybravo.

bottom of page