Allow for different time based failure thresholds

Something we have seen lately is we have a system that alerts in the middle of the night due to 2 failures in a row. In our overnight hours, we are a bit more fault tolerant because we scale our services down a bit and know that there is a period where those services will scale back up if demand exists. During those periods, 2 failures is going to happen with scaling events taking a few minutes and trigger our incident.io alert flow. By the time the escalation is sent and acknowledged, this has usually recovered because our systems have scaled up.

I know one option I could set is to have it be 4 failures in a row instead of 2 all day long. But id really prefer to have a rule that I was able to say between say 8 am and 8 pm, 2 failures in a row trigger it and from 8 pm to 8 am, 4 failures are what triggers a failure.

I’m going to look into a potential of adding an alert counter to incident.io to help with this but feel like this should be an alert configuration rather than a workaround.

Please authenticate to join the conversation.

Upvoters
Status

In Review

Board

πŸ’‘ Feature Request

Tags

Alerting

Date

11 days ago

Author

Rick Clymer

Subscribe to post

Get notified by email when there are changes.