Severity | Critical / High / Medium / Low |
---|---|
Root cause | Describe the root cause of the issue, if discovered. |
Incident duration | 1.5 hours |
Time to detect the incident | MMMM DDD @ hh:mm a |
Time to mitigate the incident | MMMM DDD @ hh:mm a |
Keep it brief, 1-2 paragraphs.
What happened?
Keep it brief and direct, just a few sentences.
Who was impacted?
Keep it brief and direct, just a few sentences.
Why did it happen?
Keep it brief and direct, just a few sentences.
Product(s) affected:
List the properties and/or services affected.
User impact:
Keep it brief and direct, just a few sentences.
Detection:
Keep it brief and direct, just a few sentences.
Resolution:
Keep it brief and direct, just a few sentences.
Duration:
Describe duration, for example:
Approximately 1.5 hours (from 9:00 to 10:00 am on June 5th UTC).
Timeline (in UTC): (Example)
- June 5th, 9:00 am: Updown monitor reported outage
- June 5th, 9:10 am: HC notified client about the incident
- June 5th, 9:40 am: HC completed review of the issue and proceeded and started on mitigation measures.
- June 5th, 9:50 am: Client notified HC of additional issues.
- June 5th, 10:05 am: HC deployed fix.
- June 5th, 10:05 am: HC confirmed the website was back up and running.
- June 5th, 10:15 am: Client confirmed the website was back up and running.
- June 5th, 10:30 am: HC completed comprehensive testing of the website.
Lessons learned:
List of learned lessons and areas of improvement.
- Lesson learned 1
- Lesson learned 2
- Lesson learned 3
What went wrong:
List of failures that lead to the issue.
- Failure 1
- Failure 2
- Failure 3
What went well:
List things that went well during the incident (Mitigation plans, response time, etc).
- Success 1
- Success 2
- Success 3
Where we were lucky:
Identify and list how the incident could have led to a bigger problem.
- Luck 1
- Luck 2
- Luck 3
Create a list of action items that we can take to avoid a similar incident from happening in the future.
- Action item 1
- Action item 2
- Action item 3