Skip to content

Instantly share code, notes, and snippets.

@mnichols
Last active December 5, 2017 06:24
Show Gist options
  • Save mnichols/219b0690bd7c8921332100ecd1762654 to your computer and use it in GitHub Desktop.
Save mnichols/219b0690bd7c8921332100ecd1762654 to your computer and use it in GitHub Desktop.
incident cause

Terms to lookup: -"Handler failed" this is the composed events handler error prefix

  • "Emailer" this is the notifications ''' {"level":"error","timestamp":"2017-12-04T20:05:40.109Z","logger":"conversations-service.notifications.email_unreads","caller":"notifications/email_unreads.go:109","msg":"Email batch failed","owner":"red","service":"conversations-service","error":"0 Emails failed to be sent","stacktrace":"github.com/InVisionApp/conversations-service/vendor/go.uber.org/zap.Stack\n\t/go/src/github.com/InVisionApp/conversations-service/vendor/go.uber.org/zap/field.go:191\ngithub.com/InVisionApp/conversations-service/vendor/go.uber.org/zap.(*Logger).check\n\t/go/src/github.com/InVisionApp/conversations-service/vendor/go.uber.org/zap/logger.go:301\ngithub.com/InVisionApp/conversations-service/vendor/go.uber.org/zap.(*Logger).Error\n\t/go/src/github.com/InVisionApp/conversations-service/vendor/go.uber.org/zap/logger.go:202\ngithub.com/InVisionApp/conversations-service/notifications.(*unreadsEmailer).handleCommentMade\n\t/go/src/github.com/InVisionApp/conversations-service/notifications/email_unreads.go:109\ngithub.com/InVisionApp/conversations-service/notifications.(*unreadsEmailer).HandleMessages\n\t/go/src/github.com/InVisionApp/conversations-service/notifications/email_unreads.go:77\ngithub.com/InVisionApp/conversations-service/queues.(*EventsWorker).handleMessage\n\t/go/src/github.com/InVisionApp/conversations-service/queues/events.go:96\ngithub.com/InVisionApp/conversations-service/queues.(*EventsWorker).Run\n\t/go/src/github.com/InVisionApp/conversations-service/queues/events.go:140\ngithub.com/InVisionApp/conversations-service/queues.(*PushableRunner).Run\n\t/go/src/github.com/InVisionApp/conversations-service/queues/runner.go:57"} msg:Email batch failed '''

then

{"level":"error","timestamp":"2017-12-04T21:42:48.327Z","logger":"conversations-service.queues.event_handlers","caller":"queues/event_handlers.go:42","msg":"Handler failed","owner":"red","service":"conversations-service","user_id":"26","team_id":"0","correlation_id":"cjasnm93100133i7fw90cwmka","event":"threadResolved","event_id":"cjaso371s00020779gqapig1t","event_provider_id":"cjaa1dxda00083i7f6lmxq5ub","event_payload":"{Event:           \"threadResolved\",\n EventProviderID: \"cjaa1dxda00083i7f6lmxq5ub\",\n CorrelationID:   \"cjasnm93100133i7fw90cwmka\",\n Revision:        112,\n Payload:         {ContextID:  \"cjaa1dxda00083i7f6lmxq5ub\",\n                   ThreadID:   \"cjao8am5y000f3i7fuql2cq07\",\n                   ResolverID: \"26\"},\n EventID:         \"cjaso371s00020779gqapig1t\",\n Timestamp:       0001-01-01 00:00:00 +0000 UTC,\n UserID:          \"26\",\n TeamID:          \"0\",\n Dispatched:      0}","error":"ThreadNotFound on context cjaa1dxda00083i7f6lmxq5ub with id cjao8am5y000f3i7fuql2cq07\n%!v(MISSING)","stacktrace":"github.com/InVisionApp/conversations-service/vendor/go.uber.org/zap.Stack\n\t/go/src/github.com/InVisionApp/conversations-service/vendor/go.uber.org/zap/field.go:191\ngithub.com/InVisionApp/conversations-service/vendor/go.uber.org/zap.(*Logger).check\n\t/go/src/github.com/InVisionApp/conversations-service/vendor/go.uber.org/zap/logger.go:301\ngithub.com/InVisionApp/conversations-service/vendor/go.uber.org/zap.(*Logger).Error\n\t/go/src/github.com/InVisionApp/conversations-service/vendor/go.uber.org/zap/logger.go:202\ngithub.com/InVisionApp/conversations-service/queues.(*instrumentedHandler).Handle\n\t/go/src/github.com/InVisionApp/conversations-service/queues/event_handlers.go:42\ngithub.com/InVisionApp/conversations-service/queues.(*instrumentedHandler).Handle-fm\n\t/go/src/github.com/InVisionApp/conversations-service/queues/event_handlers.go:84\ngithub.com/InVisionApp/conversations-service/messaging.(*Handlers).Handle\n\t/go/src/github.com/InVisionApp/conversations-service/messaging/handlers.go:31\ngithub.com/InVisionApp/conversations-service/queues.(*EventHandlers).HandleAll\n\t/go/src/github.com/InVisionApp/conversations-service/queues/event_handlers.go:92\ngithub.com/InVisionApp/conversations-service/queues.(*EventsWorker).handleMessage\n\t/go/src/github.com/InVisionApp/conversations-service/queues/events.go:81\ngithub.com/InVisionApp/conversations-service/queues.(*EventsWorker).Run\n\t/go/src/github.com/InVisionApp/conversations-service/queues/events.go:142\ngithub.com/InVisionApp/conversations-service/queues.(*PushableRunner).Run\n\t/go/src/github.com/InVisionApp/conversations-service/queues/runner.go:57"}

Note that the email_unreads error failed first. Looking at DD graphs something happened at 1:05 - 2:42 today that set in motion failures/discrepancies in BOTH commands and events. I didnt see any CPU spikes during that time on our API dashboard.

This logs show our service start 11secs before the handler for threadResolved failed.

{"level":"info","timestamp":"2017-12-04T21:42:37.732Z","logger":"conversations-service.queues.redrive_policy","caller":"queues/redrive_policy.go:26","msg":"Constructed RedrivePolicy","owner":"red","service":"conversations-service","queue_url":"https://sqs.us-east-2.amazonaws.com/416817738788/k8s-use1-prod-1-early-v6-cluster-conversations.fifo","dead_letter_arn":"arn:aws:sqs:us-east-2:416817738788:k8s-use1-prod-1-early-v6-cluster-conversations-deadletter.fifo","maxReceiveCount":"10"}

ref: https://invisionmain.loggly.com/search#terms=%22conversations.fifo%22&from=2017-12-04T20:05:40.109Z&until=2017-12-04T21:42:48.327Z&source_group=&filter=tag;use1-prod-1-cluster&filter=json.kubernetes.namespace_name;early-v7 ref: https://invisionmain.loggly.com/search#terms=%22Commands%20service%20enabled%22&from=2017-12-04T20:05:40.109Z&until=2017-12-04T21:42:48.327Z&source_group=&filter=tag;use1-prod-1-cluster&filter=json.kubernetes.namespace_name;early-v7 Shows our service restarting at the same time as the threadResolved event was failing. So something tore us down.

There are discrepancies in messages for commands queue at that time but there are zero errors from the commands queue logs.

I DID find this though: https://invisionmain.loggly.com/search#terms=%22index%20out%20of%20range%22&from=2017-12-04T20:05:40.109Z&until=2017-12-04T21:42:48.327Z&source_group=&filter=tag;use1-prod-1-cluster&filter=json.kubernetes.namespace_name;early-v7&filter=json.kubernetes.pod_name;conversations-service-deployment-4187827765-xwmhd

This came as a result of a DeleteComment command:

github.com/InVisionApp/conversations-service/queues.(*instrumentedHandlers).HandleCommands(0xc42071b100, 0xc42000e018, 0x1, 0x1, 0xc42071b100, 0x0)
ip-10-10-12-158.ec2.internal
/go/src/github.com/InVisionApp/conversations-service/queues/instrumented_handlers.go:56 +0xf9
ip-10-10-12-158.ec2.internal
github.com/InVisionApp/conversations-service/queues.(*instrumentedHandlers).HandleCommand(0xc42071b100, 0xc4203cb3e0, 0x0, 0x0)
ip-10-10-12-158.ec2.internal
/go/src/github.com/InVisionApp/conversations-service/messaging/handlers.go:31 +0x62
ip-10-10-12-158.ec2.internal
github.com/InVisionApp/conversations-service/messaging.(*Handlers).Handle(0xc4203cb2e0, 0xa36c80, 0xc420709140, 0x0, 0x0)
ip-10-10-12-158.ec2.internal
/go/src/github.com/InVisionApp/conversations-service/domain/handlers.go:41 +0x3e
ip-10-10-12-158.ec2.internal
github.com/InVisionApp/conversations-service/domain.(*Handlers).HandleCommand-fm(0xa36c80, 0xc420709140, 0x10, 0x9a505e)
ip-10-10-12-158.ec2.internal
/go/src/github.com/InVisionApp/conversations-service/domain/handlers.go:55 +0x36d
ip-10-10-12-158.ec2.internal
github.com/InVisionApp/conversations-service/domain.(*Handlers).HandleCommand(0xc4203cb300, 0xa36c80, 0xc420709140, 0x0, 0x0)
ip-10-10-12-158.ec2.internal
/go/src/github.com/InVisionApp/conversations-service/domain/handlers.go:130 +0xb7
ip-10-10-12-158.ec2.internal
github.com/InVisionApp/conversations-service/domain.(*Handlers).handleDeleteComment(0xc4203cb300, 0xc420709140, 0xeb3ec0, 0xc420709100, 0xc420064680, 0xc4201f6980)
ip-10-10-12-158.ec2.internal
/go/src/github.com/InVisionApp/conversations-service/domain/contexts/context.go:38 +0xbd5
ip-10-10-12-158.ec2.internal
github.com/InVisionApp/conversations-service/domain/contexts.(*Context).DeleteComment(0xc4204310b0, 0xc4201a7d00, 0x19, 0xc4201897a0, 0x2, 0xeb3ec0, 0xc420709100, 0x1, 0xc420430fc0)
ip-10-10-12-158.ec2.internal
goroutine 187 [running]:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment