FES consumes events from the fes-async-in
and the sts-delivery-tracking-events
kafka event streams. The latter has never posed a problem, so this document is geared to talk about the former. Regarding fes-async-in
, our order management systems send and receive kafka messages through it to communicate with our fulfillment management system. Prior to the solution, we in FMS would occasionally receive a kafka message that we couldn’t consume due to a bug, bad data, etc. When this happened, our kafka consumer tried this message again and again, literally forever. This “just try again” policy is great for transient errors, such as database locking. However, it's a problem if this error isn’t transient, as all other messages behind it are now blocked and will never be processed. This is what we call a "stuck message".
Inspired by this Uber blog post, we wrote [special "punt" code](https://github.com/blueapron/fulfillment-engine/pull