This method allows to perform other methods as batch operations.
https://yourdomain.com/api/batch.submit/{method}
Its body should be an array of valid inputs to the respective method.
For example, /batch.submit/user.update
would accept the following
request:
[
{
"user_id": "wLQPaRVUME7f",
"first_name": "John"
},
{
"user_id": "qwIHFEihqw15",
"last_name": "Doe",
"language": "en"
}
]
A successful response will return the id for the job running the batch operation. It can be used to query the status of the operation afterwards.
{
"status": "success",
"data": {
"job_id": "agDjGj24Gdjk"
}
}
This method allows to query the status of a previously submitted batch operation.
https://yourdomain.com/api/batch.status
Argument | Required | Description |
---|---|---|
job_id |
Required | The id obtained after submitting the respective batch operation |
{
"job_id": "agDjGj24Gdjk"
}
There are 4 possible states for a job, each with associated data:
The job has not yet started to be processed.
{
"status": "success",
"data": {
"state": "pending",
}
}
The job is currently being processed.
{
"status": "success",
"data": {
"state": "running",
"started_at": "2019-11-13T10:00:41.188Z",
"succeeded": 302,
"failed": 2,
"estimated_end": "2019-11-13T12:23:41.188Z"
}
}
TBD
The job has finished and the result for each individual operation will be included in the same order as received. For more information, check the expected response for the respective method being processed in batch.
For example, the finished state for a batch of user.create
operations would be:
{
"status": "success",
"data": {
"state": "finished",
"finished_at": "2019-11-13T10:00:41.188Z",
"results": [
{
"status": "success",
"data": { "user_id": "wLQPaRVUME7f" }
},
{
"status": "error",
"error": "login_id_exists"
},
{
"status": "success",
"data": { "user_id": "FqhqHF24Ghks" }
}
],
}
}
Here I'll add some points that were discussed in Slack:
- I added an
estimated_end
to therunning
state so they don't need to be constantly polling the status, it could be calculated by taking the average time per operation so far and extrapolating for the rest of the batch. Don't know if this would be necessary or desirable. - There's still no code in api2 for adding a job to the queue, it'll need to be implemented.
- Decoding the body should be easy to parameterize, such that we can reuse the same method with uploading a CSV file instead of a JSON.
- The json should be parsed and validated before returning the job id to prevent a job filled with only errors.
- Not necessary at this stage but it would be nice to refactor the UI handlers in a way that we can connect them easily.
- The rate limiter for batch operations needs different params. We can refactor the existing one to allow customization or add it directly in the network layer. Would the limit be applied for all batch operations or per method? What would be the params?
- We'll need to increase the size limit for requests (currently we use the default 100KB), maybe only for the batch methods. We can take a look at previous manual jobs to define an initial limit, and log the size of batch requests so we can track and adjust if necessary.
- Maybe it would be interessing to add a
/batch.abort
method? Although it can be difficult to implement. - We need a way to keep track of jobs, so we'll need to log updates while they're running (don't know if per entry).
- If a job gets stuck, it may need to be aborted (either automatically by a timeout, through the api if we implement it or manually). That makes it necessary to use transactions at least at the command level such that, in case it gets aborted, it doesn't leave the db in an inconsistent state. Also it's helpful to have enough information in the update logs so that we know the state of the job prior to the cancellation (which entries were processed, which failed and which succeeded).
- In which situations would a job be put into the
canceled
state? When aborted while running? When aborted while pending? - What should the logs include? The whole request (filtering out sensitive data such as passwords) and the whole response (basically using the logs that already exist in the current route handlers, mapped over each entry) or some kind of custom summary? Is there a limit in papertrail that we need to take into account? What kind of information would be needed to allow compiling stats for the orgs?
- Since the amount of data to be stored in the response is huge, we'll need to set some kind of expiration date so it gets cleaned from redis. Should we include this date (or an estimate) somehow in the statuses? What would be a sensible duration?
- Is it enough to keep the results in the same order or do we need a more direct way to correspond each result to its entry? By the way, are the operations going to be performed sequentially? If not, we'll need another representation for sure.
- The status updates will need to be persisted in redis so it's accessible to
the
/batch.status
query. - A possible representation for the statuses is:
type jobStatus('event, 'error) =
| Pending(submittedTime, submitter, id, command)
| Running(submittedTime, submitter, id, command, startedTime, percentage, list(result('event, 'error)))
| Cancelled(submittedTime, submitter, id, command, cancelledTime)
| Finished(submittedTime, submitter, id, command, finishedTime, list(result('event, 'error)))