Created
January 27, 2021 22:25
-
-
Save williballenthin/2b39c0f917eba7c8d4b9a90def49ddb6 to your computer and use it in GitHub Desktop.
sort the given jsonl file by the given key, writing the output to STDOUT.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
sort the given jsonl document (distinct json documents separated by newline) | |
by the given key, writing the output to STDOUT. | |
example: | |
python sort-jsonl-by-key.py log.jsonl "timestamp" | |
this does require reading the entire document into memory, first. | |
a future revision could maybe use a mmap to avoid keeping things in memory. | |
""" | |
import re | |
import sys | |
import json | |
with open(sys.argv[1], "rb") as f: | |
buf = f.read().decode("utf-8") | |
key = sys.argv[2] | |
lines = [] | |
for match in re.finditer(r"^(.*)$", buf, re.M): | |
if not match: | |
continue | |
line = buf[match.start():match.end()] | |
if not line: | |
continue | |
linedoc = json.loads(line) | |
linekey = linedoc[key] | |
lines.append((linekey, match.start(), match.end())) | |
lines.sort() | |
for _, start, end in lines: | |
print(buf[start:end]) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment