Skip to content

Crash in Python/assemble.c:301: write_location_info_entry: Assertion `column >= -1' failed. #130775

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
yijan4845 opened this issue Mar 3, 2025 · 11 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@yijan4845
Copy link

yijan4845 commented Mar 3, 2025

Crash report

What happened?

Bug Description

This is a bug that only affects DEBUG builds.

The reproducer is as follow:

import ast

tree = ast.Module(body=[
    ast.Import(names=[ast.alias(name='traceback', lineno=0, col_offset=0)], lineno=0, col_offset=-2)
], type_ignores=[])

compile(tree, "<string>", "exec")

This code fails with:

python: ../Python/assemble.c:301: write_location_info_entry: Assertion `column >= -1' failed.
fish: Job 1, '/home/yijan/Tools/cpython/debug…' terminated by signal SIGABRT (Abort)

backtrace

#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff6c4527e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff6c288ff in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff6c2881b in __assert_fail_base (fmt=0x7ffff6dd01e8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x555557212940 "column >= -1", 
    file=file@entry=0x555557211f60 "../Python/assemble.c", line=line@entry=301, function=function@entry=0x555557213040 <__PRETTY_FUNCTION__.11> "write_location_info_entry") at ./assert/assert.c:96
#6  0x00007ffff6c3b517 in __assert_fail (assertion=assertion@entry=0x555557212940 "column >= -1", file=file@entry=0x555557211f60 "../Python/assemble.c", line=line@entry=301, 
    function=function@entry=0x555557213040 <__PRETTY_FUNCTION__.11> "write_location_info_entry") at ./assert/assert.c:105
#7  0x0000555556bc1760 in write_location_info_entry (a=a@entry=0x7ffff4727fb0, loc=..., isize=isize@entry=6) at ../Python/assemble.c:301
#8  0x0000555556bc1974 in assemble_emit_location (a=a@entry=0x7ffff4727fb0, loc=..., isize=isize@entry=6) at ../Python/assemble.c:336
#9  0x0000555556bc1d36 in assemble_location_info (a=a@entry=0x7ffff4727fb0, instrs=instrs@entry=0x7ffff48c3840, firstlineno=<optimized out>) at ../Python/assemble.c:355
#10 0x0000555556bc487b in assemble_emit (a=a@entry=0x7ffff4727fb0, instrs=instrs@entry=0x7ffff48c3840, first_lineno=<optimized out>, const_cache=const_cache@entry=0x50800009e040) at ../Python/assemble.c:433
#11 0x0000555556bc4cd9 in _PyAssemble_MakeCodeObject (umd=umd@entry=0x51900008e710, const_cache=const_cache@entry=0x50800009e040, consts=consts@entry=0x5070001d8d80, maxdepth=maxdepth@entry=2, 
    instrs=instrs@entry=0x7ffff48c3840, nlocalsplus=nlocalsplus@entry=0, code_flags=<optimized out>, filename=<optimized out>) at ../Python/assemble.c:752
#12 0x0000555556cf6ba8 in optimize_and_assemble_code_unit (u=u@entry=0x51900008e390, const_cache=const_cache@entry=0x50800009e040, code_flags=code_flags@entry=0, filename=filename@entry=0x50700011db30)
    at ../Python/compile.c:1341
#13 0x0000555556cfe374 in _PyCompile_OptimizeAndAssemble (c=c@entry=0x50b000075480, addNone=addNone@entry=1) at ../Python/compile.c:1369
#14 0x0000555556cfe44c in compiler_mod (c=c@entry=0x50b000075480, mod=mod@entry=0x5250000231e8) at ../Python/compile.c:825
--Type <RET> for more, q to quit, c to continue without paging--
#15 0x0000555556cfe4da in _PyAST_Compile (mod=mod@entry=0x5250000231e8, filename=filename@entry=0x50700011db30, pflags=pflags@entry=0x7ffff4727b40, optimize=optimize@entry=-1, arena=arena@entry=0x5080001299b0)
    at ../Python/compile.c:1382
#16 0x0000555556bf755c in builtin_compile_impl (module=module@entry=0x508000049f40, source=source@entry=0x5070001d8af0, filename=<optimized out>, mode=mode@entry=0x5070000459c8 "exec", flags=flags@entry=0, 
    dont_inherit=dont_inherit@entry=0, optimize=<optimized out>, feature_version=<optimized out>) at ../Python/bltinmodule.c:868
#17 0x0000555556bf7e98 in builtin_compile (module=<optimized out>, args=<optimized out>, args@entry=0x529000005280, nargs=nargs@entry=3, kwnames=kwnames@entry=0x0) at ../Python/clinic/bltinmodule.c.h:363
#18 0x000055555692f626 in cfunction_vectorcall_FASTCALL_KEYWORDS (func=0x50800004a540, args=0x529000005280, nargsf=<optimized out>, kwnames=0x0) at ../Objects/methodobject.c:452
#19 0x00005555567fb2e6 in _PyObject_VectorcallTstate (tstate=0x55555867f8f8 <_PyRuntime+329752>, callable=callable@entry=0x50800004a540, args=args@entry=0x529000005280, nargsf=9223372036854775811, 
    kwnames=kwnames@entry=0x0) at ../Include/internal/pycore_call.h:167
#20 0x00005555567fb42f in PyObject_Vectorcall (callable=callable@entry=0x50800004a540, args=args@entry=0x529000005280, nargsf=<optimized out>, kwnames=kwnames@entry=0x0) at ../Objects/call.c:327
#21 0x0000555556c21af5 in _PyEval_EvalFrameDefault (tstate=tstate@entry=0x55555867f8f8 <_PyRuntime+329752>, frame=frame@entry=0x529000005220, throwflag=throwflag@entry=0) at ../Python/generated_cases.c.h:1023
#22 0x0000555556c9d225 in _PyEval_EvalFrame (tstate=tstate@entry=0x55555867f8f8 <_PyRuntime+329752>, frame=frame@entry=0x529000005220, throwflag=throwflag@entry=0) at ../Include/internal/pycore_ceval.h:116
#23 0x0000555556c9d743 in _PyEval_Vector (tstate=tstate@entry=0x55555867f8f8 <_PyRuntime+329752>, func=func@entry=0x51000003d560, locals=locals@entry=0x50800009c340, args=args@entry=0x0, 
    argcount=argcount@entry=0, kwnames=kwnames@entry=0x0) at ../Python/ceval.c:1921
#24 0x0000555556c9da3e in PyEval_EvalCode (co=co@entry=0x514000033450, globals=globals@entry=0x50800009c340, locals=locals@entry=0x50800009c340) at ../Python/ceval.c:660
#25 0x0000555556e22f41 in run_eval_code_obj (tstate=tstate@entry=0x55555867f8f8 <_PyRuntime+329752>, co=co@entry=0x514000033450, globals=globals@entry=0x50800009c340, locals=locals@entry=0x50800009c340)
    at ../Python/pythonrun.c:1338
#26 0x0000555556e23378 in run_mod (mod=mod@entry=0x5250000191e0, filename=filename@entry=0x50b000041080, globals=globals@entry=0x50800009c340, locals=locals@entry=0x50800009c340, 
    flags=flags@entry=0x7ffff4927e30, arena=arena@entry=0x5080000c27b0, interactive_src=0x0, generate_new_source=0) at ../Python/pythonrun.c:1423
#27 0x0000555556e25e70 in pyrun_file (fp=fp@entry=0x51500000fa80, filename=filename@entry=0x50b000041080, start=start@entry=257, globals=globals@entry=0x50800009c340, locals=locals@entry=0x50800009c340, 
    closeit=closeit@entry=1, flags=0x7ffff4927e30) at ../Python/pythonrun.c:1256
#28 0x0000555556e27d8b in _PyRun_SimpleFileObject (fp=fp@entry=0x51500000fa80, filename=filename@entry=0x50b000041080, closeit=closeit@entry=1, flags=flags@entry=0x7ffff4927e30) at ../Python/pythonrun.c:491
#29 0x0000555556e280de in _PyRun_AnyFileObject (fp=fp@entry=0x51500000fa80, filename=filename@entry=0x50b000041080, closeit=closeit@entry=1, flags=flags@entry=0x7ffff4927e30) at ../Python/pythonrun.c:78
#30 0x0000555556eb16f3 in pymain_run_file_obj (program_name=program_name@entry=0x50b000041130, filename=filename@entry=0x50b000041080, skip_source_first_line=<optimized out>) at ../Modules/main.c:410
#31 0x0000555556eb19cd in pymain_run_file (config=config@entry=0x55555864aa88 <_PyRuntime+113064>) at ../Modules/main.c:429
#32 0x0000555556eb4731 in pymain_run_python (exitcode=exitcode@entry=0x7ffff4640760) at ../Modules/main.c:697
#33 0x0000555556eb48f8 in Py_RunMain () at ../Modules/main.c:776
#34 0x0000555556eb4b0d in pymain_main (args=args@entry=0x7ffff470a0a0) at ../Modules/main.c:806
#35 0x0000555556eb4e92 in Py_BytesMain (argc=2, argv=0x7fffffffda78) at ../Modules/main.c:830
#36 0x0000555556590ba6 in main (argc=<optimized out>, argv=<optimized out>) at ../Programs/python.c:15

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

Output from running 'python -VV' on the command line:

Python 3.14.0a4+ (heads/main-dirty:75f59bb6293, Jan 23 2025, 22:33:08) [GCC 13.3.0]

Linked PRs

@yijan4845 yijan4845 added the type-crash A hard crash of the interpreter, possibly with a core dump label Mar 3, 2025
@picnixz picnixz added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Mar 3, 2025
@picnixz
Copy link
Member

picnixz commented Mar 3, 2025

I'm tempted to raise when constructing artificial AST nodes, but I'm not sure I'm not sure whether it could break code. Maybe people are using fake AST nodes with bad location values just for their own needs, like for a sentinel value. OTOH, checking the consistency of all AST nodes in compile may seem an overkill. I don't have enough insight on that part of the codebase to know how to avoid that assertion.

@JelleZijlstra any recommendation here?

@ZeroIntensity
Copy link
Member

Is this causing a problem in practice, or just when fuzzing?

@picnixz
Copy link
Member

picnixz commented Mar 3, 2025

Is this causing a problem in practice, or just when fuzzing?

In practice, it's causing an assertion failure and a crash. So yes, it could be a problem in practice for static analysis tools that may perhaps incorrectly determine some values.

In addition, it may cause issues since without the assertion, the location data would be incorrect and we could end up with incorrect bytecode.

@sobolevn
Copy link
Member

sobolevn commented Mar 3, 2025

Or maybe convert assert to a regular exception?

    if (column < -1 || end_column < -1) {
        const char *msg = column < -1 ? "column" : "end_column";
        int value = column < -1 ? column : end_column;
        PyErr_Format(PyExc_ValueError,
                     "invalid %s value: %d",
                     msg, value);
        return ERROR;
    }

Will produce:

» ./python.exe
Python 3.14.0a5+ (heads/main-dirty:8f11af45de6, Mar  3 2025, 14:19:56) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import ast
... 
... tree = ast.Module(body=[
...     ast.Import(names=[ast.alias(name='traceback', lineno=0, col_offset=0)], lineno=0, \
col_offset=-2)
... ], type_ignores=[])
... 
... compile(tree, "<string>", "exec")
... 
Traceback (most recent call last):
  File "<python-input-0>", line 7, in <module>
    compile(tree, "<string>", "exec")
    ~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid column value: -2

@picnixz
Copy link
Member

picnixz commented Mar 3, 2025

Yes, but my question was: should we do it when we construct the AST nodes or should we do it in assemble.c?

@sobolevn
Copy link
Member

sobolevn commented Mar 3, 2025

In my opinion, it is better to do in assemble.c, but I am not a big expert in this module.

@picnixz
Copy link
Member

picnixz commented Mar 3, 2025

I can think of some advantages of doing in either module:

  • in AST: traceback may be nicer, and users usually craft AST nodes rather than bytecode. Probably where errors would arise the most (like, wrongly crafting an AST node with a negative column value is something I can totally imagine)
  • in assemble: we only need one check at that place. It could prevent footguns from other places I'm not aware of and that can hit that line (I don't remember whether it's possible to craft a wrong bytecode explicitly and assemble it without passing by a pure python module or if requires the C API for that)

@JelleZijlstra
Copy link
Member

I think it's better to do it in the assembler. There are other things you can do with ASTs than compiling, and for those it may make sense to use invalid column values. For example, if you're constructing an AST and then stringifying it with something like ast.unparse (for example, for a lint autofix), you may want to use -1 as a column value because you don't have a real one.

@picnixz
Copy link
Member

picnixz commented Mar 3, 2025

column = -1 seems valid according to both the assertion and the way we internally represent such locations, but I think it would be good to explicitly reject column = -2. I don't know if I have time for working on this one so anyone else can pick it up before I assign the issue to myself (for future contributors, just drop a comment say that you're working on it)

@sobolevn sobolevn self-assigned this Mar 3, 2025
@sobolevn
Copy link
Member

sobolevn commented Mar 3, 2025

I now fully sure that adding this to assemble.c is better, because we don't validate any other values in ast, for example, importing an empty module name:

>>> import ast
... 
... tree = ast.Module(body=[
...     ast.Import(names=[ast.alias(name='', lineno=0, col_offset=0)], lineno=0, col_offset=-1)
... ], type_ignores=[])
... 
>>> eval(compile(tree, "<string>", "exec"))
Traceback (most recent call last):
  File "<python-input-10>", line 1, in <module>
    eval(compile(tree, "<string>", "exec"))
    ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 0, in <module>
ValueError: Empty module name

I have a PR with the change above.

@sobolevn
Copy link
Member

sobolevn commented Mar 3, 2025

Sorry, I was not really correct with my phrase:

because we don't validate any other values in ast

We do validate some values, for example, locations:

cpython/Python/ast.c

Lines 29 to 48 in b3c18bf

#define VALIDATE_POSITIONS(node) \
if (node->lineno > node->end_lineno) { \
PyErr_Format(PyExc_ValueError, \
"AST node line range (%d, %d) is not valid", \
node->lineno, node->end_lineno); \
return 0; \
} \
if ((node->lineno < 0 && node->end_lineno != node->lineno) || \
(node->col_offset < 0 && node->col_offset != node->end_col_offset)) { \
PyErr_Format(PyExc_ValueError, \
"AST node column range (%d, %d) for line range (%d, %d) is not valid", \
node->col_offset, node->end_col_offset, node->lineno, node->end_lineno); \
return 0; \
} \
if (node->lineno == node->end_lineno && node->col_offset > node->end_col_offset) { \
PyErr_Format(PyExc_ValueError, \
"line %d, column %d-%d is not a valid range", \
node->lineno, node->col_offset, node->end_col_offset); \
return 0; \
}

So, it might be a good idea to fix this macro. It does not make any sense to use -2 as the value for any locations.

sobolevn added a commit to sobolevn/cpython that referenced this issue Mar 3, 2025
sobolevn added a commit that referenced this issue Apr 7, 2025
sobolevn added a commit to sobolevn/cpython that referenced this issue Apr 7, 2025
…30795)

(cherry picked from commit bc5233b)

Co-authored-by: sobolevn <[email protected]>
Co-authored-by: Victor Stinner <[email protected]>
Yhg1s pushed a commit that referenced this issue Apr 8, 2025
)

(cherry picked from commit bc5233b)

Co-authored-by: Victor Stinner <[email protected]>
sobolevn added a commit to sobolevn/cpython that referenced this issue Apr 8, 2025
…30795)

(cherry picked from commit bc5233b)

Co-authored-by: sobolevn <[email protected]>
Co-authored-by: Victor Stinner <[email protected]>
Yhg1s pushed a commit that referenced this issue Apr 8, 2025
)

(cherry picked from commit bc5233b)

Co-authored-by: sobolevn <[email protected]>
Co-authored-by: Victor Stinner <[email protected]>
@picnixz picnixz closed this as completed Apr 8, 2025
seehwan pushed a commit to seehwan/cpython that referenced this issue Apr 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump
Projects
None yet
Development

No branches or pull requests

5 participants