public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
@ 2023-11-06 21:00 sjames at gcc dot gnu.org
  2023-11-06 21:00 ` [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA " sjames at gcc dot gnu.org
                   ` (55 more replies)
  0 siblings, 56 replies; 57+ messages in thread
From: sjames at gcc dot gnu.org @ 2023-11-06 21:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

            Bug ID: 112415
           Summary: [14 regression] Python 3.11 miscompiled with new RTL
                    fold mem offset pass, since r14-4664-g04c9cf5c786b94
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: sjames at gcc dot gnu.org
                CC: danglin at gcc dot gnu.org, manolis.tsamis at vrull dot eu
  Target Milestone: ---

I've bisected this twice and come to r14-4664-g04c9cf5c786b94 ('Implement new
RTL optimizations pass: fold-mem-offsets'). -fno-fold-mem-offsets makes things
work.

Python 3.11.6 fails to build on HPPA since that commit with the built-Python
segfaulting during the build.

```
hppa2.0-unknown-linux-gnu-gcc -c -Wsign-compare -DNDEBUG     -O2 -pipe
-march=2.0 -fdiagnostics-color=always -frecord-gcc-switches -ggdb3 -fwrapv
-std=c11 -Wextra -Wno-unused-parameter -Wno-missing-field-init
ializers -Wstrict-prototypes -Werror=implicit-function-declaration
-fvisibility=hidden  -I./Include/internal  -I. -I./Include
-I/usr/include/ncursesw  -fPIC -DPy_BUILD_CORE -o Python/frozen.o
Python/frozen.c
./_bootstrap_python ./Tools/scripts/deepfreeze.py \
Python/frozen_modules/importlib._bootstrap.h:importlib._bootstrap \
Python/frozen_modules/importlib._bootstrap_external.h:importlib._bootstrap_external
\
Python/frozen_modules/zipimport.h:zipimport \
Python/frozen_modules/abc.h:abc \
Python/frozen_modules/codecs.h:codecs \
Python/frozen_modules/io.h:io \
Python/frozen_modules/_collections_abc.h:_collections_abc \
Python/frozen_modules/_sitebuiltins.h:_sitebuiltins \
Python/frozen_modules/genericpath.h:genericpath \
Python/frozen_modules/ntpath.h:ntpath \
Python/frozen_modules/posixpath.h:posixpath \
Python/frozen_modules/os.h:os \
Python/frozen_modules/site.h:site \
Python/frozen_modules/stat.h:stat \
Python/frozen_modules/importlib.util.h:importlib.util \
Python/frozen_modules/importlib.machinery.h:importlib.machinery \
Python/frozen_modules/runpy.h:runpy \
Python/frozen_modules/__hello__.h:__hello__ \
Python/frozen_modules/__phello__.h:__phello__ \
Python/frozen_modules/__phello__.ham.h:__phello__.ham \
Python/frozen_modules/__phello__.ham.eggs.h:__phello__.ham.eggs \
Python/frozen_modules/__phello__.spam.h:__phello__.spam \
Python/frozen_modules/frozen_only.h:frozen_only \
-o Python/deepfreeze/deepfreeze.c
make: *** [Makefile:1298: Python/deepfreeze/deepfreeze.c] Segmentation fault
(core dumped)
make: *** Waiting for unfinished jobs....
hppa2.0-unknown-linux-gnu-gcc -c -I./Modules/_decimal/libmpdec -DCONFIG_32=1
-DANSI=1 -Wsign-compare -DNDEBUG     -O2 -pipe -march=2.0
-fdiagnostics-color=always -frecord-gcc-switches -ggdb3 -fwrapv -std=c11
-Wextra -Wno-unused-parameter -Wno-missing-field-initializers
-Wstrict-prototypes -Werror=implicit-function-declaration -fvisibility=hidden 
-I./Include/internal  -I. -I./Include -I/usr/include/ncursesw  -fPIC -fPIC -o
Modules/_decimal/libmpdec/mpdecimal.o ./Modules/_decimal/libmpdec/mpdecimal.c
 * ERROR: dev-lang/python-3.11.6::gentoo failed (compile phase):
 *   emake failed
```

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
@ 2023-11-06 21:00 ` sjames at gcc dot gnu.org
  2023-11-06 21:01 ` sjames at gcc dot gnu.org
                   ` (54 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: sjames at gcc dot gnu.org @ 2023-11-06 21:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

Sam James <sjames at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[14 regression] Python 3.11 |[14 regression] Python 3.11
                   |miscompiled with new RTL    |miscompiled on HPPA with
                   |fold mem offset pass, since |new RTL fold mem offset
                   |r14-4664-g04c9cf5c786b94    |pass, since
                   |                            |r14-4664-g04c9cf5c786b94

--- Comment #1 from Sam James <sjames at gcc dot gnu.org> ---
Backtrace from the crashing Python:
```
(gdb) r
Starting program:
/var/tmp/portage/dev-lang/python-3.11.6/work/Python-3.11.6/_bootstrap_python
./Tools/scripts/deepfreeze.py
Python/frozen_modules/importlib._bootstrap.h:importlib._bootstrap
Python/frozen_modules/importlib._bootstrap_external.h:importlib._bootstrap_external
Python/frozen_modules/zipimport.h:zipimport Python/frozen_modules/abc.h:abc
Python/frozen_modules/codecs.h:codecs Python/frozen_modules/io.h:io
Python/frozen_modules/_collections_abc.h:_collections_abc
Python/frozen_modules/_sitebuiltins.h:_sitebuiltins
Python/frozen_modules/genericpath.h:genericpath
Python/frozen_modules/ntpath.h:ntpath
Python/frozen_modules/posixpath.h:posixpath Python/frozen_modules/os.h:os
Python/frozen_modules/site.h:site Python/frozen_modules/stat.h:stat
Python/frozen_modules/importlib.util.h:importlib.util
Python/frozen_modules/importlib.machinery.h:importlib.machinery
Python/frozen_modules/runpy.h:runpy Python/frozen_modules/__hello__.h:__hello__
Python/frozen_modules/__phello__.h:__phello__
Python/frozen_modules/__phello__.ham.h:__phello__.ham
Python/frozen_modules/__phello__.ham.eggs.h:__phello__.ham.eggs
Python/frozen_modules/__phello__.spam.h:__phello__.spam
Python/frozen_modules/frozen_only.h:frozen_only -o
Python/deepfreeze/deepfreeze.c
warning: File "/usr/lib/libthread_db.so.1" auto-loading has been declined by
your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
        add-auto-load-safe-path /usr/lib/libthread_db.so.1
line to your configuration file "/root/.config/gdb/gdbinit".
To completely disable this security protection add
        set auto-load safe-path /
line to your configuration file "/root/.config/gdb/gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
        info "(gdb)Auto-loading safe path"
warning: Unable to find libthread_db matching inferior's thread library, thread
debugging will not be available.

Program received signal SIGSEGV, Segmentation fault.
0x412083fc in _PyST_GetSymbol (name=0xf9a33a60, ste=<optimized out>) at
Python/symtable.c:396
396         PyObject *v = PyDict_GetItemWithError(ste->ste_symbols, name);
(gdb) bt
#0  0x412083fc in _PyST_GetSymbol (name=0xf9a33a60, ste=<optimized out>) at
Python/symtable.c:396
#1  _PyST_GetScope (ste=<optimized out>, name=0xf9a33a60) at
Python/symtable.c:406
#2  0x411bb8f8 in compiler_nameop (c=0xf7b03b88, name=<optimized out>,
ctx=Load) at Python/compile.c:4274
#3  0x411be074 in compiler_visit_expr (c=0x1, e=<optimized out>) at
Python/compile.c:5969
#4  0x411bcc88 in compiler_visit_expr1 (c=0xf7b03b88, e=0x1) at
Python/compile.c:5915
#5  0x411be074 in compiler_visit_expr (c=0x1, e=<optimized out>) at
Python/compile.c:5969
#6  0x411bceac in compiler_call (e=0x1, c=0xf7b03b88) at Python/compile.c:4952
#7  compiler_visit_expr1 (c=0xf7b03b88, e=0x1) at Python/compile.c:5905
#8  0x411c1f34 in compiler_visit_expr (e=<optimized out>, c=0xf9a33a60) at
Python/compile.c:5969
#9  compiler_decorators (decos=0x8d, c=0xf9a33a60) at Python/compile.c:2327
#10 compiler_class (c=0xf9a33a60, s=0x414e4490) at Python/compile.c:2702
#11 0x411c566c in compiler_body (c=0xf7b03b88, stmts=0xf9a33a60) at
Python/compile.c:2180
#12 0x411c7e98 in compiler_mod (mod=0xf7b03b88, c=0x0) at Python/compile.c:2197
#13 _PyAST_Compile (mod=0xf7b03b88, filename=0x8d, flags=<optimized out>,
optimize=<optimized out>, arena=<optimized out>) at Python/compile.c:581
#14 0x411fe7b8 in Py_CompileStringObject (str=0xf7b03b88
"\371\240\277\220\371\236\353`\371\257\221\260\367\260:t", filename=0x8d,
start=-139445336, flags=0xf9a33a60, optimize=<optimized out>)
    at Python/pythonrun.c:1799
#15 0x4119c334 in builtin_compile_impl (module=<optimized out>,
feature_version=<optimized out>, optimize=<optimized out>,
dont_inherit=<optimized out>, flags=<optimized out>, mode=<optimized out>,
    filename=0xf998db68, source=0x8d) at Python/bltinmodule.c:831
#16 builtin_compile (module=<optimized out>, args=<optimized out>,
nargs=<optimized out>, kwnames=<optimized out>) at
Python/clinic/bltinmodule.c.h:328
#17 0x410f3ae4 in cfunction_vectorcall_FASTCALL_KEYWORDS (func=0xf9a33a60,
args=0x8d, nargsf=<optimized out>, kwnames=<optimized out>) at
./Include/cpython/methodobject.h:52
#18 0x4109fa88 in _PyVectorcall_Call (tstate=0xf7b03b88, func=<optimized out>,
callable=0xf9a33a60, tuple=<optimized out>, kwargs=<optimized out>) at
Objects/call.c:257
#19 0x4109fd28 in _PyObject_Call (tstate=0xf9a33a60, callable=0x1,
args=0xf7b03ba8, kwargs=0x8d) at Objects/call.c:328
#20 0x4109fdb8 in PyObject_Call () at Objects/call.c:352
#21 0x411a47c8 in do_call_core (tstate=0x8d, func=0x1, callargs=0xf9a33a60,
kwdict=0xf7b03b88, use_tracing=<optimized out>) at Python/ceval.c:7315
#22 0x411ab5dc in _PyEval_EvalFrameDefault (tstate=0xf7b03ba8,
frame=0xf9a33a60, throwflag=1) at Python/ceval.c:5367
#23 0x411af42c in _PyEval_EvalFrame (throwflag=0, frame=0xf9a33a60, tstate=0x1)
at ./Include/internal/pycore_ceval.h:73
#24 _PyEval_Vector (tstate=0x1, func=<optimized out>, locals=<optimized out>,
args=<optimized out>, argcount=<optimized out>, kwnames=<optimized out>) at
Python/ceval.c:6425
#25 0x4109fe48 in _PyFunction_Vectorcall (func=<optimized out>,
stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at
Objects/call.c:396
#26 0x410a0a0c in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized
out>, args=0xf998db68, callable=0xf7b03b88, tstate=0xf7b03ba8) at
./Include/internal/pycore_call.h:92
#27 object_vacall (tstate=0xf7b03ba8, base=<optimized out>,
callable=0xf7b03b88, vargs=<optimized out>) at Objects/call.c:819
#28 0x410a0be0 in PyObject_CallMethodObjArgs (obj=<optimized out>,
name=<optimized out>) at Objects/call.c:879
#29 0x411dd9e8 in import_find_and_load (abs_name=0xf7b03ba8, tstate=0xf9a33a60)
at Python/import.c:1737
#30 PyImport_ImportModuleLevelObject (name=0x1, globals=<optimized out>,
locals=<optimized out>, fromlist=0xf7b03b88, level=<optimized out>) at
Python/import.c:1836
#31 0x411aefbc in import_name (level=<optimized out>, fromlist=<optimized out>,
name=<optimized out>, frame=<optimized out>, tstate=<optimized out>) at
Python/ceval.c:7415
#32 _PyEval_EvalFrameDefault (tstate=0xf7b03ba8, frame=0xf9a33a60, throwflag=1)
at Python/ceval.c:3937
#33 0x411af42c in _PyEval_EvalFrame (throwflag=0, frame=0xf9a33a60, tstate=0x1)
at ./Include/internal/pycore_ceval.h:73
#34 _PyEval_Vector (tstate=0x1, func=<optimized out>, locals=<optimized out>,
args=<optimized out>, argcount=<optimized out>, kwnames=<optimized out>) at
Python/ceval.c:6425
#35 0x411af4e4 in PyEval_EvalCode (co=0xf9a33a60, globals=<optimized out>,
locals=0xf7b03b88) at Python/ceval.c:1140
#36 0x4119b6d4 in builtin_exec_impl (module=<optimized out>, closure=<optimized
out>, locals=0xf7b03ba8, globals=0x8d, source=0xf998db68) at
Python/bltinmodule.c:1077
#37 builtin_exec (module=<optimized out>, args=<optimized out>,
nargs=<optimized out>, kwnames=<optimized out>) at
Python/clinic/bltinmodule.c.h:465
#38 0x410f3ae4 in cfunction_vectorcall_FASTCALL_KEYWORDS (func=0xf9a33a60,
args=0x8d, nargsf=<optimized out>, kwnames=<optimized out>) at
./Include/cpython/methodobject.h:52
#39 0x4109fa14 in _PyVectorcall_Call (tstate=0xf7b03b88, func=<optimized out>,
callable=0xf9a33a60, tuple=<optimized out>, kwargs=<optimized out>) at
Objects/call.c:245
#40 0x4109fd28 in _PyObject_Call (tstate=0xf9a33a60, callable=0x1,
args=0xf7b03ba8, kwargs=0x8d) at Objects/call.c:328
#41 0x4109fdb8 in PyObject_Call () at Objects/call.c:352
#42 0x411a47c8 in do_call_core (tstate=0x8d, func=0x1, callargs=0xf9a33a60,
kwdict=0xf7b03b88, use_tracing=<optimized out>) at Python/ceval.c:7315
#43 0x411ab5dc in _PyEval_EvalFrameDefault (tstate=0xf7b03ba8,
frame=0xf9a33a60, throwflag=1) at Python/ceval.c:5367
#44 0x411af42c in _PyEval_EvalFrame (throwflag=0, frame=0xf9a33a60, tstate=0x1)
at ./Include/internal/pycore_ceval.h:73
#45 _PyEval_Vector (tstate=0x1, func=<optimized out>, locals=<optimized out>,
args=<optimized out>, argcount=<optimized out>, kwnames=<optimized out>) at
Python/ceval.c:6425
#46 0x4109fe48 in _PyFunction_Vectorcall (func=<optimized out>,
stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at
Objects/call.c:396
--Type <RET> for more, q to quit, c to continue without paging--
#47 0x410a0a0c in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized
out>, args=0xf998db68, callable=0xf7b03b88, tstate=0xf7b03ba8) at
./Include/internal/pycore_call.h:92
#48 object_vacall (tstate=0xf7b03ba8, base=<optimized out>,
callable=0xf7b03b88, vargs=<optimized out>) at Objects/call.c:819
#49 0x410a0be0 in PyObject_CallMethodObjArgs (obj=<optimized out>,
name=<optimized out>) at Objects/call.c:879
#50 0x411dd9e8 in import_find_and_load (abs_name=0xf7b03ba8, tstate=0xf9a33a60)
at Python/import.c:1737
#51 PyImport_ImportModuleLevelObject (name=0x1, globals=<optimized out>,
locals=<optimized out>, fromlist=0xf7b03b88, level=<optimized out>) at
Python/import.c:1836
#52 0x411aefbc in import_name (level=<optimized out>, fromlist=<optimized out>,
name=<optimized out>, frame=<optimized out>, tstate=<optimized out>) at
Python/ceval.c:7415
#53 _PyEval_EvalFrameDefault (tstate=0xf7b03ba8, frame=0xf9a33a60, throwflag=1)
at Python/ceval.c:3937
#54 0x411af42c in _PyEval_EvalFrame (throwflag=0, frame=0xf9a33a60, tstate=0x1)
at ./Include/internal/pycore_ceval.h:73
#55 _PyEval_Vector (tstate=0x1, func=<optimized out>, locals=<optimized out>,
args=<optimized out>, argcount=<optimized out>, kwnames=<optimized out>) at
Python/ceval.c:6425
#56 0x411af4e4 in PyEval_EvalCode (co=0xf9a33a60, globals=<optimized out>,
locals=0xf7b03b88) at Python/ceval.c:1140
#57 0x411fa628 in run_eval_code_obj (tstate=0xf7b03b88, co=0xf9a33a60,
globals=0x1, locals=0x8d) at Python/pythonrun.c:1710
#58 0x411fa8d8 in run_mod (mod=<optimized out>, filename=<optimized out>,
globals=0xf9a33a60, locals=0x8d, flags=<optimized out>, arena=<optimized out>)
at Python/pythonrun.c:1731
#59 0x411faa50 in pyrun_file (fp=0x0, filename=0x8d, start=<optimized out>,
globals=0xf7b03b88, locals=<optimized out>, closeit=<optimized out>,
flags=<optimized out>) at Python/pythonrun.c:1626
#60 0x411fdc38 in _PyRun_SimpleFileObject (fp=0xf998db68, filename=0x8d,
closeit=-139445336, flags=0x0) at Python/pythonrun.c:440
#61 0x411fe30c in _PyRun_AnyFileObject (fp=0xf9a33a60, filename=0x1,
closeit=141, flags=0xf7b03b88) at Python/pythonrun.c:79
#62 0x41222278 in pymain_run_file_obj (skip_source_first_line=1095637024,
filename=0xf7b03ba8, program_name=0x8e) at Modules/main.c:360
#63 pymain_run_file (config=0x1) at Modules/main.c:379
#64 pymain_run_python (exitcode=0x8d) at Modules/main.c:601
#65 Py_RunMain () at Modules/main.c:680
#66 0x4104c4c8 in main (argc=<optimized out>, argv=<optimized out>) at
Programs/_bootstrap_python.c:109
(gdb)
```

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
  2023-11-06 21:00 ` [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA " sjames at gcc dot gnu.org
@ 2023-11-06 21:01 ` sjames at gcc dot gnu.org
  2023-11-06 21:03 ` pinskia at gcc dot gnu.org
                   ` (53 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: sjames at gcc dot gnu.org @ 2023-11-06 21:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #2 from Sam James <sjames at gcc dot gnu.org> ---
I'll grab a bad vs good build directory next and upload both, and then try see
which objects differ.

Dave, can you reproduce?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
  2023-11-06 21:00 ` [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA " sjames at gcc dot gnu.org
  2023-11-06 21:01 ` sjames at gcc dot gnu.org
@ 2023-11-06 21:03 ` pinskia at gcc dot gnu.org
  2023-11-06 21:31 ` dave.anglin at bell dot net
                   ` (52 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-11-06 21:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |wrong-code
   Target Milestone|---                         |14.0
             Target|                            |hppa2.0-unknown-linux-gnu

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2023-11-06 21:03 ` pinskia at gcc dot gnu.org
@ 2023-11-06 21:31 ` dave.anglin at bell dot net
  2023-11-06 22:09 ` sjames at gcc dot gnu.org
                   ` (51 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: dave.anglin at bell dot net @ 2023-11-06 21:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #3 from dave.anglin at bell dot net ---
On 2023-11-06 4:00 p.m., sjames at gcc dot gnu.org wrote:
> Program received signal SIGSEGV, Segmentation fault.
> 0x412083fc in _PyST_GetSymbol (name=0xf9a33a60, ste=<optimized out>) at
> Python/symtable.c:396
> 396         PyObject *v = PyDict_GetItemWithError(ste->ste_symbols, name);
> (gdb) bt
> #0  0x412083fc in _PyST_GetSymbol (name=0xf9a33a60, ste=<optimized out>) at
> Python/symtable.c:396
> #1  _PyST_GetScope (ste=<optimized out>, name=0xf9a33a60) at
> Python/symtable.c:406
Probably, ste is NULL or in page 0, and it's symtable.c that's miscompiled.

There's not a lot of testing of gcc-14 on hppa yet.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2023-11-06 21:31 ` dave.anglin at bell dot net
@ 2023-11-06 22:09 ` sjames at gcc dot gnu.org
  2023-11-06 22:11 ` sjames at gcc dot gnu.org
                   ` (50 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: sjames at gcc dot gnu.org @ 2023-11-06 22:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #4 from Sam James <sjames at gcc dot gnu.org> ---
Created attachment 56520
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56520&action=edit
list_of_differing_files.txt

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2023-11-06 22:09 ` sjames at gcc dot gnu.org
@ 2023-11-06 22:11 ` sjames at gcc dot gnu.org
  2023-11-06 22:20 ` law at gcc dot gnu.org
                   ` (49 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: sjames at gcc dot gnu.org @ 2023-11-06 22:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

Sam James <sjames at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |law at gcc dot gnu.org

--- Comment #5 from Sam James <sjames at gcc dot gnu.org> ---
Built with 14.0.0 20231029.

*
https://dev.gentoo.org/~sam/bugs/gcc/gcc-python-hppa/cpython-3.11.6-good.tar.xz
*
https://dev.gentoo.org/~sam/bugs/gcc/gcc-python-hppa/cpython-3.11.6-bad.tar.xz

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2023-11-06 22:11 ` sjames at gcc dot gnu.org
@ 2023-11-06 22:20 ` law at gcc dot gnu.org
  2023-11-06 22:33 ` dave.anglin at bell dot net
                   ` (48 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: law at gcc dot gnu.org @ 2023-11-06 22:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #6 from Jeffrey A. Law <law at gcc dot gnu.org> ---
Do we have assembly code around the faulting point (x/20i $pc) and a register
dump (i r)?  The biggest concern I'd have with f-m-o on the PA would be the
implicit segment selection that happens on the base register -- but it would
only be an issue if we are faulting on an unscaled indexed addressing mode and
only if the linux-gnu port was actually putting different values into the space
registers.

WRT testing -- we did test this on hppa1.1-linux-gnu.  Just a bootstrap and
regression test of the compiler itself.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2023-11-06 22:20 ` law at gcc dot gnu.org
@ 2023-11-06 22:33 ` dave.anglin at bell dot net
  2023-11-06 22:49 ` sjames at gcc dot gnu.org
                   ` (47 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: dave.anglin at bell dot net @ 2023-11-06 22:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #7 from dave.anglin at bell dot net ---
On 2023-11-06 5:20 p.m., law at gcc dot gnu.org wrote:
> The biggest concern I'd have with f-m-o on the PA would be the
> implicit segment selection that happens on the base register -- but it would
> only be an issue if we are faulting on an unscaled indexed addressing mode and
> only if the linux-gnu port was actually putting different values into the space
> registers.
The linux-gnu port does not put different values into the space resisters.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2023-11-06 22:33 ` dave.anglin at bell dot net
@ 2023-11-06 22:49 ` sjames at gcc dot gnu.org
  2023-11-06 23:11 ` sjames at gcc dot gnu.org
                   ` (46 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: sjames at gcc dot gnu.org @ 2023-11-06 22:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #8 from Sam James <sjames at gcc dot gnu.org> ---
(In reply to Jeffrey A. Law from comment #6)

Program received signal SIGSEGV, Segmentation fault.
0x412083f0 in _PyST_GetSymbol (name=0xf9a34a00, ste=<optimized out>) at
Python/symtable.c:396
396         PyObject *v = PyDict_GetItemWithError(ste->ste_symbols, name);
(gdb) x/20i $pc
=> 0x412083f0 <_PyST_GetScope+20>:      ldw c(r26),r26
   0x412083f4 <_PyST_GetScope+24>:      movb,= ret0,r26,0x41208414
<_PyST_GetScope+56>
   0x412083f8 <_PyST_GetScope+28>:      copy r4,r19
   0x412083fc <_PyST_GetScope+32>:      b,l 0x410d6900 <PyLong_AsLong>,rp
   0x41208400 <_PyST_GetScope+36>:      nop
   0x41208404 <_PyST_GetScope+40>:      ldw -54(sp),rp
   0x41208408 <_PyST_GetScope+44>:      extrw,u ret0,20,4,ret0
   0x4120840c <_PyST_GetScope+48>:      bve (rp)
   0x41208410 <_PyST_GetScope+52>:      ldw,mb -40(sp),r4
   0x41208414 <_PyST_GetScope+56>:      copy r26,ret0
   0x41208418 <_PyST_GetScope+60>:      ldw -54(sp),rp
   0x4120841c <_PyST_GetScope+64>:      bve (rp)
   0x41208420 <_PyST_GetScope+68>:      ldw,mb -40(sp),r4
   0x41208424 <_Py_SymtableStringObjectFlags>:  stw rp,-14(sp)
   0x41208428 <_Py_SymtableStringObjectFlags+4>:        stw,ma r8,80(sp)
   0x4120842c <_Py_SymtableStringObjectFlags+8>:        copy r23,r8
   0x41208430 <_Py_SymtableStringObjectFlags+12>:       stw r7,-7c(sp)
   0x41208434 <_Py_SymtableStringObjectFlags+16>:       copy r24,r7
   0x41208438 <_Py_SymtableStringObjectFlags+20>:       stw r6,-78(sp)
   0x4120843c <_Py_SymtableStringObjectFlags+24>:       copy r25,r6
(gdb)

(gdb) i r
flags          <unavailable>
r1             0x411bc688          1092339336
rp             0x412083f7          1092649975
r3             0x1                 1
r4             0x4136c000          1094107136
r5             0xf9a34a00          4188228096
r6             0x8d                141
r7             0xf7b03b88          4155521928
r8             0xf7b03ba8          4155521960
r9             0xf9953b68          4187306856
r10            0x0                 0
r11            0x8e                142
r12            0x414e1820          1095637024
r13            0x414e4490          1095648400
r14            0xf9a76498          4188497048
r15            0x1                 1
r16            0xf99bb5e8          4187731432
r17            0xf9ae11b4          4188934580
r18            0xf99e3b68          4187896680
r19            0x4136c000          1094107136
r20            0x411bc7f0          1092339696
r21            0x41450268          1095041640
r22            0x8d                141
r23            0x1                 1
r24            0x1                 1
r25            0xf9a34a00          4188228096
r26            0x34                52
dp             0x4136c000          1094107136
ret0           0xf9964020          4187373600
ret1           0x8d                141
sp             0xf7b04080          4155523200
r31            0x1                 1
sar            0x3d                61
pcoqh          0x412083f3          1092649971
pcsqh          <unavailable>
pcoqt          0x410e4c0f          1091456015
pcsqt          <unavailable>
eiem           <unavailable>
iir            <unavailable>
isr            <unavailable>
ior            <unavailable>
ipsw           0xeff0f             982799
goto           <unavailable>
sr4            <unavailable>
sr0            <unavailable>
sr1            <unavailable>
sr2            <unavailable>
sr3            <unavailable>
sr5            <unavailable>
sr6            <unavailable>
sr7            <unavailable>
cr0            <unavailable>
cr8            <unavailable>
cr9            <unavailable>
ccr            <unavailable>
cr12           <unavailable>
cr13           <unavailable>
cr24           <unavailable>
cr25           <unavailable>
cr26           0xeff0f             982799
mpsfu_high     0xf7afa500          4155483392
mpsfu_low      <unavailable>
mpsfu_ovflo    <unavailable>
pad            <unavailable>
fpsr           <unavailable>
fpe1           <unavailable>
fpe2           <unavailable>
fpe3           <unavailable>
fpe4           <unavailable>
fpe5           <unavailable>
fpe6           <unavailable>
fpe7           <unavailable>
(gdb)

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2023-11-06 22:49 ` sjames at gcc dot gnu.org
@ 2023-11-06 23:11 ` sjames at gcc dot gnu.org
  2023-11-06 23:18 ` dave.anglin at bell dot net
                   ` (45 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: sjames at gcc dot gnu.org @ 2023-11-06 23:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #9 from Sam James <sjames at gcc dot gnu.org> ---
I think the key object is Python/compile.o, but not certain yet.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2023-11-06 23:11 ` sjames at gcc dot gnu.org
@ 2023-11-06 23:18 ` dave.anglin at bell dot net
  2023-11-07 14:08 ` manolis.tsamis at vrull dot eu
                   ` (44 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: dave.anglin at bell dot net @ 2023-11-06 23:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #10 from dave.anglin at bell dot net ---
On 2023-11-06 5:49 p.m., sjames at gcc dot gnu.org wrote:
> Program received signal SIGSEGV, Segmentation fault.
> 0x412083f0 in _PyST_GetSymbol (name=0xf9a34a00, ste=<optimized out>) at
> Python/symtable.c:396
> 396         PyObject *v = PyDict_GetItemWithError(ste->ste_symbols, name);
> (gdb) x/20i $pc
> => 0x412083f0 <_PyST_GetScope+20>:      ldw c(r26),r26
r26=0x34, so the ldw will fault.  It appears r26 and r25 have been exchanged in
the code
prior to <_PyST_GetScope+20>.  In any case, the problem is with the ste
argument passed
to  _PyST_GetSymbol.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2023-11-06 23:18 ` dave.anglin at bell dot net
@ 2023-11-07 14:08 ` manolis.tsamis at vrull dot eu
  2023-11-07 21:12 ` sjames at gcc dot gnu.org
                   ` (43 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: manolis.tsamis at vrull dot eu @ 2023-11-07 14:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #11 from Manolis Tsamis <manolis.tsamis at vrull dot eu> ---
Hi all,

I will also go ahead and try to reproduce that, although it may take me some
time due to my limited experience with HPPA. Once I manage to reproduce, most
f-m-o issues are straightforward to locate by bisecting the transformed
instructions.

> I think the key object is Python/compile.o, but not certain yet.

In this case the dump file of fold-mem-offsets
(-fdump-rtl-fold_mem_offsets-all) could also be useful, as it contains all the
information needed to see whether a transformation is valid. If it would be
easy for anyone to provide the dump file, I could look at it and see if
anything stands out (until I manage to reproduce this).

Thanks,
Manolis

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2023-11-07 14:08 ` manolis.tsamis at vrull dot eu
@ 2023-11-07 21:12 ` sjames at gcc dot gnu.org
  2023-11-08  1:36 ` sjames at gcc dot gnu.org
                   ` (42 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: sjames at gcc dot gnu.org @ 2023-11-07 21:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #12 from Sam James <sjames at gcc dot gnu.org> ---
(In reply to Manolis Tsamis from comment #11)
> Hi all,
> 
> I will also go ahead and try to reproduce that, although it may take me some
> time due to my limited experience with HPPA. Once I manage to reproduce,
> most f-m-o issues are straightforward to locate by bisecting the transformed
> instructions.

Thanks! You are very welcome to have access to some HPPA machines for this kind
of work. Please email me an SSH public key + desired username if that sounds
helpful.

> 
> > I think the key object is Python/compile.o, but not certain yet.
> 
> In this case the dump file of fold-mem-offsets
> (-fdump-rtl-fold_mem_offsets-all) could also be useful, as it contains all
> the information needed to see whether a transformation is valid. If it would
> be easy for anyone to provide the dump file, I could look at it and see if
> anything stands out (until I manage to reproduce this).

I'll get the dumps in a moment, thanks.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (12 preceding siblings ...)
  2023-11-07 21:12 ` sjames at gcc dot gnu.org
@ 2023-11-08  1:36 ` sjames at gcc dot gnu.org
  2023-11-08  2:24 ` dave.anglin at bell dot net
                   ` (41 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: sjames at gcc dot gnu.org @ 2023-11-08  1:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #13 from Sam James <sjames at gcc dot gnu.org> ---
Created attachment 56527
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56527&action=edit
compile.c.323r.fold_mem_offsets.bad.xz

Output from
```
hppa2.0-unknown-linux-gnu-gcc -c  -DNDEBUG -g -fwrapv -O3 -Wall -O2   -std=c11
-Werror=implicit-function-declaration -fvisibility=hidden 
-I/home/sam/git/cpython/Include/internal -IObjects -IInclude -IPython -I.
-I/home/sam/git/cpython/Include    -DPy_BUILD_CORE -o Python/compile.o
/home/sam/git/cpython/Python/compile.c -fdump-rtl-fold_mem_offsets-all
```

If I instrument certain functions in compile.c with no optimisation attribuet
or build the file with -fno-fold-mem-offsets, Python works, so I'm reasonably
sure this is the relevant object.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (13 preceding siblings ...)
  2023-11-08  1:36 ` sjames at gcc dot gnu.org
@ 2023-11-08  2:24 ` dave.anglin at bell dot net
  2023-11-08 10:09 ` manolis.tsamis at vrull dot eu
                   ` (40 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: dave.anglin at bell dot net @ 2023-11-08  2:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #14 from dave.anglin at bell dot net ---
On 2023-11-07 8:36 p.m., sjames at gcc dot gnu.org wrote:
> If I instrument certain functions in compile.c with no optimisation attribuet
> or build the file with -fno-fold-mem-offsets, Python works, so I'm reasonably
> sure this is the relevant object.
I believe this bug is related to https://gcc.gnu.org/PR97431
I see the same fault with using debian/rules and -finline-small-functions
option.

Debian has been building with -fno-inline-small-functions on sh and hppa.  This
hides
problem.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (14 preceding siblings ...)
  2023-11-08  2:24 ` dave.anglin at bell dot net
@ 2023-11-08 10:09 ` manolis.tsamis at vrull dot eu
  2023-11-08 14:42 ` jeffreyalaw at gmail dot com
                   ` (39 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: manolis.tsamis at vrull dot eu @ 2023-11-08 10:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #15 from Manolis Tsamis <manolis.tsamis at vrull dot eu> ---
(In reply to Sam James from comment #13)
> Created attachment 56527 [details]
> compile.c.323r.fold_mem_offsets.bad.xz
> 
> Output from
> ```
> hppa2.0-unknown-linux-gnu-gcc -c  -DNDEBUG -g -fwrapv -O3 -Wall -O2  
> -std=c11 -Werror=implicit-function-declaration -fvisibility=hidden 
> -I/home/sam/git/cpython/Include/internal -IObjects -IInclude -IPython -I.
> -I/home/sam/git/cpython/Include    -DPy_BUILD_CORE -o Python/compile.o
> /home/sam/git/cpython/Python/compile.c -fdump-rtl-fold_mem_offsets-all
> ```
> 
> If I instrument certain functions in compile.c with no optimisation
> attribuet or build the file with -fno-fold-mem-offsets, Python works, so I'm
> reasonably sure this is the relevant object.

Thanks for the dump file! There are 66 folded/eliminated instructions in this
object file; I did look at each case and there doesn't seem to be anything
strange. In fact most of the transformations are straightforward:

 - All except a couple of cases don't involve any arithmetic, so it's just
moving a constant around.
 - The majority of the transformations are 'trivial' and consist of a single
add and then a memory operation: a sequence like X = Y + Const, R = MEM[X + 0]
is folded to X = Y, R = MEM[X + Const]. I wonder why so many of these exist and
are not optimized elsewhere.
 - There are some cases with negative offsets, but the calculations look
correct.
 - There are few more complicated cases, but I've done these on paper and also
look correct.

Of course I could be missing some more complicated effect, but what I want to
say is that everything looks sensible in this particular file.

> Thanks! You are very welcome to have access to some HPPA machines for
> this kind of work. Please email me an SSH public key + desired username
> if that sounds helpful.

Yes, since I couldn't find anything interesting in the dump, that would
definitely be helpful. Thanks!

Manolis

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (15 preceding siblings ...)
  2023-11-08 10:09 ` manolis.tsamis at vrull dot eu
@ 2023-11-08 14:42 ` jeffreyalaw at gmail dot com
  2023-11-08 18:59 ` dave.anglin at bell dot net
                   ` (38 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: jeffreyalaw at gmail dot com @ 2023-11-08 14:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #16 from Jeffrey A. Law <jeffreyalaw at gmail dot com> ---
On 11/8/23 03:09, manolis.tsamis at vrull dot eu wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415
> 
> --- Comment #15 from Manolis Tsamis <manolis.tsamis at vrull dot eu> ---
> (In reply to Sam James from comment #13)
>> Created attachment 56527 [details]
>> compile.c.323r.fold_mem_offsets.bad.xz
>>
>> Output from
>> ```
>> hppa2.0-unknown-linux-gnu-gcc -c  -DNDEBUG -g -fwrapv -O3 -Wall -O2
>> -std=c11 -Werror=implicit-function-declaration -fvisibility=hidden
>> -I/home/sam/git/cpython/Include/internal -IObjects -IInclude -IPython -I.
>> -I/home/sam/git/cpython/Include    -DPy_BUILD_CORE -o Python/compile.o
>> /home/sam/git/cpython/Python/compile.c -fdump-rtl-fold_mem_offsets-all
>> ```
>>
>> If I instrument certain functions in compile.c with no optimisation
>> attribuet or build the file with -fno-fold-mem-offsets, Python works, so I'm
>> reasonably sure this is the relevant object.
> 
> Thanks for the dump file! There are 66 folded/eliminated instructions in this
> object file; I did look at each case and there doesn't seem to be anything
> strange. In fact most of the transformations are straightforward:
> 
>   - All except a couple of cases don't involve any arithmetic, so it's just
> moving a constant around.
>   - The majority of the transformations are 'trivial' and consist of a single
> add and then a memory operation: a sequence like X = Y + Const, R = MEM[X + 0]
> is folded to X = Y, R = MEM[X + Const]. I wonder why so many of these exist and
> are not optimized elsewhere.
>   - There are some cases with negative offsets, but the calculations look
> correct.
>   - There are few more complicated cases, but I've done these on paper and also
> look correct.
The PA port is "weird".  It's addressing modes aren't a good match for 
GCC (they're not symmetrical across loads vs stores and across fp vs 
integer) and they have the implicit space register problem.  But I don't 
immediately recall needing to avoid propagation of constants into memory 
references or anything like that.

I'd probably continue with the process of narrowing down what code is 
affected using the attributes.  We already know the file, narrowing it 
down to a function might help considerably with the evaluation effort.

Note that QEMU has a functional PA port.  So you might be able to just 
take a root filesystem, add the tarball referenced earlier and play 
around to narrow things down further.

I haven't done work on the PA in about 20 years at this point, but I can 
probably still grok its code.  Between David and myself I'm sure we can 
help interpret what's going on


Jeff

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (16 preceding siblings ...)
  2023-11-08 14:42 ` jeffreyalaw at gmail dot com
@ 2023-11-08 18:59 ` dave.anglin at bell dot net
  2023-11-08 19:07 ` pinskia at gcc dot gnu.org
                   ` (37 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: dave.anglin at bell dot net @ 2023-11-08 18:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #17 from dave.anglin at bell dot net ---
On 2023-11-08 9:42 a.m., jeffreyalaw at gmail dot com wrote:
> I'd probably continue with the process of narrowing down what code is
> affected using the attributes.  We already know the file, narrowing it
> down to a function might help considerably with the evaluation effort.
The problem seems to be in compiler_visit_expr().

-static int compiler_visit_expr(struct compiler *, expr_ty);
+static int compiler_visit_expr(struct compiler *, expr_ty)
__attribute__((optimize("no-inline-small-functions")));

Python builds okay if this function is not inlined, if it is compiled at -O1,
or if -fno-inline-small-functions is
specified as above.  Can't specify -fno-fold-mem-offsets as a function
attribute.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (17 preceding siblings ...)
  2023-11-08 18:59 ` dave.anglin at bell dot net
@ 2023-11-08 19:07 ` pinskia at gcc dot gnu.org
  2023-11-08 19:16 ` law at gcc dot gnu.org
                   ` (36 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-11-08 19:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #18 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I wonder if -fno-strict-aliasing works around the issue too?
I get the feeling that `fold mem offset pass` allows the aliasing code to have
a better time with the offset and that might be expose more aliasing issues.

The other thing to try is add `-fno-schedule-insns2 -fno-schedule-insns`
instead of `-fno-strict-aliasing` as the scheduler is normally where the
aliasing issues are exposed on the RTL level ...

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (18 preceding siblings ...)
  2023-11-08 19:07 ` pinskia at gcc dot gnu.org
@ 2023-11-08 19:16 ` law at gcc dot gnu.org
  2023-11-08 19:40 ` dave.anglin at bell dot net
                   ` (35 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: law at gcc dot gnu.org @ 2023-11-08 19:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #19 from Jeffrey A. Law <law at gcc dot gnu.org> ---
f-m-o runs post-allocation, so the scope of where it's behavior can change
things is narrower.  So testing with -fno-schedule-insns isn't going to be
useful, but -fno-schedule-insns2 might.

I'm a bit concerned that we can't turn off f-m-o with an attribute.  That would
indicating something isn't wired up right in the options handling.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (19 preceding siblings ...)
  2023-11-08 19:16 ` law at gcc dot gnu.org
@ 2023-11-08 19:40 ` dave.anglin at bell dot net
  2023-11-08 23:33 ` pinskia at gcc dot gnu.org
                   ` (34 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: dave.anglin at bell dot net @ 2023-11-08 19:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #20 from dave.anglin at bell dot net ---
On 2023-11-08 2:07 p.m., pinskia at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415
>
> --- Comment #18 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
> I wonder if -fno-strict-aliasing works around the issue too?
> I get the feeling that `fold mem offset pass` allows the aliasing code to have
> a better time with the offset and that might be expose more aliasing issues.
>
> The other thing to try is add `-fno-schedule-insns2 -fno-schedule-insns`
> instead of `-fno-strict-aliasing` as the scheduler is normally where the
> aliasing issues are exposed on the RTL level ...
Both -fno-strict-aliasing and -fno-schedule-insns2 applied to
compiler_visit_expr()
work around issue.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (20 preceding siblings ...)
  2023-11-08 19:40 ` dave.anglin at bell dot net
@ 2023-11-08 23:33 ` pinskia at gcc dot gnu.org
  2023-11-08 23:40 ` danglin at gcc dot gnu.org
                   ` (33 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-11-08 23:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #21 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to dave.anglin from comment #20)
> Both -fno-strict-aliasing and -fno-schedule-insns2 applied to
> compiler_visit_expr()
> work around issue.

The other option to try is -fstack-reuse=none. There is definitely known issues
with the code that coalesces stack variables together too (see PR 111843 for
examples).

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (21 preceding siblings ...)
  2023-11-08 23:33 ` pinskia at gcc dot gnu.org
@ 2023-11-08 23:40 ` danglin at gcc dot gnu.org
  2023-11-08 23:51 ` sjames at gcc dot gnu.org
                   ` (32 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: danglin at gcc dot gnu.org @ 2023-11-08 23:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #22 from John David Anglin <danglin at gcc dot gnu.org> ---
Created attachment 56542
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56542&action=edit
Preprocessed source and assembly files for Python/compile.c

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (22 preceding siblings ...)
  2023-11-08 23:40 ` danglin at gcc dot gnu.org
@ 2023-11-08 23:51 ` sjames at gcc dot gnu.org
  2023-11-09  0:00 ` dave.anglin at bell dot net
                   ` (31 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: sjames at gcc dot gnu.org @ 2023-11-08 23:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #23 from Sam James <sjames at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #21)
> The other option to try is -fstack-reuse=none. There is definitely known
> issues with the code that coalesces stack variables together too (see PR
> 111843 for examples).

I had a good feeling about this but no, didn't help when applied to compile.o.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (23 preceding siblings ...)
  2023-11-08 23:51 ` sjames at gcc dot gnu.org
@ 2023-11-09  0:00 ` dave.anglin at bell dot net
  2023-11-09  0:02 ` sjames at gcc dot gnu.org
                   ` (30 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: dave.anglin at bell dot net @ 2023-11-09  0:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #24 from dave.anglin at bell dot net ---
On 2023-11-08 6:51 p.m., sjames at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415
>
> --- Comment #23 from Sam James <sjames at gcc dot gnu.org> ---
> (In reply to Andrew Pinski from comment #21)
>> The other option to try is -fstack-reuse=none. There is definitely known
>> issues with the code that coalesces stack variables together too (see PR
>> 111843 for examples).
> I had a good feeling about this but no, didn't help when applied to compile.o.
At this point, I don't know whether this is a python or gcc bug.  I scanned for
unions in compile.i
that might be problematic but I didn't find anything obvious.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (24 preceding siblings ...)
  2023-11-09  0:00 ` dave.anglin at bell dot net
@ 2023-11-09  0:02 ` sjames at gcc dot gnu.org
  2023-11-09  0:07 ` law at gcc dot gnu.org
                   ` (29 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: sjames at gcc dot gnu.org @ 2023-11-09  0:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #25 from Sam James <sjames at gcc dot gnu.org> ---
I am having the same thoughts. It would not be the first time Python had
something dubious, like...
* https://wiki.gentoo.org/wiki/Project:Python/Strict_aliasing ->
https://www.python.org/dev/peps/pep-3123/
* https://github.com/python/cpython/issues/111178

So far, I did not see this failure on any other target (-> makes me think it's
a gcc bug). But also, I didn't yet see any other software break on hppa (->
makes me think it might be a Python bug).

I tried ubsan on amd64 with Python 3.12 at least and got a lot of different
errors, although ubsan does not diagnose aliasing issues...

I am undecided myself still.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (25 preceding siblings ...)
  2023-11-09  0:02 ` sjames at gcc dot gnu.org
@ 2023-11-09  0:07 ` law at gcc dot gnu.org
  2023-11-09  0:08 ` dave.anglin at bell dot net
                   ` (28 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: law at gcc dot gnu.org @ 2023-11-09  0:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #26 from Jeffrey A. Law <law at gcc dot gnu.org> ---
As a compiler junkie, I tend to think compiler first until I can prove it
otherwise.  I wouldn't get too hung up on aliasing issues and such at this
point.

Do we already have a dump for the key function?  Presumably f-m-o doesn't
trigger *that* much.  And if this is triggering w/o LTO we can probably move to
cross debugging and analysis of those dump files and assembly code with and
without f-m-o enabled, narrowing our focus on the key function.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (26 preceding siblings ...)
  2023-11-09  0:07 ` law at gcc dot gnu.org
@ 2023-11-09  0:08 ` dave.anglin at bell dot net
  2023-11-09  0:23 ` dave.anglin at bell dot net
                   ` (27 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: dave.anglin at bell dot net @ 2023-11-09  0:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #27 from dave.anglin at bell dot net ---
On 2023-11-08 7:00 p.m., John David Anglin wrote:
> On 2023-11-08 6:51 p.m., sjames at gcc dot gnu.org wrote:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415
>>
>> --- Comment #23 from Sam James <sjames at gcc dot gnu.org> ---
>> (In reply to Andrew Pinski from comment #21)
>>> The other option to try is -fstack-reuse=none. There is definitely known
>>> issues with the code that coalesces stack variables together too (see PR
>>> 111843 for examples).
>> I had a good feeling about this but no, didn't help when applied to compile.o.
> At this point, I don't know whether this is a python or gcc bug. I scanned for unions in compile.i
> that might be problematic but I didn't find anything obvious.
Note -no-strict-aliasing affects the inlining of compiler_visit_expr.  It is
not inlined with -no-strict-aliasing.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (27 preceding siblings ...)
  2023-11-09  0:08 ` dave.anglin at bell dot net
@ 2023-11-09  0:23 ` dave.anglin at bell dot net
  2023-11-09 18:04 ` danglin at gcc dot gnu.org
                   ` (26 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: dave.anglin at bell dot net @ 2023-11-09  0:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #28 from dave.anglin at bell dot net ---
On 2023-11-08 7:07 p.m., law at gcc dot gnu.org wrote:
> Do we already have a dump for the key function?  Presumably f-m-o doesn't
> trigger*that*  much.  And if this is triggering w/o LTO we can probably move to
> cross debugging and analysis of those dump files and assembly code with and
> without f-m-o enabled, narrowing our focus on the key function.
I tried looking at the difference with and without f-m-o and it was quite
large.  The difference
with and without strict aliasing is much smaller.  The main differences that I
saw relate to the
inlining of compiler_visit_expr and compiler_visit_expr1.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (28 preceding siblings ...)
  2023-11-09  0:23 ` dave.anglin at bell dot net
@ 2023-11-09 18:04 ` danglin at gcc dot gnu.org
  2023-11-09 19:17 ` danglin at gcc dot gnu.org
                   ` (25 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: danglin at gcc dot gnu.org @ 2023-11-09 18:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #29 from John David Anglin <danglin at gcc dot gnu.org> ---
The miscompilation is in compiler_visit_expr:

(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program:
/home/dave/debian/python3.11/python3.11-3.11.6/build-static/Programs/_freeze_module
importlib._bootstrap ../Lib/importlib/_bootstrap.py
Python/frozen_modules/importlib._bootstrap.h
warning: Unable to find libthread_db matching inferior's thread library, thread
debugging will not be available.

Breakpoint 2, compiler_jump_if (c=0xf8f02508, e=0x5763f8, next=0xfaeaa908,
cond=0) at ../Python/compile.c:2898
2898    {
(gdb) watch *0xfaea51b8
Watchpoint 3: *0xfaea51b8
(gdb) c
Continuing.

Watchpoint 3: *0xfaea51b8

Old value = -85046408
New value = 43
0x0019c688 in compiler_visit_expr (e=0x576308, c=0xf8f02508) at
../Python/compile.c:5968
5968        SET_LOC(c, e);
(gdb) bt
#0  0x0019c688 in compiler_visit_expr (e=0x576308, c=0xf8f02508)
    at ../Python/compile.c:5968
#1  compiler_call_helper (c=0xf8f02508, n=0, args=<optimized out>,
    keywords=0x0) at ../Python/compile.c:5138
#2  0x0019ec70 in compiler_visit_expr (e=<optimized out>, c=0xf8f02508)
    at ../Python/compile.c:5969
#3  compiler_jump_if (c=0xf8f02508, e=<optimized out>, next=0x0,
    cond=<optimized out>) at ../Python/compile.c:2988
#4  0x001a0770 in compiler_if (s=0x0, c=0x5763c0) at ../Python/compile.c:3090
#5  compiler_visit_stmt (c=0x5763c0, s=0x0) at ../Python/compile.c:4118
#6  0x001a1378 in compiler_for (s=0x0, c=0x5763c0) at ../Python/compile.c:3124
#7  compiler_visit_stmt (c=0x5763c0, s=0x0) at ../Python/compile.c:4114
#8  0x001a3170 in compiler_function (c=0x2, s=<optimized out>,
    is_async=<optimized out>) at ../Python/compile.c:2670
#9  0x001a3438 in compiler_body (c=0x0, stmts=0x5763c0)
    at ../Python/compile.c:2180
#10 0x001a5cdc in compiler_mod (mod=0x0, c=0xf8f02528)
    at ../Python/compile.c:2197
#11 _PyAST_Compile (mod=0x0, filename=0xf8f02528, flags=<optimized out>,
    optimize=<optimized out>, arena=<optimized out>) at ../Python/compile.c:581
#12 0x001dea00 in Py_CompileStringObject (optimize=0, flags=0x5763c0, start=0,
    filename=0x2, str=0x0) at ../Python/pythonrun.c:1799
#13 Py_CompileStringExFlags (str=0x0, filename_str=<optimized out>, start=0,
--Type <RET> for more, q to quit, c to continue without paging--
    flags=0x5763c0, optimize=<optimized out>) at ../Python/pythonrun.c:1812
#14 0x000167a4 in compile_and_marshal (text=0x0,
    name=0x2 <error: Cannot access memory at address 0x2>)
    at ../Programs/_freeze_module.c:125
#15 main (argc=0, argv=<optimized out>) at ../Programs/_freeze_module.c:230
(gdb) diass $pc-16,$pc+16
Undefined command: "diass".  Try "help".
(gdb) disass $pc-16,$pc+16
Dump of assembler code from 0x19c678 to 0x19c698:
   0x0019c678 <compiler_call_helper+576>:       ldw 14(r25),ret1
   0x0019c67c <compiler_call_helper+580>:       ldw 18(r25),r31
   0x0019c680 <compiler_call_helper+584>:       ldw 1c(r25),ret0
   0x0019c684 <compiler_call_helper+588>:       stw r23,0(r22)
=> 0x0019c688 <compiler_call_helper+592>:       stw ret1,0(r21)
   0x0019c68c <compiler_call_helper+596>:       stw r31,0(r20)
   0x0019c690 <compiler_call_helper+600>:       b,l 0x198d58
<compiler_visit_expr1>,rp
   0x0019c694 <compiler_call_helper+604>:       stw ret0,0(r19)
End of assembler dump.

The code at 0x0019c688 clobbers the value at c->u->u_ste:
(gdb) p/x $r21
$35 = 0xfaea51b8
(gdb) p/x *c
$36 = {c_filename = 0xfaed9480, c_st = 0xfaeafd10, c_future = 0xfaef7030,
  c_flags = 0xf8f02544, c_optimize = 0x0, c_interactive = 0x0,
  c_nestlevel = 0x2, c_const_cache = 0xfae81280, u = 0xfaea51b8,
  c_stack = 0xfae57a88, c_arena = 0xfaec0c90}
(gdb) p/x *c->u
$37 = {u_ste = 0x2b, u_name = 0xfae7ff80, u_qualname = 0xfae7ff80,
  u_scope_type = 0x2, u_consts = 0xfaeaa7f8, u_names = 0xfaeaa7d0,
  u_varnames = 0xfaeaa780, u_cellvars = 0xfaeaa7a8, u_freevars = 0xfaeaa758,
  u_private = 0x0, u_argcount = 0x2, u_posonlyargcount = 0x0,
  u_kwonlyargcount = 0x0, u_blocks = 0xfaeaa908, u_curblock = 0xfaeaa868,
  u_nfblocks = 0x1, u_fblock = {{fb_type = 0x1, fb_block = 0xfaeaa840,
      fb_exit = 0xfaeaa8b8, fb_datum = 0x0}, {fb_type = 0x0, fb_block = 0x0,
      fb_exit = 0x0, fb_datum = 0x0} <repeats 19 times>},
  u_firstlineno = 0x28, u_lineno = 0x2b, u_col_offset = 0xb,
  u_end_lineno = 0x2b, u_end_col_offset = 0x20,
  u_need_new_implicit_block = 0x0}
(gdb) p/x $r23
$38 = 0x2b

#define SET_LOC(c, x)                           \
    (c)->u->u_lineno = (x)->lineno;             \
    (c)->u->u_col_offset = (x)->col_offset;     \
    (c)->u->u_end_lineno = (x)->end_lineno;     \
    (c)->u->u_end_col_offset = (x)->end_col_offset;

(gdb) p/x *e
$40 = {kind = 0x18, v = {BoolOp = {op = 0xfaeb8b60, values = 0x1},
    NamedExpr = {target = 0xfaeb8b60, value = 0x1}, BinOp = {
      left = 0xfaeb8b60, op = 0x1, right = 0x0}, UnaryOp = {op = 0xfaeb8b60,
      operand = 0x1}, Lambda = {args = 0xfaeb8b60, body = 0x1}, IfExp = {
      test = 0xfaeb8b60, body = 0x1, orelse = 0x0}, Dict = {keys = 0xfaeb8b60,
      values = 0x1}, Set = {elts = 0xfaeb8b60}, ListComp = {elt = 0xfaeb8b60,
      generators = 0x1}, SetComp = {elt = 0xfaeb8b60, generators = 0x1},
    DictComp = {key = 0xfaeb8b60, value = 0x1, generators = 0x0},
    GeneratorExp = {elt = 0xfaeb8b60, generators = 0x1}, Await = {
      value = 0xfaeb8b60}, Yield = {value = 0xfaeb8b60}, YieldFrom = {
      value = 0xfaeb8b60}, Compare = {left = 0xfaeb8b60, ops = 0x1,
      comparators = 0x0}, Call = {func = 0xfaeb8b60, args = 0x1,
      keywords = 0x0}, FormattedValue = {value = 0xfaeb8b60, conversion = 0x1,
      format_spec = 0x0}, JoinedStr = {values = 0xfaeb8b60}, Constant = {
      value = 0xfaeb8b60, kind = 0x1}, Attribute = {value = 0xfaeb8b60,
      attr = 0x1, ctx = 0x0}, Subscript = {value = 0xfaeb8b60, slice = 0x1,
      ctx = 0x0}, Starred = {value = 0xfaeb8b60, ctx = 0x1}, Name = {
      id = 0xfaeb8b60, ctx = 0x1}, List = {elts = 0xfaeb8b60, ctx = 0x1},
    Tuple = {elts = 0xfaeb8b60, ctx = 0x1}, Slice = {lower = 0xfaeb8b60,
      upper = 0x1, step = 0x0}}, lineno = 0x2b, col_offset = 0x18,
  end_lineno = 0x2b, end_col_offset = 0x1f}

Seems like an offset issue.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (29 preceding siblings ...)
  2023-11-09 18:04 ` danglin at gcc dot gnu.org
@ 2023-11-09 19:17 ` danglin at gcc dot gnu.org
  2023-11-09 20:28 ` law at gcc dot gnu.org
                   ` (24 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: danglin at gcc dot gnu.org @ 2023-11-09 19:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #30 from John David Anglin <danglin at gcc dot gnu.org> ---
   0x0019c684 <+588>:   stw r23,0(r22)
=> 0x0019c688 <+592>:   stw ret1,0(r21)
   0x0019c68c <+596>:   stw r31,0(r20)
   0x0019c690 <+600>:   b,l 0x198d58 <compiler_visit_expr1>,rp
   0x0019c694 <+604>:   stw ret0,0(r19)

These instructions are in a loop:

    /* No * or ** args, so can use faster calling sequence */
    for (i = 0; i < nelts; i++) {
        expr_ty elt = asdl_seq_GET(args, i);
        assert(elt->kind != Starred_kind);
        VISIT(c, expr, elt);
    }

r21 is clobbered by VISIT call.  Value is okay in first iteration.

The initialization instructions are outside the loop:

   0x0019c638 <+512>:   ldo 184(r19),r22
   0x0019c63c <+516>:   ldw 184(r19),r14
   0x0019c640 <+520>:   ldo 188(r19),r21
   0x0019c644 <+524>:   ldw 188(r19),r13
   0x0019c648 <+528>:   ldo 18c(r19),r20
   0x0019c64c <+532>:   ldw 18c(r19),r12
   0x0019c650 <+536>:   ldw 190(r19),r11

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (30 preceding siblings ...)
  2023-11-09 19:17 ` danglin at gcc dot gnu.org
@ 2023-11-09 20:28 ` law at gcc dot gnu.org
  2023-11-09 20:41 ` dave.anglin at bell dot net
                   ` (23 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: law at gcc dot gnu.org @ 2023-11-09 20:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #31 from Jeffrey A. Law <law at gcc dot gnu.org> ---
IIRC r21 is call-clobbered.  So I guess the question turns into what was the
sequence before f-m-o got involved -- was it assuming r21 would be preserved,
or did f-m-o make r21 live across the call?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (31 preceding siblings ...)
  2023-11-09 20:28 ` law at gcc dot gnu.org
@ 2023-11-09 20:41 ` dave.anglin at bell dot net
  2023-11-09 23:41 ` danglin at gcc dot gnu.org
                   ` (22 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: dave.anglin at bell dot net @ 2023-11-09 20:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #32 from dave.anglin at bell dot net ---
At this point, I don't have gcc-14 builds that bracket the f-m-o change.  Maybe
Sam can check.

I'm trying to determine RTL pass where things go bad.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (32 preceding siblings ...)
  2023-11-09 20:41 ` dave.anglin at bell dot net
@ 2023-11-09 23:41 ` danglin at gcc dot gnu.org
  2023-11-11 19:40 ` danglin at gcc dot gnu.org
                   ` (21 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: danglin at gcc dot gnu.org @ 2023-11-09 23:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #33 from John David Anglin <danglin at gcc dot gnu.org> ---
Created attachment 56549
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56549&action=edit
ira and reload dumps for compiler_call_helper

The incorrect code for insn 246 in compiler_call_helper appears in the reload
pass.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (33 preceding siblings ...)
  2023-11-09 23:41 ` danglin at gcc dot gnu.org
@ 2023-11-11 19:40 ` danglin at gcc dot gnu.org
  2023-11-11 19:51 ` sjames at gcc dot gnu.org
                   ` (20 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: danglin at gcc dot gnu.org @ 2023-11-11 19:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #34 from John David Anglin <danglin at gcc dot gnu.org> ---
Same wrong code is generated with x86-64 cross to hppa-linux-gnu. This it seems
this bug is not due to gcc being miscompiled.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (34 preceding siblings ...)
  2023-11-11 19:40 ` danglin at gcc dot gnu.org
@ 2023-11-11 19:51 ` sjames at gcc dot gnu.org
  2023-11-11 20:00 ` danglin at gcc dot gnu.org
                   ` (19 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: sjames at gcc dot gnu.org @ 2023-11-11 19:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #35 from Sam James <sjames at gcc dot gnu.org> ---
If you still need dumps off me, please let me know which. I've attached those
w/ f-o-m on for the fold-mem-offsets pass. If you need others, just say.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (35 preceding siblings ...)
  2023-11-11 19:51 ` sjames at gcc dot gnu.org
@ 2023-11-11 20:00 ` danglin at gcc dot gnu.org
  2023-11-11 20:06 ` danglin at gcc dot gnu.org
                   ` (18 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: danglin at gcc dot gnu.org @ 2023-11-11 20:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #36 from John David Anglin <danglin at gcc dot gnu.org> ---
Created attachment 56562
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56562&action=edit
fold_mem_offsets, prop_hardreg, rtl_dce and bbro dumps

Comment #33 is wrong.  The issue is not reload.  It's okay to pick a
call clobbered register as the code stands.

The initialization of the register used for the store at
offset 392B ends up outside the loop.  It ends up in a call clobbered
register and clobbered by the call to compiler_visit_expr1 in the loop.
This occurs around the second call to compiler_visit_expr1 in
compiler_call_helper

Various initializations get moved out of the loop between the f-m-o and bbro
passes.  I think it's the bbro pass that's at fault but it could be something
that happens before that causes the initialization to get moved outside the
loop.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (36 preceding siblings ...)
  2023-11-11 20:00 ` danglin at gcc dot gnu.org
@ 2023-11-11 20:06 ` danglin at gcc dot gnu.org
  2023-11-11 20:19 ` sjames at gcc dot gnu.org
                   ` (17 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: danglin at gcc dot gnu.org @ 2023-11-11 20:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #37 from John David Anglin <danglin at gcc dot gnu.org> ---
(In reply to Sam James from comment #35)
> If you still need dumps off me, please let me know which. I've attached
> those w/ f-o-m on for the fold-mem-offsets pass. If you need others, just
> say.

I have a set of dumps.  The problem is determining where the wrong RTL
occurs in compiler_call_helper.  It changes a lot in pass to pass.

Many of the changes in f-m-o seem to get destroyed by later transformations.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (37 preceding siblings ...)
  2023-11-11 20:06 ` danglin at gcc dot gnu.org
@ 2023-11-11 20:19 ` sjames at gcc dot gnu.org
  2023-11-11 21:54 ` danglin at gcc dot gnu.org
                   ` (16 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: sjames at gcc dot gnu.org @ 2023-11-11 20:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

Sam James <sjames at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2023-11-11

--- Comment #38 from Sam James <sjames at gcc dot gnu.org> ---
Confirming since Dave repro'd too.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (38 preceding siblings ...)
  2023-11-11 20:19 ` sjames at gcc dot gnu.org
@ 2023-11-11 21:54 ` danglin at gcc dot gnu.org
  2023-11-12 15:05 ` danglin at gcc dot gnu.org
                   ` (15 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: danglin at gcc dot gnu.org @ 2023-11-11 21:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #39 from John David Anglin <danglin at gcc dot gnu.org> ---
In the f-m-o pass, the following three insns that set call clobbered
registers r20-r22 are pulled from loop:

(insn 186 183 190 29 (set (reg/f:SI 22 %r22 [478])
        (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
            (const_int 388 [0x184]))) "../Python/compile.c":5964:9 120 {addsi3}
     (nil))
(insn 190 186 187 29 (set (reg/f:SI 21 %r21 [479])
        (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
            (const_int 392 [0x188]))) "../Python/compile.c":5964:9 120 {addsi3}
     (nil))
(insn 194 191 195 29 (set (reg/f:SI 20 %r20 [480])
        (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
            (const_int 396 [0x18c]))) "../Python/compile.c":5964:9 120 {addsi3}
     (nil))

They are used in the following insns before call to compiler_visit_expr1:

(insn 242 238 258 32 (set (mem:SI (reg/f:SI 22 %r22 [478]) [4 MEM[(int
*)prephit
mp_37 + 388B]+0 S4 A32])
        (reg:SI 23 %r23 [orig:173 vect__102.2442 ] [173]))
"../Python/compile.c"
:5968:22 42 {*pa.md:2193}
     (expr_list:REG_DEAD (reg:SI 23 %r23 [orig:173 vect__102.2442 ] [173])
        (expr_list:REG_DEAD (reg/f:SI 22 %r22 [478])
            (nil))))
(insn 258 242 246 32 (set (reg:SI 26 %r26)
        (reg/v/f:SI 5 %r5 [orig:198 c ] [198])) "../Python/compile.c":5969:15
42 {*pa.md:2193}
     (nil))
(insn 246 258 250 32 (set (mem:SI (reg/f:SI 21 %r21 [479]) [4 MEM[(int
*)prephitmp_37 + 392B]+0 S4 A32])
        (reg:SI 29 %r29 [orig:169 vect__102.2443 ] [169]))
"../Python/compile.c":5968:22 42 {*pa.md:2193}
     (expr_list:REG_DEAD (reg:SI 29 %r29 [orig:169 vect__102.2443 ] [169])
        (expr_list:REG_DEAD (reg/f:SI 21 %r21 [479])
            (nil))))
(insn 250 246 254 32 (set (mem:SI (reg/f:SI 20 %r20 [480]) [4 MEM[(int
*)prephitmp_37 + 396B]+0 S4 A32])
        (reg:SI 31 %r31 [orig:145 vect__102.2444 ] [145]))
"../Python/compile.c":5968:22 42 {*pa.md:2193}
     (expr_list:REG_DEAD (reg:SI 31 %r31 [orig:145 vect__102.2444 ] [145])
        (expr_list:REG_DEAD (reg/f:SI 20 %r20 [480])
            (nil))))

After the call, we have:

(insn 1241 269 273 30 (set (reg/f:SI 22 %r22 [478])
        (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]))
"../Python/compile.c":5970:20 -1
     (nil))
(insn 273 1241 1242 30 (set (mem:SI (plus:SI (reg/f:SI 22 %r22 [478])
                (const_int 388 [0x184])) [4 MEM[(int *)_107 + 388B]+0 S4 A32])
        (reg:SI 14 %r14 [orig:167 vect_pretmp_36.2450 ] [167]))
"../Python/compile.c":5970:20 42 {*pa.md:2193}
     (nil))
(insn 1242 273 277 30 (set (reg/f:SI 21 %r21 [479])
        (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]))
"../Python/compile.c":5970:20 -1
     (nil))
(insn 277 1242 1243 30 (set (mem:SI (plus:SI (reg/f:SI 21 %r21 [479])
                (const_int 392 [0x188])) [4 MEM[(int *)_107 + 392B]+0 S4 A32])
        (reg:SI 13 %r13 [orig:156 vect_pretmp_36.2451 ] [156]))
"../Python/compile.c":5970:20 42 {*pa.md:2193}
     (nil))
(insn 1243 277 281 30 (set (reg/f:SI 20 %r20 [480])
        (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]))
"../Python/compile.c":5970:20 -1
     (nil))
(insn 281 1243 299 30 (set (mem:SI (plus:SI (reg/f:SI 20 %r20 [480])
                (const_int 396 [0x18c])) [4 MEM[(int *)_107 + 396B]+0 S4 A32])
        (reg:SI 12 %r12 [orig:134 vect_pretmp_36.2452 ] [134]))
"../Python/compile.c":5970:20 42 {*pa.md:2193}
     (nil))

We have lost the offsets that were added initially to r20, r21 and r22.

Previous ce3 pass had:

(insn 272 269 273 30 (set (reg/f:SI 22 %r22 [478])
        (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
            (const_int 388 [0x184]))) "../Python/compile.c":5970:20 120
{addsi3}
     (nil))
(insn 273 272 276 30 (set (mem:SI (reg/f:SI 22 %r22 [478]) [4 MEM[(int *)_107 +
388B]+0 S4 A32])
        (reg:SI 14 %r14 [orig:167 vect_pretmp_36.2450 ] [167]))
"../Python/compile.c":5970:20 42 {*pa.md:2193}
     (nil))
(insn 276 273 277 30 (set (reg/f:SI 21 %r21 [479])
        (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
            (const_int 392 [0x188]))) "../Python/compile.c":5970:20 120
{addsi3}
     (nil))
(insn 277 276 280 30 (set (mem:SI (reg/f:SI 21 %r21 [479]) [4 MEM[(int *)_107 +
392B]+0 S4 A32])
        (reg:SI 13 %r13 [orig:156 vect_pretmp_36.2451 ] [156]))
"../Python/compile.c":5970:20 42 {*pa.md:2193}
     (nil))
(insn 280 277 281 30 (set (reg/f:SI 20 %r20 [480])
        (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
            (const_int 396 [0x18c]))) "../Python/compile.c":5970:20 120
{addsi3}
     (nil))
(insn 281 280 284 30 (set (mem:SI (reg/f:SI 20 %r20 [480]) [4 MEM[(int *)_107 +
396B]+0 S4 A32])
        (reg:SI 12 %r12 [orig:134 vect_pretmp_36.2452 ] [134]))
"../Python/compile.c":5970:20 42 {*pa.md:2193}
     (nil))

So, this is a f-m-o bug.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (39 preceding siblings ...)
  2023-11-11 21:54 ` danglin at gcc dot gnu.org
@ 2023-11-12 15:05 ` danglin at gcc dot gnu.org
  2023-11-12 15:54 ` law at gcc dot gnu.org
                   ` (14 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: danglin at gcc dot gnu.org @ 2023-11-12 15:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #40 from John David Anglin <danglin at gcc dot gnu.org> ---
Jeff,

I don't think these split instructions make a lot of sense on PA-RISC.

(insn 280 277 281 30 (set (reg/f:SI 20 %r20 [480])
        (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
            (const_int 396 [0x18c]))) "../Python/compile.c":5970:20 120
{addsi3}
     (nil))
(insn 281 280 284 30 (set (mem:SI (reg/f:SI 20 %r20 [480]) [4 MEM[(int *)_107 +
396B]+0 S4 A32])
        (reg:SI 12 %r12 [orig:134 vect_pretmp_36.2452 ] [134]))
"../Python/compile.c":5970:20 42 {*pa.md:2193}
     (nil))

They increase code size and register pressure.  That may lead to unnecessary
spills and longer branches.  They increase probability of problems like the
one in this PR.

I suspect the two instructions generated are actually slower than one with a
nonzero memory offset.  It's not clear that memory accesses with a zero offset
are faster than ones with nonzero offsets.

Integer loads and stores on pa support fairly large offsets.

I think we need to look at why this happens frequently.

Thoughts?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (40 preceding siblings ...)
  2023-11-12 15:05 ` danglin at gcc dot gnu.org
@ 2023-11-12 15:54 ` law at gcc dot gnu.org
  2023-11-12 23:59 ` danglin at gcc dot gnu.org
                   ` (13 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: law at gcc dot gnu.org @ 2023-11-12 15:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #41 from Jeffrey A. Law <law at gcc dot gnu.org> ---
I would agree.  In fact,the whole point of the f-m-o pass is to bring those
immediates into the memory reference.  It'd be really useful to know why that
isn't happening.

The only thing I can think of would be if multiple instructions needed the %r20
in the RTL you attached.  Which might point to a refinement we should make in
f-m-o, specifically the transformation isn't likely profitable if we aren't
able to fold away a term or fold a constant term into the actual memory
reference.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (41 preceding siblings ...)
  2023-11-12 15:54 ` law at gcc dot gnu.org
@ 2023-11-12 23:59 ` danglin at gcc dot gnu.org
  2023-11-13  0:24 ` law at gcc dot gnu.org
                   ` (12 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: danglin at gcc dot gnu.org @ 2023-11-12 23:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #42 from John David Anglin <danglin at gcc dot gnu.org> ---
The problem is we are limiting displacements to five bits in
pa_legitimate_address_p.  The comment is somewhat confusing but
we may have reload issues if we allow 14-bit displacements before
reload completes.  Testing a patch to see if we can allow 14-bit
displacements before reload.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (42 preceding siblings ...)
  2023-11-12 23:59 ` danglin at gcc dot gnu.org
@ 2023-11-13  0:24 ` law at gcc dot gnu.org
  2023-11-13  9:33 ` manolis.tsamis at vrull dot eu
                   ` (11 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: law at gcc dot gnu.org @ 2023-11-13  0:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #43 from Jeffrey A. Law <law at gcc dot gnu.org> ---
I would expect allowing larger offsets before reload to be a significant
problem.

The core issue is integer memory operations allow 14 bits while FP only allows
5.  During reloading we don't know if any given memory reference is FP or
integer.  xmpyu plays a role here too since it's going to require FP registers
in integer modes.

But what I don't understand is why f-m-o fails to push the offset into the
memory reference -- it should be conditional on the insn being recognized.  And
since it's after reload we know if we're doing an FP or integer load.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (43 preceding siblings ...)
  2023-11-13  0:24 ` law at gcc dot gnu.org
@ 2023-11-13  9:33 ` manolis.tsamis at vrull dot eu
  2023-11-13  9:37 ` manolis.tsamis at vrull dot eu
                   ` (10 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: manolis.tsamis at vrull dot eu @ 2023-11-13  9:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #44 from Manolis Tsamis <manolis.tsamis at vrull dot eu> ---
(In reply to John David Anglin from comment #39)
> In the f-m-o pass, the following three insns that set call clobbered
> registers r20-r22 are pulled from loop:
> 
> (insn 186 183 190 29 (set (reg/f:SI 22 %r22 [478])
>         (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
>             (const_int 388 [0x184]))) "../Python/compile.c":5964:9 120
> {addsi3}
>      (nil))
> (insn 190 186 187 29 (set (reg/f:SI 21 %r21 [479])
>         (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
>             (const_int 392 [0x188]))) "../Python/compile.c":5964:9 120
> {addsi3}
>      (nil))
> (insn 194 191 195 29 (set (reg/f:SI 20 %r20 [480])
>         (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
>             (const_int 396 [0x18c]))) "../Python/compile.c":5964:9 120
> {addsi3}
>      (nil))
> 
> They are used in the following insns before call to compiler_visit_expr1:
> 
> (insn 242 238 258 32 (set (mem:SI (reg/f:SI 22 %r22 [478]) [4 MEM[(int
> *)prephit
> mp_37 + 388B]+0 S4 A32])
>         (reg:SI 23 %r23 [orig:173 vect__102.2442 ] [173]))
> "../Python/compile.c"
> :5968:22 42 {*pa.md:2193}
>      (expr_list:REG_DEAD (reg:SI 23 %r23 [orig:173 vect__102.2442 ] [173])
>         (expr_list:REG_DEAD (reg/f:SI 22 %r22 [478])
>             (nil))))
> (insn 258 242 246 32 (set (reg:SI 26 %r26)
>         (reg/v/f:SI 5 %r5 [orig:198 c ] [198]))
> "../Python/compile.c":5969:15 42 {*pa.md:2193}
>      (nil))
> (insn 246 258 250 32 (set (mem:SI (reg/f:SI 21 %r21 [479]) [4 MEM[(int
> *)prephitmp_37 + 392B]+0 S4 A32])
>         (reg:SI 29 %r29 [orig:169 vect__102.2443 ] [169]))
> "../Python/compile.c":5968:22 42 {*pa.md:2193}
>      (expr_list:REG_DEAD (reg:SI 29 %r29 [orig:169 vect__102.2443 ] [169])
>         (expr_list:REG_DEAD (reg/f:SI 21 %r21 [479])
>             (nil))))
> (insn 250 246 254 32 (set (mem:SI (reg/f:SI 20 %r20 [480]) [4 MEM[(int
> *)prephitmp_37 + 396B]+0 S4 A32])
>         (reg:SI 31 %r31 [orig:145 vect__102.2444 ] [145]))
> "../Python/compile.c":5968:22 42 {*pa.md:2193}
>      (expr_list:REG_DEAD (reg:SI 31 %r31 [orig:145 vect__102.2444 ] [145])
>         (expr_list:REG_DEAD (reg/f:SI 20 %r20 [480])
>             (nil))))
> 
> After the call, we have:
> 
> (insn 1241 269 273 30 (set (reg/f:SI 22 %r22 [478])
>         (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]))
> "../Python/compile.c":5970:20 -1
>      (nil))
> (insn 273 1241 1242 30 (set (mem:SI (plus:SI (reg/f:SI 22 %r22 [478])
>                 (const_int 388 [0x184])) [4 MEM[(int *)_107 + 388B]+0 S4
> A32])
>         (reg:SI 14 %r14 [orig:167 vect_pretmp_36.2450 ] [167]))
> "../Python/compile.c":5970:20 42 {*pa.md:2193}
>      (nil))
> (insn 1242 273 277 30 (set (reg/f:SI 21 %r21 [479])
>         (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]))
> "../Python/compile.c":5970:20 -1
>      (nil))
> (insn 277 1242 1243 30 (set (mem:SI (plus:SI (reg/f:SI 21 %r21 [479])
>                 (const_int 392 [0x188])) [4 MEM[(int *)_107 + 392B]+0 S4
> A32])
>         (reg:SI 13 %r13 [orig:156 vect_pretmp_36.2451 ] [156]))
> "../Python/compile.c":5970:20 42 {*pa.md:2193}
>      (nil))
> (insn 1243 277 281 30 (set (reg/f:SI 20 %r20 [480])
>         (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]))
> "../Python/compile.c":5970:20 -1
>      (nil))
> (insn 281 1243 299 30 (set (mem:SI (plus:SI (reg/f:SI 20 %r20 [480])
>                 (const_int 396 [0x18c])) [4 MEM[(int *)_107 + 396B]+0 S4
> A32])
>         (reg:SI 12 %r12 [orig:134 vect_pretmp_36.2452 ] [134]))
> "../Python/compile.c":5970:20 42 {*pa.md:2193}
>      (nil))
> 
> We have lost the offsets that were added initially to r20, r21 and r22.
> 
> Previous ce3 pass had:
> 
> (insn 272 269 273 30 (set (reg/f:SI 22 %r22 [478])
>         (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
>             (const_int 388 [0x184]))) "../Python/compile.c":5970:20 120
> {addsi3}
>      (nil))
> (insn 273 272 276 30 (set (mem:SI (reg/f:SI 22 %r22 [478]) [4 MEM[(int
> *)_107 + 388B]+0 S4 A32])
>         (reg:SI 14 %r14 [orig:167 vect_pretmp_36.2450 ] [167]))
> "../Python/compile.c":5970:20 42 {*pa.md:2193}
>      (nil))
> (insn 276 273 277 30 (set (reg/f:SI 21 %r21 [479])
>         (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
>             (const_int 392 [0x188]))) "../Python/compile.c":5970:20 120
> {addsi3}
>      (nil))
> (insn 277 276 280 30 (set (mem:SI (reg/f:SI 21 %r21 [479]) [4 MEM[(int
> *)_107 + 392B]+0 S4 A32])
>         (reg:SI 13 %r13 [orig:156 vect_pretmp_36.2451 ] [156]))
> "../Python/compile.c":5970:20 42 {*pa.md:2193}
>      (nil))
> (insn 280 277 281 30 (set (reg/f:SI 20 %r20 [480])
>         (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
>             (const_int 396 [0x18c]))) "../Python/compile.c":5970:20 120
> {addsi3}
>      (nil))
> (insn 281 280 284 30 (set (mem:SI (reg/f:SI 20 %r20 [480]) [4 MEM[(int
> *)_107 + 396B]+0 S4 A32])
>         (reg:SI 12 %r12 [orig:134 vect_pretmp_36.2452 ] [134]))
> "../Python/compile.c":5970:20 42 {*pa.md:2193}
>      (nil))
> 
> So, this is a f-m-o bug.

Hi Dave,

I don't see an f-m-o bug here. The offsets aren't lost, they're just moved in
the corresponding memory loads/stores. If you look the stores in ce3  they
don't have offsets whereas after f-m-o they have. E.g. in ce3: (insn 273 272
276 30 (set (mem:SI (reg/f:SI 22 %r22 [478]) ...) but in f-m-o it is (insn 273
1241 1242 30 (set (mem:SI (plus:SI (reg/f:SI 22 %r22 [478]) (const_int 388
[0x184]) ...).

This is the way that f-m-o works. It can also be seen in the f-m-o dumps, where
offsets changes to memory ops are reported as 'Memory offset changed' and
instructions which got their offset propagated (like insns 272, 276, 280) are
reported as 'Instruction folded':

Memory offset changed from 0 to 388 for instruction:
(insn 273 272 276 30 (set (mem:SI (reg/f:SI 22 %r22 [478]) [4 MEM[(int *)_107 +
388B]+0 S4 A32])
        (reg:SI 14 %r14 [orig:167 vect_pretmp_36.2450 ] [167]))
"../Python/compile.c":5970:20 42 {*pa.md:2193}
     (nil))
deferring rescan insn with uid = 273.
Memory offset changed from 0 to 392 for instruction:
(insn 277 276 280 30 (set (mem:SI (reg/f:SI 21 %r21 [479]) [4 MEM[(int *)_107 +
392B]+0 S4 A32])
        (reg:SI 13 %r13 [orig:156 vect_pretmp_36.2451 ] [156]))
"../Python/compile.c":5970:20 42 {*pa.md:2193}
     (nil))
deferring rescan insn with uid = 277.
Memory offset changed from 0 to 396 for instruction:
(insn 281 280 284 30 (set (mem:SI (reg/f:SI 20 %r20 [480]) [4 MEM[(int *)_107 +
396B]+0 S4 A32])
        (reg:SI 12 %r12 [orig:134 vect_pretmp_36.2452 ] [134]))
"../Python/compile.c":5970:20 42 {*pa.md:2193}
     (nil))
deferring rescan insn with uid = 281.
Memory offset changed from 0 to 400 for instruction:
(insn 285 301 286 30 (set (mem:SI (reg/f:SI 19 %r19 [481]) [4 MEM[(int *)_107 +
400B]+0 S4 A32])
        (reg:SI 11 %r11 [orig:133 vect_pretmp_36.2453 ] [133]))
"../Python/compile.c":5970:20 42 {*pa.md:2193}
     (nil))
deferring rescan insn with uid = 285.
Instruction folded:(insn 272 269 273 30 (set (reg/f:SI 22 %r22 [478])
        (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
            (const_int 388 [0x184]))) "../Python/compile.c":5970:20 120
{addsi3}
     (nil))
deferring rescan insn with uid = 1241.
deferring rescan insn with uid = 1241.
deferring deletion of insn with uid = 272.
Instruction folded:(insn 276 273 277 30 (set (reg/f:SI 21 %r21 [479])
        (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
            (const_int 392 [0x188]))) "../Python/compile.c":5970:20 120
{addsi3}
     (nil))
deferring rescan insn with uid = 1242.
deferring rescan insn with uid = 1242.
deferring deletion of insn with uid = 276.
Instruction folded:(insn 280 277 281 30 (set (reg/f:SI 20 %r20 [480])
        (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
            (const_int 396 [0x18c]))) "../Python/compile.c":5970:20 120
{addsi3}
     (nil))
deferring rescan insn with uid = 1243.
deferring rescan insn with uid = 1243.
deferring deletion of insn with uid = 280.
Instruction folded:(insn 284 281 299 30 (set (reg/f:SI 19 %r19 [481])
        (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
            (const_int 400 [0x190]))) "../Python/compile.c":5970:20 120
{addsi3}
     (nil))

If I'm missing something that makes this illegal please explain it to me.

Thanks,
Manolis

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (44 preceding siblings ...)
  2023-11-13  9:33 ` manolis.tsamis at vrull dot eu
@ 2023-11-13  9:37 ` manolis.tsamis at vrull dot eu
  2023-11-13 13:20 ` manolis.tsamis at vrull dot eu
                   ` (9 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: manolis.tsamis at vrull dot eu @ 2023-11-13  9:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #45 from Manolis Tsamis <manolis.tsamis at vrull dot eu> ---
(In reply to Jeffrey A. Law from comment #41)
> I would agree.  In fact,the whole point of the f-m-o pass is to bring those
> immediates into the memory reference.  It'd be really useful to know why
> that isn't happening.
> 
> The only thing I can think of would be if multiple instructions needed the
> %r20 in the RTL you attached.  Which might point to a refinement we should
> make in f-m-o, specifically the transformation isn't likely profitable if we
> aren't able to fold away a term or fold a constant term into the actual
> memory reference.

Jeff,

I'm confused about "It'd be really useful to know why that isn't happening.".
It can be seen in Dave's dumps that it *is* happening, e.g.:

Memory offset changed from 0 to 396 for instruction:
(insn 281 280 284 30 (set (mem:SI (reg/f:SI 20 %r20 [480]) [4 MEM[(int *)_107 +
396B]+0 S4 A32])
        (reg:SI 12 %r12 [orig:134 vect_pretmp_36.2452 ] [134]))
"../Python/compile.c":5970:20 42 {*pa.md:2193}
     (nil))

Instruction folded:(insn 280 277 281 30 (set (reg/f:SI 20 %r20 [480])
        (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
            (const_int 396 [0x18c]))) "../Python/compile.c":5970:20 120
{addsi3}
     (nil))

If you looks at the RTL in f-m-o all these offsets are indeed moved in the
respective load/store. I don't know if cprop afterwards manages to eliminate
the unwanted move, but f-m-o does what it's supposed to do in this case.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (45 preceding siblings ...)
  2023-11-13  9:37 ` manolis.tsamis at vrull dot eu
@ 2023-11-13 13:20 ` manolis.tsamis at vrull dot eu
  2023-11-13 15:06 ` dave.anglin at bell dot net
                   ` (8 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: manolis.tsamis at vrull dot eu @ 2023-11-13 13:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #46 from Manolis Tsamis <manolis.tsamis at vrull dot eu> ---
I have reproduced the segfault with f-m-o limited to only fold insn 272 from
compiler_call_helper. The exact transformation is:

Memory offset changed from 0 to 388 for instruction:
(insn 273 272 276 30 (set (mem:SI (reg/f:SI 22 %r22 [478]) [4 MEM[(intD.1
*)_107 + 388B]+0 S4 A32])
        (reg:SI 14 %r14 [orig:167 vect_pretmp_36.2448D.32932 ] [167]))
"Python/compile.c":5970:20 42 {*pa.md:2193}
     (nil))
deferring rescan insn with uid = 273.
Instruction folded:(insn 272 269 273 30 (set (reg/f:SI 22 %r22 [478])
        (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
            (const_int 388 [0x184]))) "Python/compile.c":5970:20 120 {addsi3}
     (nil))

This instruction is also included to the ones that Dave mentioned. Again, if
I'm missing something as to why this transformation is illegal please tell me.
Given these are also consecutive instructions, I'm just seeing here that 

%r22 = %r19 + 388
[%r22] = %r14

is transformed to

%r22 = %r19
[%r22 + 388] = %r14

I haven't tracked all other uses of %r22 yet, but in theory if there was any
non-foldable use of that register then the transformation wouldn't be made.

Manolis

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (46 preceding siblings ...)
  2023-11-13 13:20 ` manolis.tsamis at vrull dot eu
@ 2023-11-13 15:06 ` dave.anglin at bell dot net
  2023-11-13 15:26 ` manolis.tsamis at vrull dot eu
                   ` (7 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: dave.anglin at bell dot net @ 2023-11-13 15:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #47 from dave.anglin at bell dot net ---
On 2023-11-13 4:33 a.m., manolis.tsamis at vrull dot eu wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415
>
> --- Comment #44 from Manolis Tsamis <manolis.tsamis at vrull dot eu> ---
> (In reply to John David Anglin from comment #39)
>> In the f-m-o pass, the following three insns that set call clobbered
>> registers r20-r22 are pulled from loop:
>>
>> (insn 186 183 190 29 (set (reg/f:SI 22 %r22 [478])
>>          (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
>>              (const_int 388 [0x184]))) "../Python/compile.c":5964:9 120
>> {addsi3}
>>       (nil))
>> (insn 190 186 187 29 (set (reg/f:SI 21 %r21 [479])
>>          (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
>>              (const_int 392 [0x188]))) "../Python/compile.c":5964:9 120
>> {addsi3}
>>       (nil))
>> (insn 194 191 195 29 (set (reg/f:SI 20 %r20 [480])
>>          (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
>>              (const_int 396 [0x18c]))) "../Python/compile.c":5964:9 120
>> {addsi3}
>>       (nil))
>>
>> They are used in the following insns before call to compiler_visit_expr1:
>>
>> (insn 242 238 258 32 (set (mem:SI (reg/f:SI 22 %r22 [478]) [4 MEM[(int
>> *)prephit
>> mp_37 + 388B]+0 S4 A32])
>>          (reg:SI 23 %r23 [orig:173 vect__102.2442 ] [173]))
>> "../Python/compile.c"
>> :5968:22 42 {*pa.md:2193}
>>       (expr_list:REG_DEAD (reg:SI 23 %r23 [orig:173 vect__102.2442 ] [173])
>>          (expr_list:REG_DEAD (reg/f:SI 22 %r22 [478])
>>              (nil))))
>> (insn 258 242 246 32 (set (reg:SI 26 %r26)
>>          (reg/v/f:SI 5 %r5 [orig:198 c ] [198]))
>> "../Python/compile.c":5969:15 42 {*pa.md:2193}
>>       (nil))
>> (insn 246 258 250 32 (set (mem:SI (reg/f:SI 21 %r21 [479]) [4 MEM[(int
>> *)prephitmp_37 + 392B]+0 S4 A32])
>>          (reg:SI 29 %r29 [orig:169 vect__102.2443 ] [169]))
>> "../Python/compile.c":5968:22 42 {*pa.md:2193}
>>       (expr_list:REG_DEAD (reg:SI 29 %r29 [orig:169 vect__102.2443 ] [169])
>>          (expr_list:REG_DEAD (reg/f:SI 21 %r21 [479])
>>              (nil))))
>> (insn 250 246 254 32 (set (mem:SI (reg/f:SI 20 %r20 [480]) [4 MEM[(int
>> *)prephitmp_37 + 396B]+0 S4 A32])
>>          (reg:SI 31 %r31 [orig:145 vect__102.2444 ] [145]))
>> "../Python/compile.c":5968:22 42 {*pa.md:2193}
>>       (expr_list:REG_DEAD (reg:SI 31 %r31 [orig:145 vect__102.2444 ] [145])
>>          (expr_list:REG_DEAD (reg/f:SI 20 %r20 [480])
>>              (nil))))
>>
>> After the call, we have:
>>
>> (insn 1241 269 273 30 (set (reg/f:SI 22 %r22 [478])
>>          (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]))
>> "../Python/compile.c":5970:20 -1
>>       (nil))
>> (insn 273 1241 1242 30 (set (mem:SI (plus:SI (reg/f:SI 22 %r22 [478])
>>                  (const_int 388 [0x184])) [4 MEM[(int *)_107 + 388B]+0 S4
>> A32])
>>          (reg:SI 14 %r14 [orig:167 vect_pretmp_36.2450 ] [167]))
>> "../Python/compile.c":5970:20 42 {*pa.md:2193}
>>       (nil))
>> (insn 1242 273 277 30 (set (reg/f:SI 21 %r21 [479])
>>          (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]))
>> "../Python/compile.c":5970:20 -1
>>       (nil))
>> (insn 277 1242 1243 30 (set (mem:SI (plus:SI (reg/f:SI 21 %r21 [479])
>>                  (const_int 392 [0x188])) [4 MEM[(int *)_107 + 392B]+0 S4
>> A32])
>>          (reg:SI 13 %r13 [orig:156 vect_pretmp_36.2451 ] [156]))
>> "../Python/compile.c":5970:20 42 {*pa.md:2193}
>>       (nil))
>> (insn 1243 277 281 30 (set (reg/f:SI 20 %r20 [480])
>>          (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]))
>> "../Python/compile.c":5970:20 -1
>>       (nil))
>> (insn 281 1243 299 30 (set (mem:SI (plus:SI (reg/f:SI 20 %r20 [480])
>>                  (const_int 396 [0x18c])) [4 MEM[(int *)_107 + 396B]+0 S4
>> A32])
>>          (reg:SI 12 %r12 [orig:134 vect_pretmp_36.2452 ] [134]))
>> "../Python/compile.c":5970:20 42 {*pa.md:2193}
>>       (nil))
>>
>> We have lost the offsets that were added initially to r20, r21 and r22.
>>
>> Previous ce3 pass had:
>>
>> (insn 272 269 273 30 (set (reg/f:SI 22 %r22 [478])
>>          (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
>>              (const_int 388 [0x184]))) "../Python/compile.c":5970:20 120
>> {addsi3}
>>       (nil))
>> (insn 273 272 276 30 (set (mem:SI (reg/f:SI 22 %r22 [478]) [4 MEM[(int
>> *)_107 + 388B]+0 S4 A32])
>>          (reg:SI 14 %r14 [orig:167 vect_pretmp_36.2450 ] [167]))
>> "../Python/compile.c":5970:20 42 {*pa.md:2193}
>>       (nil))
>> (insn 276 273 277 30 (set (reg/f:SI 21 %r21 [479])
>>          (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
>>              (const_int 392 [0x188]))) "../Python/compile.c":5970:20 120
>> {addsi3}
>>       (nil))
>> (insn 277 276 280 30 (set (mem:SI (reg/f:SI 21 %r21 [479]) [4 MEM[(int
>> *)_107 + 392B]+0 S4 A32])
>>          (reg:SI 13 %r13 [orig:156 vect_pretmp_36.2451 ] [156]))
>> "../Python/compile.c":5970:20 42 {*pa.md:2193}
>>       (nil))
>> (insn 280 277 281 30 (set (reg/f:SI 20 %r20 [480])
>>          (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
>>              (const_int 396 [0x18c]))) "../Python/compile.c":5970:20 120
>> {addsi3}
>>       (nil))
>> (insn 281 280 284 30 (set (mem:SI (reg/f:SI 20 %r20 [480]) [4 MEM[(int
>> *)_107 + 396B]+0 S4 A32])
>>          (reg:SI 12 %r12 [orig:134 vect_pretmp_36.2452 ] [134]))
>> "../Python/compile.c":5970:20 42 {*pa.md:2193}
>>       (nil))
>>
>> So, this is a f-m-o bug.
> Hi Dave,
>
> I don't see an f-m-o bug here. The offsets aren't lost, they're just moved in
> the corresponding memory loads/stores. If you look the stores in ce3  they
> don't have offsets whereas after f-m-o they have. E.g. in ce3: (insn 273 272
> 276 30 (set (mem:SI (reg/f:SI 22 %r22 [478]) ...) but in f-m-o it is (insn 273
> 1241 1242 30 (set (mem:SI (plus:SI (reg/f:SI 22 %r22 [478]) (const_int 388
> [0x184]) ...).
>
> This is the way that f-m-o works. It can also be seen in the f-m-o dumps, where
> offsets changes to memory ops are reported as 'Memory offset changed' and
> instructions which got their offset propagated (like insns 272, 276, 280) are
> reported as 'Instruction folded':
Hi Manolis,

If you look at the f-m-o transformation applied to insn 272 and insn 273, you
will see that
"reg/f:SI 22 %r22 [478]" is not dead after these insns.  The transformation
changes the value
of r22 which is wrong without changing all uses of the register and adjusting
the other sets
for the register.  It only changed the use in insn 273 and not the uses earlier
in the loop.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (47 preceding siblings ...)
  2023-11-13 15:06 ` dave.anglin at bell dot net
@ 2023-11-13 15:26 ` manolis.tsamis at vrull dot eu
  2023-11-13 21:46 ` danglin at gcc dot gnu.org
                   ` (6 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: manolis.tsamis at vrull dot eu @ 2023-11-13 15:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #48 from Manolis Tsamis <manolis.tsamis at vrull dot eu> ---
(In reply to dave.anglin from comment #47)
> On 2023-11-13 4:33 a.m., manolis.tsamis at vrull dot eu wrote:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415
> >
> > --- Comment #44 from Manolis Tsamis <manolis.tsamis at vrull dot eu> ---
> > (In reply to John David Anglin from comment #39)
> >> In the f-m-o pass, the following three insns that set call clobbered
> >> registers r20-r22 are pulled from loop:
> >>
> >> (insn 186 183 190 29 (set (reg/f:SI 22 %r22 [478])
> >>          (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
> >>              (const_int 388 [0x184]))) "../Python/compile.c":5964:9 120
> >> {addsi3}
> >>       (nil))
> >> (insn 190 186 187 29 (set (reg/f:SI 21 %r21 [479])
> >>          (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
> >>              (const_int 392 [0x188]))) "../Python/compile.c":5964:9 120
> >> {addsi3}
> >>       (nil))
> >> (insn 194 191 195 29 (set (reg/f:SI 20 %r20 [480])
> >>          (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
> >>              (const_int 396 [0x18c]))) "../Python/compile.c":5964:9 120
> >> {addsi3}
> >>       (nil))
> >>
> >> They are used in the following insns before call to compiler_visit_expr1:
> >>
> >> (insn 242 238 258 32 (set (mem:SI (reg/f:SI 22 %r22 [478]) [4 MEM[(int
> >> *)prephit
> >> mp_37 + 388B]+0 S4 A32])
> >>          (reg:SI 23 %r23 [orig:173 vect__102.2442 ] [173]))
> >> "../Python/compile.c"
> >> :5968:22 42 {*pa.md:2193}
> >>       (expr_list:REG_DEAD (reg:SI 23 %r23 [orig:173 vect__102.2442 ] [173])
> >>          (expr_list:REG_DEAD (reg/f:SI 22 %r22 [478])
> >>              (nil))))
> >> (insn 258 242 246 32 (set (reg:SI 26 %r26)
> >>          (reg/v/f:SI 5 %r5 [orig:198 c ] [198]))
> >> "../Python/compile.c":5969:15 42 {*pa.md:2193}
> >>       (nil))
> >> (insn 246 258 250 32 (set (mem:SI (reg/f:SI 21 %r21 [479]) [4 MEM[(int
> >> *)prephitmp_37 + 392B]+0 S4 A32])
> >>          (reg:SI 29 %r29 [orig:169 vect__102.2443 ] [169]))
> >> "../Python/compile.c":5968:22 42 {*pa.md:2193}
> >>       (expr_list:REG_DEAD (reg:SI 29 %r29 [orig:169 vect__102.2443 ] [169])
> >>          (expr_list:REG_DEAD (reg/f:SI 21 %r21 [479])
> >>              (nil))))
> >> (insn 250 246 254 32 (set (mem:SI (reg/f:SI 20 %r20 [480]) [4 MEM[(int
> >> *)prephitmp_37 + 396B]+0 S4 A32])
> >>          (reg:SI 31 %r31 [orig:145 vect__102.2444 ] [145]))
> >> "../Python/compile.c":5968:22 42 {*pa.md:2193}
> >>       (expr_list:REG_DEAD (reg:SI 31 %r31 [orig:145 vect__102.2444 ] [145])
> >>          (expr_list:REG_DEAD (reg/f:SI 20 %r20 [480])
> >>              (nil))))
> >>
> >> After the call, we have:
> >>
> >> (insn 1241 269 273 30 (set (reg/f:SI 22 %r22 [478])
> >>          (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]))
> >> "../Python/compile.c":5970:20 -1
> >>       (nil))
> >> (insn 273 1241 1242 30 (set (mem:SI (plus:SI (reg/f:SI 22 %r22 [478])
> >>                  (const_int 388 [0x184])) [4 MEM[(int *)_107 + 388B]+0 S4
> >> A32])
> >>          (reg:SI 14 %r14 [orig:167 vect_pretmp_36.2450 ] [167]))
> >> "../Python/compile.c":5970:20 42 {*pa.md:2193}
> >>       (nil))
> >> (insn 1242 273 277 30 (set (reg/f:SI 21 %r21 [479])
> >>          (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]))
> >> "../Python/compile.c":5970:20 -1
> >>       (nil))
> >> (insn 277 1242 1243 30 (set (mem:SI (plus:SI (reg/f:SI 21 %r21 [479])
> >>                  (const_int 392 [0x188])) [4 MEM[(int *)_107 + 392B]+0 S4
> >> A32])
> >>          (reg:SI 13 %r13 [orig:156 vect_pretmp_36.2451 ] [156]))
> >> "../Python/compile.c":5970:20 42 {*pa.md:2193}
> >>       (nil))
> >> (insn 1243 277 281 30 (set (reg/f:SI 20 %r20 [480])
> >>          (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127]))
> >> "../Python/compile.c":5970:20 -1
> >>       (nil))
> >> (insn 281 1243 299 30 (set (mem:SI (plus:SI (reg/f:SI 20 %r20 [480])
> >>                  (const_int 396 [0x18c])) [4 MEM[(int *)_107 + 396B]+0 S4
> >> A32])
> >>          (reg:SI 12 %r12 [orig:134 vect_pretmp_36.2452 ] [134]))
> >> "../Python/compile.c":5970:20 42 {*pa.md:2193}
> >>       (nil))
> >>
> >> We have lost the offsets that were added initially to r20, r21 and r22.
> >>
> >> Previous ce3 pass had:
> >>
> >> (insn 272 269 273 30 (set (reg/f:SI 22 %r22 [478])
> >>          (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
> >>              (const_int 388 [0x184]))) "../Python/compile.c":5970:20 120
> >> {addsi3}
> >>       (nil))
> >> (insn 273 272 276 30 (set (mem:SI (reg/f:SI 22 %r22 [478]) [4 MEM[(int
> >> *)_107 + 388B]+0 S4 A32])
> >>          (reg:SI 14 %r14 [orig:167 vect_pretmp_36.2450 ] [167]))
> >> "../Python/compile.c":5970:20 42 {*pa.md:2193}
> >>       (nil))
> >> (insn 276 273 277 30 (set (reg/f:SI 21 %r21 [479])
> >>          (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
> >>              (const_int 392 [0x188]))) "../Python/compile.c":5970:20 120
> >> {addsi3}
> >>       (nil))
> >> (insn 277 276 280 30 (set (mem:SI (reg/f:SI 21 %r21 [479]) [4 MEM[(int
> >> *)_107 + 392B]+0 S4 A32])
> >>          (reg:SI 13 %r13 [orig:156 vect_pretmp_36.2451 ] [156]))
> >> "../Python/compile.c":5970:20 42 {*pa.md:2193}
> >>       (nil))
> >> (insn 280 277 281 30 (set (reg/f:SI 20 %r20 [480])
> >>          (plus:SI (reg/f:SI 19 %r19 [orig:127 prephitmp_37 ] [127])
> >>              (const_int 396 [0x18c]))) "../Python/compile.c":5970:20 120
> >> {addsi3}
> >>       (nil))
> >> (insn 281 280 284 30 (set (mem:SI (reg/f:SI 20 %r20 [480]) [4 MEM[(int
> >> *)_107 + 396B]+0 S4 A32])
> >>          (reg:SI 12 %r12 [orig:134 vect_pretmp_36.2452 ] [134]))
> >> "../Python/compile.c":5970:20 42 {*pa.md:2193}
> >>       (nil))
> >>
> >> So, this is a f-m-o bug.
> > Hi Dave,
> >
> > I don't see an f-m-o bug here. The offsets aren't lost, they're just moved in
> > the corresponding memory loads/stores. If you look the stores in ce3  they
> > don't have offsets whereas after f-m-o they have. E.g. in ce3: (insn 273 272
> > 276 30 (set (mem:SI (reg/f:SI 22 %r22 [478]) ...) but in f-m-o it is (insn 273
> > 1241 1242 30 (set (mem:SI (plus:SI (reg/f:SI 22 %r22 [478]) (const_int 388
> > [0x184]) ...).
> >
> > This is the way that f-m-o works. It can also be seen in the f-m-o dumps, where
> > offsets changes to memory ops are reported as 'Memory offset changed' and
> > instructions which got their offset propagated (like insns 272, 276, 280) are
> > reported as 'Instruction folded':
> Hi Manolis,
> 
> If you look at the f-m-o transformation applied to insn 272 and insn 273,
> you will see that
> "reg/f:SI 22 %r22 [478]" is not dead after these insns.  The transformation
> changes the value
> of r22 which is wrong without changing all uses of the register and
> adjusting the other sets
> for the register.  It only changed the use in insn 273 and not the uses
> earlier in the loop.

I see, thanks for pointing that out! I'll debug this further and see why it
misses f-m-o's use detection code.

Manolis

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (48 preceding siblings ...)
  2023-11-13 15:26 ` manolis.tsamis at vrull dot eu
@ 2023-11-13 21:46 ` danglin at gcc dot gnu.org
  2023-11-16 17:43 ` cvs-commit at gcc dot gnu.org
                   ` (5 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: danglin at gcc dot gnu.org @ 2023-11-13 21:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #49 from John David Anglin <danglin at gcc dot gnu.org> ---
Created attachment 56576
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56576&action=edit
Patch to improve reg+d address handling

This patch revise pa_legitimate_address_p to allow 14-bit displacements
for all memory accesses before reload.  Comments and flow in this routine
are improved.

So far, I haven't seen any issues related to reloading out-of-range
floating-point accesses

This significantly improves code generation and saves more than two
thousand instructions in compile.s.  I was able to successfully build
python with the patched compiler.

This is version two of the change and it still needs more testing.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (49 preceding siblings ...)
  2023-11-13 21:46 ` danglin at gcc dot gnu.org
@ 2023-11-16 17:43 ` cvs-commit at gcc dot gnu.org
  2023-11-27 20:55 ` sjames at gcc dot gnu.org
                   ` (4 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-11-16 17:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #50 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by John David Anglin <danglin@gcc.gnu.org>:

https://gcc.gnu.org/g:d2934eb6ae92471484469d8ddd039eb34ef400b1

commit r14-5538-gd2934eb6ae92471484469d8ddd039eb34ef400b1
Author: John David Anglin <danglin@gcc.gnu.org>
Date:   Thu Nov 16 17:42:26 2023 +0000

    hppa: Revise REG+D address support to allow long displacements before
reload

    In analyzing PR rtl-optimization/112415, I realized that restricting
    REG+D offsets to 5-bits before reload results in very poor code and
    complexities in optimizing these instructions after reload.  The
    general problem is long displacements are not allowed for floating
    point accesses when generating PA 1.1 code.  Even with PA 2.0, there
    is a ELF linker bug that prevents using long displacements for
    floating point loads and stores.

    In the past, enabling long displacements before reload caused issues
    in reload.  However, there have been fixes in the handling of reloads
    for floating-point accesses.  This change allows long displacements
    before reload and corrects a couple of issues in the constraint
    handling for integer and floating-point accesses.

    2023-11-16  John David Anglin  <danglin@gcc.gnu.org>

    gcc/ChangeLog:

            PR rtl-optimization/112415
            * config/pa/pa.cc (pa_legitimate_address_p): Allow 14-bit
            displacements before reload.  Simplify logic flow.  Revise
            comments.
            * config/pa/pa.h (TARGET_ELF64): New define.
            (INT14_OK_STRICT): Update define and comment.
            * config/pa/pa64-linux.h (TARGET_ELF64): Define.
            * config/pa/predicates.md (base14_operand): Don't check
            alignment of short displacements.
            (integer_store_memory_operand): Don't return true when
            reload_in_progress is true.  Remove INT_5_BITS check.
            (floating_point_store_memory_operand): Don't return true when
            reload_in_progress is true.  Use INT14_OK_STRICT to check
            whether long displacements are always okay.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (50 preceding siblings ...)
  2023-11-16 17:43 ` cvs-commit at gcc dot gnu.org
@ 2023-11-27 20:55 ` sjames at gcc dot gnu.org
  2023-11-28 12:39 ` manolis.tsamis at vrull dot eu
                   ` (3 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: sjames at gcc dot gnu.org @ 2023-11-27 20:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #51 from Sam James <sjames at gcc dot gnu.org> ---
manolis, did you have a chance to look at the remaining pass issue? You'll need
to revert Dave's commit locally which made the issue latent for building
Python.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (51 preceding siblings ...)
  2023-11-27 20:55 ` sjames at gcc dot gnu.org
@ 2023-11-28 12:39 ` manolis.tsamis at vrull dot eu
  2024-03-18  0:22 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  55 siblings, 0 replies; 57+ messages in thread
From: manolis.tsamis at vrull dot eu @ 2023-11-28 12:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #52 from Manolis Tsamis <manolis.tsamis at vrull dot eu> ---
(In reply to Sam James from comment #51)
> manolis, did you have a chance to look at the remaining pass issue? You'll
> need to revert Dave's commit locally which made the issue latent for
> building Python.

Hi Sam, I had to work on some other things so I didn't get to find a fix yet,
but I'll be working on that again now (in light of the new info from PR111601
too). 

Thanks for the ping,
Manolis

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (52 preceding siblings ...)
  2023-11-28 12:39 ` manolis.tsamis at vrull dot eu
@ 2024-03-18  0:22 ` cvs-commit at gcc dot gnu.org
  2024-03-18  0:39 ` danglin at gcc dot gnu.org
  2024-03-22 13:34 ` law at gcc dot gnu.org
  55 siblings, 0 replies; 57+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-03-18  0:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #53 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by John David Anglin <danglin@gcc.gnu.org>:

https://gcc.gnu.org/g:f0fda1aff0b752e4182c009c5526b9306bd35f7c

commit r14-9511-gf0fda1aff0b752e4182c009c5526b9306bd35f7c
Author: John David Anglin <danglin@gcc.gnu.org>
Date:   Mon Mar 18 00:19:36 2024 +0000

    hppa: Improve handling of REG+D addresses when generating PA 2.0 code

    In looking at PR 112415, it became clear that improvements could be
    made in the handling of loads and stores using REG+D addresses.  A
    change in 2002 conflated two issues:

    1) We can't generate insns with 14-bit displacements before reload
    completes when generating PA 1.x code since floating-point loads and
    stores only support 5-bit offsets in PA 1.x.

    2) The GNU ELF 32-bit linker lacks relocation support for PA 2.0
    floating point instructions with 14-bit displacements.  These
    relocations affect instructions with symbolic references.

    The result of the change was to block creation of PA 2.0 instructions
    with 14-bit REG_D displacements for SImode, DImode, SFmode and DFmode
    on the GNU linux target before reload.  This was unnecessary as these
    instructions don't need relocation.

    This change revises the INT14_OK_STRICT define to allow creation
    of instructions with 14-bit REG+D addresses before reload when
    generating PA 2.0 code.

    2024-03-17  John David Anglin  <danglin@gcc.gnu.org>

    gcc/ChangeLog:

            PR rtl-optimization/112415
            * config/pa/pa.cc (pa_emit_move_sequence): Revise condition
            for symbolic memory operands.
            (pa_legitimate_address_p): Revise LO_SUM condition.
            * config/pa/pa.h (INT14_OK_STRICT): Revise define.  Move
            comment about GNU linker to predicates.md.
            * config/pa/predicates.md (floating_point_store_memory_operand):
            Revise condition for symbolic memory operands.  Update
            comment.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (53 preceding siblings ...)
  2024-03-18  0:22 ` cvs-commit at gcc dot gnu.org
@ 2024-03-18  0:39 ` danglin at gcc dot gnu.org
  2024-03-22 13:34 ` law at gcc dot gnu.org
  55 siblings, 0 replies; 57+ messages in thread
From: danglin at gcc dot gnu.org @ 2024-03-18  0:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #54 from John David Anglin <danglin at gcc dot gnu.org> ---
The f-m-o issue is probably fixed.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94
  2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
                   ` (54 preceding siblings ...)
  2024-03-18  0:39 ` danglin at gcc dot gnu.org
@ 2024-03-22 13:34 ` law at gcc dot gnu.org
  55 siblings, 0 replies; 57+ messages in thread
From: law at gcc dot gnu.org @ 2024-03-22 13:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

Jeffrey A. Law <law at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #55 from Jeffrey A. Law <law at gcc dot gnu.org> ---
Per c#54. If it turns out we're wrong, we can always reopen or file a new
report.

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2024-03-22 13:34 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-06 21:00 [Bug rtl-optimization/112415] New: [14 regression] Python 3.11 miscompiled with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94 sjames at gcc dot gnu.org
2023-11-06 21:00 ` [Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA " sjames at gcc dot gnu.org
2023-11-06 21:01 ` sjames at gcc dot gnu.org
2023-11-06 21:03 ` pinskia at gcc dot gnu.org
2023-11-06 21:31 ` dave.anglin at bell dot net
2023-11-06 22:09 ` sjames at gcc dot gnu.org
2023-11-06 22:11 ` sjames at gcc dot gnu.org
2023-11-06 22:20 ` law at gcc dot gnu.org
2023-11-06 22:33 ` dave.anglin at bell dot net
2023-11-06 22:49 ` sjames at gcc dot gnu.org
2023-11-06 23:11 ` sjames at gcc dot gnu.org
2023-11-06 23:18 ` dave.anglin at bell dot net
2023-11-07 14:08 ` manolis.tsamis at vrull dot eu
2023-11-07 21:12 ` sjames at gcc dot gnu.org
2023-11-08  1:36 ` sjames at gcc dot gnu.org
2023-11-08  2:24 ` dave.anglin at bell dot net
2023-11-08 10:09 ` manolis.tsamis at vrull dot eu
2023-11-08 14:42 ` jeffreyalaw at gmail dot com
2023-11-08 18:59 ` dave.anglin at bell dot net
2023-11-08 19:07 ` pinskia at gcc dot gnu.org
2023-11-08 19:16 ` law at gcc dot gnu.org
2023-11-08 19:40 ` dave.anglin at bell dot net
2023-11-08 23:33 ` pinskia at gcc dot gnu.org
2023-11-08 23:40 ` danglin at gcc dot gnu.org
2023-11-08 23:51 ` sjames at gcc dot gnu.org
2023-11-09  0:00 ` dave.anglin at bell dot net
2023-11-09  0:02 ` sjames at gcc dot gnu.org
2023-11-09  0:07 ` law at gcc dot gnu.org
2023-11-09  0:08 ` dave.anglin at bell dot net
2023-11-09  0:23 ` dave.anglin at bell dot net
2023-11-09 18:04 ` danglin at gcc dot gnu.org
2023-11-09 19:17 ` danglin at gcc dot gnu.org
2023-11-09 20:28 ` law at gcc dot gnu.org
2023-11-09 20:41 ` dave.anglin at bell dot net
2023-11-09 23:41 ` danglin at gcc dot gnu.org
2023-11-11 19:40 ` danglin at gcc dot gnu.org
2023-11-11 19:51 ` sjames at gcc dot gnu.org
2023-11-11 20:00 ` danglin at gcc dot gnu.org
2023-11-11 20:06 ` danglin at gcc dot gnu.org
2023-11-11 20:19 ` sjames at gcc dot gnu.org
2023-11-11 21:54 ` danglin at gcc dot gnu.org
2023-11-12 15:05 ` danglin at gcc dot gnu.org
2023-11-12 15:54 ` law at gcc dot gnu.org
2023-11-12 23:59 ` danglin at gcc dot gnu.org
2023-11-13  0:24 ` law at gcc dot gnu.org
2023-11-13  9:33 ` manolis.tsamis at vrull dot eu
2023-11-13  9:37 ` manolis.tsamis at vrull dot eu
2023-11-13 13:20 ` manolis.tsamis at vrull dot eu
2023-11-13 15:06 ` dave.anglin at bell dot net
2023-11-13 15:26 ` manolis.tsamis at vrull dot eu
2023-11-13 21:46 ` danglin at gcc dot gnu.org
2023-11-16 17:43 ` cvs-commit at gcc dot gnu.org
2023-11-27 20:55 ` sjames at gcc dot gnu.org
2023-11-28 12:39 ` manolis.tsamis at vrull dot eu
2024-03-18  0:22 ` cvs-commit at gcc dot gnu.org
2024-03-18  0:39 ` danglin at gcc dot gnu.org
2024-03-22 13:34 ` law at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).