From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 549E33858025; Thu, 27 May 2021 11:20:39 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 549E33858025 From: "alexander.grund@tu-dresden.de" To: gcc-bugs@gcc.gnu.org Subject: [Bug fortran/100799] New: Stackoverflow in optimized code on PPC Date: Thu, 27 May 2021 11:20:38 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: fortran X-Bugzilla-Version: 10.3.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: alexander.grund@tu-dresden.de X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone attachments.created Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 May 2021 11:20:39 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D100799 Bug ID: 100799 Summary: Stackoverflow in optimized code on PPC Product: gcc Version: 10.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: alexander.grund@tu-dresden.de Target Milestone: --- Created attachment 50879 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=3D50879&action=3Dedit Disassembly of dbgebal_ in debug and release modes Quick summary of the use case: When using FlexiBLAS with OpenBLAS I noticed corruption of the parameters passed to OpenBLAS functions. FlexiBLAS basica= lly provides a BLAS interface where each function is a stub that forwards the arguments to a real BLAS lib, like OpenBLAS Example: void FC_GLOBAL(dgebal,DGEBAL)(char* job, blasint* n, double* a, blasint* ld= a, blasint* ilo, blasint* ihi, double* scale, blasint* info) { void (*fn) (void* job, void* n, void* a, void* lda, void* ilo, void* ihi, void* scale, void* info); fn =3D current_backend->lapack.dgebal.f77_blas_function;=20 fn((void*) job, (void*) n, (void*) a, (void*) lda, (void*) = ilo, (void*) ihi, (void*) scale, (void*) info);=20 return; } void dgebal(char* job, blasint* n, double* a, blasint* lda, blasint* ilo, blasint* ihi, double* scale, blasint* info) __attribute__((alias(MTS(FC_GLOBAL(dgebal,DGEBAL))))); Due to the alias and the real BLAS lib being loader after FlexiBLAS also the calls from an OpenBLAS function to another OpenBLAS function get routed thr= ough FlexiBLAS. Now I noticed that the parameter "N" at https://github.com/xianyi/OpenBLAS/blob/v0.3.15/lapack-netlib/SRC/dgeev.f#L= 369 gets messed up during the call at https://github.com/xianyi/OpenBLAS/blob/v0.3.15/lapack-netlib/SRC/dgeev.f#L= 363 which I traced to FlexiBLAS pushing the register that holds it, calling the OpenBLAS DGEBAL and restoring it afterwards but the stack entry where it ca= me from gets changed by DGEBAL So the actual Bug here is that GCC generates code for DGEBAL which uses a w= rite outside of the allocated stack. The dissassembly of the dgebal_ function shows "stdu r1,-368(r1)" in the prologue and "std r25,440(r1)" later, which is the instruction that overwrites the saved register from the calling function. As far as I can tell an offset of 440 onto r1, which is bigger than the 368 "allocated" by the stdu is invalid. The line reported by GDB for the overwriting instruction is https://github.com/xianyi/OpenBLAS/blob/v0.3.15/lapack-netlib/SRC/dgebal.f#= L328 The command used to compile the file is: gfortran -fno-math-errno -Wall -frecursive -fno-optimize-sibling-calls -m64 -fopenmp -fPIC -O2 -fno-fast-m= ath -mcpu=3Dpower9 -mtune=3Dpower9 -DUSE_OPENMP -fopenmp -fno-optimize-sibling= -calls -g -c -o dgebal.o dgebal.f Replacing the "O2" by "Og" changes the prologue to "stdu r1,-336(r1)" and the max offset used for std on r1 is 328. Using this works with FlexiBLAS, hence I suspect an optimization issue which leads to more spills but doesn't update the stack size. Reproduced with GCC 10.2.0, 10.3.0, 11.1.0=