From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 367A53857C52; Thu, 24 Sep 2020 20:25:13 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 367A53857C52 From: "dimitri.gorokhovik at free dot fr" To: gcc-bugs@gcc.gnu.org Subject: [Bug c++/94485] Inter-dependency between { tree-sra, ABI, inlining, loop-unrolling } leads to mis-optimization Date: Thu, 24 Sep 2020 20:25:13 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: c++ X-Bugzilla-Version: unknown X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: dimitri.gorokhovik at free dot fr X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Sep 2020 20:25:13 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D94485 --- Comment #8 from Dimitri Gorokhovik = --- I was able to reduce same code (see the attached file bug-6.cpp). -- when compiled correctly, running it produces the following (expected) output: cube: ({ 0, 0, 0 }, { 1, 1, 1 })=20 cube: ({ 0, 0, 1 }, { 1, 1, 2 })=20 cube: ({ 0, 0, 2 }, { 1, 1, 3 })=20 cube: ({ 0, 1, 0 }, { 1, 2, 1 })=20 cube: ({ 0, 1, 1 }, { 1, 2, 2 })=20 cube: ({ 0, 1, 2 }, { 1, 2, 3 })=20 cube: ({ 0, 2, 0 }, { 1, 3, 1 })=20 cube: ({ 0, 2, 1 }, { 1, 3, 2 })=20 cube: ({ 0, 2, 2 }, { 1, 3, 3 })=20 cube: ({ 1, 0, 0 }, { 2, 1, 1 })=20 cube: ({ 1, 0, 1 }, { 2, 1, 2 })=20 cube: ({ 1, 0, 2 }, { 2, 1, 3 })=20 cube: ({ 1, 1, 0 }, { 2, 2, 1 })=20 cube: ({ 1, 1, 1 }, { 2, 2, 2 })=20 cube: ({ 1, 1, 2 }, { 2, 2, 3 })=20 cube: ({ 1, 2, 0 }, { 2, 3, 1 })=20 cube: ({ 1, 2, 1 }, { 2, 3, 2 })=20 cube: ({ 1, 2, 2 }, { 2, 3, 3 })=20 cube: ({ 2, 0, 0 }, { 3, 1, 1 })=20 cube: ({ 2, 0, 1 }, { 3, 1, 2 })=20 cube: ({ 2, 0, 2 }, { 3, 1, 3 })=20 cube: ({ 2, 1, 0 }, { 3, 2, 1 })=20 cube: ({ 2, 1, 1 }, { 3, 2, 2 })=20 cube: ({ 2, 1, 2 }, { 3, 2, 3 })=20 cube: ({ 2, 2, 0 }, { 3, 3, 1 })=20 cube: ({ 2, 2, 1 }, { 3, 3, 2 })=20 cube: ({ 2, 2, 2 }, { 3, 3, 3 })=20 count =3D 27 -- when compiled incorrectly, it prints out: count =3D 0 Tested with build g++ (GCC) 11.0.0 20200924 (experimental). In order to compile and run: g++ -std=3Dc++17 -O3 -o bug-6 bug-6.cpp && ./bug-6 This builds for implicit '-m64' (x86_64) and produces invalid output.=20 To get valid output, compile with either of the following: -m32 -O0 (instead of -O3) -fno-tree-sra one of: -DFIX_0, -DFIX_1, -DFIX_2, -DFIX_3, -DFIX_4=20 >>From my limited understanding of tree dumps, here is what roughly happens: -- the routine 'begin()', line 183, returns 'struct iterator' by value. The latter has the size of 14 bytes so returned "in registers". Forcing it to be returned via memory =3D=3D> issue goes away. (Methods to force: make bigger= than 16 bytes, make some fields volatile, use -m32). Note also that, when the routi= ne is evaluated as constexpr (in static_assert), the issue is not reproduced. -- all called routines (pretty much) are inlined inside one call, to 'count_them ()'. Prevent the inlining of the routine 'can_be_incremented ()' =3D=3D> issue goes away. (Methods to prevent: define FIX_1.) -- SRA replaces several fields of the 'struct iterator' (line 150), most importantly 'idx_' (line 153). Disable SRA =3D=3D> issue goes away (-fno-tr= ee-sra or -O0).=20 This replacement by tree-SRA somehow doesn't propagate the writes to the replacement vars of idx_to the original parts of the structure living "in t= he return registers". When the return value lives in memory, the writes are propagated correctly. The compiler then eliminates the loop in 'can_be_incremented' and evaluates= the call to that routine to 'false' (line 163). Forcibly keeping the loop (-DFI= X_2) or replacing it by non-loop code (-DFIX_0) =3D=3D> issue goes away.=