From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 7120 invoked by alias); 28 Oct 2015 14:39:21 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 129188 invoked by uid 48); 28 Oct 2015 14:39:17 -0000 From: "ktkachov at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/68136] New: missed tree-level optimization with redundant computations Date: Wed, 28 Oct 2015 14:39:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 6.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: enhancement X-Bugzilla-Who: ktkachov at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status keywords bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2015-10/txt/msg02373.txt.bz2 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68136 Bug ID: 68136 Summary: missed tree-level optimization with redundant computations Product: gcc Version: 6.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ktkachov at gcc dot gnu.org Target Milestone: --- Take the testcase gcc.dg/ifcvt-3.c: typedef long long s64; int foo (s64 a, s64 b, s64 c) { s64 d = a - b; if (d == 0) return a + c; else return b + d + c; } on aarch64 this produces the simplest possible: foo: add w0, w2, w0 ret However, this is due to RTL-level ifconversion. The final tree dump is the more complex: foo (s64D.2694 aD.2695, s64D.2694 bD.2696, s64D.2694 cD.2697) { s64D.2694 dD.2700; intD.7 _1; unsigned int _5; unsigned int _7; unsigned int _8; intD.7 _9; unsigned int _10; unsigned int _11; unsigned int _13; unsigned int _14; intD.7 _15; unsigned int _17; ;; basic block 2, loop depth 0, count 0, freq 10000, maybe hot ;; prev block 0, next block 3, flags: (NEW, REACHABLE) ;; pred: ENTRY [100.0%] (FALLTHRU,EXECUTABLE) d_4 = a_2(D) - b_3(D); if (d_4 == 0) goto ; else goto ; ;; succ: 3 [39.0%] (TRUE_VALUE,EXECUTABLE) ;; 4 [61.0%] (FALSE_VALUE,EXECUTABLE) ;; basic block 3, loop depth 0, count 0, freq 3900, maybe hot ;; prev block 2, next block 4, flags: (NEW, REACHABLE) ;; pred: 2 [39.0%] (TRUE_VALUE,EXECUTABLE) # RANGE [0, 4294967295] _5 = (unsigned int) a_2(D); # RANGE [0, 4294967295] _7 = (unsigned int) c_6(D); # RANGE [0, 4294967295] _8 = _5 + _7; _9 = (intD.7) _8; goto ; ;; succ: 5 [100.0%] (FALLTHRU,EXECUTABLE) ;; basic block 4, loop depth 0, count 0, freq 6100, maybe hot ;; prev block 3, next block 5, flags: (NEW, REACHABLE) ;; pred: 2 [61.0%] (FALSE_VALUE,EXECUTABLE) # RANGE [0, 4294967295] _10 = (unsigned int) b_3(D); # RANGE [0, 4294967295] _11 = (unsigned int) d_4; # RANGE [0, 4294967295] _13 = (unsigned int) c_6(D); # RANGE [0, 4294967295] _17 = _10 + _13; # RANGE [0, 4294967295] _14 = _11 + _17; _15 = (intD.7) _14; ;; succ: 5 [100.0%] (FALLTHRU,EXECUTABLE) ;; basic block 5, loop depth 0, count 0, freq 10000, maybe hot ;; prev block 4, next block 1, flags: (NEW, REACHABLE) ;; pred: 3 [100.0%] (FALLTHRU,EXECUTABLE) ;; 4 [100.0%] (FALLTHRU,EXECUTABLE) # _1 = PHI <_9(3), _15(4)> # VUSE <.MEM_16(D)> return _1; ;; succ: EXIT [100.0%] } It's probably a good idea to detect this earlier and produce a " return a + c;" at the tree level