From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id 20C87385B834; Tue, 24 Mar 2020 13:42:37 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 20C87385B834
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
 s=default; t=1585057357;
 bh=CBQne9AzJmtRJN0J0k9skB2WQgdH/cS096F77kd81NI=;
 h=From:To:Subject:Date:In-Reply-To:References:From;
 b=IDFP5Q3B5VlMbrYU/P0wWcm9iR0YNrSPflcXfMH+9QgWG7hSraJSzz8ObOG465+b2
 B+MowxhEQg74e1fGszapWbBDt9dgkOlOKm7/tR9o8xFTGoiCYeaGTJkbPTg1BK5O4b
 o8ylIOJpZhf/8h6Sp4eiuG/zsnk8TxN1aJR4tu7Q=
From: "z.zhanghaijian at huawei dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/94274] fold phi whose incoming args are
 defined from binary operations
Date: Tue, 24 Mar 2020 13:42:36 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 10.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: z.zhanghaijian at huawei dot com
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-94274-4-mlnlEMeQgf@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-94274-4@http.gcc.gnu.org/bugzilla/>
References: <bug-94274-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <http://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <http://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <http://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 24 Mar 2020 13:42:37 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D94274
--- Comment #3 from z.zhanghaijian at huawei dot com <z.zhanghaijian at hua=
wei dot com> ---
(In reply to Marc Glisse from comment #1)
> Detecting common beginnings / endings in branches is something gcc does v=
ery
> seldom. Even at -Os, for if(cond)f(b);else f(c); we need to wait until
> rtl-optimizations to get a single call to f. (of course the reverse
> transformation of duplicating a statement that was after the branches into
> them, if it simplifies, is nice as well, and they can conflict)
> I don't know if handling one such very specific case (binary operations w=
ith
> a common argument) separately is a good idea when we don't even handle un=
ary
> operations.

I tried to test this fold on specint2017 and found some performance gains on
500.perlbench_r. Then compared the assemble and found some improvements.

For example:

S_invlist_max, which is inlined by many functions, such as
S__append_range_to_invlist, S_ssc_anything, Perl__invlist_invert ...

invlist_inline.h:
#define FROM_INTERNAL_SIZE(x) ((x)/ sizeof(UV))

S_invlist_max(inlined by S__append_range_to_invlist, S_ssc_anything,
Perl__invlist_invert, ....)=EF=BC=9A
    return SvLEN(invlist) =3D=3D 0  /* This happens under _new_invlist_C_ar=
ray */
           ? FROM_INTERNAL_SIZE(SvCUR(invlist)) - 1
           : FROM_INTERNAL_SIZE(SvLEN(invlist)) - 1;

Dump tree phiopt=EF=BC=9A

<bb 3> [local count: 536870911]:
  _46 =3D pretmp_112 >> 3;
  iftmp.1123_47 =3D _46 + 18446744073709551615;
  goto <bb 5>; [100.00%]

  <bb 4> [local count: 536870911]:
  _48 =3D _44 >> 3;
  iftmp.1123_49 =3D _48 + 18446744073709551615;

  <bb 5> [local count: 1073741823]:
  # iftmp.1123_50 =3D PHI <iftmp.1123_47(3), iftmp.1123_49(4)>

Which can replaces with=EF=BC=9A

  <bb 3> [local count: 536870912]:

  <bb 4> [local count: 1073741823]:
  # _48 =3D PHI <_44(2), pretmp_112(3)>
  _49 =3D _48 >> 3;
  iftmp.1123_50 =3D _49 + 18446744073709551615;

Assemble=EF=BC=9A

lsr     x5, x6, #3
lsr     x3, x3, #3
sub     x20, x5, #0x1
sub     x3, x3, #0x1
csel    x20, x3, x20, ne

Replaces with=EF=BC=9A

csel    x3, x3, x4, ne
lsr     x3, x3, #3
sub     x20, x3, #0x1

This can eliminate two instruction.=