From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id AFA31385DC00; Tue, 31 Mar 2020 23:12:32 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org AFA31385DC00
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
 s=default; t=1585696352;
 bh=PlxGwcvGknEfGAsxzF2M3wKTiUruTEtfZI2GCncpoO4=;
 h=From:To:Subject:Date:In-Reply-To:References:From;
 b=peELpx0NLC231JQPC0+66T0leF3mEN8e33s9gB4HL6lrFk4Ao3nzuajmgaB/1hLI8
 +zF1isCf6s01+7QP21dqfJKPO1/DB/nSSuQcYKucgpsYe78TFzG9N4mnpUvyFse0O9
 h/sJ3LoBqr0l/vEpy8TCDbHDigCSI36KTmSMduiI=
From: "jamborm at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/94427] 456.hmmer is 8-17% slower when
 compiled at -Ofast than with GCC 9
Date: Tue, 31 Mar 2020 23:12:32 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 10.0
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: jamborm at gcc dot gnu.org
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-94427-4-IlIOMfQwTp@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-94427-4@http.gcc.gnu.org/bugzilla/>
References: <bug-94427-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <http://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <http://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <http://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 31 Mar 2020 23:12:32 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D94427
--- Comment #1 from Martin Jambor <jamborm at gcc dot gnu.org> ---
OK, so it turns out the identified commit only allows us to shoot
ourselves in the foot - and there one too few branches, not too many.

The hottest loop, consuming most of the time is:

Percent         Instructions
------------------------------------------------
  0.03 =E2=94=82 fb0:=E2=94=8C=E2=94=80+add     -0x8(%r9,%rcx,4),%eax
  5.03 =E2=94=82     =E2=94=82  mov     %eax,-0x4(%r13,%rcx,4)
  2.48 =E2=94=82     =E2=94=82  mov     -0x8(%r8,%rcx,4),%esi
  0.02 =E2=94=82     =E2=94=82  add     -0x8(%rdx,%rcx,4),%esi
  0.06 =E2=94=82     =E2=94=82  cmp     %eax,%esi
  4.49 =E2=94=82     =E2=94=82  cmovge  %esi,%eax
 17.17 =E2=94=82     =E2=94=82  mov     %ecx,%esi
  0.03 =E2=94=82     =E2=94=82  cmp     $0xc521974f,%eax
  3.50 =E2=94=82     =E2=94=82  cmovl   %ebx,%eax   <----------- this used =
to be a branch
 21.84 =E2=94=82     =E2=94=82  mov     %eax,-0x4(%r13,%rcx,4)
  3.88 =E2=94=82     =E2=94=82  add     $0x1,%rcx
  0.00 =E2=94=82     =E2=94=82  cmp     %rdi,%rcx
  0.04 =E2=94=82     =E2=94=94=E2=94=80=E2=94=80jne     fb0

where the marked conditional move was a branch one revision before,
because, after fwprop3 the IL looked like:

  <bb 16> [local count: 955630217]:
  # cstore_281 =3D PHI <[fast_algorithms.c:142:53] sc_223(14),
[fast_algorithms.c:142:53] cstore_249(15)>
  [fast_algorithms.c:142:49] MEM <int> [(void *)_72] =3D cstore_281;
  [fast_algorithms.c:143:13] _78 =3D [fast_algorithms.c:143:13] *_72;
  [fast_algorithms.c:143:10] if (_78 < -987654321)
    goto <bb 18>; [50.00%]
  else
    goto <bb 17>; [50.00%]

  <bb 17> [local count: 477815109]:

  <bb 18> [local count: 955630217]:
  # cstore_250 =3D PHI <[fast_algorithms.c:143:33] -987654321(16),
[fast_algorithms.c:143:33] cstore_281(17)>
  [fast_algorithms.c:143:29] MEM <int> [(void *)_72] =3D cstore_250;

The aforementioned revision turned this into more optimized code:

  <bb 16> [local count: 955630217]:
  # cstore_281 =3D PHI <[fast_algorithms.c:142:53] sc_223(14),
[fast_algorithms.c:142:53] _73(15)>
  [fast_algorithms.c:143:10] if (cstore_281 < -987654321)
    goto <bb 18>; [50.00%]
  else
    goto <bb 17>; [50.00%]

  <bb 17> [local count: 477815109]:

  <bb 18> [local count: 955630217]:
  # cstore_250 =3D PHI <[fast_algorithms.c:143:33] -987654321(16),
[fast_algorithms.c:143:33] cstore_281(17)>
  [fast_algorithms.c:143:29] MEM <int> [(void *)_72] =3D cstore_250;

Which then phiopt3 changed to:

  cstore_248 =3D MAX_EXPR <cstore_249, -987654321>;
  [fast_algorithms.c:143:29] MEM <int> [(void *)_72] =3D cstore_248;

and expander apparently always expands MAX_EXPR into a conditional
move if it can(?).

When I hacked phiopt not to do the transformation for - ehm - any
GIMPLE_COND statement originating from source line 143, I recovered
the original run-time of the benchmark.  On both AMD and Intel.=