From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 4FE543858C56; Fri, 14 Oct 2022 08:52:45 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4FE543858C56
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1665737565;
	bh=Hr85dqtZSDB2RDiUVDDRVms7x2uoxjdBiQX/X9bFiMY=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=oHEolwuSICz0NIRd53jR72bOVlyuWGsFnh8hcNUKrt43EUKVgHoDJ9JzH7tkXn8OD
	 fS8pCz0D2zc4Yww7GTqeSG2YHFLmTUz6hudXvmd9Eq+EOI2yRCS+KV5SLpp9+VLGix
	 Q9xv54Ofu6yfdPt4no5zokazySOQVYvaxR6GRnhI=
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/107254] [11/12/13 Regression] Wrong
 vectorizer code (Fortran) since r11-1501-gda2b7c7f0a136b4d
Date: Fri, 14 Oct 2022 08:52:44 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 11.3.1
X-Bugzilla-Keywords: wrong-code
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 11.4
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-107254-4-ZAUcWQ07vM@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-107254-4@http.gcc.gnu.org/bugzilla/>
References: <bug-107254-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D107254
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
So I think the issue is that we have a live operation:

  <bb 3> [local count: 955630225]:
  # jc_42 =3D PHI <jc_36(7), 1(6)>
  _2 =3D (integer(kind=3D8)) jc_42;
  _3 =3D _2 * stride.1_23;
  _4 =3D _3 + offset.2_24;
  _5 =3D _4 + 1;
  _6 =3D (*h_28(D))[_5];
  _7 =3D _6 * 9.0000000000000002220446049250313080847263336181640625e-1;
  _8 =3D _4 + 2;
  _9 =3D (*h_28(D))[_8];
  temp_29 =3D _7 + _9;
  _10 =3D _9 * 9.0000000000000002220446049250313080847263336181640625e-1;
  _11 =3D _10 - _6;
  (*h_28(D))[_8] =3D _11;
  (*h_28(D))[_5] =3D temp_29;
  _12 =3D (*t_32(D))[_5];
  _13 =3D _12 * 9.0000000000000002220446049250313080847263336181640625e-1;
  _14 =3D (*t_32(D))[_8];
  _15 =3D _13 + _14;
  _16 =3D _14 * 9.0000000000000002220446049250313080847263336181640625e-1;
  _17 =3D _16 - _12;
  (*t_32(D))[_8] =3D _17;
  (*t_32(D))[_5] =3D _15;
  jc_36 =3D jc_42 + 1;
  if (_1 < jc_36)
    goto <bb 4>; [11.00%]
  else
    goto <bb 7>; [89.00%]

  <bb 7> [local count: 850510901]:
  goto <bb 3>; [100.00%]

  <bb 4> [local count: 105119324]:
  # _72 =3D PHI <_15(3)>
  # _71 =3D PHI <_17(3)>
^^^

that we fail to vectorize which keeps the _12 and _14 loads live but
those are from the wrong iteration.  The out-of loop stores are
caused by invariant motion of 'temp2' (it's memory because it's passed
to dlartg) and which we apperantly cannot disambiguate against the
stores to t so we re-materialize them on the loop exit.

The vectorizer sees

t.f90:27:9: note:   init: stmt relevant? _15 =3D _13 + _14;
t.f90:27:9: note:   vec_stmt_relevant_p: used out of loop.
t.f90:27:9: note:   vect_is_simple_use: operand _12 *
9.0000000000000002220446049250313080847263336181640625e-1, type of def:
internal
t.f90:27:9: note:   vec_stmt_relevant_p: stmt live but not relevant.
t.f90:27:9: note:   mark relevant 1, live 1: _15 =3D _13 + _14;

and later

t.f90:27:9: note:   op: VEC_PERM_EXPR
t.f90:27:9: note:       stmt 0 _15 =3D _13 + _14;
t.f90:27:9: note:       stmt 1 _17 =3D _16 - _12;
t.f90:27:9: note:       lane permutation { 0[0] 1[1] }
t.f90:27:9: note:       children 0x3358e90 0x3358f18
t.f90:27:9: note:   node 0x3358e90 (max_nunits=3D1, refcnt=3D1) vector(4)
real(kind=3D8)
t.f90:27:9: note:   op template: _15 =3D _13 + _14;
t.f90:27:9: note:       { }
t.f90:27:9: note:       children 0x3358c70 0x3358e08
...
t.f90:27:9: note:   node 0x3358f18 (max_nunits=3D1, refcnt=3D1) vector(4)
real(kind=3D8)
t.f90:27:9: note:   op template: _17 =3D _16 - _12;
t.f90:27:9: note:       { }
t.f90:27:9: note:       children 0x3358c70 0x3358e08

investigating further.=