From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 567443858D35; Thu, 29 Jun 2023 03:15:04 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 567443858D35
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1688008504;
	bh=yt1DFOnhp6hpnSjkUqmUh/C5wSDG9qjwM/uk3/KiA2Y=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=PHAHjBVpP40zQf+wMQUhXGxZnSvMN+p8dqm2rOLrJt0Xpf4sr/SDgQt3yccVjLHLw
	 S+t7J2NAmKWqs6UEGKLeHsAjQlAdF84W+SCMnpv5OcP8QjI0qndlOMYTP64V6qGp2Q
	 Lx+UFUyJoAcEOEuCJ6v/ysT2kt13UwB0uo9DSBHM=
From: "hliu at amperecomputing dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/110449] Vect: use a small step to calculate
 the loop induction if the loop is unrolled during loop vectorization
Date: Thu, 29 Jun 2023 03:15:03 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 14.0
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: hliu at amperecomputing dot com
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-110449-4-23dUeyIQuk@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-110449-4@http.gcc.gnu.org/bugzilla/>
References: <bug-110449-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D110449
--- Comment #2 from Hao Liu <hliu at amperecomputing dot com> ---
That looks better than the currently generated code (it saves one "MOV"
instruction). Yes, it has the loop-carried dependency advantage. But it sti=
ll
uses one more register for "8*step" (There may be a register pressure probl=
em
for complicated code, not for this simple case).=20

This is still a floating point precision problem. There is a PR84201 discus=
sed
about the same problem for X86:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D84201. The larger step makes=
 the
floating point calculation result has larger gap compared to the original
scalar calculation result. E.g. The SPEC2017 fp benchmark 549.fotonik may
result in VE (Validation Error) after unrolling a loop of double:=20
   319    do ifreq =3D 1, tmppower%nofreq <------ HERE
   320      frequency(ifreq,ipower) =3D freq
   321      freq =3D freq + freqstep
   322    end do

it uses 4*step for unrolled vectorization version other than the 2*step for
non-unrolled vectorization version. The SPEC fp result checks the "relative
tolerance" of the fp results and it is higher than the current standard (i.=
e.
the compare command line option of "--reltol 1e-10").=