From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id 0466A386F45F; Sun, 27 Sep 2020 06:48:09 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0466A386F45F
From: "rguenther at suse dot de" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/96789] x264: sub4x4_dct() improves when vectorization is
 disabled
Date: Sun, 27 Sep 2020 06:48:09 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 11.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenther at suse dot de
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: linkw at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-96789-4-6uK4lYmceq@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-96789-4@http.gcc.gnu.org/bugzilla/>
References: <bug-96789-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Sun, 27 Sep 2020 06:48:10 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D96789
--- Comment #24 from rguenther at suse dot de <rguenther at suse dot de> ---
On September 27, 2020 4:56:43 AM GMT+02:00, crazylht at gmail dot com
<gcc-bugzilla@gcc.gnu.org> wrote:
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D96789
>
>--- Comment #22 from Hongtao.liu <crazylht at gmail dot com> ---
>>One of my workmates found that if we disable vectorization for
>SPEC2017 >525.x264_r function sub4x4_dct in source file
>x264_src/common/dct.c with ?>explicit function attribute
>__attribute__((optimize("no-tree-vectorize"))), it >can speed up by 4%.
>
>For CLX, if we disable slp vectorization in sub4x4_dct by=20
>__attribute__((optimize("no-tree-slp-vectorize"))), it can also speed
>up by 4%.
>
>> Thanks Richi! Should we take care of this case? or neglect this kind
>of
>> extension as "no instruction"? I was intent to handle it in target
>specific
>> code, but it isn't recorded into cost vector while it seems too heavy
>to do
>> the bb_info slp_instances revisits in finish_cost.
>
>For i386 backend unsigned char --> unsigned short is no "no
>instruction", but
>in this case
>---
>1033  _134 =3D MEM[(pixel *)pix1_295 + 2B];=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
>=20=20=20=20=20=20=20=20
>1034  _135 =3D (short unsigned int) _134;
>---
>
>It could be combined and optimized to=20
>---
>movzbl  19(%rcx), %r8d
>---
>
>So, if "unsigned char" variable is loaded from memory, then the
>convertion
>would also be "no instruction", i'm not sure if backend cost model
>could handle
>such situation.

I think all attempts to address this from the side of the scalar cost is go=
ing
to be difficult and fragile..=