From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 0466A386F45F; Sun, 27 Sep 2020 06:48:09 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0466A386F45F From: "rguenther at suse dot de" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled Date: Sun, 27 Sep 2020 06:48:09 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenther at suse dot de X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: linkw at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 27 Sep 2020 06:48:10 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D96789 --- Comment #24 from rguenther at suse dot de --- On September 27, 2020 4:56:43 AM GMT+02:00, crazylht at gmail dot com wrote: >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D96789 > >--- Comment #22 from Hongtao.liu --- >>One of my workmates found that if we disable vectorization for >SPEC2017 >525.x264_r function sub4x4_dct in source file >x264_src/common/dct.c with ?>explicit function attribute >__attribute__((optimize("no-tree-vectorize"))), it >can speed up by 4%. > >For CLX, if we disable slp vectorization in sub4x4_dct by=20 >__attribute__((optimize("no-tree-slp-vectorize"))), it can also speed >up by 4%. > >> Thanks Richi! Should we take care of this case? or neglect this kind >of >> extension as "no instruction"? I was intent to handle it in target >specific >> code, but it isn't recorded into cost vector while it seems too heavy >to do >> the bb_info slp_instances revisits in finish_cost. > >For i386 backend unsigned char --> unsigned short is no "no >instruction", but >in this case >--- >1033 _134 =3D MEM[(pixel *)pix1_295 + 2B];=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 >=20=20=20=20=20=20=20=20 >1034 _135 =3D (short unsigned int) _134; >--- > >It could be combined and optimized to=20 >--- >movzbl 19(%rcx), %r8d >--- > >So, if "unsigned char" variable is loaded from memory, then the >convertion >would also be "no instruction", i'm not sure if backend cost model >could handle >such situation. I think all attempts to address this from the side of the scalar cost is go= ing to be difficult and fragile..=