From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 100103858418; Wed, 1 Nov 2023 01:15:26 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 100103858418 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1698801326; bh=8bj0+CSquI3v/c5U5DQv7ICdGoJCPqMr3+jHZc8OdIU=; h=From:To:Subject:Date:In-Reply-To:References:From; b=UuakM6owUXQ4w4MDsMMm4v5M31DoRjqD67up0OPB5e4N5GN8vVAqoBWppK8Wj+hXV JwTymTFmvPd0PWKZOhuD4NKZF0OgTQ2FcLpXgZ2fIuhDghYmY/s2IiX+ACSGgA48aj QwPya50pDOiw3Ty4BsBa0XIb5raUZb+V+poI5dSw= From: "crazylht at gmail dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug middle-end/110015] openjpeg is slower when built with gcc13 compared to clang16 Date: Wed, 01 Nov 2023 01:15:22 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: middle-end X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: crazylht at gmail dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D110015 --- Comment #3 from Hongtao.liu --- 169test.c:85:23: note: vect_is_simple_use: operand max_38 =3D PHI , type of def: unknown 170test.c:85:23: missed: Unsupported pattern. 171test.c:62:24: missed: not vectorized: unsupported use in stmt. 172test.c:85:23: missed: unexpected pattern. 173test.c:85:23: note: ***** Analysis failed with vector mode V8SI 174test.c:85:23: note: ***** The result for vector mode V32QI would be the same 175test.c:85:23: missed: couldn't vectorize loop 176test.c:65:13: note: vectorized 0 loops in function. 177Removing basic block 5 178;; basic block 5, loop depth 2 179;; pred: 16 180;; 43 181# max_38 =3D PHI 182# i_42 =3D PHI 183# datap_44 =3D PHI 184tmp_24 =3D *datap_44; 185_35 =3D tmp_24 < 0; 186_56 =3D (unsigned int) tmp_24; 187_51 =3D -_56; 188_1 =3D (int) _51; 189_25 =3D MAX_EXPR <_1, max_38>; 190_31 =3D _1 | -2147483648; 191iftmp.0_27 =3D (unsigned int) _31; 192.MASK_STORE (datap_44, 8B, _35, iftmp.0_27); 193_26 =3D MAX_EXPR ; 194max_5 =3D _35 ? _25 : _26; 195i_29 =3D i_42 + 1; 196datap_30 =3D datap_44 + 4; 197if (w_22 > i_29) 198 goto ; [89.00%] 199else 200 goto ; [11.00%] 201;; succ: 16 So here we have a reduction for MAX_EXPR, but there's 2 MAX_EXPR which can = be merge together with MAX_EXPR > manually change the loop to below, then it can be vectorized. for (j =3D 0; j < t1->h; ++j) { const OPJ_UINT32 w =3D t1->w; for (i =3D 0; i < w; ++i, ++datap) { OPJ_INT32 tmp =3D *datap; if (tmp < 0) { OPJ_UINT32 tmp_unsigned; tmp_unsigned =3D opj_to_smr(tmp); memcpy(datap, &tmp_unsigned, sizeof(OPJ_INT32)); tmp =3D -tmp; } max =3D opj_int_max(max, tmp); } } maybe it's related to phiopt?=