From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 100103858418; Wed,  1 Nov 2023 01:15:26 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 100103858418
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1698801326;
	bh=8bj0+CSquI3v/c5U5DQv7ICdGoJCPqMr3+jHZc8OdIU=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=UuakM6owUXQ4w4MDsMMm4v5M31DoRjqD67up0OPB5e4N5GN8vVAqoBWppK8Wj+hXV
	 JwTymTFmvPd0PWKZOhuD4NKZF0OgTQ2FcLpXgZ2fIuhDghYmY/s2IiX+ACSGgA48aj
	 QwPya50pDOiw3Ty4BsBa0XIb5raUZb+V+poI5dSw=
From: "crazylht at gmail dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug middle-end/110015] openjpeg is slower when built with gcc13
 compared to clang16
Date: Wed, 01 Nov 2023 01:15:22 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: middle-end
X-Bugzilla-Version: 14.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: crazylht at gmail dot com
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-110015-4-QB36VboRco@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-110015-4@http.gcc.gnu.org/bugzilla/>
References: <bug-110015-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D110015
--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
169test.c:85:23: note:   vect_is_simple_use: operand max_38 =3D PHI <max_5(=
16),
max_40(43)>, type of def: unknown
170test.c:85:23: missed:   Unsupported pattern.
171test.c:62:24: missed:   not vectorized: unsupported use in stmt.
172test.c:85:23: missed:  unexpected pattern.
173test.c:85:23: note:  ***** Analysis  failed with vector mode V8SI
174test.c:85:23: note:  ***** The result for vector mode V32QI would be the
same
175test.c:85:23: missed: couldn't vectorize loop
176test.c:65:13: note: vectorized 0 loops in function.
177Removing basic block 5
178;; basic block 5, loop depth 2
179;;  pred:       16
180;;              43
181# max_38 =3D PHI <max_5(16), max_40(43)>
182# i_42 =3D PHI <i_29(16), 0(43)>
183# datap_44 =3D PHI <datap_30(16), datap_46(43)>
184tmp_24 =3D *datap_44;
185_35 =3D tmp_24 < 0;
186_56 =3D (unsigned int) tmp_24;
187_51 =3D -_56;
188_1 =3D (int) _51;
189_25 =3D MAX_EXPR <_1, max_38>;
190_31 =3D _1 | -2147483648;
191iftmp.0_27 =3D (unsigned int) _31;
192.MASK_STORE (datap_44, 8B, _35, iftmp.0_27);
193_26 =3D MAX_EXPR <tmp_24, max_38>;
194max_5 =3D _35 ? _25 : _26;
195i_29 =3D i_42 + 1;
196datap_30 =3D datap_44 + 4;
197if (w_22 > i_29)
198  goto <bb 16>; [89.00%]
199else
200  goto <bb 9>; [11.00%]
201;;  succ:       16

So here we have a reduction for MAX_EXPR, but there's 2 MAX_EXPR which can =
be
merge together with MAX_EXPR <max_38, ABS_EXPR <tmp>>

manually change the loop to below, then it can be vectorized.

    for (j =3D 0; j < t1->h; ++j) {
        const OPJ_UINT32 w =3D t1->w;
        for (i =3D 0; i < w; ++i, ++datap) {
            OPJ_INT32 tmp =3D *datap;
            if (tmp < 0)
              {
                OPJ_UINT32 tmp_unsigned;
                tmp_unsigned =3D opj_to_smr(tmp);
                memcpy(datap, &tmp_unsigned, sizeof(OPJ_INT32));
                tmp =3D -tmp;
              }
            max =3D opj_int_max(max, tmp);
        }
    }

maybe it's related to phiopt?=