From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180]) by sourceware.org (Postfix) with ESMTPS id B0CBB3858C2C for ; Wed, 20 Dec 2023 09:50:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B0CBB3858C2C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org B0CBB3858C2C Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=68.232.137.180 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1703065818; cv=none; b=maberCjKS+fzWnoDHfMT2e8pYhT3vW0XMI6x/wpL/AF9QBaij/wNTldOBl1TRw479KJiMC1klxpSgy+bmr3F6c5QvW/fVeKm2t++kb0bTjVexXjOa4VsRQ7gSwlp+lCi1WkNKqH2fJczQGADokrnwOs8LteSQ9x33djzJrEf/6A= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1703065818; c=relaxed/simple; bh=Nq0mMr5uxtVwNr9yw3R5JrQARf3qpdd3sGpjU/OfZ/4=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=ek7hsSqEtnQVBEtPFWeZ0iaEyzMGvDU7TrGfPOqGD6aBZTA3z6hu1j4Em06UB/fh2ZX2rZVzlAMk9yoxAqoMuMM1fpA4+XPVH85bIQMqndSU7px5a75DKb+xgLM7+eUarXORhNONdEH8qkU7sRSaTtUYVqbmFfbIg/Jbr7hypl0= ARC-Authentication-Results: i=1; server2.sourceware.org X-CSE-ConnectionGUID: d6tokdl6RpWD5ajRyWa8xA== X-CSE-MsgGUID: X0SMjZ/fSjOD9SUcQedzkw== X-IronPort-AV: E=Sophos;i="6.04,291,1695715200"; d="scan'208";a="25822510" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa3.mentor.iphmx.com with ESMTP; 20 Dec 2023 01:50:14 -0800 IronPort-SDR: vMS8sS2j9Ji/mGp+EyIDj7a8brXOrKD/FAbbSul1XEzzoUen+yGi8J+FNhIsk0r+fWH9UOkjSq dSaMWOak7fhCpgn5dK8++KYnE/uPKS1MSkPuYGkYB2CACvBLyD7Ifk80fho4bgrA0PIfm3znmk /P83tSOWQepu8p+KCwBKXjlpOaj/BbJZAkPmcLx38JvF0qW7bXklYx66PUrG3I/JdTaz6Yc1qh E/UKwG6eYHjPuae9Fhiq9/UaBh5ZEnsbkKBFSmVqtAVcqOa367qTpZc53j7bPr/ElVwJA8jtpy /Yk= From: Thomas Schwinge To: Richard Biener , Andrew Stubbs , Julian Brown CC: Subject: Re: [PATCH] tree-optimization/113073 - amend PR112736 fix In-Reply-To: <20231219123224.3D481385E459@sourceware.org> References: <20231219123224.3D481385E459@sourceware.org> User-Agent: Notmuch/0.29.3+94~g74c3f1b (https://notmuchmail.org) Emacs/28.2 (x86_64-pc-linux-gnu) Date: Wed, 20 Dec 2023 10:50:07 +0100 Message-ID: <871qbh4300.fsf@euler.schwinge.homeip.net> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) To svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00,GIT_PATCH_0,HEADER_FROM_DIFFERENT_DOMAINS,KAM_DMARC_STATUS,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi! On 2023-12-19T13:30:58+0100, Richard Biener wrote: > The PR112736 testcase fails on RISC-V because the aligned exception > uses the wrong check. The alignment support scheme can be > dr_aligned even when the access isn't aligned to the vector size > but some targets are happy with element alignment. The following > fixes that. > > Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. I've noticed this to regresses GCN target as follows: PASS: gcc.dg/vect/bb-slp-pr78205.c (test for excess errors) PASS: gcc.dg/vect/bb-slp-pr78205.c scan-tree-dump-times slp2 "optimized= : basic block" 3 PASS: gcc.dg/vect/bb-slp-pr78205.c scan-tree-dump-times slp2 "BB vector= ization with gaps at the end of a load is not supported" 1 [-PASS:-]{+FAIL:+} gcc.dg/vect/bb-slp-pr78205.c scan-tree-dump-times op= timized " =3D c\\[4\\];" 1 As so often, I've got no clue whether that's a vectorizer, GCN back end, or test case issue. ;-) 'diff'ing before vs. after: --- bb-slp-pr78205.c.191t.slp2 2023-12-20 09:49:45.834344620 +01= 00 +++ bb-slp-pr78205.c.191t.slp2 2023-12-20 09:10:14.706300941 +01= 00 [...] @@ -505,8 +505,9 @@ [...]/bb-slp-pr78205.c:9:8: note: create vector_type-pointer variable = to type: vector(4) double vectorizing a pointer ref: c[0] [...]/bb-slp-pr78205.c:9:8: note: created &c[0] [...]/bb-slp-pr78205.c:9:8: note: add new stmt: vect__1.7_19 =3D MEM <= vector(4) double> [(double *)&c]; -[...]/bb-slp-pr78205.c:9:8: note: add new stmt: vect__1.8_20 =3D MEM <= vector(4) double> [(double *)&c + 32B]; -[...]/bb-slp-pr78205.c:9:8: note: add new stmt: vect__1.9_21 =3D VEC_P= ERM_EXPR ; +[...]/bb-slp-pr78205.c:9:8: note: add new stmt: _20 =3D MEM[(double *)= &c + 32B]; +[...]/bb-slp-pr78205.c:9:8: note: add new stmt: vect__1.8_21 =3D {_20,= 0.0, 0.0, 0.0}; +[...]/bb-slp-pr78205.c:9:8: note: add new stmt: vect__1.9_22 =3D VEC_P= ERM_EXPR ; [...]/bb-slp-pr78205.c:9:8: note: ------>vectorizing SLP node starting= from: a[0] =3D _1; [...]/bb-slp-pr78205.c:9:8: note: vect_is_simple_use: operand c[0], ty= pe of def: internal [...]/bb-slp-pr78205.c:9:8: note: vect_is_simple_use: operand c[1], ty= pe of def: internal [...] @@ -537,9 +538,10 @@ [...]/bb-slp-pr78205.c:13:8: note: transform load. ncopies =3D 1 [...]/bb-slp-pr78205.c:13:8: note: create vector_type-pointer variable= to type: vector(4) double vectorizing a pointer ref: c[2] [...]/bb-slp-pr78205.c:13:8: note: created &c[2] -[...]/bb-slp-pr78205.c:13:8: note: add new stmt: vect__3.14_23 =3D MEM= [(double *)&c]; -[...]/bb-slp-pr78205.c:13:8: note: add new stmt: vect__3.15_24 =3D MEM= [(double *)&c + 32B]; -[...]/bb-slp-pr78205.c:13:8: note: add new stmt: vect__1.16_25 =3D VEC= _PERM_EXPR ; +[...]/bb-slp-pr78205.c:13:8: note: add new stmt: vect__3.14_24 =3D MEM= [(double *)&c]; +[...]/bb-slp-pr78205.c:13:8: note: add new stmt: _25 =3D MEM[(double *= )&c + 32B]; +[...]/bb-slp-pr78205.c:13:8: note: add new stmt: vect__3.15_26 =3D {_2= 5, 0.0, 0.0, 0.0}; +[...]/bb-slp-pr78205.c:13:8: note: add new stmt: vect__1.16_27 =3D VEC= _PERM_EXPR ; [...]/bb-slp-pr78205.c:13:8: note: ------>vectorizing SLP node startin= g from: b[0] =3D _3; [...]/bb-slp-pr78205.c:13:8: note: vect_is_simple_use: operand c[2], t= ype of def: internal [...]/bb-slp-pr78205.c:13:8: note: vect_is_simple_use: operand c[3], t= ype of def: internal [...] @@ -580,18 +582,22 @@ double _4; double _5; vector(2) double _17; + double _20; + double _25; [local count: 1073741824]: vect__1.7_19 =3D MEM [(double *)&c]; - vect__1.9_21 =3D VEC_PERM_EXPR ; + _20 =3D MEM[(double *)&c + 32B]; + vect__1.9_22 =3D VEC_PERM_EXPR ; _1 =3D c[0]; _2 =3D c[1]; - MEM [(double *)&a] =3D vect__1.9_21; - vect__3.14_23 =3D MEM [(double *)&c]; - vect__1.16_25 =3D VEC_PERM_EXPR ; + MEM [(double *)&a] =3D vect__1.9_22; + vect__3.14_24 =3D MEM [(double *)&c]; + _25 =3D MEM[(double *)&c + 32B]; + vect__1.16_27 =3D VEC_PERM_EXPR ; _3 =3D c[2]; _4 =3D c[3]; - MEM [(double *)&b] =3D vect__1.16_25; + MEM [(double *)&b] =3D vect__1.16_27; _5 =3D c[4]; _17 =3D {_5, _5}; MEM [(double *)&x] =3D _17; --- bb-slp-pr78205.c.265t.optimized 2023-12-20 09:49:45.838344586 +01= 00 +++ bb-slp-pr78205.c.265t.optimized 2023-12-20 09:10:14.706300941 +01= 00 @@ -6,17 +6,17 @@ vector(4) double vect__1.16; vector(4) double vect__1.9; vector(4) double vect__1.7; - double _5; vector(2) double _17; + double _20; [local count: 1073741824]: vect__1.7_19 =3D MEM [(double *)&c]; - vect__1.9_21 =3D VEC_PERM_EXPR ; - MEM [(double *)&a] =3D vect__1.9_21; - vect__1.16_25 =3D VEC_PERM_EXPR ; - MEM [(double *)&b] =3D vect__1.16_25; - _5 =3D c[4]; - _17 =3D {_5, _5}; + _20 =3D MEM[(double *)&c + 32B]; + vect__1.9_22 =3D VEC_PERM_EXPR ; + MEM [(double *)&a] =3D vect__1.9_22; + vect__1.16_27 =3D VEC_PERM_EXPR ; + MEM [(double *)&b] =3D vect__1.16_27; + _17 =3D {_20, _20}; MEM [(double *)&x] =3D _17; return; --- bb-slp-pr78205.s 2023-12-20 09:49:45.846344519 +0100 +++ bb-slp-pr78205.s 2023-12-20 09:10:14.722300807 +0100 @@ -41,7 +41,17 @@ v_addc_co_u32 v7, s[22:23], 0, v7, s[22:23] flat_load_dwordx2 v[6:7], v[6:7] offset:0 s_waitcnt 0 + v_writelane_b32 v8, s12, 0 + v_writelane_b32 v9, s13, 0 + s_mov_b64 exec, 1 + v_add_co_u32 v8, vcc, 32, v8 + v_addc_co_u32 v9, vcc, 0, v9, vcc + flat_load_dwordx2 v[8:9], v[8:9] + s_waitcnt 0 + v_readlane_b32 s12, v8, 0 + v_readlane_b32 s13, v9, 0 s_mov_b64 s[18:19], 10 + s_mov_b64 exec, 15 v_cndmask_b32 v0, 0, 4, s[18:19] s_mov_b64 exec, 15 v_mov_b32 v11, v7 @@ -73,15 +83,6 @@ v_mov_b32 v5, s19 v_addc_co_u32 v5, s[22:23], 0, v5, s[22:23] flat_store_dwordx2 v[4:5], v[6:7] offset:0 - v_writelane_b32 v4, s12, 0 - v_writelane_b32 v5, s13, 0 - s_mov_b64 exec, 1 - v_add_co_u32 v4, vcc, 32, v4 - v_addc_co_u32 v5, vcc, 0, v5, vcc - flat_load_dwordx2 v[4:5], v[4:5] - s_waitcnt 0 - v_readlane_b32 s12, v4, 0 - v_readlane_b32 s13, v5, 0 s_mov_b64 exec, 3 v_mov_b32 v6, s12 v_mov_b32 v7, s13 I haven't looked at the full context, but this appears to effectively just move this block of code, and use different registers. In: + s_mov_b64 exec, 15 v_cndmask_b32 v0, 0, 4, s[18:19] s_mov_b64 exec, 15 ... isn't the second (pre-existing) 's_mov_b64 exec, 15' now redundant, though? Gr=C3=BC=C3=9Fe Thomas > PR tree-optimization/113073 > * tree-vect-stmts.cc (vectorizable_load): Properly ensure > to exempt only vector-size aligned overreads. > --- > gcc/tree-vect-stmts.cc | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > index fc6923cf68a..e9ff728dfd4 100644 > --- a/gcc/tree-vect-stmts.cc > +++ b/gcc/tree-vect-stmts.cc > @@ -11476,7 +11476,9 @@ vectorizable_load (vec_info *vinfo, > - (group_size * vf - gap), nunits)) > /* DR will be unused. */ > ltype =3D NULL_TREE; > - else if (alignment_support_scheme =3D=3D dr_aligned= ) > + else if (known_ge (vect_align, > + tree_to_poly_uint64 > + (TYPE_SIZE_UNIT (vectype)))) > /* Aligned access to excess elements is OK if > at least one element is accessed in the > scalar loop. */ ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstra=C3=9Fe 201= , 80634 M=C3=BCnchen; Gesellschaft mit beschr=C3=A4nkter Haftung; Gesch=C3= =A4ftsf=C3=BChrer: Thomas Heurung, Frank Th=C3=BCrauf; Sitz der Gesellschaf= t: M=C3=BCnchen; Registergericht M=C3=BCnchen, HRB 106955