From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2a07:de40:b251:101:10:150:64:1]) by sourceware.org (Postfix) with ESMTPS id D898A3858CD1 for ; Wed, 20 Dec 2023 13:45:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D898A3858CD1 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D898A3858CD1 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a07:de40:b251:101:10:150:64:1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1703079944; cv=none; b=ZpYJXZlG6kH5dfojY+ixF4QYftqYEy429h7/aObMcglKgDTgi0ioj4aMk52ALjqUQHnTMnUBjb7uUxJeKIxm/tueYt+5TCEWk2vbBAYvBsWhUwY2gd4v39Pihd2gIWDc4VkMaAjKBf2WrNA6u5fUKhn8jglYvc2xozQOMrSEWeE= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1703079944; c=relaxed/simple; bh=XZu6/wM/R/OJm7gF45xeEoUMjMfopVWINRxlTbmO2GA=; h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature:Date: From:To:Subject:Message-ID:MIME-Version; b=jkhGxpaHh0mP8Ee2/tkZDPuTpr8ZmlomG9kYJSVNka1o0Gf4MYE4NyYFLiJHpV/44lvost1nVTLjfHkN9zQ6ZN4ZDoJTH6bON5TKzOCnqd6Hv6VmIXBnlpNROizV9MgcmqkbwjsusZ1BxfNzTRJ6CVzPTqs1AJ0QJ7D5hgfo1X4= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from [10.168.4.150] (unknown [10.168.4.150]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id CCDEC22325; Wed, 20 Dec 2023 13:45:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1703079941; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Cvc1dtMAVFlF53HXo8/TMk6es3KElRHxz0SUzOT9COA=; b=XI69uBMO3QvG+StV50BQ+ICzYEb2MqBeSMnEyNgPwWWx+74LGX2mYO89TJhcd6blPoN5hf TECEBqikI9sQDBS6fP2m4peK2zMp/iD3k6yQnffX+oMU/ZKoXGYl9huNrwSQdNaTFzwz8r rBKRxKlWzzEFf8oIKCwhyYlQKZbpfF8= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1703079941; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Cvc1dtMAVFlF53HXo8/TMk6es3KElRHxz0SUzOT9COA=; b=Jv5qC43STXP+G/k46oYnt1RW4ysyuIh6WLfbPwO07wA1LPipVfGjJWgGET0KskC5P33PH/ rrR73OJAYopcy/AA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1703079941; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Cvc1dtMAVFlF53HXo8/TMk6es3KElRHxz0SUzOT9COA=; b=XI69uBMO3QvG+StV50BQ+ICzYEb2MqBeSMnEyNgPwWWx+74LGX2mYO89TJhcd6blPoN5hf TECEBqikI9sQDBS6fP2m4peK2zMp/iD3k6yQnffX+oMU/ZKoXGYl9huNrwSQdNaTFzwz8r rBKRxKlWzzEFf8oIKCwhyYlQKZbpfF8= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1703079941; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Cvc1dtMAVFlF53HXo8/TMk6es3KElRHxz0SUzOT9COA=; b=Jv5qC43STXP+G/k46oYnt1RW4ysyuIh6WLfbPwO07wA1LPipVfGjJWgGET0KskC5P33PH/ rrR73OJAYopcy/AA== Date: Wed, 20 Dec 2023 14:44:29 +0100 (CET) From: Richard Biener To: Thomas Schwinge cc: Andrew Stubbs , Julian Brown , gcc-patches@gcc.gnu.org Subject: Re: [PATCH] tree-optimization/113073 - amend PR112736 fix In-Reply-To: <2798rq4n-6qq4-6804-p37s-6845p837s57r@fhfr.qr> Message-ID: <712143qr-8oo8-p656-2568-p19q92o27r5s@fhfr.qr> References: <20231219123224.3D481385E459@sourceware.org> <871qbh4300.fsf@euler.schwinge.homeip.net> <2798rq4n-6qq4-6804-p37s-6845p837s57r@fhfr.qr> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Level: Authentication-Results: smtp-out1.suse.de; none X-Spam-Level: X-Spam-Score: -4.18 X-Spamd-Result: default: False [-4.18 / 50.00]; ARC_NA(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; TO_DN_SOME(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; NEURAL_HAM_LONG(-1.00)[-1.000]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; NEURAL_HAM_SHORT(-0.08)[-0.416]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.de:email]; FUZZY_BLOCKED(0.00)[rspamd.com]; RCVD_COUNT_ZERO(0.00)[0]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; BAYES_HAM(-3.00)[100.00%] X-Spam-Status: No, score=-5.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, 20 Dec 2023, Richard Biener wrote: > On Wed, 20 Dec 2023, Thomas Schwinge wrote: > > > Hi! > > > > On 2023-12-19T13:30:58+0100, Richard Biener wrote: > > > The PR112736 testcase fails on RISC-V because the aligned exception > > > uses the wrong check. The alignment support scheme can be > > > dr_aligned even when the access isn't aligned to the vector size > > > but some targets are happy with element alignment. The following > > > fixes that. > > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. > > > > I've noticed this to regresses GCN target as follows: > > > > PASS: gcc.dg/vect/bb-slp-pr78205.c (test for excess errors) > > PASS: gcc.dg/vect/bb-slp-pr78205.c scan-tree-dump-times slp2 "optimized: basic block" 3 > > PASS: gcc.dg/vect/bb-slp-pr78205.c scan-tree-dump-times slp2 "BB vectorization with gaps at the end of a load is not supported" 1 > > [-PASS:-]{+FAIL:+} gcc.dg/vect/bb-slp-pr78205.c scan-tree-dump-times optimized " = c\\[4\\];" 1 > > > > As so often, I've got no clue whether that's a vectorizer, GCN back end, > > or test case issue. ;-) > > > > 'diff'ing before vs. after: > > > > --- bb-slp-pr78205.c.191t.slp2 2023-12-20 09:49:45.834344620 +0100 > > +++ bb-slp-pr78205.c.191t.slp2 2023-12-20 09:10:14.706300941 +0100 > > [...] > > @@ -505,8 +505,9 @@ > > [...]/bb-slp-pr78205.c:9:8: note: create vector_type-pointer variable to type: vector(4) double vectorizing a pointer ref: c[0] > > [...]/bb-slp-pr78205.c:9:8: note: created &c[0] > > [...]/bb-slp-pr78205.c:9:8: note: add new stmt: vect__1.7_19 = MEM [(double *)&c]; > > -[...]/bb-slp-pr78205.c:9:8: note: add new stmt: vect__1.8_20 = MEM [(double *)&c + 32B]; > > -[...]/bb-slp-pr78205.c:9:8: note: add new stmt: vect__1.9_21 = VEC_PERM_EXPR ; > > +[...]/bb-slp-pr78205.c:9:8: note: add new stmt: _20 = MEM[(double *)&c + 32B]; > > +[...]/bb-slp-pr78205.c:9:8: note: add new stmt: vect__1.8_21 = {_20, 0.0, 0.0, 0.0}; > > +[...]/bb-slp-pr78205.c:9:8: note: add new stmt: vect__1.9_22 = VEC_PERM_EXPR ; > > [...]/bb-slp-pr78205.c:9:8: note: ------>vectorizing SLP node starting from: a[0] = _1; > > [...]/bb-slp-pr78205.c:9:8: note: vect_is_simple_use: operand c[0], type of def: internal > > [...]/bb-slp-pr78205.c:9:8: note: vect_is_simple_use: operand c[1], type of def: internal > > [...] > > @@ -537,9 +538,10 @@ > > [...]/bb-slp-pr78205.c:13:8: note: transform load. ncopies = 1 > > [...]/bb-slp-pr78205.c:13:8: note: create vector_type-pointer variable to type: vector(4) double vectorizing a pointer ref: c[2] > > [...]/bb-slp-pr78205.c:13:8: note: created &c[2] > > -[...]/bb-slp-pr78205.c:13:8: note: add new stmt: vect__3.14_23 = MEM [(double *)&c]; > > -[...]/bb-slp-pr78205.c:13:8: note: add new stmt: vect__3.15_24 = MEM [(double *)&c + 32B]; > > -[...]/bb-slp-pr78205.c:13:8: note: add new stmt: vect__1.16_25 = VEC_PERM_EXPR ; > > +[...]/bb-slp-pr78205.c:13:8: note: add new stmt: vect__3.14_24 = MEM [(double *)&c]; > > +[...]/bb-slp-pr78205.c:13:8: note: add new stmt: _25 = MEM[(double *)&c + 32B]; > > +[...]/bb-slp-pr78205.c:13:8: note: add new stmt: vect__3.15_26 = {_25, 0.0, 0.0, 0.0}; > > +[...]/bb-slp-pr78205.c:13:8: note: add new stmt: vect__1.16_27 = VEC_PERM_EXPR ; > > [...]/bb-slp-pr78205.c:13:8: note: ------>vectorizing SLP node starting from: b[0] = _3; > > [...]/bb-slp-pr78205.c:13:8: note: vect_is_simple_use: operand c[2], type of def: internal > > [...]/bb-slp-pr78205.c:13:8: note: vect_is_simple_use: operand c[3], type of def: internal > > [...] > > @@ -580,18 +582,22 @@ > > double _4; > > double _5; > > vector(2) double _17; > > + double _20; > > + double _25; > > > > [local count: 1073741824]: > > vect__1.7_19 = MEM [(double *)&c]; > > - vect__1.9_21 = VEC_PERM_EXPR ; > > + _20 = MEM[(double *)&c + 32B]; > > + vect__1.9_22 = VEC_PERM_EXPR ; > > _1 = c[0]; > > _2 = c[1]; > > - MEM [(double *)&a] = vect__1.9_21; > > - vect__3.14_23 = MEM [(double *)&c]; > > - vect__1.16_25 = VEC_PERM_EXPR ; > > + MEM [(double *)&a] = vect__1.9_22; > > + vect__3.14_24 = MEM [(double *)&c]; > > + _25 = MEM[(double *)&c + 32B]; > > that looks like a noop (but it's odd we keep the unused load) > > > + vect__1.16_27 = VEC_PERM_EXPR ; > > _3 = c[2]; > > _4 = c[3]; > > - MEM [(double *)&b] = vect__1.16_25; > > + MEM [(double *)&b] = vect__1.16_27; > > _5 = c[4]; > > _17 = {_5, _5}; > > MEM [(double *)&x] = _17; > > > > --- bb-slp-pr78205.c.265t.optimized 2023-12-20 09:49:45.838344586 +0100 > > +++ bb-slp-pr78205.c.265t.optimized 2023-12-20 09:10:14.706300941 +0100 > > @@ -6,17 +6,17 @@ > > vector(4) double vect__1.16; > > vector(4) double vect__1.9; > > vector(4) double vect__1.7; > > - double _5; > > vector(2) double _17; > > + double _20; > > > > [local count: 1073741824]: > > vect__1.7_19 = MEM [(double *)&c]; > > - vect__1.9_21 = VEC_PERM_EXPR ; > > - MEM [(double *)&a] = vect__1.9_21; > > - vect__1.16_25 = VEC_PERM_EXPR ; > > - MEM [(double *)&b] = vect__1.16_25; > > - _5 = c[4]; > > - _17 = {_5, _5}; > > + _20 = MEM[(double *)&c + 32B]; > > that looks similar in the end, but trades c[4] for MEM here. That's > because we CSEd c[4] with the "dead" load above. > > I think we could simply remove that extra scan-tree-dump-times, we > should instead look to _not_ see a vector load starting at c[4], > but scan-tree-dump-not is also not very reliable (in a false negative > sense). > > I'll see what that "dead" load is. Should be fixed by r14-6748-ga8f0278ade1353 Richard.