From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x12d.google.com (mail-lf1-x12d.google.com [IPv6:2a00:1450:4864:20::12d]) by sourceware.org (Postfix) with ESMTPS id 5F9043858C62 for ; Wed, 12 Jul 2023 11:05:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5F9043858C62 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-lf1-x12d.google.com with SMTP id 2adb3069b0e04-4fb7589b187so10991597e87.1 for ; Wed, 12 Jul 2023 04:05:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1689159906; x=1691751906; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=le+NVpx113bkoNSRfvrUDiIG8PEGa2IORU7Ipb7WFbo=; b=EwDaVvrWiuwOlLRLld3HCgr8pkEZo3GHXoNNTUqlnvR0zbRblfbqt0Lx2U95odpGev ocFhLasDCd4oaIeRrAt1SRBHjdassaLreGF/VSbWpCoBbmaPZB/gVPvZm60J7yFK6QD+ 5ccWIBB10Dr8j92kGsy/4yfxjJGkpjqt8xBb+OfXeCHVINHmFXftmr9vzC1m4XNDF4yS 6G/VSaptwoxPSt3VpfBOgF5Si5SAU5dXrmIhg+PGDa+4+c3N3v6CngE22d5E9YybT5CR gGl6bTvyJx7HOFK6+IixxXwCQ2no2BF7jYfPztezNZj01ZWg1kqqQggofrJfSwS7mvo7 fYDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689159906; x=1691751906; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=le+NVpx113bkoNSRfvrUDiIG8PEGa2IORU7Ipb7WFbo=; b=VaobKBdeHlFyG/vtOtDa0AlvaueHAe3RgH5HkRQD1PxskCc2F1htTvFXru3TcnWX72 X/iSYkIu389Kbei4Lb9cDD9ohkWdlJeH1G9vgQOSzTzBL9oD635MviIISl3ffMapmJGR a3V6b1oooSqDvtu6F1VQFp6klosCzprb+BBuHj0+i4PSqgdHFWIuL0yvPe3FczS1C9af agw0akp5NVLpEzFiL5cuk+BQXv6TLQ0D/tJ8RnlrqJKF+bN1BxvaPbd8SF42JgC4AWoC eF4MWrEXtEDAR3ECTv1lCryQVvcWS+Emc7zjvaUOYBNWJWEVg39hGypNS3Mo8EKN7q54 +Hvg== X-Gm-Message-State: ABy/qLZ/j5uCL2k41XTzPEM0PuZJWCuzUChTc5sdJCMe50gg8DrqwPFM ttdv3KWs99z5t/pYS0UTPE4tzXk497UVTx13MIWZFAkt2uwsgQ== X-Google-Smtp-Source: APBJJlF0GcjWW4ev5uOB4J4u30Lke7mW2F2SrGCGOjYasj5n3cBSjP+fAYkVD6QlFj3WnPAkR6YGUCC8qADV/lj5meU= X-Received: by 2002:a05:6512:2525:b0:4f9:556b:93c5 with SMTP id be37-20020a056512252500b004f9556b93c5mr17726886lfb.40.1689159906019; Wed, 12 Jul 2023 04:05:06 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Uros Bizjak Date: Wed, 12 Jul 2023 13:04:54 +0200 Message-ID: Subject: Re: [PATCH] simplify-rtx: Fix invalid simplification with paradoxical subregs [PR110206] To: Richard Biener via Gcc-patches , Richard Biener , richard.sandiford@arm.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, Jul 12, 2023 at 12:58=E2=80=AFPM Uros Bizjak wr= ote: > > On Wed, Jul 12, 2023 at 12:23=E2=80=AFPM Richard Sandiford > wrote: > > > > Richard Biener via Gcc-patches writes: > > > On Mon, Jul 10, 2023 at 1:01=E2=80=AFPM Uros Bizjak wrote: > > >> > > >> On Mon, Jul 10, 2023 at 11:47=E2=80=AFAM Richard Biener > > >> wrote: > > >> > > > >> > On Mon, Jul 10, 2023 at 11:26=E2=80=AFAM Uros Bizjak wrote: > > >> > > > > >> > > On Mon, Jul 10, 2023 at 11:17=E2=80=AFAM Richard Biener > > >> > > wrote: > > >> > > > > > >> > > > On Sun, Jul 9, 2023 at 10:53=E2=80=AFAM Uros Bizjak via Gcc-pa= tches > > >> > > > wrote: > > >> > > > > > > >> > > > > As shown in the PR, simplify_gen_subreg call in simplify_rep= lace_fn_rtx: > > >> > > > > > > >> > > > > (gdb) list > > >> > > > > 469 if (code =3D=3D SUBREG) > > >> > > > > 470 { > > >> > > > > 471 op0 =3D simplify_replace_fn_rtx (SUBREG_RE= G (x), > > >> > > > > old_rtx, fn, data); > > >> > > > > 472 if (op0 =3D=3D SUBREG_REG (x)) > > >> > > > > 473 return x; > > >> > > > > 474 op0 =3D simplify_gen_subreg (GET_MODE (x),= op0, > > >> > > > > 475 GET_MODE (SUBRE= G_REG (x)), > > >> > > > > 476 SUBREG_BYTE (x)= ); > > >> > > > > 477 return op0 ? op0 : x; > > >> > > > > 478 } > > >> > > > > > > >> > > > > simplifies with following arguments: > > >> > > > > > > >> > > > > (gdb) p debug_rtx (op0) > > >> > > > > (const_vector:V4QI [ > > >> > > > > (const_int -52 [0xffffffffffffffcc]) repeated x4 > > >> > > > > ]) > > >> > > > > (gdb) p debug_rtx (x) > > >> > > > > (subreg:V16QI (reg:V4QI 98) 0) > > >> > > > > > > >> > > > > to: > > >> > > > > > > >> > > > > (gdb) p debug_rtx (op0) > > >> > > > > (const_vector:V16QI [ > > >> > > > > (const_int -52 [0xffffffffffffffcc]) repeated x16 > > >> > > > > ]) > > >> > > > > > > >> > > > > This simplification is invalid, it is not possible to get V1= 6QImode vector > > >> > > > > from V4QImode vector, even when all elements are duplicates. > > >> > > > >> > ^^^ > > >> > > > >> > I think this simplification is valid. A simplification to > > >> > > > >> > (const_vector:V16QI [ > > >> > (const_int -52 [0xffffffffffffffcc]) repeated x4 > > >> > (const_int 0 [0]) repeated x12 > > >> > ]) > > >> > > > >> > would be valid as well. > > >> > > > >> > > > > The simplification happens in simplify_context::simplify_sub= reg: > > >> > > > > > > >> > > > > (gdb) list > > >> > > > > 7558 if (VECTOR_MODE_P (outermode) > > >> > > > > 7559 && GET_MODE_INNER (outermode) =3D=3D GET_M= ODE_INNER (innermode) > > >> > > > > 7560 && vec_duplicate_p (op, &elt)) > > >> > > > > 7561 return gen_vec_duplicate (outermode, elt); > > >> > > > > > > >> > > > > but the above simplification is valid only for non-paradoxic= al registers, > > >> > > > > where outermode <=3D innermode. We should not assume that e= lements outside > > >> > > > > the original register are valid, let alone all duplicates. > > >> > > > > > >> > > > Hmm, but looking at the audit trail the x86 backend expects th= em to be zero? > > >> > > > Isn't that wrong as well? > > >> > > > > >> > > If you mean Comment #10, it is just an observation that > > >> > > simplify_replace_rtx simplifies arguments from Comment #9 to: > > >> > > > > >> > > (gdb) p debug_rtx (src) > > >> > > (const_vector:V8HI [ > > >> > > (const_int 204 [0xcc]) repeated x4 > > >> > > (const_int 0 [0]) repeated x4 > > >> > > ]) > > >> > > > > >> > > instead of: > > >> > > > > >> > > (gdb) p debug_rtx (src) > > >> > > (const_vector:V8HI [ > > >> > > (const_int 204 [0xcc]) repeated x8 > > >> > > ]) > > >> > > > > >> > > which is in line with the statement below. > > >> > > > > > >> > > > That is, I think putting any random value into the upper lanes= when > > >> > > > constant folding > > >> > > > a paradoxical subreg sounds OK to me, no? > > >> > > > > >> > > The compiler is putting zero there as can be seen from the above= new RTX. > > >> > > > > >> > > > Of course we might choose to not do such constant propagation = for > > >> > > > efficiency reason - at least > > >> > > > when the resulting CONST_* would require a larger constant poo= l entry > > >> > > > or more costly > > >> > > > construction. > > >> > > > > >> > > This is probably a follow-up improvement, where this patch tries= to > > >> > > fix a specific invalid simplification of simplify_replace_rtx th= at is > > >> > > invalid universally. > > >> > > > >> > How so? What specifies the values of the paradoxical subreg for t= he > > >> > bytes not covered by the subreg operand? > > >> > > >> I don't know why 0 is generated here (and if it is valid) for > > >> paradoxical bytes, but 0xcc is not correct, since it sets REG_EQUAL = to > > >> the wrong constant and triggers unwanted propagation later on. > > > > > > Quoting what I wrote in the PR below. I think pragmatically the fix = is > > > good - we might miss some opportunistic folding this way but we for > > > sure may not optimistically register an equality via REG_EQUAL withou= t > > > enforcing it (removing the producer and replacing it with the optimis= tic > > > constant). > > > > > > So consider the patch approved if no other RTL maintainer chimes in > > > within 48h. > > > > Sorry, can you hold off a bit longer? Wanted to have a look but the > > deadline is about to expire. > > No problem, I will wait for you. Please also note Comment #14 in the PR. With the patch, the compiler sets (const_vector:V8HI [ (const_int 204 [0xcc]) repeated x4 (const_int 0 [0]) repeated x4 ]) as a new REG_EQUAL note. This also does not look OK to me. IMO, the compiler should not emit REG_EQUAL note when some of the elements are derived from undefined values outside the original register. Uros.