From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-x236.google.com (mail-lj1-x236.google.com [IPv6:2a00:1450:4864:20::236]) by sourceware.org (Postfix) with ESMTPS id D074C3858D28 for ; Fri, 19 Jan 2024 13:50:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D074C3858D28 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D074C3858D28 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::236 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1705672209; cv=none; b=Y2aQCyzwpYbptjpIe9FaBcMeSGoXQDOcitxNnZXyBualRHdmVEkimQ1YgTRbiwL0jB2FnR44bXi1+YkOBrAqjuXX2uxaOBPttaAlAFRH0pnesIIQUX5lNjsyVTO9XBUvVqO3Rs9qmt2Sur5DazzKta9XTu0/1c4mVAMtLnDR6Yo= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1705672209; c=relaxed/simple; bh=gXnMYPImo6LX/HWAyD2y8Q2NasEbMJ36yQ55RdvOT4c=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=cpKqtpVCvtqyax3GuU75GVw+p2cynNIFuh0ks802DI/pf1nL1b8rlP2VZKpjcmzOos+OikQFhDHhikCuxqPHOQvaUfSRdGC/5IASUzHJB6U5+dgUG8GplwPcO0bwVt7I/XKInb+w7ODhP4p3qa9Z2/HoF7H9AC4WtmyCk5VrPzA= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-lj1-x236.google.com with SMTP id 38308e7fff4ca-2ccae380df2so9184611fa.1 for ; Fri, 19 Jan 2024 05:50:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1705672205; x=1706277005; darn=gcc.gnu.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=9cyqDSKnNnjPhh3tYri9Ule+TdnupPzRwfTCESZRmq4=; b=XDyMcri45Mzeozr80azevyrh4w+oWEFKemNrJ2zDIRhhtHHWWzhhC7r5ew6xBTtmrl 6hXgL8mCJh88PYGaWz3VMK0+gA4iKEbAbNhCWYadTUFppLiMw6nNKU6LnGp4B+FW05Ri jFImIVVCCF9V5uTE/YFTPdcpHdBUttbVtZdawAhup42iNGRNQlEdffubBwZi4PLF4cTf ZLRxitUKA3nRozbaB3WL4hVKUknWuyE6BjsFJTTa51UoRFAee9oI2UV6xNPV6wsAXtAW rYtzaUYQ7+wka1wGRExVxVmya7xp9vEbF8grrcFZjsiq1wlyvCkF577mOEIoMhN/6QMn 5p5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705672205; x=1706277005; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9cyqDSKnNnjPhh3tYri9Ule+TdnupPzRwfTCESZRmq4=; b=A/JYaTOWliL+RE15VawymKMrdtA/dLa+quGKU2e1KY0fW3wjqpyJkMvFbEUy3d5Sx5 oNm8xKKzuaPhmlek2wSTD5Sa5ByFWeTe7nkpmiNEkB0huymriiMBll9K732SbR8t922G dvqEm957YgjKMicTWytoxrjAKTYRlnftTD4GAbUAbt2Mt7vBD5cY7k7p6/TpYmcY8ttg 0/AP1Ir3bcLwYzE65d+Vj7SgPvteTK82s+Mc0iMzoB5sDtqyzo4i7oAVymWDUxDVWtlQ N+4OJyuYXRAD+GENhZIUaMYVzU8lOj1cpEp2yZzO/nymKI1nKhHUK7khmdXXjolrI6rS WgGw== X-Gm-Message-State: AOJu0Yyslij5/CzDyR64wbiouO0zqhUjaKBlmKPGF1bPKPXdVaM9IY5c BxZM1LeNLal+4ehNA9GVraYc2s6N2zz73BLQ/yTAIV3sJ8SYZMaR3ZQ15cFuQ4BRAyVyEZVsHLr 62LK9gWr3qzTh61mIwG5p2c5MZ0E8aEFv X-Google-Smtp-Source: AGHT+IH4mD/tMRRMOw4oXGUmrJUJp1FwOhehnHct4aC2obFB+x7LQqE5Y3AczBJKhqI3pZMBJLo9MyySWyOCZn0vAtY= X-Received: by 2002:a2e:361a:0:b0:2cd:edab:9417 with SMTP id d26-20020a2e361a000000b002cdedab9417mr1389398lja.70.1705672204831; Fri, 19 Jan 2024 05:50:04 -0800 (PST) MIME-Version: 1.0 References: <023501da4a48$320e7540$962b5fc0$@nextmovesoftware.com> <002f01da4adb$16799150$436cb3f0$@nextmovesoftware.com> In-Reply-To: <002f01da4adb$16799150$436cb3f0$@nextmovesoftware.com> From: Richard Biener Date: Fri, 19 Jan 2024 14:49:53 +0100 Message-ID: Subject: Re: [middle-end PATCH] Prefer PLUS over IOR in RTL expansion of multi-word shifts/rotates. To: Roger Sayle Cc: gcc-patches@gcc.gnu.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, Jan 19, 2024 at 2:26=E2=80=AFPM Roger Sayle wrote: > > > Hi Richard, > > Thanks for the speedy review. I completely agree this patch > can wait for stage1, but it's related to some recent work Andrew > Pinski has been doing in match.pd, so I thought I'd share it. > > Hypothetically, recognizing (x<<4)+(x>>60) as a rotation at the > tree-level might lead to a code quality regression, if RTL > expansion doesn't know to lower it back to use PLUS on > those targets with lea but without rotate. > > > From: Richard Biener > > Sent: 19 January 2024 11:04 > > On Thu, Jan 18, 2024 at 8:55=E2=80=AFPM Roger Sayle > > wrote: > > > > > > This patch tweaks RTL expansion of multi-word shifts and rotates to > > > use PLUS rather than IOR for disjunctive operations. During expansio= n > > > of these operations, the middle-end creates RTL like (X<>C2= ) > > > where the constants C1 and C2 guarantee that bits don't overlap. > > > Hence the IOR can be performed by any any_or_plus operation, such as > > > IOR, XOR or PLUS; for word-size operations where carry chains aren't > > > an issue these should all be equally fast (single-cycle) instructions= . > > > The benefit of this change is that targets with shift-and-add insns, > > > like x86's lea, can benefit from the LSHIFT-ADD form. > > > > > > An example of a backend that benefits is ARC, which is demonstrated b= y > > > these two simple functions: > > > > > > unsigned long long foo(unsigned long long x) { return x<<2; } > > > > > > which with -O2 is currently compiled to: > > > > > > foo: lsr r2,r0,30 > > > asl_s r1,r1,2 > > > asl_s r0,r0,2 > > > j_s.d [blink] > > > or_s r1,r1,r2 > > > > > > with this patch becomes: > > > > > > foo: lsr r2,r0,30 > > > add2 r1,r2,r1 > > > j_s.d [blink] > > > asl_s r0,r0,2 > > > > > > unsigned long long bar(unsigned long long x) { return (x<<2)|(x>>62); > > > } > > > > > > which with -O2 is currently compiled to 6 insns + return: > > > > > > bar: lsr r12,r0,30 > > > asl_s r3,r1,2 > > > asl_s r0,r0,2 > > > lsr_s r1,r1,30 > > > or_s r0,r0,r1 > > > j_s.d [blink] > > > or r1,r12,r3 > > > > > > with this patch becomes 4 insns + return: > > > > > > bar: lsr r3,r1,30 > > > lsr r2,r0,30 > > > add2 r1,r2,r1 > > > j_s.d [blink] > > > add2 r0,r3,r0 > > > > > > > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > > and make -k check, both with and without --target_board=3Dunix{-m32} > > > with no new failures. Ok for mainline? > > > > For expand_shift_1 you add > > > > + where C is the bitsize of A. If N cannot be zero, > > + use PLUS instead of IOR. > > > > but I don't see a check ensuring this other than mabe CONST_INT_P (op1) > > suggesting that we enver end up with const0_rtx here. OTOH why is N ze= ro a > > problem and why is it not in the optabs.cc case where I don't see any s= uch check > > (at least not obvious)? > > Excellent question. A common mistake in writing a rotate function in C > or C++ is to write something like (x>>n)|(x<<(64-n)) or (x<>(64-n)= ) > which invokes undefined behavior when n =3D=3D 0. It's OK to recognize t= hese > as rotates (relying on the undefined behavior), but correct/portable code > (and RTL) needs the correct idiom(x>>n)|(x<<((-n)&63), which never invoke= s > undefined behaviour. One interesting property of this idiom, is that shi= ft > by zero is then calculated as (x>>0)|(x<<0) which is x|x. This should th= en > reveal the problem, for all non-zero values the IOR can be replaced by PL= US, > but for zero shifts, X|X isn't the same as X+X or X^X. > > This only applies for single word rotations, and not multi-word shifts > nor multi-word rotates, which explains why this test is only in one place= . > > In theory, we could use ranger to check whether a rotate by a variable > amount can ever be by zero bits, but the simplification used here is to > continue using IOR for variable shifts, and PLUS for fixed/known shift > values. The last remaining insight is that we only need to check for > CONST_INT_P, as rotations/shifts by const0_rtx are handled earlier in > this function (and eliminated by the tree-optimizers), i.e. rotation by > a known constant is implicitly a rotation by a known non-zero constant. Ah, I see. It wasn't obvious the expmed.cc case was for rotations only. The patch is OK as-is for stage1 (which also gives others plenty of time to comment). I wonder if you can add a testcase though? Thanks, Richard. > This is a little clearer if you read/cite more of the comment that was > changed. Fortunately, this case is also well covered by the testsuite. > I'd be happy to change the code to read: > > (CONST_INT_P (op1) && op1 !=3D const0_rtx) > ? add_optab > : ior_optab > > But the test "if (op1 =3D=3D const0_rtx)" already appears on line 2570 > of expmed.cc. > > > > Since this doesn't seem to fix a regression it probably has to wait for > > stage1 to re-open. > > > > Thanks, > > Richard. > > > > > 2024-01-18 Roger Sayle > > > > > > gcc/ChangeLog > > > * expmed.cc (expand_shift_1): Use add_optab instead of ior_op= tab > > > to generate PLUS instead or IOR when unioning disjoint bitfie= lds. > > > * optabs.cc (expand_subword_shift): Likewise. > > > (expand_binop): Likewise for double-word rotate. > > > > > > Thanks again. >