From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id D4512385B511; Thu, 16 Feb 2023 17:50:44 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D4512385B511
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1676569844;
	bh=DlXYPN3ovSCFtI0+RC/UwG9dUBaiUbuRpc1ixFSvYV8=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=wgYeMiu0lYzjZJLPY3PVB+Eok3XLBzYMSWw/Fll3dNps0FytM1JOksFhKg0uLHQk6
	 VPEef1amoVCIJZV14DUMKJz/x8LYGUaQlgQ2fNKiHEt00da4LZKlOzLEuBUI/YZxCD
	 +at+XN6DMEAxBNEZGZebKkWJN+TZJKFtX0aaCr8Q=
From: "jakub at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/108803] [10/11/12/13 Regression] wrong code for 128bit
 rotate on aarch64-unknown-linux-gnu with -Og
Date: Thu, 16 Feb 2023 17:50:44 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 13.0
X-Bugzilla-Keywords: wrong-code
X-Bugzilla-Severity: normal
X-Bugzilla-Who: jakub at gcc dot gnu.org
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 10.5
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-108803-4-dlNU5iUorH@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-108803-4@http.gcc.gnu.org/bugzilla/>
References: <bug-108803-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D108803
--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
--- gcc/optabs.cc.jj    2023-01-02 09:32:53.309838465 +0100
+++ gcc/optabs.cc       2023-02-16 18:04:54.794871019 +0100
@@ -596,6 +596,16 @@ expand_doubleword_shift_condmove (scalar
 {
   rtx outof_superword, into_superword;

+  if (shift_mask < BITS_PER_WORD - 1)
+    {
+      rtx tmp =3D immed_wide_int_const (wi::shwi (BITS_PER_WORD - 1,
+                                               GET_MODE (superword_op1)),
+                                     GET_MODE (superword_op1));
+      superword_op1
+       =3D simplify_expand_binop (op1_mode, and_optab, superword_op1, tmp,
+                                0, true, methods);
+    }
+
   /* Put the superword version of the output into OUTOF_SUPERWORD and
      INTO_SUPERWORD.  */
   outof_superword =3D outof_target !=3D 0 ? gen_reg_rtx (word_mode) : 0;
@@ -617,6 +627,16 @@ expand_doubleword_shift_condmove (scalar
        return false;
     }

+  if (shift_mask < BITS_PER_WORD - 1)
+    {
+      rtx tmp =3D immed_wide_int_const (wi::shwi (BITS_PER_WORD - 1,
+                                               GET_MODE (subword_op1)),
+                                     GET_MODE (subword_op1));
+      subword_op1
+       =3D simplify_expand_binop (op1_mode, and_optab, subword_op1, tmp,
+                                0, true, methods);
+    }
+
   /* Put the subword version directly in OUTOF_TARGET and INTO_TARGET.  */
   if (!expand_subword_shift (op1_mode, binoptab,
                             outof_input, into_input, subword_op1,
indeed fixes the miscompilation, but unfortunately with e.g.
__attribute__((noipa)) __int128
foo (__int128 a, unsigned k)
{
  return a << k;
}

__attribute__((noipa)) __int128
bar (__int128 a, unsigned k)
{
  return a >> k;
}
results in one extra insn in each of the functions.  While the superword_op1
case
is fine because aarch64 (among other arches) has a pattern to catch shift w=
ith
masked count, in the subword_op1 case that doesn't work, because
expand_subword_shift actually emits 3 shifts instead of just one, one with
(BIT_PER_WORD - 1) - op1 as shift count
and two with op1.  If the op1 &=3D (BITS_PER_WORD - 1) masking is done in t=
he
caller, then
it can't be easily merged with the shifts.
We could do that also separately in expand_subword_shift under some new bool
and in that
case instead of using op1 &=3D (BITS_PER_WORD - 1); shift1 by ((BITS_PER_WO=
RD -
1) - op1); shift2 by op1; shift3 by op1 use tmp =3D (63 - op1) & (BITS_PER_=
WORD -
1); shift1 by tmp; op1 &=3D (BITS_PER_WORD - 1); shift2 by op1; shift3 by o=
p1,
but that would be larger code if the target doesn't have those shift with
masking patterns that trigger on it.  Perhaps have some target hook?  Or tr=
y to
recog the combined instruction?=