From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id D4512385B511; Thu, 16 Feb 2023 17:50:44 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D4512385B511 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1676569844; bh=DlXYPN3ovSCFtI0+RC/UwG9dUBaiUbuRpc1ixFSvYV8=; h=From:To:Subject:Date:In-Reply-To:References:From; b=wgYeMiu0lYzjZJLPY3PVB+Eok3XLBzYMSWw/Fll3dNps0FytM1JOksFhKg0uLHQk6 VPEef1amoVCIJZV14DUMKJz/x8LYGUaQlgQ2fNKiHEt00da4LZKlOzLEuBUI/YZxCD +at+XN6DMEAxBNEZGZebKkWJN+TZJKFtX0aaCr8Q= From: "jakub at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/108803] [10/11/12/13 Regression] wrong code for 128bit rotate on aarch64-unknown-linux-gnu with -Og Date: Thu, 16 Feb 2023 17:50:44 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 13.0 X-Bugzilla-Keywords: wrong-code X-Bugzilla-Severity: normal X-Bugzilla-Who: jakub at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 10.5 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D108803 --- Comment #2 from Jakub Jelinek --- --- gcc/optabs.cc.jj 2023-01-02 09:32:53.309838465 +0100 +++ gcc/optabs.cc 2023-02-16 18:04:54.794871019 +0100 @@ -596,6 +596,16 @@ expand_doubleword_shift_condmove (scalar { rtx outof_superword, into_superword; + if (shift_mask < BITS_PER_WORD - 1) + { + rtx tmp =3D immed_wide_int_const (wi::shwi (BITS_PER_WORD - 1, + GET_MODE (superword_op1)), + GET_MODE (superword_op1)); + superword_op1 + =3D simplify_expand_binop (op1_mode, and_optab, superword_op1, tmp, + 0, true, methods); + } + /* Put the superword version of the output into OUTOF_SUPERWORD and INTO_SUPERWORD. */ outof_superword =3D outof_target !=3D 0 ? gen_reg_rtx (word_mode) : 0; @@ -617,6 +627,16 @@ expand_doubleword_shift_condmove (scalar return false; } + if (shift_mask < BITS_PER_WORD - 1) + { + rtx tmp =3D immed_wide_int_const (wi::shwi (BITS_PER_WORD - 1, + GET_MODE (subword_op1)), + GET_MODE (subword_op1)); + subword_op1 + =3D simplify_expand_binop (op1_mode, and_optab, subword_op1, tmp, + 0, true, methods); + } + /* Put the subword version directly in OUTOF_TARGET and INTO_TARGET. */ if (!expand_subword_shift (op1_mode, binoptab, outof_input, into_input, subword_op1, indeed fixes the miscompilation, but unfortunately with e.g. __attribute__((noipa)) __int128 foo (__int128 a, unsigned k) { return a << k; } __attribute__((noipa)) __int128 bar (__int128 a, unsigned k) { return a >> k; } results in one extra insn in each of the functions. While the superword_op1 case is fine because aarch64 (among other arches) has a pattern to catch shift w= ith masked count, in the subword_op1 case that doesn't work, because expand_subword_shift actually emits 3 shifts instead of just one, one with (BIT_PER_WORD - 1) - op1 as shift count and two with op1. If the op1 &=3D (BITS_PER_WORD - 1) masking is done in t= he caller, then it can't be easily merged with the shifts. We could do that also separately in expand_subword_shift under some new bool and in that case instead of using op1 &=3D (BITS_PER_WORD - 1); shift1 by ((BITS_PER_WO= RD - 1) - op1); shift2 by op1; shift3 by op1 use tmp =3D (63 - op1) & (BITS_PER_= WORD - 1); shift1 by tmp; op1 &=3D (BITS_PER_WORD - 1); shift2 by op1; shift3 by o= p1, but that would be larger code if the target doesn't have those shift with masking patterns that trigger on it. Perhaps have some target hook? Or tr= y to recog the combined instruction?=