From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 82ED63858C50 for ; Sat, 29 Apr 2023 16:24:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 82ED63858C50 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:To:From:Sender:Reply-To:Cc:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=wrm6IPvm9dCJPA+D3ibpfd9Zis4g/6h2uKryHaDodOM=; b=e2TT2josmDjQXRsKkxks9hbgFQ d0sq2QSGkakKvg9yxIwAm71FukN4CjuU7yNZiWnWGNu1gOf8wx+zF801KlA7prQoaUBHQQhq53J4s +1I/vxL5QtzeO63y48FStCtvLC3H5qu8tPCYAEt5sF1/CdVmswDH/4PNqvE0wOb9qBUVf1/wdxvgJ 0I4ub5Dmf7T7wtNRIUKzeLwA9BMgocyHBIwJPMe98tvNshD+rPJWvlFxXX9+oMgDYHYS57fWkwSwN amk1/9djutU8Wabxx9iA6473xAfYl+ZL0CSD2YgNCZmTMue5d4Clhs6dAEfjZUUCsgUfdzsVjZJP1 5IQJ2CBw==; Received: from [185.62.158.67] (port=60596 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1psnMt-0001MW-0J for gcc-patches@gcc.gnu.org; Sat, 29 Apr 2023 12:24:15 -0400 From: "Roger Sayle" To: "'GCC Patches'" Subject: [xstormy16 PATCH] Recognize/support swpn (swap nibbles) instruction. Date: Sat, 29 Apr 2023 17:24:13 +0100 Message-ID: <004001d97ab7$0a1989f0$1e4c9dd0$@nextmovesoftware.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_0041_01D97ABF.6BE03BE0" X-Mailer: Microsoft Outlook 16.0 Content-Language: en-gb Thread-Index: Adl6topHyytAyzZHT6WMOa9XDDBMGQ== X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This is a multipart message in MIME format. ------=_NextPart_000_0041_01D97ABF.6BE03BE0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit This patch adds support for xstormy16's swap nibbles instruction (swpn). For the test case: short foo(short x) { return (x&0xff00) | ((x<<4)&0xf0) | ((x>>4)&0x0f); } GCC with -O2 currently generates the nine instruction sequence: foo: mov r7,r2 asr r2,#4 and r2,#15 mov.w r6,#-256 and r6,r7 or r2,r6 shl r7,#4 and r7,#255 or r2,r7 ret with this patch, we now generate: foo: swpn r2 ret To achieve this using combine's four instruction "combinations" requires a little wizardry. Firstly, define_insn_and_split are introduced to treat logical shifts followed by bitwise-AND as macro instructions that are split after reload. This is sufficient to recognize a QImode nibble swap, which can be implemented by swpn followed by either a zero-extension or a sign-extension from QImode to HImode. Then finally, in the correct context, a QImode swap-nibbles pattern can be combined to preserve the high-byte of a HImode word, matching the xstormy16's swpn semantics. The naming of the new code iterators is taken from i386.md. The any_rotate code iterator is used in my next (split out) patch. This patch has been tested by building a cross-compiler to xstormy16-elf from x86_64-pc-linux-gnu and confirming the new test cases pass. Ok for mainline? 2023-04-29 Roger Sayle gcc/ChangeLog * config/stormy16/stormy16.md (any_lshift): New code iterator. (any_or_plus): Likewise. (any_rotate): Likewise. (*_and_internal): New define_insn_and_split to recognize a logical shift followed by an AND, and split it again after reload. (*swpn): New define_insn matching xstormy16's swpn. (*swpn_zext): New define_insn recognizing swpn followed by zero_extendqihi2, i.e. with the high byte set to zero. (*swpn_sext): Likewise, for swpn followed by cbw. (*swpn_sext_2): Likewise, for an alternate RTL form. (*swpn_zext_ior): A pre-reload splitter so that an swpn+zext+ior sequence is split in the correct place to recognize the *swpn_zext followed by any_or_plus (ior, xor or plus) instruction. gcc/testsuite/ChangeLog * gcc.target/xstormy16/swpn-1.c: New QImode test case. * gcc.target/xstormy16/swpn-2.c: New zero_extend test case. * gcc.target/xstormy16/swpn-3.c: New sign_extend test case. * gcc.target/xstormy16/swpn-4.c: New HImode test case. Thanks in advance, Roger -- ------=_NextPart_000_0041_01D97ABF.6BE03BE0 Content-Type: text/plain; name="patchxs1.txt" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="patchxs1.txt" diff --git a/gcc/config/stormy16/stormy16.md = b/gcc/config/stormy16/stormy16.md=0A= index b2e86ee..be1ee04 100644=0A= --- a/gcc/config/stormy16/stormy16.md=0A= +++ b/gcc/config/stormy16/stormy16.md=0A= @@ -48,6 +48,10 @@=0A= (CARRY_REG 16)=0A= ]=0A= )=0A= +=0A= +(define_code_iterator any_lshift [ashift lshiftrt])=0A= +(define_code_iterator any_or_plus [plus ior xor])=0A= +(define_code_iterator any_rotate [rotate rotatert])=0A= =0C=0A= ;; ::::::::::::::::::::=0A= ;; ::=0A= @@ -1301,3 +1323,86 @@=0A= [(parallel [(set (match_dup 2) (match_dup 1))=0A= (set (match_dup 1) (match_dup 2))])])=0A= =0A= +;; Recognize shl+and and shr+and as macro instructions.=0A= +(define_insn_and_split "*_and_internal"=0A= + [(set (match_operand:HI 0 "register_operand" "=3Dr")=0A= + (and:HI (any_lshift:HI (match_operand 1 "register_operand" "0")=0A= + (match_operand 2 "const_int_operand" "i"))=0A= + (match_operand 3 "const_int_operand" "i")))=0A= + (clobber (reg:BI CARRY_REG))]=0A= + "IN_RANGE (INTVAL (operands[2]), 0, 15)"=0A= + "#"=0A= + "reload_completed"=0A= + [(parallel [(set (match_dup 0) (any_lshift:HI (match_dup 1) = (match_dup 2)))=0A= + (clobber (reg:BI CARRY_REG))])=0A= + (set (match_dup 0) (and:HI (match_dup 0) (match_dup 3)))])=0A= +=0A= +;; Swap nibbles instruction=0A= +(define_insn "*swpn"=0A= + [(set (match_operand:HI 0 "register_operand" "=3Dr")=0A= + (any_or_plus:HI=0A= + (any_or_plus:HI=0A= + (and:HI (ashift:HI (match_operand:HI 1 "register_operand" "0")=0A= + (const_int 4))=0A= + (const_int 240))=0A= + (and:HI (lshiftrt:HI (match_dup 1) (const_int 4))=0A= + (const_int 15)))=0A= + (and:HI (match_dup 1) (const_int -256))))]=0A= + ""=0A= + "swpn %0")=0A= +=0A= +(define_insn "*swpn_zext"=0A= + [(set (match_operand:HI 0 "register_operand" "=3Dr")=0A= + (any_or_plus:HI=0A= + (and:HI (ashift:HI (match_operand:HI 1 "register_operand" "0")=0A= + (const_int 4))=0A= + (const_int 240))=0A= + (and:HI (lshiftrt:HI (match_dup 1) (const_int 4))=0A= + (const_int 15))))]=0A= + ""=0A= + "swpn %0 | and %0,#255"=0A= + [(set_attr "length" "6")])=0A= +=0A= +(define_insn "*swpn_sext"=0A= + [(set (match_operand:HI 0 "register_operand" "=3Dr")=0A= + (sign_extend:HI=0A= + (rotate:QI (subreg:QI (match_operand:HI 1 "register_operand" "0") 0)=0A= + (const_int 4))))]=0A= + ""=0A= + "swpn %0 | cbw %0"=0A= + [(set_attr "length" "4")])=0A= +=0A= +(define_insn "*swpn_sext_2"=0A= + [(set (match_operand:HI 0 "register_operand" "=3Dr")=0A= + (sign_extend:HI=0A= + (subreg:QI=0A= + (any_or_plus:HI=0A= + (ashift:HI (match_operand:HI 1 "register_operand" "0")=0A= + (const_int 4))=0A= + (subreg:HI (lshiftrt:QI (subreg:QI (match_dup 1) 0)=0A= + (const_int 4)) 0)) 0)))]=0A= + ""=0A= + "swpn %0 | cbw %0"=0A= + [(set_attr "length" "4")])=0A= +=0A= +;; Recognize swpn_zext+ior as a macro instruction.=0A= +(define_insn_and_split "*swpn_zext_ior"=0A= + [(set (match_operand:HI 0 "register_operand")=0A= + (any_or_plus:HI=0A= + (any_or_plus:HI=0A= + (and:HI (ashift:HI (match_operand:HI 1 "register_operand")=0A= + (const_int 4))=0A= + (const_int 240))=0A= + (and:HI (lshiftrt:HI (match_dup 1) (const_int 4))=0A= + (const_int 15)))=0A= + (match_operand:HI 2 "nonmemory_operand")))]=0A= + "can_create_pseudo_p ()"=0A= + "#"=0A= + "&& 1"=0A= + [(set (match_dup 3) (ior:HI (and:HI (ashift:HI (match_dup 1) = (const_int 4))=0A= + (const_int 240))=0A= + (and:HI (lshiftrt:HI (match_dup 1) (const_int 4))=0A= + (const_int 15))))=0A= + (set (match_dup 0) (ior:HI (match_dup 3) (match_dup 2)))]=0A= + "operands[3] =3D gen_reg_rtx (HImode);")=0A= +=0A= diff --git a/gcc/testsuite/gcc.target/xstormy16/swpn-1.c = b/gcc/testsuite/gcc.target/xstormy16/swpn-1.c=0A= new file mode 100644=0A= index 0000000..a2c9316=0A= --- /dev/null=0A= +++ b/gcc/testsuite/gcc.target/xstormy16/swpn-1.c=0A= @@ -0,0 +1,10 @@=0A= +/* { dg-do compile } */=0A= +/* { dg-options "-O2" } */=0A= +unsigned char ior_1(unsigned char x) { return (x>>4) | (x<<4); }=0A= +unsigned char ior_2(unsigned char x) { return (x<<4) | (x>>4); }=0A= +unsigned char xor_1(unsigned char x) { return (x>>4) ^ (x<<4); }=0A= +unsigned char xor_2(unsigned char x) { return (x<<4) ^ (x>>4); }=0A= +unsigned char sum_1(unsigned char x) { return (x>>4) + (x<<4); }=0A= +unsigned char sum_2(unsigned char x) { return (x<<4) + (x>>4); }=0A= +/* { dg-final { scan-assembler-times "swpn r2" 6 } } */=0A= +=0A= diff --git a/gcc/testsuite/gcc.target/xstormy16/swpn-2.c = b/gcc/testsuite/gcc.target/xstormy16/swpn-2.c=0A= new file mode 100644=0A= index 0000000..f26c296=0A= --- /dev/null=0A= +++ b/gcc/testsuite/gcc.target/xstormy16/swpn-2.c=0A= @@ -0,0 +1,14 @@=0A= +/* { dg-do compile } */=0A= +/* { dg-options "-O2" } */=0A= +=0A= +unsigned short ior_1(unsigned short x) { return ((x&0xf0)>>4) | = ((x&0x0f)<<4); }=0A= +unsigned short xor_1(unsigned short x) { return ((x&0xf0)>>4) ^ = ((x&0x0f)<<4); }=0A= +unsigned short sum_1(unsigned short x) { return ((x&0xf0)>>4) + = ((x&0x0f)<<4); }=0A= +=0A= +unsigned short ior_2(unsigned short x) { return ((x&0x0f)<<4) | = ((x&0xf0)>>4); }=0A= +unsigned short xor_2(unsigned short x) { return ((x&0x0f)<<4) ^ = ((x&0xf0)>>4); }=0A= +unsigned short sum_2(unsigned short x) { return ((x&0x0f)<<4) + = ((x&0xf0)>>4); }=0A= +=0A= +/* { dg-final { scan-assembler-times "swpn r2" 6 } } */=0A= +/* { dg-final { scan-assembler-times "and r2,#255" 6 } } */=0A= +=0A= diff --git a/gcc/testsuite/gcc.target/xstormy16/swpn-3.c = b/gcc/testsuite/gcc.target/xstormy16/swpn-3.c=0A= new file mode 100644=0A= index 0000000..6109c6a=0A= --- /dev/null=0A= +++ b/gcc/testsuite/gcc.target/xstormy16/swpn-3.c=0A= @@ -0,0 +1,28 @@=0A= +/* { dg-do compile } */=0A= +/* { dg-options "-O2" } */=0A= +=0A= +short ior_1(unsigned short x) {=0A= + return (signed char)(((x&0xf0)>>4) | ((x&0x0f)<<4));=0A= +}=0A= +=0A= +short xor_1(unsigned short x) {=0A= + return (signed char)(((x&0xf0)>>4) ^ ((x&0x0f)<<4));=0A= +}=0A= +=0A= +short sum_1(unsigned short x) {=0A= + return (signed char)(((x&0xf0)>>4) + ((x&0x0f)<<4));=0A= +}=0A= +=0A= +short ior_2(unsigned short x) {=0A= + return (signed char)(((x&0x0f)<<4) | ((x&0xf0)>>4));=0A= +}=0A= +=0A= +short xor_2(unsigned short x) {=0A= + return (signed char)(((x&0x0f)<<4) ^ ((x&0xf0)>>4));=0A= +}=0A= +=0A= +short sum_2(unsigned short x) {=0A= + return (signed char)(((x&0x0f)<<4) + ((x&0xf0)>>4));=0A= +}=0A= +=0A= +/* { dg-final { scan-assembler-times "cbw" 6 } } */=0A= diff --git a/gcc/testsuite/gcc.target/xstormy16/swpn-4.c = b/gcc/testsuite/gcc.target/xstormy16/swpn-4.c=0A= new file mode 100644=0A= index 0000000..4a31dc6=0A= --- /dev/null=0A= +++ b/gcc/testsuite/gcc.target/xstormy16/swpn-4.c=0A= @@ -0,0 +1,25 @@=0A= +/* { dg-do compile } */=0A= +/* { dg-options "-O2" } */=0A= +=0A= +short ior_abc(short x) { return (x&0xff00) | ((x<<4)&0xf0) | = ((x>>4)&0x0f); }=0A= +short ior_acb(short x) { return (x&0xff00) | ((x>>4)&0x0f) | = ((x<<4)&0xf0); }=0A= +short ior_bac(short x) { return ((x<<4)&0xf0) | (x&0xff00) | = ((x>>4)&0x0f); }=0A= +short ior_bca(short x) { return ((x<<4)&0xf0) | ((x>>4)&0x0f) | = (x&0xff00); }=0A= +short ior_cab(short x) { return ((x>>4)&0x0f) | (x&0xff00) | = ((x<<4)&0xf0); }=0A= +short ior_cba(short x) { return ((x>>4)&0x0f) | ((x<<4)&0xf0) | = (x&0xff00); }=0A= +=0A= +short xor_abc(short x) { return (x&0xff00) ^ ((x<<4)&0xf0) ^ = ((x>>4)&0x0f); }=0A= +short xor_acb(short x) { return (x&0xff00) ^ ((x>>4)&0x0f) ^ = ((x<<4)&0xf0); }=0A= +short xor_bac(short x) { return ((x<<4)&0xf0) ^ (x&0xff00) ^ = ((x>>4)&0x0f); }=0A= +short xor_bca(short x) { return ((x<<4)&0xf0) ^ ((x>>4)&0x0f) ^ = (x&0xff00); }=0A= +short xor_cab(short x) { return ((x>>4)&0x0f) ^ (x&0xff00) ^ = ((x<<4)&0xf0); }=0A= +short xor_cba(short x) { return ((x>>4)&0x0f) ^ ((x<<4)&0xf0) ^ = (x&0xff00); }=0A= +=0A= +short sum_abc(short x) { return (x&0xff00) + ((x<<4)&0xf0) + = ((x>>4)&0x0f); }=0A= +short sum_acb(short x) { return (x&0xff00) + ((x>>4)&0x0f) + = ((x<<4)&0xf0); }=0A= +short sum_bac(short x) { return ((x<<4)&0xf0) + (x&0xff00) + = ((x>>4)&0x0f); }=0A= +short sum_bca(short x) { return ((x<<4)&0xf0) + ((x>>4)&0x0f) + = (x&0xff00); }=0A= +short sum_cab(short x) { return ((x>>4)&0x0f) + (x&0xff00) + = ((x<<4)&0xf0); }=0A= +short sum_cba(short x) { return ((x>>4)&0x0f) + ((x<<4)&0xf0) + = (x&0xff00); }=0A= +=0A= +/* { dg-final { scan-assembler-times "swpn r2" 18 } } */=0A= ------=_NextPart_000_0041_01D97ABF.6BE03BE0--