[committed] Improve single bit zero extraction on H8.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [committed] Improve single bit zero extraction on H8.
@ 2023-11-10  0:45 Jeff Law
  0 siblings, 0 replies; only message in thread
From: Jeff Law @ 2023-11-10  0:45 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1335 bytes --]

When zero extracting a single bit bitfield from bits 16..31 on the H8 we 
currently generate some pretty bad code.

The fundamental issue is we can't shift efficiently and there's no 
trivial way to extract a single bit out of the high half word of an 
SImode value.

What usually happens is we use a synthesized right shift to get the 
single bit into the desired position, then a bit-and to mask off 
everything we don't care about.

The shifts are expensive, even using tricks like half and quarter word 
moves to implement shift-by-16 and shift-by-8.  Additionally a logical 
right shift must clear out the upper bits which is redundant since we're 
going to mask things with &1 later.

This patch provides a consistently better sequence for such extractions. 
  The general form moves the high half into the low half, a bit 
extraction into C, clear the destination, then move C into the 
destination with a few special cases.

This also avoids all the shenanigans for H8/SX which has a much more 
capable shifter.  It's not single cycle, but it is reasonably efficient.

This has been regression tested on the H8 without issues.  Pushing to 
the trunk momentarily.

jeff

ps.  Yes, supporting extraction of multi-bit fields might be improvable 
as well.  But I've already spent more time on this than I can reasonably 
justify.


[-- Attachment #2: P --]
[-- Type: text/plain, Size: 4720 bytes --]

commit 57dbc02d261bb833f6ef287187eb144321dd595c
Author: Jeff Law <jlaw@ventanamicro.com>
Date:   Thu Nov 9 17:34:01 2023 -0700

    [committed] Improve single bit zero extraction on H8.
    
    When zero extracting a single bit bitfield from bits 16..31 on the H8 we
    currently generate some pretty bad code.
    
    The fundamental issue is we can't shift efficiently and there's no trivial way
    to extract a single bit out of the high half word of an SImode value.
    
    What usually happens is we use a synthesized right shift to get the single bit
    into the desired position, then a bit-and to mask off everything we don't care
    about.
    
    The shifts are expensive, even using tricks like half and quarter word moves to
    implement shift-by-16 and shift-by-8.  Additionally a logical right shift must
    clear out the upper bits which is redundant since we're going to mask things
    with &1 later.
    
    This patch provides a consistently better sequence for such extractions.  The
    general form moves the high half into the low half, a bit extraction into C,
    clear the destination, then move C into the destination with a few special
    cases.
    
    This also avoids all the shenanigans for H8/SX which has a much more capable
    shifter.  It's not single cycle, but it is reasonably efficient.
    
    This has been regression tested on the H8 without issues.  Pushing to the trunk
    momentarily.
    
    jeff
    
    ps.  Yes, supporting zero extraction of multi-bit fields might be improvable as
    well.  But I've already spent more time on this than I can reasonably justify.
    
    gcc/
            * config/h8300/combiner.md (single bit sign_extract): Avoid recently
            added patterns for H8/SX.
            (single bit zero_extract): New patterns.

diff --git a/gcc/config/h8300/combiner.md b/gcc/config/h8300/combiner.md
index 2f7faf77c93..e1179b5fea6 100644
--- a/gcc/config/h8300/combiner.md
+++ b/gcc/config/h8300/combiner.md
@@ -1278,7 +1278,7 @@ (define_insn_and_split ""
 	(sign_extract:SI (match_operand:QHSI 1 "register_operand" "0")
 			 (const_int 1)
 			 (match_operand 2 "immediate_operand")))]
-  ""
+  "!TARGET_H8300SX"
   "#"
   "&& reload_completed"
   [(parallel [(set (match_dup 0)
@@ -1291,7 +1291,7 @@ (define_insn ""
 			 (const_int 1)
 			 (match_operand 2 "immediate_operand")))
    (clobber (reg:CC CC_REG))]
-  ""
+  "!TARGET_H8300SX"
 {
   int position = INTVAL (operands[2]);
 
@@ -1359,3 +1359,69 @@ (define_insn ""
   return "subx\t%s0,%s0\;exts.w %T0\;exts.l %0";
 }
   [(set_attr "length" "10")])
+
+;; For shift counts >= 16 we can always do better than the
+;; generic sequences.  Other patterns handle smaller counts.
+(define_insn_and_split ""
+  [(set (match_operand:SI 0 "register_operand" "=r")
+	(and:SI (lshiftrt:SI (match_operand:SI 1 "register_operand" "0")
+			     (match_operand 2 "immediate_operand" "n"))
+		(const_int 1)))]
+  "!TARGET_H8300SX && INTVAL (operands[2]) >= 16"
+  "#"
+  "&& reload_completed"
+  [(parallel [(set (match_dup 0) (and:SI (lshiftrt:SI (match_dup 0) (match_dup 2))
+					 (const_int 1)))
+	      (clobber (reg:CC CC_REG))])])
+
+(define_insn ""
+  [(set (match_operand:SI 0 "register_operand" "=r")
+	(and:SI (lshiftrt:SI (match_operand:SI 1 "register_operand" "0")
+			     (match_operand 2 "immediate_operand" "n"))
+		(const_int 1)))
+   (clobber (reg:CC CC_REG))]
+  "!TARGET_H8300SX && INTVAL (operands[2]) >= 16"
+{
+  int position = INTVAL (operands[2]);
+
+  /* If the bit we want is the highest bit we can just rotate it into position
+     and mask off everything else.  */
+  if (position == 31)
+    {
+      output_asm_insn ("rotl.l\t%0", operands);
+      return "and.l\t#1,%0";
+    }
+
+  /* Special case for H8/S.  Similar to bit 31.  */
+  if (position == 30 && TARGET_H8300S)
+    return "rotl.l\t#2,%0\;and.l\t#1,%0";
+
+  if (position <= 30 && position >= 17)
+    {
+      /* Shift 16 bits, without worrying about extensions.  */
+      output_asm_insn ("mov.w\t%e1,%f0", operands);
+
+      /* Get the bit we want into C.  */
+      operands[2] = GEN_INT (position % 8);
+      if (position >= 24)
+	output_asm_insn ("bld\t%2,%t0", operands);
+      else
+	output_asm_insn ("bld\t%2,%s0", operands);
+
+      /* xor + rotate to clear the destination, then rotate
+	 the C into position.  */
+      return "xor.l\t%0,%0\;rotxl.l\t%0";
+    }
+
+  if (position == 16)
+    {
+      /* Shift 16 bits, without worrying about extensions.  */
+      output_asm_insn ("mov.w\t%e1,%f0", operands);
+
+      /* And finally, mask out everything we don't want.  */
+      return "and.l\t#1,%0";
+    }
+
+  gcc_unreachable ();
+}
+  [(set_attr "length" "10")])

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2023-11-10  0:45 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-10  0:45 [committed] Improve single bit zero extraction on H8 Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).