public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/99462] New: Enhance scheduling to split instructions
@ 2021-03-08 10:49 rguenth at gcc dot gnu.org
  2021-03-08 10:50 ` [Bug rtl-optimization/99462] " rguenth at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-03-08 10:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99462

            Bug ID: 99462
           Summary: Enhance scheduling to split instructions
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

Maybe the scheduler(s) can already do this (I have zero knowledge here).  For
example the x86 vec_concatv2di insn has alternatives that cause the instruction
to be split into multiple uops (vpinsrq, movhpd) when the 'insert' operand
is not XMM (but GPR or MEM).  We now have a peephole2 to split such cases:

+;; Further split pinsrq variants of vec_concatv2di to hide the latency
+;; the GPR->XMM transition(s).
+(define_peephole2
+  [(match_scratch:DI 3 "Yv")
+   (set (match_operand:V2DI 0 "sse_reg_operand")
+       (vec_concat:V2DI (match_operand:DI 1 "sse_reg_operand")
+                        (match_operand:DI 2 "nonimmediate_gr_operand")))]
+  "TARGET_64BIT && TARGET_SSE4_1
+   && !optimize_insn_for_size_p ()"
+  [(set (match_dup 3)
+        (match_dup 2))
+   (set (match_dup 0)
+       (vec_concat:V2DI (match_dup 1)
+                        (match_dup 3)))])

but in reality this is only profitable when we either can execute
two "bad" move uops in parallel (thus when originally composing
two GPRs or two MEMs) or when we can schedule one "bad" move much
earlier.

Thus, can the scheduler already "split" an instruction - say split
away a load uop and issue it early when a scratch register is available?

(the reverse alternative is to not expose multi-uop insns before scheduling
and only merge them later - during scheduling?)

How does GCC deal with situations like this?

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-03-08 16:13 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-08 10:49 [Bug rtl-optimization/99462] New: Enhance scheduling to split instructions rguenth at gcc dot gnu.org
2021-03-08 10:50 ` [Bug rtl-optimization/99462] " rguenth at gcc dot gnu.org
2021-03-08 14:15 ` jakub at gcc dot gnu.org
2021-03-08 16:13 ` amonakov at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).