public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/111020] New: RFE: RISC-V: ability to cherry-pick additional instructions
@ 2023-08-15  0:42 hpa at zytor dot com
  2023-08-15  0:48 ` [Bug target/111020] " pinskia at gcc dot gnu.org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: hpa at zytor dot com @ 2023-08-15  0:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111020

            Bug ID: 111020
           Summary: RFE: RISC-V: ability to cherry-pick additional
                    instructions
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: hpa at zytor dot com
  Target Milestone: ---

For very deeply embedded use, it is sometimes highly desirable to control the
instruction set on a very fine grained basis. For example, the Zbb extension
contains a mixture of things that most likely requires separate functional
units. However, as an example, the ctz instruction is highly useful to speed up
interrupt latency in designs that do not have vectorized interrupt handling
(which is, in its most basic form, a dedicated ctz unit.) It would be massive
hardware bloat to require the full Zbb set to add this one instruction.

Once the instruction is added, though, one would like to be able to use it as
fully as possible.

This, obviously, creates binaries that are specifically tuned toward a single
processor implementation, but that is pretty much the essence of deeply
embedded, where in the normal case the entire software stack from the OS to
application is linked together in a single binary, or at the very least
compiled together, often from a single source tree.

As far as object code compatibility is concerned, this is very much a
"programmer beware" situation. There is no need for heroics in terms of tagging
objects with the exact instruction set, for example.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/111020] RFE: RISC-V: ability to cherry-pick additional instructions
  2023-08-15  0:42 [Bug target/111020] New: RFE: RISC-V: ability to cherry-pick additional instructions hpa at zytor dot com
@ 2023-08-15  0:48 ` pinskia at gcc dot gnu.org
  2023-08-15  2:17 ` hpa at zytor dot com
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-08-15  0:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111020

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
This sounds more like something which should be designed on at ISA level and
since RISC-V is an open source ISA, it should be discussed at that level ...

There are already extensions which are designed this way too. E.g. Zmmul which
is a subset of the M extension.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/111020] RFE: RISC-V: ability to cherry-pick additional instructions
  2023-08-15  0:42 [Bug target/111020] New: RFE: RISC-V: ability to cherry-pick additional instructions hpa at zytor dot com
  2023-08-15  0:48 ` [Bug target/111020] " pinskia at gcc dot gnu.org
@ 2023-08-15  2:17 ` hpa at zytor dot com
  2023-08-15  2:37 ` pinskia at gcc dot gnu.org
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: hpa at zytor dot com @ 2023-08-15  2:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111020

--- Comment #2 from H. Peter Anvin <hpa at zytor dot com> ---
Named subsets are, inherently, designed to make sense toward mass-produced
products where the hardware and software are designed (mostly) independently.
However, what I mean with "very deep embedded use" is hardware and software
being co-designed.

The RISC-V ISA policy is that those are considered vendor-specific subsets and
are to be given an X* name; however, gcc obviously needs to be able to
understand the meaning of this X* name. At this point there is no way to do
without changing the source code in nontrivial ways.

Regardless of if it is done in source code or at runtime, by implementing a
fine-grained, preferably table-driven, approach to subsets in gcc then it would
be very simple for a hardware implementor to define their custom X-subsets
without a lot of surgery to the code, *and* it makes it possible to take it one
step further and allowing custom (or newly defined! - there have been multiple
instances already of new subsets of existing instructions defined a posteori)
instruction subsets to be defined in a configuration file.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/111020] RFE: RISC-V: ability to cherry-pick additional instructions
  2023-08-15  0:42 [Bug target/111020] New: RFE: RISC-V: ability to cherry-pick additional instructions hpa at zytor dot com
  2023-08-15  0:48 ` [Bug target/111020] " pinskia at gcc dot gnu.org
  2023-08-15  2:17 ` hpa at zytor dot com
@ 2023-08-15  2:37 ` pinskia at gcc dot gnu.org
  2023-08-15  2:50 ` palmer at gcc dot gnu.org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-08-15  2:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111020

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to H. Peter Anvin from comment #2)
> Named subsets are, inherently, designed to make sense toward mass-produced
> products where the hardware and software are designed (mostly)
> independently. However, what I mean with "very deep embedded use" is
> hardware and software being co-designed.
> 
> The RISC-V ISA policy is that those are considered vendor-specific subsets
> and are to be given an X* name; however, gcc obviously needs to be able to
> understand the meaning of this X* name. At this point there is no way to do
> without changing the source code in nontrivial ways.
> 
> Regardless of if it is done in source code or at runtime, by implementing a
> fine-grained, preferably table-driven, approach to subsets in gcc then it
> would be very simple for a hardware implementor to define their custom
> X-subsets without a lot of surgery to the code, *and* it makes it possible
> to take it one step further and allowing custom (or newly defined! - there
> have been multiple instances already of new subsets of existing instructions
> defined a posteori) instruction subsets to be defined in a configuration
> file.

I am 100% disagree here. Because if you do this there would be a huge explosion
of what is and is not considered a subset. THIS is why it should be defined at
the ISA level instead. Why just CTZ for ZBB what next just bseti or bexti of
ZBS?

defining the specific set during your development is different from a
production compiler really. GCC should aim for production compiler quality even
for highly embedded targets.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/111020] RFE: RISC-V: ability to cherry-pick additional instructions
  2023-08-15  0:42 [Bug target/111020] New: RFE: RISC-V: ability to cherry-pick additional instructions hpa at zytor dot com
                   ` (2 preceding siblings ...)
  2023-08-15  2:37 ` pinskia at gcc dot gnu.org
@ 2023-08-15  2:50 ` palmer at gcc dot gnu.org
  2023-08-15  3:26 ` hpa at zytor dot com
  2023-08-15 14:22 ` amylaar at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: palmer at gcc dot gnu.org @ 2023-08-15  2:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111020

palmer at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |palmer at gcc dot gnu.org

--- Comment #4 from palmer at gcc dot gnu.org ---
(In reply to Andrew Pinski from comment #3)
> (In reply to H. Peter Anvin from comment #2)
> > Named subsets are, inherently, designed to make sense toward mass-produced
> > products where the hardware and software are designed (mostly)
> > independently. However, what I mean with "very deep embedded use" is
> > hardware and software being co-designed.
> > 
> > The RISC-V ISA policy is that those are considered vendor-specific subsets
> > and are to be given an X* name; however, gcc obviously needs to be able to
> > understand the meaning of this X* name. At this point there is no way to do
> > without changing the source code in nontrivial ways.
> > 
> > Regardless of if it is done in source code or at runtime, by implementing a
> > fine-grained, preferably table-driven, approach to subsets in gcc then it
> > would be very simple for a hardware implementor to define their custom
> > X-subsets without a lot of surgery to the code, *and* it makes it possible
> > to take it one step further and allowing custom (or newly defined! - there
> > have been multiple instances already of new subsets of existing instructions
> > defined a posteori) instruction subsets to be defined in a configuration
> > file.
> 
> I am 100% disagree here. Because if you do this there would be a huge
> explosion of what is and is not considered a subset. THIS is why it should
> be defined at the ISA level instead. Why just CTZ for ZBB what next just
> bseti or bexti of ZBS?
> 
> defining the specific set during your development is different from a
> production compiler really. GCC should aim for production compiler quality
> even for highly embedded targets.

IMO adding some config file for custom subsets is going to make more headaches
than it fixes.  For a while we had args like "-mno-div", but that's kind of
hacky and we eventually ended up with Zmmul to handle it -- having an external
config file controlling this would expose a lot of interface surface we don't
have a sane way to test.

If vendors want a custom subset then they can make one, it'll just be called
"X${vendor}${subset}".  We've already got a few forks/subsets floating around,
look at the T-Head and Ventana stuff.  For a few instructions it's pretty
mechanical, aside from fixing whatever fallout comes from splitting off the
subset.

We do currently require (IIRC we still didn't write this down) some amount of
public commitment to hardware availability to take that code, but if that's the
problem we should try and figure something out.  It's certainly a pain for
vendors to keep in-development trees around, but we're trading that off with
upstream pain -- I've found these sorts of subsets drift around until the HW
actually ships, so we don't want to end up stuck keeping around subsets that
didn't ship.

Vendors also have the option of just implementing all the instructions (via
some trap or microcode or whatever), thus turning this into a performance
problem.  That sort of just trades one problem for another, but we've got some
examples of this as well (SiFive traps on a bunch of stuff, for example).

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/111020] RFE: RISC-V: ability to cherry-pick additional instructions
  2023-08-15  0:42 [Bug target/111020] New: RFE: RISC-V: ability to cherry-pick additional instructions hpa at zytor dot com
                   ` (3 preceding siblings ...)
  2023-08-15  2:50 ` palmer at gcc dot gnu.org
@ 2023-08-15  3:26 ` hpa at zytor dot com
  2023-08-15 14:22 ` amylaar at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: hpa at zytor dot com @ 2023-08-15  3:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111020

--- Comment #5 from H. Peter Anvin <hpa at zytor dot com> ---
I don't think source code modifications are a huge problem, but at this point
they require tracking down each individual bit.

As far as trapping implementations are concerned:

1. In deeply embedded implementations, it is entirely possible that
firmware/microcode might be *more* expensive than logic. Although memory arrays
are, of course, very dense, they are still extremely general and RISC-V isn't a
very sparse instruction set.

2. It seems like it almost would require an implementation-specific performance
model. Now, one can validly argue that by setting the cost of unimplemented
instructions to a (near-)infinite value such instructions should never be
generated even if they are "enabled". That might also be a possible avenue for
achieving this.

As far as an explosion of subsets, yes, this is really what this means.
Bloating a tiny on-chip control processor both in area and timing to implement
instructions that never actually appears in the code is at best painful.

That being said, I do intend to submit a proposal to the RISC-V ISA folks to
subset the Zbb subset. It is worth noting that there are overlaps between the
Zb* and Zbk* subsets, but the individual intersection sets do not have their
own names.

The Zbb instruction set is particularly noxious (and this is indeed an ISA
definition problem), because it implements multiple things that are, from an
implementation point of view, completely separate and require separate code
paths in the ALU:

§ 1.2.1 Logical with negate
        - minimal cost; in fact in some implementations it might have zero or
even negative cost due to decoder simplification.
        - Extremely common in embedded operations.

§ 1.2.2 Count leading/trailing zero bits
        - Requires dedicated logic.
        - ctz and clz have very different uses.
        - Typically clz and ctz will not be able to share logic, either,
requiring *two* dedicated units.

§ 1.2.3 Count population
        - Requires dedicated logic.
        - May be useless depending on what the processor needs.

§ 1.2.4 Integer minimum/maximum
        - May be cheap or expensive, depending on if an existing comparator can
be leveraged.
        - Quite possibly free or almost free if the AMO instruction set is
already supported in its entirety, as that requires max/min already.

§ 1.2.5 Sign- and zero-extension
§ 1.2.6 Bitwise rotation
        - May be very cheap or quite expensive, depending on the implementation
of the shift instructions.

§ 1.2.7 OR combine
        - Requires dedicated logic.
        - Virtually useless in control processors that do not process text.

§ 1.2.8 Byte-reverse
        - Requires dedicated logic.
        - These, and some other instructions, are special cases of a bit swap
extension proposed in the original bitmanip proposal, but was not included even
as a separate set.
        - Virtually useless in control processors that does not need to
interface with cross-endian data.


These 8 groups really ought to be given separate names.

Is this going to happen again? Quite likely.

It seems, as you say, that chopping the public ISA to pieces to support every
single use case would seem unlikely.

It really comes down to: out of multiple suboptimal cases (forced hardware
bloat, custom subsets, extremely fine grained public subsets, vendor-hacked
trees that lag behind and/or diverge from upstream), what option is the least
amount of badness?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/111020] RFE: RISC-V: ability to cherry-pick additional instructions
  2023-08-15  0:42 [Bug target/111020] New: RFE: RISC-V: ability to cherry-pick additional instructions hpa at zytor dot com
                   ` (4 preceding siblings ...)
  2023-08-15  3:26 ` hpa at zytor dot com
@ 2023-08-15 14:22 ` amylaar at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: amylaar at gcc dot gnu.org @ 2023-08-15 14:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111020

Jorn Wolfgang Rennecke <amylaar at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |amylaar at gcc dot gnu.org

--- Comment #6 from Jorn Wolfgang Rennecke <amylaar at gcc dot gnu.org> ---
(In reply to H. Peter Anvin from comment #5)

> 2. It seems like it almost would require an implementation-specific
> performance model. Now, one can validly argue that by setting the cost of
> unimplemented instructions to a (near-)infinite value such instructions
> should never be generated even if they are "enabled". That might also be a
> possible avenue for achieving this.

Yes, that makes it possible to implement the interface without actually having
a dedicated mask table.  However, you still have the headache of how to get
code generation to use this effectively.  A lot of code generation strategies
are basically canned solution that a skilled assembler programmer has devised;
you can theoretically use the superoptimizer to find linear sequences for
arbitrary instruction sets, but the compilation time cost and the limit to
linear sequences makes this impractical.
Therefore, as you want to co-develop architecture and software, you likely also
have to hack the compiler to make effective use of your architecture.
FWIW, 'infinite' cost seems unnecessarily high, considering you could make your
assembler replace missing instructions with function calls, and these functions
can get linked from a library.  So you have a finite cost per-call for the call
site size (static instruction count) & time (dynamic instruction count), and a
one-time size cost per-object for each function used.  Such a library and
assembler modification could be prepared for specific extensions that you want
to deconstruct, and then used flexibly.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-08-15 14:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-15  0:42 [Bug target/111020] New: RFE: RISC-V: ability to cherry-pick additional instructions hpa at zytor dot com
2023-08-15  0:48 ` [Bug target/111020] " pinskia at gcc dot gnu.org
2023-08-15  2:17 ` hpa at zytor dot com
2023-08-15  2:37 ` pinskia at gcc dot gnu.org
2023-08-15  2:50 ` palmer at gcc dot gnu.org
2023-08-15  3:26 ` hpa at zytor dot com
2023-08-15 14:22 ` amylaar at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).