public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/102604] New: arm v7m_extra_costs for SFmode inaccurate?
@ 2021-10-05  9:27 clyon at gcc dot gnu.org
  2021-10-05 14:33 ` [Bug target/102604] " rearnsha at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: clyon at gcc dot gnu.org @ 2021-10-05  9:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102604

            Bug ID: 102604
           Summary: arm v7m_extra_costs for SFmode inaccurate?
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: clyon at gcc dot gnu.org
  Target Milestone: ---

I have noticed that gcc.target/arm/vfp-1.c fails when targeting cortex-m7
(-mthumb -mfloat-abi=hard -march=armv7e-m+fp.dp):

FAIL: gcc.target/arm/vfp-1.c scan-assembler vmla.f32
FAIL: gcc.target/arm/vfp-1.c scan-assembler vmls.f32
FAIL: gcc.target/arm/vfp-1.c scan-assembler vnmla.f32
FAIL: gcc.target/arm/vfp-1.c scan-assembler vnmls.f32

The DP (f64) versions PASS.

This is because v7m_extra_costs for SP contains:
      COSTS_N_INSNS (2),        /* mult.  */
      COSTS_N_INSNS (5),        /* mult_addsub.  */
      COSTS_N_INSNS (1),        /* addsub.  */

So when combine tries to replace mutl + addsub into a single mult_addsub, the
cost of the latter is still higher than two separate instructions.
I suppose these values are the result of benchmarking, but it seems to mean
that vmla is never going to be selected by the compiler.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/102604] arm v7m_extra_costs for SFmode inaccurate?
  2021-10-05  9:27 [Bug target/102604] New: arm v7m_extra_costs for SFmode inaccurate? clyon at gcc dot gnu.org
@ 2021-10-05 14:33 ` rearnsha at gcc dot gnu.org
  2021-10-05 16:14 ` clyon at gcc dot gnu.org
  2021-10-07 11:16 ` rearnsha at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: rearnsha at gcc dot gnu.org @ 2021-10-05 14:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102604

--- Comment #1 from Richard Earnshaw <rearnsha at gcc dot gnu.org> ---
I wonder if it might be better to change this test to use -Os, since then the
cost model is much more consistent as it's based on size rather than speed.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/102604] arm v7m_extra_costs for SFmode inaccurate?
  2021-10-05  9:27 [Bug target/102604] New: arm v7m_extra_costs for SFmode inaccurate? clyon at gcc dot gnu.org
  2021-10-05 14:33 ` [Bug target/102604] " rearnsha at gcc dot gnu.org
@ 2021-10-05 16:14 ` clyon at gcc dot gnu.org
  2021-10-07 11:16 ` rearnsha at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: clyon at gcc dot gnu.org @ 2021-10-05 16:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102604

--- Comment #2 from Christophe Lyon <clyon at gcc dot gnu.org> ---
Right, using -Os makes these tests pass (but vsqrt.f32 and vsqrt.f64 would
fail), but I'm still wondering about the purpose of vmla?

Rather than benchmarking, the costs may come from the Architecture
documentation? But then, if vmla is so costly, when is it supposed to be used?
Only when optimizing for size?

Note that the DP/f64 version does not have this problem.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/102604] arm v7m_extra_costs for SFmode inaccurate?
  2021-10-05  9:27 [Bug target/102604] New: arm v7m_extra_costs for SFmode inaccurate? clyon at gcc dot gnu.org
  2021-10-05 14:33 ` [Bug target/102604] " rearnsha at gcc dot gnu.org
  2021-10-05 16:14 ` clyon at gcc dot gnu.org
@ 2021-10-07 11:16 ` rearnsha at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: rearnsha at gcc dot gnu.org @ 2021-10-07 11:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102604

--- Comment #3 from Richard Earnshaw <rearnsha at gcc dot gnu.org> ---
(In reply to Christophe Lyon from comment #2)
> Right, using -Os makes these tests pass (but vsqrt.f32 and vsqrt.f64 would
> fail),

Ah, because sqrt() is expected to set errno?  Would changing the code to
__builtin_sqrt() be enough?  This test shouldn't really be about other
optimizations.

> but I'm still wondering about the purpose of vmla?

> 
> Rather than benchmarking, the costs may come from the Architecture
> documentation? But then, if vmla is so costly, when is it supposed to be
> used? Only when optimizing for size?

It's part of the ISA, so has to be implemented.  As to why it's not preferred,
I can only speculate.

> 
> Note that the DP/f64 version does not have this problem.

This DP versions simply don't exist on most m-profile CPUs (they certainly
don't exist on the cortex-m4 and cortex-m7) and this is another case of a test
forcing instructions into a compilation which make no sense at all. 
Consequently, there are no costs associated with these operations because they
simply are irrelevant when the instruction doesn't exist (the opcode will just
trap).

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-10-07 11:16 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-05  9:27 [Bug target/102604] New: arm v7m_extra_costs for SFmode inaccurate? clyon at gcc dot gnu.org
2021-10-05 14:33 ` [Bug target/102604] " rearnsha at gcc dot gnu.org
2021-10-05 16:14 ` clyon at gcc dot gnu.org
2021-10-07 11:16 ` rearnsha at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).