public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* [arm] GCC validation: preferred way of running the testsuite?
@ 2020-05-11 16:43 Christophe Lyon
  2020-05-19 11:28 ` Richard Earnshaw
  0 siblings, 1 reply; 4+ messages in thread
From: Christophe Lyon @ 2020-05-11 16:43 UTC (permalink / raw)
  To: gcc Mailing List

Hi,


As you may know, I've been running validations of GCC trunk in many
configurations for Arm and Aarch64.


I was recently trying to make some cleanup in the new Bfloat16, MVE, CDE, and
ACLE tests because in several configurations I see 300-400 FAILs
mainly in these areas, because of “testisms”. The goal is to avoid
wasting time over the same failure reports when checking what needs
fixing. I thought this would be quick & easy, but this is tedious
because of the numerous combinations of options and configurations
available on Arm.


Sorry for the very long email, it’s hard to describe and summarize,
but I'd like to try nonetheless, hoping that we can make testing
easier/more efficient :-), because most of the time the problems I
found are with the tests rather than real compiler bugs, so I think
it's a bit of wasted time.


Here is a list of problems, starting with the tricky dependencies
around -mfloat-abi=XXX:

* Some targets do not support multilibs (eg arm-linux-gnueabi[hf] with
glibc), or one can decide not to build with both hard and soft FP
multilibs. This generally becomes a problem when including stdint.h
(used by arm_neon.h, arm_acle.h, …), leading to a compiler error for
lack of gnu/stub*.h for the missing float-abi. If you add -mthumb to
the picture, it becomes quite complex (eg -mfloat-abi=hard is not
supported on thumb-1).


Consider mytest.c that does not depend on any include file and has:
/* { dg-options "-mfloat-abi=hard" } */

If GCC is configured for arm-linux-gnueabi --with-cpu=cortex-a9 --with-fpu=neon,
with ‘make check’, the test PASSes.
With ‘make check’ with --target-board=-march=armv5t/-mthumb, then the
test FAILs:
sorry, unimplemented: Thumb-1 hard-float VFP ABI


If I add
/* { dg-require-effective-target arm_hard_ok } */
‘make check’ with --target-board=-march=armv5t/-mthumb is now
UNSUPPORTED (which is OK), but
plain ‘make check’ is now also UNSUPPORTED because arm_hard_ok detects
that we lack the -mfloat-abi=hard multilib. So we lose a PASS.

If I configure GCC for arm-linux-gnueabihf, then:
‘make check’ PASSes
‘make check’ with --target-board=-march=armv5t/-mthumb, FAILs
and with
/* { dg-require-effective-target arm_hard_ok } */
‘make check’ with --target-board=-march=armv5t/-mthumb is now UNSUPPORTED and
plain ‘make check’ PASSes

So it seems the best option is to add
/* { dg-require-effective-target arm_hard_ok } */
although it makes the test UNSUPPORTED by arm-linux-gnueabi even in
cases where it could PASS.

Is there consensus that this is the right way?



* In GCC DejaGnu helpers, the queries for -mfloat-abi=hard and
-march=XXX are independent in general, meaning if you query for
-mfloat-abi=hard support, it will do that in the absence of any
-march=XXX that the testcase may also be using. So, if GCC is
configured with its default cpu/fpu, -mfloat-abi=hard will be rejected
for lack of an fpu on the default cpu, but if GCC is configured with a
suitable cpu/fpu pair, -mfloat-abi=hard will be accepted.

I faced this problem when I tried to “fix” the order in which we try options in
Arm_v8_2a_bf16_neon_ok. (see
https://gcc.gnu.org/pipermail/gcc-patches/2020-April/544654.html)

I faced similar problems while working on a patch of mine about a bug
with IRQ handlers which has different behaviour depending on the FP
ABI used: I have the feeling that I spend too much time writing the
tests to the detriment of the patch itself...

I also noticed that Richard Sandiford probably faced similar issues
with his recent fix for "no_unique_address", where he finally added
arm_arch_v8a_hard_ok to check arm8v-a CPU + neon-fp-armv8 FPU +
float-abi=hard at the same time.

Maybe we could decide on a consistent and simpler way of checking such things?


* A metric for this complexity could be the number of arm
effective-targets, a quick and not-fully accurate grep | sed | sort |
uniq -c | sort -n on target-supports.exp ends with:
     9 mips
     16 aarch64
     21 powerpc
     97 vect
    106 arm
(does not count all the effective-targets generated by tcl code, eg
arm_arch_FUNC_ok)

This probably explains why it’s hard to get test directives right :-)

I’ve not thought about how we could reduce that number….



* Finally, I’m wondering about the most appropriate way of configuring
GCC and running the tests.

So far, for most of the configurations I'm testing, I use different
--with-cpu/--with-fpu/--with-mode configure flags for each toolchain
configuration I’m testing and rarely override the flags at testing
time. I also disable multilibs to save build time and (scratch) disk
space. (See https://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/0latest/report-build-info.html
for the current list, each line corresponds to a clean build + make
check job -- so there are 15 different toolchain configs for
arm-linux-gnueabihf for instance)

However, I think this is may not be appropriate at least for the
arm-eabi toolchains, because I suspect the vendors who support several
SoCs generally ship one binary toolchain built with the default
cpu/fpu/mode and the appropriate multilibs (aprofile or rmprofile),
and the associated IDE adds the right -mcpu/-mfpu flags (see
arm-embedded toolchain, ST CubeMX for stm32). So it seems to me that
the "appropriate" way of testing such a toolchain is to build it with
the default settings and appropriate multilibs and add the needed
-mcpu/-mfpu variants at 'make check' time.

I would still build one toolchain per configuration I want to test and
not use runtest’s capability to iterate over several combinations:
this way I can run the tests in parallel and reduce the total time
needed to get the results.

One can compare the results of both options with the two lines with
cortex-m33 in the above table (target arm-none-eabi).

In the first one, GCC is configured for cortex-m33, and tests executed
via plain ‘make check’: 401 failures in gcc. (duration ~2h, disk space
14GB)

In the 2nd line, GCC is configured with the default cpu/fpu, multilibs
enabled and I use test flags suitable for cortex-m33: now only 73
failures for gcc. (duration ~3h15, disk space 26GB). Note that there
are more failures for g++ and libstdc++ than for the previous line, I
haven’t fully checked why -- for libstdc++ there are spurious
-march=armv8-m.main+fp flags in the log. So this is not the magic
bullet.


Unfortunately, this means every test with arm_hard_ok effective target
would be unsupported (lack of fpu on default cpu) whatever the
validation cflags. The increased build time (many multilibs built for
nothing) will also reduce the validation bandwidth (I hope the
increased scratch disk space will not be a problem with my IT…)



OTOH, I have a feeling that arm-linux-gnueabi* toolchain vendors
probably prefer to tune them for their preferred default CPU. For
instance I have an arm board running Ubuntu with gcc-5.4 configured
--with-arch=armv7-a --with-fpu=vfpv3-d16 --with-float=hard
--with-mode=thumb.

If this is right, it would mean I should keep the configurations I
currently use for arm-linux* (no multilib, rely on default cpu/fpu).

** Regarding the flags used for testing, I’m also wondering what’s the
most appropriate: -mcpu or -march. Both have probably pros and cons?

In https://gcc.gnu.org/pipermail/gcc/2019-September/230258.html, I
described a problem where it seems that one expects the tests to run
with -march=XXX.

Another log of mine has an effective-target helper compiled with:
-mthumb -mcpu=cortex-m33 -mfloat-abi=hard -mfloat-abi=softfp
-mfpu=auto -march=armv8.1-m.main+mve.fp -mthumb
which produces this error:
cc1: warning: switch '-mcpu=cortex-m33' conflicts with
'-march=armv8.1-m.main' switch
which looks suspicious: running the tests in multiple ways surely
helps uncovering bugs….


In summary, I’d like to gather opinions on:
* appropriate usage of dg-require-effective-target arm_hard_ok
* how to improve float-abi support detection in combination with
architecture level
* hopefully consensus on choosing how to configure the toolchain and
run the tests. I’m suggesting default config + multilibs +
runtest-flags for arm-eabi and a selection of default cpu/fpu + less
runtest-flags for arm-linux*.


Thanks for reading that far :-)


Christophe

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [arm] GCC validation: preferred way of running the testsuite?
  2020-05-11 16:43 [arm] GCC validation: preferred way of running the testsuite? Christophe Lyon
@ 2020-05-19 11:28 ` Richard Earnshaw
  2020-05-26 17:04   ` Christophe Lyon
  0 siblings, 1 reply; 4+ messages in thread
From: Richard Earnshaw @ 2020-05-19 11:28 UTC (permalink / raw)
  To: Christophe Lyon, gcc Mailing List

On 11/05/2020 17:43, Christophe Lyon via Gcc wrote:
> Hi,
> 
> 
> As you may know, I've been running validations of GCC trunk in many
> configurations for Arm and Aarch64.
> 
> 
> I was recently trying to make some cleanup in the new Bfloat16, MVE, CDE, and
> ACLE tests because in several configurations I see 300-400 FAILs
> mainly in these areas, because of “testisms”. The goal is to avoid
> wasting time over the same failure reports when checking what needs
> fixing. I thought this would be quick & easy, but this is tedious
> because of the numerous combinations of options and configurations
> available on Arm.
> 
> 
> Sorry for the very long email, it’s hard to describe and summarize,
> but I'd like to try nonetheless, hoping that we can make testing
> easier/more efficient :-), because most of the time the problems I
> found are with the tests rather than real compiler bugs, so I think
> it's a bit of wasted time.
> 
> 
> Here is a list of problems, starting with the tricky dependencies
> around -mfloat-abi=XXX:
> 
> * Some targets do not support multilibs (eg arm-linux-gnueabi[hf] with
> glibc), or one can decide not to build with both hard and soft FP
> multilibs. This generally becomes a problem when including stdint.h
> (used by arm_neon.h, arm_acle.h, …), leading to a compiler error for
> lack of gnu/stub*.h for the missing float-abi. If you add -mthumb to
> the picture, it becomes quite complex (eg -mfloat-abi=hard is not
> supported on thumb-1).
> 
> 
> Consider mytest.c that does not depend on any include file and has:
> /* { dg-options "-mfloat-abi=hard" } */
> 
> If GCC is configured for arm-linux-gnueabi --with-cpu=cortex-a9 --with-fpu=neon,
> with ‘make check’, the test PASSes.
> With ‘make check’ with --target-board=-march=armv5t/-mthumb, then the
> test FAILs:
> sorry, unimplemented: Thumb-1 hard-float VFP ABI
> 
> 
> If I add
> /* { dg-require-effective-target arm_hard_ok } */
> ‘make check’ with --target-board=-march=armv5t/-mthumb is now
> UNSUPPORTED (which is OK), but
> plain ‘make check’ is now also UNSUPPORTED because arm_hard_ok detects
> that we lack the -mfloat-abi=hard multilib. So we lose a PASS.
> 
> If I configure GCC for arm-linux-gnueabihf, then:
> ‘make check’ PASSes
> ‘make check’ with --target-board=-march=armv5t/-mthumb, FAILs
> and with
> /* { dg-require-effective-target arm_hard_ok } */
> ‘make check’ with --target-board=-march=armv5t/-mthumb is now UNSUPPORTED and
> plain ‘make check’ PASSes
> 
> So it seems the best option is to add
> /* { dg-require-effective-target arm_hard_ok } */
> although it makes the test UNSUPPORTED by arm-linux-gnueabi even in
> cases where it could PASS.
> 
> Is there consensus that this is the right way?
> 
> 
> 
> * In GCC DejaGnu helpers, the queries for -mfloat-abi=hard and
> -march=XXX are independent in general, meaning if you query for
> -mfloat-abi=hard support, it will do that in the absence of any
> -march=XXX that the testcase may also be using. So, if GCC is
> configured with its default cpu/fpu, -mfloat-abi=hard will be rejected
> for lack of an fpu on the default cpu, but if GCC is configured with a
> suitable cpu/fpu pair, -mfloat-abi=hard will be accepted.
> 
> I faced this problem when I tried to “fix” the order in which we try options in
> Arm_v8_2a_bf16_neon_ok. (see
> https://gcc.gnu.org/pipermail/gcc-patches/2020-April/544654.html)
> 
> I faced similar problems while working on a patch of mine about a bug
> with IRQ handlers which has different behaviour depending on the FP
> ABI used: I have the feeling that I spend too much time writing the
> tests to the detriment of the patch itself...
> 
> I also noticed that Richard Sandiford probably faced similar issues
> with his recent fix for "no_unique_address", where he finally added
> arm_arch_v8a_hard_ok to check arm8v-a CPU + neon-fp-armv8 FPU +
> float-abi=hard at the same time.
> 
> Maybe we could decide on a consistent and simpler way of checking such things?
> 
> 
> * A metric for this complexity could be the number of arm
> effective-targets, a quick and not-fully accurate grep | sed | sort |
> uniq -c | sort -n on target-supports.exp ends with:
>      9 mips
>      16 aarch64
>      21 powerpc
>      97 vect
>     106 arm
> (does not count all the effective-targets generated by tcl code, eg
> arm_arch_FUNC_ok)
> 
> This probably explains why it’s hard to get test directives right :-)
> 
> I’ve not thought about how we could reduce that number….
> 
> 
> 
> * Finally, I’m wondering about the most appropriate way of configuring
> GCC and running the tests.
> 
> So far, for most of the configurations I'm testing, I use different
> --with-cpu/--with-fpu/--with-mode configure flags for each toolchain
> configuration I’m testing and rarely override the flags at testing
> time. I also disable multilibs to save build time and (scratch) disk
> space. (See https://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/0latest/report-build-info.html
> for the current list, each line corresponds to a clean build + make
> check job -- so there are 15 different toolchain configs for
> arm-linux-gnueabihf for instance)
> 
> However, I think this is may not be appropriate at least for the
> arm-eabi toolchains, because I suspect the vendors who support several
> SoCs generally ship one binary toolchain built with the default
> cpu/fpu/mode and the appropriate multilibs (aprofile or rmprofile),
> and the associated IDE adds the right -mcpu/-mfpu flags (see
> arm-embedded toolchain, ST CubeMX for stm32). So it seems to me that
> the "appropriate" way of testing such a toolchain is to build it with
> the default settings and appropriate multilibs and add the needed
> -mcpu/-mfpu variants at 'make check' time.
> 
> I would still build one toolchain per configuration I want to test and
> not use runtest’s capability to iterate over several combinations:
> this way I can run the tests in parallel and reduce the total time
> needed to get the results.
> 
> One can compare the results of both options with the two lines with
> cortex-m33 in the above table (target arm-none-eabi).
> 
> In the first one, GCC is configured for cortex-m33, and tests executed
> via plain ‘make check’: 401 failures in gcc. (duration ~2h, disk space
> 14GB)
> 
> In the 2nd line, GCC is configured with the default cpu/fpu, multilibs
> enabled and I use test flags suitable for cortex-m33: now only 73
> failures for gcc. (duration ~3h15, disk space 26GB). Note that there
> are more failures for g++ and libstdc++ than for the previous line, I
> haven’t fully checked why -- for libstdc++ there are spurious
> -march=armv8-m.main+fp flags in the log. So this is not the magic
> bullet.
> 
> 
> Unfortunately, this means every test with arm_hard_ok effective target
> would be unsupported (lack of fpu on default cpu) whatever the
> validation cflags. The increased build time (many multilibs built for
> nothing) will also reduce the validation bandwidth (I hope the
> increased scratch disk space will not be a problem with my IT…)
> 
> 
> 
> OTOH, I have a feeling that arm-linux-gnueabi* toolchain vendors
> probably prefer to tune them for their preferred default CPU. For
> instance I have an arm board running Ubuntu with gcc-5.4 configured
> --with-arch=armv7-a --with-fpu=vfpv3-d16 --with-float=hard
> --with-mode=thumb.
> 
> If this is right, it would mean I should keep the configurations I
> currently use for arm-linux* (no multilib, rely on default cpu/fpu).
> 
> ** Regarding the flags used for testing, I’m also wondering what’s the
> most appropriate: -mcpu or -march. Both have probably pros and cons?
> 
> In https://gcc.gnu.org/pipermail/gcc/2019-September/230258.html, I
> described a problem where it seems that one expects the tests to run
> with -march=XXX.
> 
> Another log of mine has an effective-target helper compiled with:
> -mthumb -mcpu=cortex-m33 -mfloat-abi=hard -mfloat-abi=softfp
> -mfpu=auto -march=armv8.1-m.main+mve.fp -mthumb
> which produces this error:
> cc1: warning: switch '-mcpu=cortex-m33' conflicts with
> '-march=armv8.1-m.main' switch
> which looks suspicious: running the tests in multiple ways surely
> helps uncovering bugs….
> 
> 
> In summary, I’d like to gather opinions on:
> * appropriate usage of dg-require-effective-target arm_hard_ok
> * how to improve float-abi support detection in combination with
> architecture level
> * hopefully consensus on choosing how to configure the toolchain and
> run the tests. I’m suggesting default config + multilibs +
> runtest-flags for arm-eabi and a selection of default cpu/fpu + less
> runtest-flags for arm-linux*.
> 
> 
> Thanks for reading that far :-)
> 
> 
> Christophe
> 

I've been pondering this for some time now (well before you sent your mail).

My feeling is that trying to control this via dejagnu options is just
getting too fiddly.  Perhaps a new approach is called for.

My thoughts are along the line of reworking the tests to use

  #pragma target <option>

etc (or the attribute equivalent), to set the compilation state to
something appropriate for the test so that the output is reasonable for
that and then we can stabilize the test.

It only works for assembly tests, not for anything that requires linking
or execution: but for those tests we shouldn't be looking for a specific
output but a specific behaviour and we can tolerate more variation in
the instructions that implement that behaviour (hybrid tests would need
splitting).

It's a fair amount of work, though, since many of the required options
cannot be controlled today via the attributes.  It's also not entirely
clear whether these should be exposed to users, since in most cases such
control is unlikely to be of use in real code.

R.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [arm] GCC validation: preferred way of running the testsuite?
  2020-05-19 11:28 ` Richard Earnshaw
@ 2020-05-26 17:04   ` Christophe Lyon
  2020-05-26 17:08     ` Richard Earnshaw
  0 siblings, 1 reply; 4+ messages in thread
From: Christophe Lyon @ 2020-05-26 17:04 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: gcc Mailing List

On Tue, 19 May 2020 at 13:28, Richard Earnshaw
<Richard.Earnshaw@foss.arm.com> wrote:
>
> On 11/05/2020 17:43, Christophe Lyon via Gcc wrote:
> > Hi,
> >
> >
> > As you may know, I've been running validations of GCC trunk in many
> > configurations for Arm and Aarch64.
> >
> >
> > I was recently trying to make some cleanup in the new Bfloat16, MVE, CDE, and
> > ACLE tests because in several configurations I see 300-400 FAILs
> > mainly in these areas, because of “testisms”. The goal is to avoid
> > wasting time over the same failure reports when checking what needs
> > fixing. I thought this would be quick & easy, but this is tedious
> > because of the numerous combinations of options and configurations
> > available on Arm.
> >
> >
> > Sorry for the very long email, it’s hard to describe and summarize,
> > but I'd like to try nonetheless, hoping that we can make testing
> > easier/more efficient :-), because most of the time the problems I
> > found are with the tests rather than real compiler bugs, so I think
> > it's a bit of wasted time.
> >
> >
> > Here is a list of problems, starting with the tricky dependencies
> > around -mfloat-abi=XXX:
> >
> > * Some targets do not support multilibs (eg arm-linux-gnueabi[hf] with
> > glibc), or one can decide not to build with both hard and soft FP
> > multilibs. This generally becomes a problem when including stdint.h
> > (used by arm_neon.h, arm_acle.h, …), leading to a compiler error for
> > lack of gnu/stub*.h for the missing float-abi. If you add -mthumb to
> > the picture, it becomes quite complex (eg -mfloat-abi=hard is not
> > supported on thumb-1).
> >
> >
> > Consider mytest.c that does not depend on any include file and has:
> > /* { dg-options "-mfloat-abi=hard" } */
> >
> > If GCC is configured for arm-linux-gnueabi --with-cpu=cortex-a9 --with-fpu=neon,
> > with ‘make check’, the test PASSes.
> > With ‘make check’ with --target-board=-march=armv5t/-mthumb, then the
> > test FAILs:
> > sorry, unimplemented: Thumb-1 hard-float VFP ABI
> >
> >
> > If I add
> > /* { dg-require-effective-target arm_hard_ok } */
> > ‘make check’ with --target-board=-march=armv5t/-mthumb is now
> > UNSUPPORTED (which is OK), but
> > plain ‘make check’ is now also UNSUPPORTED because arm_hard_ok detects
> > that we lack the -mfloat-abi=hard multilib. So we lose a PASS.
> >
> > If I configure GCC for arm-linux-gnueabihf, then:
> > ‘make check’ PASSes
> > ‘make check’ with --target-board=-march=armv5t/-mthumb, FAILs
> > and with
> > /* { dg-require-effective-target arm_hard_ok } */
> > ‘make check’ with --target-board=-march=armv5t/-mthumb is now UNSUPPORTED and
> > plain ‘make check’ PASSes
> >
> > So it seems the best option is to add
> > /* { dg-require-effective-target arm_hard_ok } */
> > although it makes the test UNSUPPORTED by arm-linux-gnueabi even in
> > cases where it could PASS.
> >
> > Is there consensus that this is the right way?
> >
> >
> >
> > * In GCC DejaGnu helpers, the queries for -mfloat-abi=hard and
> > -march=XXX are independent in general, meaning if you query for
> > -mfloat-abi=hard support, it will do that in the absence of any
> > -march=XXX that the testcase may also be using. So, if GCC is
> > configured with its default cpu/fpu, -mfloat-abi=hard will be rejected
> > for lack of an fpu on the default cpu, but if GCC is configured with a
> > suitable cpu/fpu pair, -mfloat-abi=hard will be accepted.
> >
> > I faced this problem when I tried to “fix” the order in which we try options in
> > Arm_v8_2a_bf16_neon_ok. (see
> > https://gcc.gnu.org/pipermail/gcc-patches/2020-April/544654.html)
> >
> > I faced similar problems while working on a patch of mine about a bug
> > with IRQ handlers which has different behaviour depending on the FP
> > ABI used: I have the feeling that I spend too much time writing the
> > tests to the detriment of the patch itself...
> >
> > I also noticed that Richard Sandiford probably faced similar issues
> > with his recent fix for "no_unique_address", where he finally added
> > arm_arch_v8a_hard_ok to check arm8v-a CPU + neon-fp-armv8 FPU +
> > float-abi=hard at the same time.
> >
> > Maybe we could decide on a consistent and simpler way of checking such things?
> >
> >
> > * A metric for this complexity could be the number of arm
> > effective-targets, a quick and not-fully accurate grep | sed | sort |
> > uniq -c | sort -n on target-supports.exp ends with:
> >      9 mips
> >      16 aarch64
> >      21 powerpc
> >      97 vect
> >     106 arm
> > (does not count all the effective-targets generated by tcl code, eg
> > arm_arch_FUNC_ok)
> >
> > This probably explains why it’s hard to get test directives right :-)
> >
> > I’ve not thought about how we could reduce that number….
> >
> >
> >
> > * Finally, I’m wondering about the most appropriate way of configuring
> > GCC and running the tests.
> >
> > So far, for most of the configurations I'm testing, I use different
> > --with-cpu/--with-fpu/--with-mode configure flags for each toolchain
> > configuration I’m testing and rarely override the flags at testing
> > time. I also disable multilibs to save build time and (scratch) disk
> > space. (See https://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/0latest/report-build-info.html
> > for the current list, each line corresponds to a clean build + make
> > check job -- so there are 15 different toolchain configs for
> > arm-linux-gnueabihf for instance)
> >
> > However, I think this is may not be appropriate at least for the
> > arm-eabi toolchains, because I suspect the vendors who support several
> > SoCs generally ship one binary toolchain built with the default
> > cpu/fpu/mode and the appropriate multilibs (aprofile or rmprofile),
> > and the associated IDE adds the right -mcpu/-mfpu flags (see
> > arm-embedded toolchain, ST CubeMX for stm32). So it seems to me that
> > the "appropriate" way of testing such a toolchain is to build it with
> > the default settings and appropriate multilibs and add the needed
> > -mcpu/-mfpu variants at 'make check' time.
> >
> > I would still build one toolchain per configuration I want to test and
> > not use runtest’s capability to iterate over several combinations:
> > this way I can run the tests in parallel and reduce the total time
> > needed to get the results.
> >
> > One can compare the results of both options with the two lines with
> > cortex-m33 in the above table (target arm-none-eabi).
> >
> > In the first one, GCC is configured for cortex-m33, and tests executed
> > via plain ‘make check’: 401 failures in gcc. (duration ~2h, disk space
> > 14GB)
> >
> > In the 2nd line, GCC is configured with the default cpu/fpu, multilibs
> > enabled and I use test flags suitable for cortex-m33: now only 73
> > failures for gcc. (duration ~3h15, disk space 26GB). Note that there
> > are more failures for g++ and libstdc++ than for the previous line, I
> > haven’t fully checked why -- for libstdc++ there are spurious
> > -march=armv8-m.main+fp flags in the log. So this is not the magic
> > bullet.
> >
> >
> > Unfortunately, this means every test with arm_hard_ok effective target
> > would be unsupported (lack of fpu on default cpu) whatever the
> > validation cflags. The increased build time (many multilibs built for
> > nothing) will also reduce the validation bandwidth (I hope the
> > increased scratch disk space will not be a problem with my IT…)
> >
> >
> >
> > OTOH, I have a feeling that arm-linux-gnueabi* toolchain vendors
> > probably prefer to tune them for their preferred default CPU. For
> > instance I have an arm board running Ubuntu with gcc-5.4 configured
> > --with-arch=armv7-a --with-fpu=vfpv3-d16 --with-float=hard
> > --with-mode=thumb.
> >
> > If this is right, it would mean I should keep the configurations I
> > currently use for arm-linux* (no multilib, rely on default cpu/fpu).
> >
> > ** Regarding the flags used for testing, I’m also wondering what’s the
> > most appropriate: -mcpu or -march. Both have probably pros and cons?
> >
> > In https://gcc.gnu.org/pipermail/gcc/2019-September/230258.html, I
> > described a problem where it seems that one expects the tests to run
> > with -march=XXX.
> >
> > Another log of mine has an effective-target helper compiled with:
> > -mthumb -mcpu=cortex-m33 -mfloat-abi=hard -mfloat-abi=softfp
> > -mfpu=auto -march=armv8.1-m.main+mve.fp -mthumb
> > which produces this error:
> > cc1: warning: switch '-mcpu=cortex-m33' conflicts with
> > '-march=armv8.1-m.main' switch
> > which looks suspicious: running the tests in multiple ways surely
> > helps uncovering bugs….
> >
> >
> > In summary, I’d like to gather opinions on:
> > * appropriate usage of dg-require-effective-target arm_hard_ok
> > * how to improve float-abi support detection in combination with
> > architecture level
> > * hopefully consensus on choosing how to configure the toolchain and
> > run the tests. I’m suggesting default config + multilibs +
> > runtest-flags for arm-eabi and a selection of default cpu/fpu + less
> > runtest-flags for arm-linux*.
> >
> >
> > Thanks for reading that far :-)
> >
> >
> > Christophe
> >
>

Thanks för your anwer.


> I've been pondering this for some time now (well before you sent your mail).
>
> My feeling is that trying to control this via dejagnu options is just
> getting too fiddly.  Perhaps a new approach is called for.
>
> My thoughts are along the line of reworking the tests to use
>
>   #pragma target <option>
>
> etc (or the attribute equivalent), to set the compilation state to
> something appropriate for the test so that the output is reasonable for
> that and then we can stabilize the test.
>
> It only works for assembly tests, not for anything that requires linking
> or execution: but for those tests we shouldn't be looking for a specific
> output but a specific behaviour and we can tolerate more variation in
> the instructions that implement that behaviour (hybrid tests would need
> splitting).

I'm not sure to fully understand what you mean: if we add #pragma CPU XXX
to a test for instance, and then run the tests with -mcpu=YYY, then
the test will still be compiled for XXX, right?
How would we detect that the generated code is wrong if compiling for YYY?

>
> It's a fair amount of work, though, since many of the required options
> cannot be controlled today via the attributes.  It's also not entirely
Indeed!

Not to mention that we would also have to decorate the many existing tests.

> clear whether these should be exposed to users, since in most cases such
> control is unlikely to be of use in real code.
Probably indeed.

For the record, I've changed the way I run the validations for
arm-eabi as I described in my original email:
I now use the default cpu/fpu/mode at GCC configure time, enable the
relevant multilibs then override the compilation flags when running
the tests.

For instance: -mthumb/-mcpu=cortex-m33/-mfloat-abi=hard

The number of failures is now lower than it used to be when
configuring --with-cpu=cortex-m33.

Christophe

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [arm] GCC validation: preferred way of running the testsuite?
  2020-05-26 17:04   ` Christophe Lyon
@ 2020-05-26 17:08     ` Richard Earnshaw
  0 siblings, 0 replies; 4+ messages in thread
From: Richard Earnshaw @ 2020-05-26 17:08 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: gcc Mailing List

On 26/05/2020 18:04, Christophe Lyon via Gcc wrote:
> On Tue, 19 May 2020 at 13:28, Richard Earnshaw
> <Richard.Earnshaw@foss.arm.com> wrote:
>>
>> On 11/05/2020 17:43, Christophe Lyon via Gcc wrote:
>>> Hi,
>>>
>>>
>>> As you may know, I've been running validations of GCC trunk in many
>>> configurations for Arm and Aarch64.
>>>
>>>
>>> I was recently trying to make some cleanup in the new Bfloat16, MVE, CDE, and
>>> ACLE tests because in several configurations I see 300-400 FAILs
>>> mainly in these areas, because of “testisms”. The goal is to avoid
>>> wasting time over the same failure reports when checking what needs
>>> fixing. I thought this would be quick & easy, but this is tedious
>>> because of the numerous combinations of options and configurations
>>> available on Arm.
>>>
>>>
>>> Sorry for the very long email, it’s hard to describe and summarize,
>>> but I'd like to try nonetheless, hoping that we can make testing
>>> easier/more efficient :-), because most of the time the problems I
>>> found are with the tests rather than real compiler bugs, so I think
>>> it's a bit of wasted time.
>>>
>>>
>>> Here is a list of problems, starting with the tricky dependencies
>>> around -mfloat-abi=XXX:
>>>
>>> * Some targets do not support multilibs (eg arm-linux-gnueabi[hf] with
>>> glibc), or one can decide not to build with both hard and soft FP
>>> multilibs. This generally becomes a problem when including stdint.h
>>> (used by arm_neon.h, arm_acle.h, …), leading to a compiler error for
>>> lack of gnu/stub*.h for the missing float-abi. If you add -mthumb to
>>> the picture, it becomes quite complex (eg -mfloat-abi=hard is not
>>> supported on thumb-1).
>>>
>>>
>>> Consider mytest.c that does not depend on any include file and has:
>>> /* { dg-options "-mfloat-abi=hard" } */
>>>
>>> If GCC is configured for arm-linux-gnueabi --with-cpu=cortex-a9 --with-fpu=neon,
>>> with ‘make check’, the test PASSes.
>>> With ‘make check’ with --target-board=-march=armv5t/-mthumb, then the
>>> test FAILs:
>>> sorry, unimplemented: Thumb-1 hard-float VFP ABI
>>>
>>>
>>> If I add
>>> /* { dg-require-effective-target arm_hard_ok } */
>>> ‘make check’ with --target-board=-march=armv5t/-mthumb is now
>>> UNSUPPORTED (which is OK), but
>>> plain ‘make check’ is now also UNSUPPORTED because arm_hard_ok detects
>>> that we lack the -mfloat-abi=hard multilib. So we lose a PASS.
>>>
>>> If I configure GCC for arm-linux-gnueabihf, then:
>>> ‘make check’ PASSes
>>> ‘make check’ with --target-board=-march=armv5t/-mthumb, FAILs
>>> and with
>>> /* { dg-require-effective-target arm_hard_ok } */
>>> ‘make check’ with --target-board=-march=armv5t/-mthumb is now UNSUPPORTED and
>>> plain ‘make check’ PASSes
>>>
>>> So it seems the best option is to add
>>> /* { dg-require-effective-target arm_hard_ok } */
>>> although it makes the test UNSUPPORTED by arm-linux-gnueabi even in
>>> cases where it could PASS.
>>>
>>> Is there consensus that this is the right way?
>>>
>>>
>>>
>>> * In GCC DejaGnu helpers, the queries for -mfloat-abi=hard and
>>> -march=XXX are independent in general, meaning if you query for
>>> -mfloat-abi=hard support, it will do that in the absence of any
>>> -march=XXX that the testcase may also be using. So, if GCC is
>>> configured with its default cpu/fpu, -mfloat-abi=hard will be rejected
>>> for lack of an fpu on the default cpu, but if GCC is configured with a
>>> suitable cpu/fpu pair, -mfloat-abi=hard will be accepted.
>>>
>>> I faced this problem when I tried to “fix” the order in which we try options in
>>> Arm_v8_2a_bf16_neon_ok. (see
>>> https://gcc.gnu.org/pipermail/gcc-patches/2020-April/544654.html)
>>>
>>> I faced similar problems while working on a patch of mine about a bug
>>> with IRQ handlers which has different behaviour depending on the FP
>>> ABI used: I have the feeling that I spend too much time writing the
>>> tests to the detriment of the patch itself...
>>>
>>> I also noticed that Richard Sandiford probably faced similar issues
>>> with his recent fix for "no_unique_address", where he finally added
>>> arm_arch_v8a_hard_ok to check arm8v-a CPU + neon-fp-armv8 FPU +
>>> float-abi=hard at the same time.
>>>
>>> Maybe we could decide on a consistent and simpler way of checking such things?
>>>
>>>
>>> * A metric for this complexity could be the number of arm
>>> effective-targets, a quick and not-fully accurate grep | sed | sort |
>>> uniq -c | sort -n on target-supports.exp ends with:
>>>      9 mips
>>>      16 aarch64
>>>      21 powerpc
>>>      97 vect
>>>     106 arm
>>> (does not count all the effective-targets generated by tcl code, eg
>>> arm_arch_FUNC_ok)
>>>
>>> This probably explains why it’s hard to get test directives right :-)
>>>
>>> I’ve not thought about how we could reduce that number….
>>>
>>>
>>>
>>> * Finally, I’m wondering about the most appropriate way of configuring
>>> GCC and running the tests.
>>>
>>> So far, for most of the configurations I'm testing, I use different
>>> --with-cpu/--with-fpu/--with-mode configure flags for each toolchain
>>> configuration I’m testing and rarely override the flags at testing
>>> time. I also disable multilibs to save build time and (scratch) disk
>>> space. (See https://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/0latest/report-build-info.html
>>> for the current list, each line corresponds to a clean build + make
>>> check job -- so there are 15 different toolchain configs for
>>> arm-linux-gnueabihf for instance)
>>>
>>> However, I think this is may not be appropriate at least for the
>>> arm-eabi toolchains, because I suspect the vendors who support several
>>> SoCs generally ship one binary toolchain built with the default
>>> cpu/fpu/mode and the appropriate multilibs (aprofile or rmprofile),
>>> and the associated IDE adds the right -mcpu/-mfpu flags (see
>>> arm-embedded toolchain, ST CubeMX for stm32). So it seems to me that
>>> the "appropriate" way of testing such a toolchain is to build it with
>>> the default settings and appropriate multilibs and add the needed
>>> -mcpu/-mfpu variants at 'make check' time.
>>>
>>> I would still build one toolchain per configuration I want to test and
>>> not use runtest’s capability to iterate over several combinations:
>>> this way I can run the tests in parallel and reduce the total time
>>> needed to get the results.
>>>
>>> One can compare the results of both options with the two lines with
>>> cortex-m33 in the above table (target arm-none-eabi).
>>>
>>> In the first one, GCC is configured for cortex-m33, and tests executed
>>> via plain ‘make check’: 401 failures in gcc. (duration ~2h, disk space
>>> 14GB)
>>>
>>> In the 2nd line, GCC is configured with the default cpu/fpu, multilibs
>>> enabled and I use test flags suitable for cortex-m33: now only 73
>>> failures for gcc. (duration ~3h15, disk space 26GB). Note that there
>>> are more failures for g++ and libstdc++ than for the previous line, I
>>> haven’t fully checked why -- for libstdc++ there are spurious
>>> -march=armv8-m.main+fp flags in the log. So this is not the magic
>>> bullet.
>>>
>>>
>>> Unfortunately, this means every test with arm_hard_ok effective target
>>> would be unsupported (lack of fpu on default cpu) whatever the
>>> validation cflags. The increased build time (many multilibs built for
>>> nothing) will also reduce the validation bandwidth (I hope the
>>> increased scratch disk space will not be a problem with my IT…)
>>>
>>>
>>>
>>> OTOH, I have a feeling that arm-linux-gnueabi* toolchain vendors
>>> probably prefer to tune them for their preferred default CPU. For
>>> instance I have an arm board running Ubuntu with gcc-5.4 configured
>>> --with-arch=armv7-a --with-fpu=vfpv3-d16 --with-float=hard
>>> --with-mode=thumb.
>>>
>>> If this is right, it would mean I should keep the configurations I
>>> currently use for arm-linux* (no multilib, rely on default cpu/fpu).
>>>
>>> ** Regarding the flags used for testing, I’m also wondering what’s the
>>> most appropriate: -mcpu or -march. Both have probably pros and cons?
>>>
>>> In https://gcc.gnu.org/pipermail/gcc/2019-September/230258.html, I
>>> described a problem where it seems that one expects the tests to run
>>> with -march=XXX.
>>>
>>> Another log of mine has an effective-target helper compiled with:
>>> -mthumb -mcpu=cortex-m33 -mfloat-abi=hard -mfloat-abi=softfp
>>> -mfpu=auto -march=armv8.1-m.main+mve.fp -mthumb
>>> which produces this error:
>>> cc1: warning: switch '-mcpu=cortex-m33' conflicts with
>>> '-march=armv8.1-m.main' switch
>>> which looks suspicious: running the tests in multiple ways surely
>>> helps uncovering bugs….
>>>
>>>
>>> In summary, I’d like to gather opinions on:
>>> * appropriate usage of dg-require-effective-target arm_hard_ok
>>> * how to improve float-abi support detection in combination with
>>> architecture level
>>> * hopefully consensus on choosing how to configure the toolchain and
>>> run the tests. I’m suggesting default config + multilibs +
>>> runtest-flags for arm-eabi and a selection of default cpu/fpu + less
>>> runtest-flags for arm-linux*.
>>>
>>>
>>> Thanks for reading that far :-)
>>>
>>>
>>> Christophe
>>>
>>
> 
> Thanks för your anwer.
> 
> 
>> I've been pondering this for some time now (well before you sent your mail).
>>
>> My feeling is that trying to control this via dejagnu options is just
>> getting too fiddly.  Perhaps a new approach is called for.
>>
>> My thoughts are along the line of reworking the tests to use
>>
>>   #pragma target <option>
>>
>> etc (or the attribute equivalent), to set the compilation state to
>> something appropriate for the test so that the output is reasonable for
>> that and then we can stabilize the test.
>>
>> It only works for assembly tests, not for anything that requires linking
>> or execution: but for those tests we shouldn't be looking for a specific
>> output but a specific behaviour and we can tolerate more variation in
>> the instructions that implement that behaviour (hybrid tests would need
>> splitting).
> 
> I'm not sure to fully understand what you mean: if we add #pragma CPU XXX
> to a test for instance, and then run the tests with -mcpu=YYY, then
> the test will still be compiled for XXX, right?
> How would we detect that the generated code is wrong if compiling for YYY?
> 

That's a separate test.  You either accept what's on the command line
for the multilib, or you have a test that essentially ignores the
command-line options (but is a compile-to-asm only test).  You can't
have it both ways without the mess we have now.

>>
>> It's a fair amount of work, though, since many of the required options
>> cannot be controlled today via the attributes.  It's also not entirely
> Indeed!
> 
> Not to mention that we would also have to decorate the many existing tests.
> 
>> clear whether these should be exposed to users, since in most cases such
>> control is unlikely to be of use in real code.
> Probably indeed.
> 
> For the record, I've changed the way I run the validations for
> arm-eabi as I described in my original email:
> I now use the default cpu/fpu/mode at GCC configure time, enable the
> relevant multilibs then override the compilation flags when running
> the tests.
> 
> For instance: -mthumb/-mcpu=cortex-m33/-mfloat-abi=hard
> 
> The number of failures is now lower than it used to be when
> configuring --with-cpu=cortex-m33.
> 
> Christophe
> 

R.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-05-26 17:08 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-11 16:43 [arm] GCC validation: preferred way of running the testsuite? Christophe Lyon
2020-05-19 11:28 ` Richard Earnshaw
2020-05-26 17:04   ` Christophe Lyon
2020-05-26 17:08     ` Richard Earnshaw

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).