public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][ARM][0/4] Fixing PR target/65932
@ 2016-01-22  9:52 Kyrill Tkachov
  2016-02-01 11:11 ` Kyrill Tkachov
  2016-02-04  9:14 ` Ramana Radhakrishnan
  0 siblings, 2 replies; 6+ messages in thread
From: Kyrill Tkachov @ 2016-01-22  9:52 UTC (permalink / raw)
  To: GCC Patches; +Cc: Ramana Radhakrishnan, Richard Earnshaw, Jim Wilson

Hi all,

PR target/65932 is a wrong-code bug affecting arm and has manifested itself
when compiling the Linux kernel, so it's something that we really
ought to fix. The problem stems from the fact that PROMOTE_MODE and
TARGET_PROMOTE_FUNCTION_MODE don't match up on arm.
PROMOTE_MODE also marks the promotion as unsigned, whereas the
TARGET_PROMOTE_FUNCTION_MODE doesn't. This can lead to short variables
being wrongly zero-extended instead of sign-extended.  This also occurs
in PR target/67714.

Jim Wilson tried a few approaches and from the discussion
on the PR and on the ML (https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00814.html)
the preferred approach is to make PROMOTE_MODE and TARGET_PROMOTE_FUNCTION_MODE
match up. Changing TARGET_PROMOTE_FUNCTION_MODE to zero-extend would be an ABI
change so we don't want to do that.  Changing PROMOTE_MODE to not zero-extend
fixes both PR 65932 and 67714.  So Jim's patch is the first patch in this
series.

It has been posted at https://gcc.gnu.org/ml/gcc-patches/2015-06/msg02132.html
and this series is based on top of the arm.h hunk of that patch.

There have been some concerns about the codegen quality fallout from Jim's
patch, which is what the remaining patches in this series address:

* 3 new arm testsuite failures: gcc.target/arm/wmul-[123].c.
wmul-1.c and wmul-2.c are cases where we no longer generate
sign-extend+multiply (and accumulate) instructions but instead generate
the normal full-width multiplies (the operands are sign-extended from
preceeding sign-extending loads).  This is a regression for some targets
on which the sign-extending form is faster. Patches 2 and 3 address this.
gcc.target/arm/wmul-3.c is a test where we actually end up generating
better code and so the testcase just needs to be adjusted.
Patch 4 deals with that.

* Sign-extending rather than zero-extending short values means we make more
use of the ldrsb and ldrsh arm instructions rather than the zero-extending
ldrb and ldrh.  On Thumb1 targets ldrsb and ldrsh have more limited addressing
modes (only REG + REG), which could hurt code size. However, the change also
means that we can now merge sequences of load-zero-extend followed by a sign-extend
into a single load-sign-extend.
So we'd turn a (ldrh ; sxth) into an (ldrsh).

I've built a few popular embedded benchmarks for a Cortex-M0 target (Thumb1) and looked
at the generated assembly for -Os and -O2. I did see both effects. That is,
ldrh instructions with an immediate being turned into two instructions:
     ldrh    r4, [r4, #12]
->
     movs    r5, #12
     ldrsh    r4, [r4, r5]

But I also observed the beneficial effect. Various register allocation
perturbations also featured in the changes
Overall, for every codebase I've looked at both -Os and -O2 the new code was
slightly smaller. Probably not enough to call this an outright win, but
definitely not a regression overall.

As for ARM and Thumb2 states, this series doesn't have a major impact.
SPEC2006 bencharks are slightly reduced in size (but nothing to write home
about) and there are no code quality regressions. And even the regressions
caught by the testsuite in the wmul-[12].c tests don't actually manifest
in practice in SPEC, but they are fixed anyway.

The series contains one target-independent change to CSE in patch 3 that
I'll explain there.

The series has been bootstrapped and tested on arm, aarch64 and x86_64.
Is this ok for trunk together with Jim's arm.h hunk from
g ?

P.S. This also fixes PR middlen-end/67295 which is a testsuite regression on arm.
Refer to the bugzilla entry there for the analysis of how this issue affects
that PR.

Thanks,
Kyrill

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH][ARM][0/4] Fixing PR target/65932
  2016-01-22  9:52 [PATCH][ARM][0/4] Fixing PR target/65932 Kyrill Tkachov
@ 2016-02-01 11:11 ` Kyrill Tkachov
  2016-02-04  9:14 ` Ramana Radhakrishnan
  1 sibling, 0 replies; 6+ messages in thread
From: Kyrill Tkachov @ 2016-02-01 11:11 UTC (permalink / raw)
  To: GCC Patches; +Cc: Ramana Radhakrishnan, Richard Earnshaw, Jim Wilson

Ping on the series:
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01717.html
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01714.html
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01716.html (already approved by Richi)
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01715.html

as well as
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg02132.html
as described in the cover letter

Thanks,
Kyrill

On 22/01/16 09:52, Kyrill Tkachov wrote:
> Hi all,
>
> PR target/65932 is a wrong-code bug affecting arm and has manifested itself
> when compiling the Linux kernel, so it's something that we really
> ought to fix. The problem stems from the fact that PROMOTE_MODE and
> TARGET_PROMOTE_FUNCTION_MODE don't match up on arm.
> PROMOTE_MODE also marks the promotion as unsigned, whereas the
> TARGET_PROMOTE_FUNCTION_MODE doesn't. This can lead to short variables
> being wrongly zero-extended instead of sign-extended.  This also occurs
> in PR target/67714.
>
> Jim Wilson tried a few approaches and from the discussion
> on the PR and on the ML (https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00814.html)
> the preferred approach is to make PROMOTE_MODE and TARGET_PROMOTE_FUNCTION_MODE
> match up. Changing TARGET_PROMOTE_FUNCTION_MODE to zero-extend would be an ABI
> change so we don't want to do that.  Changing PROMOTE_MODE to not zero-extend
> fixes both PR 65932 and 67714.  So Jim's patch is the first patch in this
> series.
>
> It has been posted at https://gcc.gnu.org/ml/gcc-patches/2015-06/msg02132.html
> and this series is based on top of the arm.h hunk of that patch.
>
> There have been some concerns about the codegen quality fallout from Jim's
> patch, which is what the remaining patches in this series address:
>
> * 3 new arm testsuite failures: gcc.target/arm/wmul-[123].c.
> wmul-1.c and wmul-2.c are cases where we no longer generate
> sign-extend+multiply (and accumulate) instructions but instead generate
> the normal full-width multiplies (the operands are sign-extended from
> preceeding sign-extending loads).  This is a regression for some targets
> on which the sign-extending form is faster. Patches 2 and 3 address this.
> gcc.target/arm/wmul-3.c is a test where we actually end up generating
> better code and so the testcase just needs to be adjusted.
> Patch 4 deals with that.
>
> * Sign-extending rather than zero-extending short values means we make more
> use of the ldrsb and ldrsh arm instructions rather than the zero-extending
> ldrb and ldrh.  On Thumb1 targets ldrsb and ldrsh have more limited addressing
> modes (only REG + REG), which could hurt code size. However, the change also
> means that we can now merge sequences of load-zero-extend followed by a sign-extend
> into a single load-sign-extend.
> So we'd turn a (ldrh ; sxth) into an (ldrsh).
>
> I've built a few popular embedded benchmarks for a Cortex-M0 target (Thumb1) and looked
> at the generated assembly for -Os and -O2. I did see both effects. That is,
> ldrh instructions with an immediate being turned into two instructions:
>     ldrh    r4, [r4, #12]
> ->
>     movs    r5, #12
>     ldrsh    r4, [r4, r5]
>
> But I also observed the beneficial effect. Various register allocation
> perturbations also featured in the changes
> Overall, for every codebase I've looked at both -Os and -O2 the new code was
> slightly smaller. Probably not enough to call this an outright win, but
> definitely not a regression overall.
>
> As for ARM and Thumb2 states, this series doesn't have a major impact.
> SPEC2006 bencharks are slightly reduced in size (but nothing to write home
> about) and there are no code quality regressions. And even the regressions
> caught by the testsuite in the wmul-[12].c tests don't actually manifest
> in practice in SPEC, but they are fixed anyway.
>
> The series contains one target-independent change to CSE in patch 3 that
> I'll explain there.
>
> The series has been bootstrapped and tested on arm, aarch64 and x86_64.
> Is this ok for trunk together with Jim's arm.h hunk from
> g ?
>
> P.S. This also fixes PR middlen-end/67295 which is a testsuite regression on arm.
> Refer to the bugzilla entry there for the analysis of how this issue affects
> that PR.
>
> Thanks,
> Kyrill
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH][ARM][0/4] Fixing PR target/65932
  2016-01-22  9:52 [PATCH][ARM][0/4] Fixing PR target/65932 Kyrill Tkachov
  2016-02-01 11:11 ` Kyrill Tkachov
@ 2016-02-04  9:14 ` Ramana Radhakrishnan
  2016-02-04  9:35   ` Kyrill Tkachov
  1 sibling, 1 reply; 6+ messages in thread
From: Ramana Radhakrishnan @ 2016-02-04  9:14 UTC (permalink / raw)
  To: Kyrill Tkachov
  Cc: GCC Patches, Ramana Radhakrishnan, Richard Earnshaw, Jim Wilson

On Fri, Jan 22, 2016 at 9:52 AM, Kyrill Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
> Hi all,
>
> PR target/65932 is a wrong-code bug affecting arm and has manifested itself
> when compiling the Linux kernel, so it's something that we really
> ought to fix. The problem stems from the fact that PROMOTE_MODE and
> TARGET_PROMOTE_FUNCTION_MODE don't match up on arm.
> PROMOTE_MODE also marks the promotion as unsigned, whereas the
> TARGET_PROMOTE_FUNCTION_MODE doesn't. This can lead to short variables
> being wrongly zero-extended instead of sign-extended.  This also occurs
> in PR target/67714.
>
> Jim Wilson tried a few approaches and from the discussion
> on the PR and on the ML
> (https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00814.html)
> the preferred approach is to make PROMOTE_MODE and
> TARGET_PROMOTE_FUNCTION_MODE
> match up. Changing TARGET_PROMOTE_FUNCTION_MODE to zero-extend would be an
> ABI
> change so we don't want to do that.  Changing PROMOTE_MODE to not
> zero-extend
> fixes both PR 65932 and 67714.  So Jim's patch is the first patch in this
> series.
>
> It has been posted at
> https://gcc.gnu.org/ml/gcc-patches/2015-06/msg02132.html
> and this series is based on top of the arm.h hunk of that patch.
>
> There have been some concerns about the codegen quality fallout from Jim's
> patch, which is what the remaining patches in this series address:
>
> * 3 new arm testsuite failures: gcc.target/arm/wmul-[123].c.
> wmul-1.c and wmul-2.c are cases where we no longer generate
> sign-extend+multiply (and accumulate) instructions but instead generate
> the normal full-width multiplies (the operands are sign-extended from
> preceeding sign-extending loads).  This is a regression for some targets
> on which the sign-extending form is faster. Patches 2 and 3 address this.
> gcc.target/arm/wmul-3.c is a test where we actually end up generating
> better code and so the testcase just needs to be adjusted.
> Patch 4 deals with that.
>
> * Sign-extending rather than zero-extending short values means we make more
> use of the ldrsb and ldrsh arm instructions rather than the zero-extending
> ldrb and ldrh.  On Thumb1 targets ldrsb and ldrsh have more limited
> addressing
> modes (only REG + REG), which could hurt code size. However, the change also
> means that we can now merge sequences of load-zero-extend followed by a
> sign-extend
> into a single load-sign-extend.
> So we'd turn a (ldrh ; sxth) into an (ldrsh).
>
> I've built a few popular embedded benchmarks for a Cortex-M0 target (Thumb1)
> and looked
> at the generated assembly for -Os and -O2. I did see both effects. That is,
> ldrh instructions with an immediate being turned into two instructions:
>     ldrh    r4, [r4, #12]
> ->
>     movs    r5, #12
>     ldrsh    r4, [r4, r5]
>
> But I also observed the beneficial effect. Various register allocation
> perturbations also featured in the changes
> Overall, for every codebase I've looked at both -Os and -O2 the new code was
> slightly smaller. Probably not enough to call this an outright win, but
> definitely not a regression overall.
>
> As for ARM and Thumb2 states, this series doesn't have a major impact.
> SPEC2006 bencharks are slightly reduced in size (but nothing to write home
> about) and there are no code quality regressions. And even the regressions
> caught by the testsuite in the wmul-[12].c tests don't actually manifest
> in practice in SPEC, but they are fixed anyway.
>
> The series contains one target-independent change to CSE in patch 3 that
> I'll explain there.
>
> The series has been bootstrapped and tested on arm, aarch64 and x86_64.
> Is this ok for trunk together with Jim's arm.h hunk from
> g ?


The whole series is OK provided the middle-end hunk has been approved.

Thanks for working through this.


Ramana

>
> P.S. This also fixes PR middlen-end/67295 which is a testsuite regression on
> arm.
> Refer to the bugzilla entry there for the analysis of how this issue affects
> that PR.
>
> Thanks,
> Kyrill
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH][ARM][0/4] Fixing PR target/65932
  2016-02-04  9:14 ` Ramana Radhakrishnan
@ 2016-02-04  9:35   ` Kyrill Tkachov
  2016-02-04 19:35     ` H.J. Lu
  0 siblings, 1 reply; 6+ messages in thread
From: Kyrill Tkachov @ 2016-02-04  9:35 UTC (permalink / raw)
  To: Ramana Radhakrishnan
  Cc: GCC Patches, Ramana Radhakrishnan, Richard Earnshaw, Jim Wilson


On 04/02/16 09:13, Ramana Radhakrishnan wrote:
> On Fri, Jan 22, 2016 at 9:52 AM, Kyrill Tkachov
> <kyrylo.tkachov@foss.arm.com> wrote:
>> Hi all,
>>
>> PR target/65932 is a wrong-code bug affecting arm and has manifested itself
>> when compiling the Linux kernel, so it's something that we really
>> ought to fix. The problem stems from the fact that PROMOTE_MODE and
>> TARGET_PROMOTE_FUNCTION_MODE don't match up on arm.
>> PROMOTE_MODE also marks the promotion as unsigned, whereas the
>> TARGET_PROMOTE_FUNCTION_MODE doesn't. This can lead to short variables
>> being wrongly zero-extended instead of sign-extended.  This also occurs
>> in PR target/67714.
>>
>> Jim Wilson tried a few approaches and from the discussion
>> on the PR and on the ML
>> (https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00814.html)
>> the preferred approach is to make PROMOTE_MODE and
>> TARGET_PROMOTE_FUNCTION_MODE
>> match up. Changing TARGET_PROMOTE_FUNCTION_MODE to zero-extend would be an
>> ABI
>> change so we don't want to do that.  Changing PROMOTE_MODE to not
>> zero-extend
>> fixes both PR 65932 and 67714.  So Jim's patch is the first patch in this
>> series.
>>
>> It has been posted at
>> https://gcc.gnu.org/ml/gcc-patches/2015-06/msg02132.html
>> and this series is based on top of the arm.h hunk of that patch.
>>
>> There have been some concerns about the codegen quality fallout from Jim's
>> patch, which is what the remaining patches in this series address:
>>
>> * 3 new arm testsuite failures: gcc.target/arm/wmul-[123].c.
>> wmul-1.c and wmul-2.c are cases where we no longer generate
>> sign-extend+multiply (and accumulate) instructions but instead generate
>> the normal full-width multiplies (the operands are sign-extended from
>> preceeding sign-extending loads).  This is a regression for some targets
>> on which the sign-extending form is faster. Patches 2 and 3 address this.
>> gcc.target/arm/wmul-3.c is a test where we actually end up generating
>> better code and so the testcase just needs to be adjusted.
>> Patch 4 deals with that.
>>
>> * Sign-extending rather than zero-extending short values means we make more
>> use of the ldrsb and ldrsh arm instructions rather than the zero-extending
>> ldrb and ldrh.  On Thumb1 targets ldrsb and ldrsh have more limited
>> addressing
>> modes (only REG + REG), which could hurt code size. However, the change also
>> means that we can now merge sequences of load-zero-extend followed by a
>> sign-extend
>> into a single load-sign-extend.
>> So we'd turn a (ldrh ; sxth) into an (ldrsh).
>>
>> I've built a few popular embedded benchmarks for a Cortex-M0 target (Thumb1)
>> and looked
>> at the generated assembly for -Os and -O2. I did see both effects. That is,
>> ldrh instructions with an immediate being turned into two instructions:
>>      ldrh    r4, [r4, #12]
>> ->
>>      movs    r5, #12
>>      ldrsh    r4, [r4, r5]
>>
>> But I also observed the beneficial effect. Various register allocation
>> perturbations also featured in the changes
>> Overall, for every codebase I've looked at both -Os and -O2 the new code was
>> slightly smaller. Probably not enough to call this an outright win, but
>> definitely not a regression overall.
>>
>> As for ARM and Thumb2 states, this series doesn't have a major impact.
>> SPEC2006 bencharks are slightly reduced in size (but nothing to write home
>> about) and there are no code quality regressions. And even the regressions
>> caught by the testsuite in the wmul-[12].c tests don't actually manifest
>> in practice in SPEC, but they are fixed anyway.
>>
>> The series contains one target-independent change to CSE in patch 3 that
>> I'll explain there.
>>
>> The series has been bootstrapped and tested on arm, aarch64 and x86_64.
>> Is this ok for trunk together with Jim's arm.h hunk from
>> g ?
>
> The whole series is OK provided the middle-end hunk has been approved.

Thanks.
Richi approved the midend hunk at:
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01733.html

I'll commit the arm.h hunk from Jim's patch as well when committing the series
so that we can keep the commits close to each other.

In the meantime, this bug (the kernel miscompile) appears on the GCC 5 branch as well.
I've bootstrapped and tested the patches on that branch.
Would it be ok to backport them there after some time on trunk?

Thanks,
Kyrill

> Thanks for working through this.
>
>
> Ramana
>
>> P.S. This also fixes PR middlen-end/67295 which is a testsuite regression on
>> arm.
>> Refer to the bugzilla entry there for the analysis of how this issue affects
>> that PR.
>>
>> Thanks,
>> Kyrill
>>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH][ARM][0/4] Fixing PR target/65932
  2016-02-04  9:35   ` Kyrill Tkachov
@ 2016-02-04 19:35     ` H.J. Lu
  2016-02-05  9:40       ` Kyrill Tkachov
  0 siblings, 1 reply; 6+ messages in thread
From: H.J. Lu @ 2016-02-04 19:35 UTC (permalink / raw)
  To: Kyrill Tkachov
  Cc: Ramana Radhakrishnan, GCC Patches, Ramana Radhakrishnan,
	Richard Earnshaw, Jim Wilson

On Thu, Feb 4, 2016 at 1:34 AM, Kyrill Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
>
> On 04/02/16 09:13, Ramana Radhakrishnan wrote:
>>
>> On Fri, Jan 22, 2016 at 9:52 AM, Kyrill Tkachov
>> <kyrylo.tkachov@foss.arm.com> wrote:
>>>
>>> Hi all,
>>>
>>> PR target/65932 is a wrong-code bug affecting arm and has manifested
>>> itself
>>> when compiling the Linux kernel, so it's something that we really
>>> ought to fix. The problem stems from the fact that PROMOTE_MODE and
>>> TARGET_PROMOTE_FUNCTION_MODE don't match up on arm.
>>> PROMOTE_MODE also marks the promotion as unsigned, whereas the
>>> TARGET_PROMOTE_FUNCTION_MODE doesn't. This can lead to short variables
>>> being wrongly zero-extended instead of sign-extended.  This also occurs
>>> in PR target/67714.
>>>
>>> Jim Wilson tried a few approaches and from the discussion
>>> on the PR and on the ML
>>> (https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00814.html)
>>> the preferred approach is to make PROMOTE_MODE and
>>> TARGET_PROMOTE_FUNCTION_MODE
>>> match up. Changing TARGET_PROMOTE_FUNCTION_MODE to zero-extend would be
>>> an
>>> ABI
>>> change so we don't want to do that.  Changing PROMOTE_MODE to not
>>> zero-extend
>>> fixes both PR 65932 and 67714.  So Jim's patch is the first patch in this
>>> series.
>>>
>>> It has been posted at
>>> https://gcc.gnu.org/ml/gcc-patches/2015-06/msg02132.html
>>> and this series is based on top of the arm.h hunk of that patch.
>>>
>>> There have been some concerns about the codegen quality fallout from
>>> Jim's
>>> patch, which is what the remaining patches in this series address:
>>>
>>> * 3 new arm testsuite failures: gcc.target/arm/wmul-[123].c.
>>> wmul-1.c and wmul-2.c are cases where we no longer generate
>>> sign-extend+multiply (and accumulate) instructions but instead generate
>>> the normal full-width multiplies (the operands are sign-extended from
>>> preceeding sign-extending loads).  This is a regression for some targets
>>> on which the sign-extending form is faster. Patches 2 and 3 address this.
>>> gcc.target/arm/wmul-3.c is a test where we actually end up generating
>>> better code and so the testcase just needs to be adjusted.
>>> Patch 4 deals with that.
>>>
>>> * Sign-extending rather than zero-extending short values means we make
>>> more
>>> use of the ldrsb and ldrsh arm instructions rather than the
>>> zero-extending
>>> ldrb and ldrh.  On Thumb1 targets ldrsb and ldrsh have more limited
>>> addressing
>>> modes (only REG + REG), which could hurt code size. However, the change
>>> also
>>> means that we can now merge sequences of load-zero-extend followed by a
>>> sign-extend
>>> into a single load-sign-extend.
>>> So we'd turn a (ldrh ; sxth) into an (ldrsh).
>>>
>>> I've built a few popular embedded benchmarks for a Cortex-M0 target
>>> (Thumb1)
>>> and looked
>>> at the generated assembly for -Os and -O2. I did see both effects. That
>>> is,
>>> ldrh instructions with an immediate being turned into two instructions:
>>>      ldrh    r4, [r4, #12]
>>> ->
>>>      movs    r5, #12
>>>      ldrsh    r4, [r4, r5]
>>>
>>> But I also observed the beneficial effect. Various register allocation
>>> perturbations also featured in the changes
>>> Overall, for every codebase I've looked at both -Os and -O2 the new code
>>> was
>>> slightly smaller. Probably not enough to call this an outright win, but
>>> definitely not a regression overall.
>>>
>>> As for ARM and Thumb2 states, this series doesn't have a major impact.
>>> SPEC2006 bencharks are slightly reduced in size (but nothing to write
>>> home
>>> about) and there are no code quality regressions. And even the
>>> regressions
>>> caught by the testsuite in the wmul-[12].c tests don't actually manifest
>>> in practice in SPEC, but they are fixed anyway.
>>>
>>> The series contains one target-independent change to CSE in patch 3 that
>>> I'll explain there.
>>>
>>> The series has been bootstrapped and tested on arm, aarch64 and x86_64.
>>> Is this ok for trunk together with Jim's arm.h hunk from
>>> g ?
>>
>>
>> The whole series is OK provided the middle-end hunk has been approved.
>
>
> Thanks.
> Richi approved the midend hunk at:
> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01733.html

This caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69676


-- 
H.J.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH][ARM][0/4] Fixing PR target/65932
  2016-02-04 19:35     ` H.J. Lu
@ 2016-02-05  9:40       ` Kyrill Tkachov
  0 siblings, 0 replies; 6+ messages in thread
From: Kyrill Tkachov @ 2016-02-05  9:40 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Ramana Radhakrishnan, GCC Patches, Ramana Radhakrishnan,
	Richard Earnshaw, Jim Wilson


On 04/02/16 19:35, H.J. Lu wrote:
> On Thu, Feb 4, 2016 at 1:34 AM, Kyrill Tkachov
> <kyrylo.tkachov@foss.arm.com> wrote:
>> On 04/02/16 09:13, Ramana Radhakrishnan wrote:
>>> On Fri, Jan 22, 2016 at 9:52 AM, Kyrill Tkachov
>>> <kyrylo.tkachov@foss.arm.com> wrote:
>>>> Hi all,
>>>>
>>>> PR target/65932 is a wrong-code bug affecting arm and has manifested
>>>> itself
>>>> when compiling the Linux kernel, so it's something that we really
>>>> ought to fix. The problem stems from the fact that PROMOTE_MODE and
>>>> TARGET_PROMOTE_FUNCTION_MODE don't match up on arm.
>>>> PROMOTE_MODE also marks the promotion as unsigned, whereas the
>>>> TARGET_PROMOTE_FUNCTION_MODE doesn't. This can lead to short variables
>>>> being wrongly zero-extended instead of sign-extended.  This also occurs
>>>> in PR target/67714.
>>>>
>>>> Jim Wilson tried a few approaches and from the discussion
>>>> on the PR and on the ML
>>>> (https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00814.html)
>>>> the preferred approach is to make PROMOTE_MODE and
>>>> TARGET_PROMOTE_FUNCTION_MODE
>>>> match up. Changing TARGET_PROMOTE_FUNCTION_MODE to zero-extend would be
>>>> an
>>>> ABI
>>>> change so we don't want to do that.  Changing PROMOTE_MODE to not
>>>> zero-extend
>>>> fixes both PR 65932 and 67714.  So Jim's patch is the first patch in this
>>>> series.
>>>>
>>>> It has been posted at
>>>> https://gcc.gnu.org/ml/gcc-patches/2015-06/msg02132.html
>>>> and this series is based on top of the arm.h hunk of that patch.
>>>>
>>>> There have been some concerns about the codegen quality fallout from
>>>> Jim's
>>>> patch, which is what the remaining patches in this series address:
>>>>
>>>> * 3 new arm testsuite failures: gcc.target/arm/wmul-[123].c.
>>>> wmul-1.c and wmul-2.c are cases where we no longer generate
>>>> sign-extend+multiply (and accumulate) instructions but instead generate
>>>> the normal full-width multiplies (the operands are sign-extended from
>>>> preceeding sign-extending loads).  This is a regression for some targets
>>>> on which the sign-extending form is faster. Patches 2 and 3 address this.
>>>> gcc.target/arm/wmul-3.c is a test where we actually end up generating
>>>> better code and so the testcase just needs to be adjusted.
>>>> Patch 4 deals with that.
>>>>
>>>> * Sign-extending rather than zero-extending short values means we make
>>>> more
>>>> use of the ldrsb and ldrsh arm instructions rather than the
>>>> zero-extending
>>>> ldrb and ldrh.  On Thumb1 targets ldrsb and ldrsh have more limited
>>>> addressing
>>>> modes (only REG + REG), which could hurt code size. However, the change
>>>> also
>>>> means that we can now merge sequences of load-zero-extend followed by a
>>>> sign-extend
>>>> into a single load-sign-extend.
>>>> So we'd turn a (ldrh ; sxth) into an (ldrsh).
>>>>
>>>> I've built a few popular embedded benchmarks for a Cortex-M0 target
>>>> (Thumb1)
>>>> and looked
>>>> at the generated assembly for -Os and -O2. I did see both effects. That
>>>> is,
>>>> ldrh instructions with an immediate being turned into two instructions:
>>>>       ldrh    r4, [r4, #12]
>>>> ->
>>>>       movs    r5, #12
>>>>       ldrsh    r4, [r4, r5]
>>>>
>>>> But I also observed the beneficial effect. Various register allocation
>>>> perturbations also featured in the changes
>>>> Overall, for every codebase I've looked at both -Os and -O2 the new code
>>>> was
>>>> slightly smaller. Probably not enough to call this an outright win, but
>>>> definitely not a regression overall.
>>>>
>>>> As for ARM and Thumb2 states, this series doesn't have a major impact.
>>>> SPEC2006 bencharks are slightly reduced in size (but nothing to write
>>>> home
>>>> about) and there are no code quality regressions. And even the
>>>> regressions
>>>> caught by the testsuite in the wmul-[12].c tests don't actually manifest
>>>> in practice in SPEC, but they are fixed anyway.
>>>>
>>>> The series contains one target-independent change to CSE in patch 3 that
>>>> I'll explain there.
>>>>
>>>> The series has been bootstrapped and tested on arm, aarch64 and x86_64.
>>>> Is this ok for trunk together with Jim's arm.h hunk from
>>>> g ?
>>>
>>> The whole series is OK provided the middle-end hunk has been approved.
>>
>> Thanks.
>> Richi approved the midend hunk at:
>> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01733.html
> This caused:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69676
>

Yeah, we already have https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69671.
Seems to be something with the x86 patterns. Kirill has assigned it to himself.

Thanks,
Kyrill


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-02-05  9:40 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-22  9:52 [PATCH][ARM][0/4] Fixing PR target/65932 Kyrill Tkachov
2016-02-01 11:11 ` Kyrill Tkachov
2016-02-04  9:14 ` Ramana Radhakrishnan
2016-02-04  9:35   ` Kyrill Tkachov
2016-02-04 19:35     ` H.J. Lu
2016-02-05  9:40       ` Kyrill Tkachov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).