public inbox for libc-ports@sourceware.org
 help / color / mirror / Atom feed
* [PATCH ppc] memset to 0 is broken on PPC405/440/464
@ 2012-09-21 17:34 Jason Gunthorpe
  2012-09-21 18:07 ` Ryan S. Arnold
  0 siblings, 1 reply; 10+ messages in thread
From: Jason Gunthorpe @ 2012-09-21 17:34 UTC (permalink / raw)
  To: libc-ports

The use_dcbz path in the hand coded assembly is assuming a 128 byte
clear size for dcbz, but dcbz uses the cache line size and 405 cores
only have a 32 byte cache line. So any clears to 0 that use the dcbz
path fail to work.

This seems to be because the memset.S file was designed for the 476
CPU, which does have a 128 byte cache line size, however 405, 440, and
464 all use 32 bytes. (see arch/powerpc/kernel/cputable.c)

This patch corrects the 405 memset.s to have the correct 32 byte cache
size, but the 476 must continue to use the old version or it will
clear too much. Unfortunately I'm not entirely sure how to do that (eg
how does Implies work?)

See http://sourceware.org/bugzilla/show_bug.cgi?id=14595

Tested in-circuit on a 405GP CPU

--- eglibc-2_13.orig/ports/sysdeps/powerpc/powerpc32/405/memset.S	2012-09-18 23:06:38.743817536 -0600
+++ eglibc-2_13/ports/sysdeps/powerpc/powerpc32/405/memset.S	2012-09-18 23:09:48.677194920 -0600
@@ -105,7 +105,7 @@
        add     r3,r3,r7
 
 L(skip_string_loop):
-       clrlwi  r8,r6,25
+       clrlwi  r8,r6,27
        srwi.   r8,r8,4
        beq     L(dcbz_pre_loop)
        mtctr   r8
@@ -120,14 +120,14 @@
        bdnz    L(word_loop)
 
 L(dcbz_pre_loop):
-       srwi    r6,r5,7
+       srwi    r6,r5,5
        mtctr   r6
        addi    r7,0,0
 
 L(dcbz_loop):
        dcbz    r3,r7
-       addi    r3,r3,0x80
-       subi    r5,r5,0x80
+       addi    r3,r3,32
+       subi    r5,r5,32
        bdnz    L(dcbz_loop)
        srwi.   r6,r5,4
        beq     L(postword2_count_loop)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH ppc] memset to 0 is broken on PPC405/440/464
  2012-09-21 17:34 [PATCH ppc] memset to 0 is broken on PPC405/440/464 Jason Gunthorpe
@ 2012-09-21 18:07 ` Ryan S. Arnold
  2012-09-21 18:16   ` Jason Gunthorpe
  0 siblings, 1 reply; 10+ messages in thread
From: Ryan S. Arnold @ 2012-09-21 18:07 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: libc-ports

On Fri, Sep 21, 2012 at 12:34 PM, Jason Gunthorpe
<jgunthorpe@obsidianresearch.com> wrote:
> The use_dcbz path in the hand coded assembly is assuming a 128 byte
> clear size for dcbz, but dcbz uses the cache line size and 405 cores
> only have a 32 byte cache line. So any clears to 0 that use the dcbz
> path fail to work.
>
> This seems to be because the memset.S file was designed for the 476
> CPU, which does have a 128 byte cache line size, however 405, 440, and
> 464 all use 32 bytes. (see arch/powerpc/kernel/cputable.c)
>
> This patch corrects the 405 memset.s to have the correct 32 byte cache
> size, but the 476 must continue to use the old version or it will
> clear too much. Unfortunately I'm not entirely sure how to do that (eg
> how does Implies work?)
>
> See http://sourceware.org/bugzilla/show_bug.cgi?id=14595
>
> Tested in-circuit on a 405GP CPU

Hi Jason,

Thanks for the patch.  The quick and dirty way to fix this for both
architectures is to copy the original file to
<...>/powerpc32/476/memset.S and then provide your version as you've
shown.  The result of this is that other 4** series processors
(besides 476) will end up using the 32-byte cache-line size code.

The method used to eliminate redundant code, is to do what is found in
the non-ports 'a2' version of memcpy.S
(sysdeps/powerpc/powerpc32/a2/memcpy.S).  It attempts to detect
whether the cacheline size has been set, if not (as in the case of the
code being used in the loader prior to reading the cacheline size from
the aux vector), then it uses a code path that doesn't use the
cache-block stream operations.  If it does detect the cacheline size,
it is queried, and then used for the cache-block stream operations in
an optimized path.

Ryan S. Arnold
Power Architecture maintainer

>
> --- eglibc-2_13.orig/ports/sysdeps/powerpc/powerpc32/405/memset.S       2012-09-18 23:06:38.743817536 -0600
> +++ eglibc-2_13/ports/sysdeps/powerpc/powerpc32/405/memset.S    2012-09-18 23:09:48.677194920 -0600
> @@ -105,7 +105,7 @@
>         add     r3,r3,r7
>
>  L(skip_string_loop):
> -       clrlwi  r8,r6,25
> +       clrlwi  r8,r6,27
>         srwi.   r8,r8,4
>         beq     L(dcbz_pre_loop)
>         mtctr   r8
> @@ -120,14 +120,14 @@
>         bdnz    L(word_loop)
>
>  L(dcbz_pre_loop):
> -       srwi    r6,r5,7
> +       srwi    r6,r5,5
>         mtctr   r6
>         addi    r7,0,0
>
>  L(dcbz_loop):
>         dcbz    r3,r7
> -       addi    r3,r3,0x80
> -       subi    r5,r5,0x80
> +       addi    r3,r3,32
> +       subi    r5,r5,32
>         bdnz    L(dcbz_loop)
>         srwi.   r6,r5,4
>         beq     L(postword2_count_loop)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH ppc] memset to 0 is broken on PPC405/440/464
  2012-09-21 18:07 ` Ryan S. Arnold
@ 2012-09-21 18:16   ` Jason Gunthorpe
  2012-09-21 18:43     ` Ryan S. Arnold
  0 siblings, 1 reply; 10+ messages in thread
From: Jason Gunthorpe @ 2012-09-21 18:16 UTC (permalink / raw)
  To: Ryan S. Arnold; +Cc: libc-ports

On Fri, Sep 21, 2012 at 01:07:01PM -0500, Ryan S. Arnold wrote:

> Thanks for the patch.  The quick and dirty way to fix this for both
> architectures is to copy the original file to
> <...>/powerpc32/476/memset.S and then provide your version as you've
> shown.  The result of this is that other 4** series processors
> (besides 476) will end up using the 32-byte cache-line size code.

Do you want me to send you a diff like that?
 
> The method used to eliminate redundant code, is to do what is found in
> the non-ports 'a2' version of memcpy.S
> (sysdeps/powerpc/powerpc32/a2/memcpy.S).  It attempts to detect

Yes, I saw that, though I admit I'm unclear about the purpose of the
405 specific code when the generic code already seems to be fairly
optimized?

Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH ppc] memset to 0 is broken on PPC405/440/464
  2012-09-21 18:16   ` Jason Gunthorpe
@ 2012-09-21 18:43     ` Ryan S. Arnold
  2012-09-21 20:28       ` Jason Gunthorpe
  0 siblings, 1 reply; 10+ messages in thread
From: Ryan S. Arnold @ 2012-09-21 18:43 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: libc-ports

On Fri, Sep 21, 2012 at 1:15 PM, Jason Gunthorpe
<jgunthorpe@obsidianresearch.com> wrote:
> On Fri, Sep 21, 2012 at 01:07:01PM -0500, Ryan S. Arnold wrote:
>
>> Thanks for the patch.  The quick and dirty way to fix this for both
>> architectures is to copy the original file to
>> <...>/powerpc32/476/memset.S and then provide your version as you've
>> shown.  The result of this is that other 4** series processors
>> (besides 476) will end up using the 32-byte cache-line size code.
>
> Do you want me to send you a diff like that?

In general, any patch to fix this should make sure that functionality
doesn't regress for the platforms where it currently works.

If you have FSF copyright assignment on file, please do submit the full patch.

If you don't have FSF copyright assignment then I can do the copy to
ports/sysdeps/powerpc/powerpc32/476/ for you and apply your fix on top
of it.

>> The method used to eliminate redundant code, is to do what is found in
>> the non-ports 'a2' version of memcpy.S
>> (sysdeps/powerpc/powerpc32/a2/memcpy.S).  It attempts to detect
>
> Yes, I saw that, though I admit I'm unclear about the purpose of the
> 405 specific code when the generic code already seems to be fairly
> optimized?

The optimization was originally developed for the 476 processor and we
were asked to make it available to other 4xx processors, so we put it
in the 405 directory.   It wasn't obvious that there was an issue
until some time last year, but no-one had taken the time (until you)
to find the reason.  Having no earlier 4xx hardware, we weren't able
to test the performance differences of the code.

Ryan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH ppc] memset to 0 is broken on PPC405/440/464
  2012-09-21 18:43     ` Ryan S. Arnold
@ 2012-09-21 20:28       ` Jason Gunthorpe
  2012-09-21 21:44         ` Ryan S. Arnold
  0 siblings, 1 reply; 10+ messages in thread
From: Jason Gunthorpe @ 2012-09-21 20:28 UTC (permalink / raw)
  To: Ryan S. Arnold; +Cc: libc-ports

On Fri, Sep 21, 2012 at 01:43:10PM -0500, Ryan S. Arnold wrote:

> In general, any patch to fix this should make sure that functionality
> doesn't regress for the platforms where it currently works.
> 
> If you have FSF copyright assignment on file, please do submit the full patch.
> 
> If you don't have FSF copyright assignment then I can do the copy to
> ports/sysdeps/powerpc/powerpc32/476/ for you and apply your fix on top
> of it.

This would be best then, I don't have a FSF copyright assignment.

Please do check that my changes are sane, I have tested them but my
PPC assembly isn't all that great these days ;)

Thanks!
Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH ppc] memset to 0 is broken on PPC405/440/464
  2012-09-21 20:28       ` Jason Gunthorpe
@ 2012-09-21 21:44         ` Ryan S. Arnold
  2012-09-24 20:57           ` Ryan S. Arnold
  0 siblings, 1 reply; 10+ messages in thread
From: Ryan S. Arnold @ 2012-09-21 21:44 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: libc-ports

On Fri, Sep 21, 2012 at 3:27 PM, Jason Gunthorpe
<jgunthorpe@obsidianresearch.com> wrote:
> On Fri, Sep 21, 2012 at 01:43:10PM -0500, Ryan S. Arnold wrote:
>
>> In general, any patch to fix this should make sure that functionality
>> doesn't regress for the platforms where it currently works.
>>
>> If you have FSF copyright assignment on file, please do submit the full patch.
>>
>> If you don't have FSF copyright assignment then I can do the copy to
>> ports/sysdeps/powerpc/powerpc32/476/ for you and apply your fix on top
>> of it.
>
> This would be best then, I don't have a FSF copyright assignment.
>
> Please do check that my changes are sane, I have tested them but my
> PPC assembly isn't all that great these days ;)

I don't know this code very well, but I think your changes are
correct.  I'll review more in-depth on Monday and check-in a fix.

Ryan S. Arnold

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH ppc] memset to 0 is broken on PPC405/440/464
  2012-09-21 21:44         ` Ryan S. Arnold
@ 2012-09-24 20:57           ` Ryan S. Arnold
  2012-09-25 16:19             ` Ryan S. Arnold
  0 siblings, 1 reply; 10+ messages in thread
From: Ryan S. Arnold @ 2012-09-24 20:57 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: libc-ports

On Fri, Sep 21, 2012 at 4:44 PM, Ryan S. Arnold <ryan.arnold@gmail.com> wrote:
> On Fri, Sep 21, 2012 at 3:27 PM, Jason Gunthorpe
> <jgunthorpe@obsidianresearch.com> wrote:
>> On Fri, Sep 21, 2012 at 01:43:10PM -0500, Ryan S. Arnold wrote:
>>
>>> In general, any patch to fix this should make sure that functionality
>>> doesn't regress for the platforms where it currently works.
>>>
>>> If you have FSF copyright assignment on file, please do submit the full patch.
>>>
>>> If you don't have FSF copyright assignment then I can do the copy to
>>> ports/sysdeps/powerpc/powerpc32/476/ for you and apply your fix on top
>>> of it.
>>
>> This would be best then, I don't have a FSF copyright assignment.
>>
>> Please do check that my changes are sane, I have tested them but my
>> PPC assembly isn't all that great these days ;)

I've checked your changes and I think they're correct.

I'll create a branch for you to test and then check it in once you've
verified that it builds/runs correctly.

Thanks,

Ryan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH ppc] memset to 0 is broken on PPC405/440/464
  2012-09-24 20:57           ` Ryan S. Arnold
@ 2012-09-25 16:19             ` Ryan S. Arnold
  2012-10-01  5:28               ` Jason Gunthorpe
  0 siblings, 1 reply; 10+ messages in thread
From: Ryan S. Arnold @ 2012-09-25 16:19 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: libc-ports

On Mon, Sep 24, 2012 at 3:57 PM, Ryan S. Arnold <ryan.arnold@gmail.com> wrote:
> I've checked your changes and I think they're correct.
>
> I'll create a branch for you to test and then check it in once you've
> verified that it builds/runs correctly.

Jason,

Please check out and test my rsa/405memset branch for correctness on 405.

git clone git:://sourceware.org/git/glibc.git
cd glibc
git checkout -b local_405memset origin/rsa/405memset

You'll, of course, have to configure glibc with the correct --with-cpu
setting for your 405 cpu.

Ryan S. Arnold

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH ppc] memset to 0 is broken on PPC405/440/464
  2012-09-25 16:19             ` Ryan S. Arnold
@ 2012-10-01  5:28               ` Jason Gunthorpe
  2012-10-01 17:07                 ` Jason Gunthorpe
  0 siblings, 1 reply; 10+ messages in thread
From: Jason Gunthorpe @ 2012-10-01  5:28 UTC (permalink / raw)
  To: Ryan S. Arnold; +Cc: libc-ports

On Tue, Sep 25, 2012 at 11:19:24AM -0500, Ryan S. Arnold wrote:
> Please check out and test my rsa/405memset branch for correctness on 405.

I'm going to say it looks OK, provisionally. The memset seems to be
working, but 2.19 is not running correctly for me (we are using 2.13
presently). I'm not sure if I will have time to investigate this or
not.

Symptom: All exec's segv immediately with no corefile, running
ld.so.1 directly works, running apps through ld.so.1 works:

$ /lib/ld.so.1 /app/bin/strace /app/bin/strace
execve("/app/bin/strace", ["/app/bin/strace"], [/* 3 vars */]) = -1
ENOMEM (Cannot allocate memory)
--- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=0} ---
+++ killed by SIGSEGV +++

Special :)

Nothing stood out to me in the kernel, or in the elf construction as
an obvious 'duh' that would trigger only when loading ld.so through
the interpreter path, I'll let you know if I decide to track it down.

Regards,
Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH ppc] memset to 0 is broken on PPC405/440/464
  2012-10-01  5:28               ` Jason Gunthorpe
@ 2012-10-01 17:07                 ` Jason Gunthorpe
  0 siblings, 0 replies; 10+ messages in thread
From: Jason Gunthorpe @ 2012-10-01 17:07 UTC (permalink / raw)
  To: Ryan S. Arnold; +Cc: libc-ports

On Sun, Sep 30, 2012 at 11:28:17PM -0600, Jason Gunthorpe wrote:
> On Tue, Sep 25, 2012 at 11:19:24AM -0500, Ryan S. Arnold wrote:
> > Please check out and test my rsa/405memset branch for correctness on 405.

> Symptom: All exec's segv immediately with no corefile, running
> ld.so.1 directly works, running apps through ld.so.1 works:

Never mind about this (my own fault), 2.19 works fine, memset is
working, I see no problems.

Thanks!
Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-10-01 17:07 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-21 17:34 [PATCH ppc] memset to 0 is broken on PPC405/440/464 Jason Gunthorpe
2012-09-21 18:07 ` Ryan S. Arnold
2012-09-21 18:16   ` Jason Gunthorpe
2012-09-21 18:43     ` Ryan S. Arnold
2012-09-21 20:28       ` Jason Gunthorpe
2012-09-21 21:44         ` Ryan S. Arnold
2012-09-24 20:57           ` Ryan S. Arnold
2012-09-25 16:19             ` Ryan S. Arnold
2012-10-01  5:28               ` Jason Gunthorpe
2012-10-01 17:07                 ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).