GCC4.3.4 downside against GCC3.4.4 on mips?

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* GCC4.3.4 downside against GCC3.4.4 on mips?
@ 2010-05-25  8:53 Amker.Cheng
  2010-05-25 13:02 ` Václav Haisman
  0 siblings, 1 reply; 6+ messages in thread
From: Amker.Cheng @ 2010-05-25  8:53 UTC (permalink / raw)
  To: gcc

Hi all,
  I compared assembly files of a function compiled by GCC4.3.4 and GCC3.4.4.
The function focuses on array computation and has no branch, or any
loop structure,
The command line is like "-march=mips32r2 -O3", and here is the
instruction statics:

            total    : 1879 : 1534
             addiu  :    6   :    6
             addu  :  216  :  129
              jr       :    1   :    1
             lui      :    5    :    5
              lw     :  396  :  353
            madd  :   41   :    0
            mfhi    :   80   :   80
            mflo    :  121  :   86
            move  :    0    :   21
            mtlo   :   39   :    0
             mul   :   85   :    0
            mult   :   18   :   80
           multu  :   64   :    0
              or    :   80   :   80
             sll     :   80  :   80
             sra   :   79   :   47
             srl    :   80   :   80
            subu  :   80   :   80
              sw   :  408  :  406

Considering there is no any branch or loop structure ,It seems result
of GCC3.4.4
is much better, since generating much less instructions.

secondly, GCC4.3.4 does consume less stack slots(1224 bytes against 1408).

So, any comments? Thanks in advance.
-- 
Best Regards.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: GCC4.3.4 downside against GCC3.4.4 on mips?
  2010-05-25  8:53 GCC4.3.4 downside against GCC3.4.4 on mips? Amker.Cheng
@ 2010-05-25 13:02 ` Václav Haisman
  2010-05-27 11:37   ` Amker.Cheng
  0 siblings, 1 reply; 6+ messages in thread
From: Václav Haisman @ 2010-05-25 13:02 UTC (permalink / raw)
  To: Amker.Cheng; +Cc: gcc


On Tue, 25 May 2010 16:28:37 +0800, "Amker.Cheng" wrote:
> Hi all,
>   I compared assembly files of a function compiled by GCC4.3.4 and
>   GCC3.4.4.
> The function focuses on array computation and has no branch, or any
> loop structure,
> The command line is like "-march=mips32r2 -O3", and here is the
> instruction statics:
> 
>             total    : 1879 : 1534
>              addiu  :    6   :    6
>              addu  :  216  :  129
>               jr       :    1   :    1
>              lui      :    5    :    5
>               lw     :  396  :  353
>             madd  :   41   :    0
>             mfhi    :   80   :   80
>             mflo    :  121  :   86
>             move  :    0    :   21
>             mtlo   :   39   :    0
>              mul   :   85   :    0
>             mult   :   18   :   80
>            multu  :   64   :    0
>               or    :   80   :   80
>              sll     :   80  :   80
>              sra   :   79   :   47
>              srl    :   80   :   80
>             subu  :   80   :   80
>               sw   :  408  :  406
> 
> Considering there is no any branch or loop structure ,It seems result
> of GCC3.4.4
> is much better, since generating much less instructions.
> 
> secondly, GCC4.3.4 does consume less stack slots(1224 bytes against
1408).
> 
> So, any comments? Thanks in advance.
Posting some random numbers without a test-case and precise command line
parameters for both compilers makes the numbers useless, IMHO. You also
only mention instruction counts. Have you actually benchmarked the
resulting code? CPUs are complicated and what you might perceive as worse
code might actually be superior thanks to scheduling and internal CPU
parallelism etc.

--
VH

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: GCC4.3.4 downside against GCC3.4.4 on mips?
  2010-05-25 13:02 ` Václav Haisman
@ 2010-05-27 11:37   ` Amker.Cheng
  2010-05-27 11:44     ` Paolo Bonzini
  0 siblings, 1 reply; 6+ messages in thread
From: Amker.Cheng @ 2010-05-27 11:37 UTC (permalink / raw)
  To: gcc; +Cc: Václav Haisman, Martin Guy, Andrew Haley

> Posting some random numbers without a test-case and precise command line
> parameters for both compilers makes the numbers useless, IMHO. You also
> only mention instruction counts. Have you actually benchmarked the
> resulting code? CPUs are complicated and what you might perceive as worse
> code might actually be superior thanks to scheduling and internal CPU
> parallelism etc.

Thanks for reminding.
After some investigation, I could demonstrate the issue by following
piece of code:
-------------------------------------begin here-------------------
extern int *p[5];

# define REAL_RADIX_2            24
# define REAL_MUL_2(x, y)        (((long long)(x) * (long long)(y)) >>
REAL_RADIX_2)


void func(int *b1, int *b2)
{
  int c0 = p[3][0];
  int c1 = p[3][1];

  b2[0x18] = b1[0x18] + b1[0x1B];
  b2[0x1B] = REAL_MUL_2((b1[0x18] - b1[0x1B]) , c0);

  b2[0x19] = b1[0x19] + b1[0x1A];
  b2[0x1A] = REAL_MUL_2((b1[0x19] - b1[0x1A]) , c1);

  b2[0x1C] = b1[0x1C] + b1[0x1F];
  b2[0x1F] = REAL_MUL_2((b1[0x1F] - b1[0x1C]) , c0);

  b2[0x1D] = b1[0x1D] + b1[0x1E];
  b2[0x1E] = REAL_MUL_2((b1[0x1E] - b1[0x1D]) , c1);
}
-------------------------------------cut here-------------------

It seems GCC4.3.4 always expands the long long multiplication into
three long multiplications, like
-------------------------------------begin here-------------------
#  b2[0x1A] = REAL_MUL_2((b1[0x19] - b1[0x1A]) , c1);

	lw	$6,104($4)
	lw	$2,100($4)
	subu	$2,$2,$6
	mult	$11,$2
	sra	$6,$2,31
	madd	$6,$9
	mflo	$6
	multu	$2,$9
	mfhi	$3
	addu	$3,$6,$3
	sll	$6,$3,8
	mflo	$2
	srl	$7,$2,24
	or	$7,$6,$7
	sw	$7,104($5)
-------------------------------------cut here-------------------

while GCC3.4.4 treats the long long multiplication just like simple
ones, which generates only one
mult insn for each statement, like
-------------------------------------begin here-------------------
#  b2[0x1A] = REAL_MUL_2((b1[0x19] - b1[0x1A]) , c1);

	lw	$2,100($4)
	lw	$7,104($4)
	subu	$3,$2,$7
	mult	$3,$9
	mflo	$6
	mfhi	$25
	srl	$15,$6,24
	sll	$24,$25,8
	or	$14,$15,$24
	sw	$14,104($5)
-------------------------------------cut here-------------------

In my understanding, It‘s not necessary using three mult insn to implement
long long mult, since the operands are converted from int type.

And as before, the compiling options are like "-march=mips32r2  -O3"

Thanks.

-- 
Best Regards.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: GCC4.3.4 downside against GCC3.4.4 on mips?
  2010-05-27 11:37   ` Amker.Cheng
@ 2010-05-27 11:44     ` Paolo Bonzini
  2010-05-27 14:37       ` Richard Guenther
  0 siblings, 1 reply; 6+ messages in thread
From: Paolo Bonzini @ 2010-05-27 11:44 UTC (permalink / raw)
  To: Amker.Cheng; +Cc: gcc, Václav Haisman, Martin Guy, Andrew Haley

On 05/27/2010 12:33 PM, Amker.Cheng wrote:
> while GCC3.4.4 treats the long long multiplication just like simple
> ones, which generates only one
> mult insn for each statement, like
>
> In my understanding, ItÂ‘s not necessary using three mult insn to implement
> long long mult, since the operands are converted from int type.

This is more helpful.  It is a known case in which GCC 4.x generates 
worse code.

Paolo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: GCC4.3.4 downside against GCC3.4.4 on mips?
  2010-05-27 11:44     ` Paolo Bonzini
@ 2010-05-27 14:37       ` Richard Guenther
  2010-07-12  1:32         ` Amker.Cheng
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Guenther @ 2010-05-27 14:37 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Amker.Cheng, gcc, Václav Haisman, Martin Guy, Andrew Haley

On Thu, May 27, 2010 at 1:37 PM, Paolo Bonzini <bonzini@gnu.org> wrote:
> On 05/27/2010 12:33 PM, Amker.Cheng wrote:
>>
>> while GCC3.4.4 treats the long long multiplication just like simple
>> ones, which generates only one
>> mult insn for each statement, like
>>
>> In my understanding, It‘s not necessary using three mult insn to implement
>> long long mult, since the operands are converted from int type.
>
> This is more helpful.  It is a known case in which GCC 4.x generates worse
> code.

Should be fixed with 4.6.

Richard.

> Paolo
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: GCC4.3.4 downside against GCC3.4.4 on mips?
  2010-05-27 14:37       ` Richard Guenther
@ 2010-07-12  1:32         ` Amker.Cheng
  0 siblings, 0 replies; 6+ messages in thread
From: Amker.Cheng @ 2010-07-12  1:32 UTC (permalink / raw)
  To: gcc; +Cc: Richard Guenther

>>>
>>> while GCC3.4.4 treats the long long multiplication just like simple
>>> ones, which generates only one
>>> mult insn for each statement, like
>>>
>>> In my understanding, It‘s not necessary using three mult insn to implement
>>> long long mult, since the operands are converted from int type.
>>
>> This is more helpful.  It is a known case in which GCC 4.x generates worse
>> code.
>
> Should be fixed with 4.6.

Hi, I tested this problem on GCC4.6 snapshot, and it works.
But I could not find the specific patch or record in buglist,
could you help? thanks very much.

-- 
Best Regards.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-07-12  1:32 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-05-25  8:53 GCC4.3.4 downside against GCC3.4.4 on mips? Amker.Cheng
2010-05-25 13:02 ` Václav Haisman
2010-05-27 11:37   ` Amker.Cheng
2010-05-27 11:44     ` Paolo Bonzini
2010-05-27 14:37       ` Richard Guenther
2010-07-12  1:32         ` Amker.Cheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).