public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Optimizations on long long multiply/divide on PowerPC32 don't work
@ 2001-12-07 16:08 Corey Minyard
  2001-12-07 17:08 ` Richard Henderson
  2001-12-10  9:09 ` Optimizations on long long multiply/divide on PowerPC32 don't work Christoph Hellwig
  0 siblings, 2 replies; 27+ messages in thread
From: Corey Minyard @ 2001-12-07 16:08 UTC (permalink / raw)
  To: gcc

In yet another problem compiling the Linux kernel on PowerPC (32-bit) 
with the CVS tree, I'm running into a problem compiling something; the 
Linux kernel is expecting division of a long long by a constant value to 
be optimized and not call __divdi3, but the GCC compiler is not doing 
this.  The following program:

  long long t1(long long v)
  {
      return v / 512;
  }

will call __divdi3, even though the operation could be done much faster 
with shifts (and other stuff for handling negatives).  It turns out that 
the code to call __divdi3 is being emitted on the conversion to rtl, and 
the optimizations do not see this function call and handle it as a 
division, they just see it as a function call.  So no optimizations at 
all will be done on 64-bit multiplies and divides.  I consider this to 
be sub-optimal :-).

I can see four options to solve this problem:

  1) Add __divdi3 to the linux kernel.  I don't really think this is a 
good idea, and it shouldn't be required.
  2) Move the conversion of the division to the function call to the 
very last stages of the compiler.  IMHO, this is probably the best 
option, but it's a big job to implement, I think.
  3) Make the optimizations understand the function calls.  I don't even 
want to think about this one.
  4) Modify the tree conversions to do the optimizations there.  I have 
a patch that does this (and passes all regressions), because it was 
easy, but I consider it less optimal than option 2.

Any opinions on this?  I'll post my patch if the maintainers think 
option 4 is reasonable.

Thanks,

-Corey

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Optimizations on long long multiply/divide on PowerPC32 don't work
  2001-12-07 16:08 Optimizations on long long multiply/divide on PowerPC32 don't work Corey Minyard
@ 2001-12-07 17:08 ` Richard Henderson
  2001-12-07 17:40   ` Corey Minyard
  2001-12-09 23:29   ` Linus Torvalds
  2001-12-10  9:09 ` Optimizations on long long multiply/divide on PowerPC32 don't work Christoph Hellwig
  1 sibling, 2 replies; 27+ messages in thread
From: Richard Henderson @ 2001-12-07 17:08 UTC (permalink / raw)
  To: Corey Minyard; +Cc: gcc

On Fri, Dec 07, 2001 at 06:09:55PM -0600, Corey Minyard wrote:
>   1) Add __divdi3 to the linux kernel.  I don't really think this is a 
> good idea, and it shouldn't be required.

Yes it should.  I've long considered it a bug that Linux
didn't link against libgcc.


r~

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Optimizations on long long multiply/divide on PowerPC32 don't work
  2001-12-07 17:08 ` Richard Henderson
@ 2001-12-07 17:40   ` Corey Minyard
  2001-12-09 23:29   ` Linus Torvalds
  1 sibling, 0 replies; 27+ messages in thread
From: Corey Minyard @ 2001-12-07 17:40 UTC (permalink / raw)
  To: Richard Henderson; +Cc: gcc

Richard Henderson wrote:

>On Fri, Dec 07, 2001 at 06:09:55PM -0600, Corey Minyard wrote:
>
>>  1) Add __divdi3 to the linux kernel.  I don't really think this is a 
>>good idea, and it shouldn't be required.
>>
>
>Yes it should.  I've long considered it a bug that Linux
>didn't link against libgcc.
>
>
>r~
>
Although I can't argue with that (because I don't know the issues), I 
still think this is still an optimization that should be done.

-Corey

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Optimizations on long long multiply/divide on PowerPC32 don't work
  2001-12-07 17:08 ` Richard Henderson
  2001-12-07 17:40   ` Corey Minyard
@ 2001-12-09 23:29   ` Linus Torvalds
  2001-12-10  1:15     ` Richard Henderson
  2001-12-10  9:09     ` Paul Koning
  1 sibling, 2 replies; 27+ messages in thread
From: Linus Torvalds @ 2001-12-09 23:29 UTC (permalink / raw)
  To: rth, gcc

In article <20011207164904.C16375@redhat.com> you write:
>On Fri, Dec 07, 2001 at 06:09:55PM -0600, Corey Minyard wrote:
>>   1) Add __divdi3 to the linux kernel.  I don't really think this is a 
>> good idea, and it shouldn't be required.
>
>Yes it should.  I've long considered it a bug that Linux
>didn't link against libgcc.

I'm sorry you feel that way, but what the gcc team has done to libgcc
has only made me _more_ convinced that not linking against that
steenking heap of *** is a really good idea.. 

Not linking against libgcc has found several problems in gcc.  Ranging
from missing totally obvious optimizations (and yes, I consider that a
_bug_, even though I know that some gcc people think that performance is
secondary), to horrible mis-features with exception handling. 

For example, it was the absense of libgcc that made us aware that the
kernel had to use magic (and largely undocumented) gcc command line
flags to make sure that gcc didn't try to insert its totally broken
exception handling code. 

Similarly, it is the lack of libgcc that was really helpful in pointing
out code where some people started using 64-bit arithmetic without
realizing just how slow it would be.  In fact, it was this very missing
__divdi3 thing that showed that you shouldn't try to do 64-bit divides,
when you can, by thinking about the problem for five minutes, do it
about a hundred times faster by just keeping a 32-bit index and offset
pair. 

Think about it this way: "libgcc" is the place where code goes to die. 
It is, by _design_, the place where gcc developers put the code that is
so cr*p that it cannot really be inlined.  Avoiding it is a _good_
thing, as it forces the kernel to step carefully around issues where gcc
has problems. 

The fact that gcc cannot do a 64-bit divide by a constant is a gcc
deficiency.  But it is NOT cause for including crap in the kernel. 

			Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Optimizations on long long multiply/divide on PowerPC32 don't work
  2001-12-09 23:29   ` Linus Torvalds
@ 2001-12-10  1:15     ` Richard Henderson
  2001-12-10  1:24       ` Richard Henderson
  2001-12-10  9:15       ` Linus Torvalds
  2001-12-10  9:09     ` Paul Koning
  1 sibling, 2 replies; 27+ messages in thread
From: Richard Henderson @ 2001-12-10  1:15 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: gcc

On Sun, Dec 09, 2001 at 10:58:14PM -0800, Linus Torvalds wrote:
> I'm sorry you feel that way, but what the gcc team has done to libgcc
> has only made me _more_ convinced that not linking against that
> steenking heap of *** is a really good idea.. 

Ok, I'll bite -- what have we done to libgcc?

> The fact that gcc cannot do a 64-bit divide by a constant is a gcc
> deficiency.  But it is NOT cause for including crap in the kernel. 

Hogwash.  Look at the example again.  Signed division is not merely
a shift.  It's more like

	long long t1(long long v)
	{
	  return (v + ((v < 0) ? 0x1ffLL : 0LL)) >> 9;
	}

GCC knows how to do this.  It will do this if the target supports
all of these operations on 64 bit data types and it considers them
to be cheap enough.


r~

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Optimizations on long long multiply/divide on PowerPC32 don't work
  2001-12-10  1:15     ` Richard Henderson
@ 2001-12-10  1:24       ` Richard Henderson
  2001-12-10  2:40         ` Franz Sirl
  2002-01-22  8:51         ` Franz Sirl
  2001-12-10  9:15       ` Linus Torvalds
  1 sibling, 2 replies; 27+ messages in thread
From: Richard Henderson @ 2001-12-10  1:24 UTC (permalink / raw)
  To: Linus Torvalds, gcc

On Mon, Dec 10, 2001 at 12:54:44AM -0800, Richard Henderson wrote:
> GCC knows how to do this.  It will do this if the target supports
> all of these operations on 64 bit data types and it considers them
> to be cheap enough.

How irritating.  I do not like the taste of crow.

While the above is true, powerpc apparently has a rather cheap 32-bit
divide instruction which throws off the heuristics in this case.


r~



Index: expmed.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/expmed.c,v
retrieving revision 1.97
diff -c -p -d -r1.97 expmed.c
*** expmed.c	2001/11/20 04:12:11	1.97
--- expmed.c	2001/12/10 09:07:20
*************** expand_divmod (rem_flag, code, mode, op0
*** 3271,3277 ****
  		      goto fail1;
  		  }
  		else if (EXACT_POWER_OF_2_OR_ZERO_P (d)
! 			 && (rem_flag ? smod_pow2_cheap : sdiv_pow2_cheap))
  		  ;
  		else if (EXACT_POWER_OF_2_OR_ZERO_P (abs_d))
  		  {
--- 3271,3286 ----
  		      goto fail1;
  		  }
  		else if (EXACT_POWER_OF_2_OR_ZERO_P (d)
! 			 && (rem_flag ? smod_pow2_cheap : sdiv_pow2_cheap)
! 			 /* ??? The cheap metric is computed only for
! 			    word_mode.  If this operation is wider, this may
! 			    not be so.  Assume true if the optab has an
! 			    expander for this mode.  */
! 			 && (((rem_flag ? smod_optab : sdiv_optab)
! 			      ->handlers[(int) compute_mode].insn_code
! 			      != CODE_FOR_nothing)
! 			     || (sdivmod_optab->handlers[(int) compute_mode]
! 				 .insn_code != CODE_FOR_nothing)))
  		  ;
  		else if (EXACT_POWER_OF_2_OR_ZERO_P (abs_d))
  		  {

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Optimizations on long long multiply/divide on PowerPC32 don't work
  2001-12-10  1:24       ` Richard Henderson
@ 2001-12-10  2:40         ` Franz Sirl
  2001-12-10  9:21           ` Linus Torvalds
  2002-01-22  8:51         ` Franz Sirl
  1 sibling, 1 reply; 27+ messages in thread
From: Franz Sirl @ 2001-12-10  2:40 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Linus Torvalds, gcc

At 10:14 10.12.2001, Richard Henderson wrote:
>On Mon, Dec 10, 2001 at 12:54:44AM -0800, Richard Henderson wrote:
> > GCC knows how to do this.  It will do this if the target supports
> > all of these operations on 64 bit data types and it considers them
> > to be cheap enough.
>
>How irritating.  I do not like the taste of crow.
>
>While the above is true, powerpc apparently has a rather cheap 32-bit
>divide instruction which throws off the heuristics in this case.

Ah, now that is nice, I always wondered which codepath produced the 
signeddivide64-by-exactlog2constant to shift conversion on x86, but 
couldn't find anything in i386.md :-(. I'll try that one on 3.0.3pre and 
see if the FAT FS compiles again (I wonder if my ashrdi3_nopower pattern 
will get used then?).

Despite the missing GCC optimization, I always wondered if the FAT FS 
really relies the roundup happening with a signed divide, or if we could 
simply replace the "/512" with ">>9" without any ill effects? At least I 
couldn't find anything documented in the code...

Franz.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Optimizations on long long multiply/divide on PowerPC32 don't work
  2001-12-09 23:29   ` Linus Torvalds
  2001-12-10  1:15     ` Richard Henderson
@ 2001-12-10  9:09     ` Paul Koning
  2001-12-10 10:23       ` Linus Torvalds
  1 sibling, 1 reply; 27+ messages in thread
From: Paul Koning @ 2001-12-10  9:09 UTC (permalink / raw)
  To: torvalds; +Cc: gcc

>>>>> "Linus" == Linus Torvalds <torvalds@transmeta.com> writes:

 Linus> In article <20011207164904.C16375@redhat.com> you write:
 >> On Fri, Dec 07, 2001 at 06:09:55PM -0600, Corey Minyard wrote:
 >>> 1) Add __divdi3 to the linux kernel.  I don't really think this
 >>> is a good idea, and it shouldn't be required.
 >>  Yes it should.  I've long considered it a bug that Linux didn't
 >> link against libgcc.

 Linus> ...Not linking against libgcc has found several problems in gcc.
 Linus> Ranging from missing totally obvious optimizations...

Does that mean that Linux isn't expected to build if you disable
optimization?  That seems strange, considering that debugging with gdb
is often easier with -O0.

   paul

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Optimizations on long long multiply/divide on PowerPC32 don't work
  2001-12-07 16:08 Optimizations on long long multiply/divide on PowerPC32 don't work Corey Minyard
  2001-12-07 17:08 ` Richard Henderson
@ 2001-12-10  9:09 ` Christoph Hellwig
  1 sibling, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2001-12-10  9:09 UTC (permalink / raw)
  To: Corey Minyard; +Cc: gcc

On Fri, Dec 07, 2001 at 06:09:55PM -0600, Corey Minyard wrote:
> I can see four options to solve this problem:
> 
>   1) Add __divdi3 to the linux kernel.  I don't really think this is a 
> good idea, and it shouldn't be required.
>   2) Move the conversion of the division to the function call to the 
> very last stages of the compiler.  IMHO, this is probably the best 
> option, but it's a big job to implement, I think.
>   3) Make the optimizations understand the function calls.  I don't even 
> want to think about this one.
>   4) Modify the tree conversions to do the optimizations there.  I have 
> a patch that does this (and passes all regressions), because it was 
> easy, but I consider it less optimal than option 2.

Fix that code to use shifts instead.  As far as I can see that's general
linux kernel practice.

	Christoph

-- 
Of course it doesn't work. We've performed a software upgrade.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Optimizations on long long multiply/divide on PowerPC32 don't work
  2001-12-10  1:15     ` Richard Henderson
  2001-12-10  1:24       ` Richard Henderson
@ 2001-12-10  9:15       ` Linus Torvalds
  2001-12-10 13:03         ` Richard Henderson
  1 sibling, 1 reply; 27+ messages in thread
From: Linus Torvalds @ 2001-12-10  9:15 UTC (permalink / raw)
  To: Richard Henderson; +Cc: gcc


On Mon, 10 Dec 2001, Richard Henderson wrote:

> On Sun, Dec 09, 2001 at 10:58:14PM -0800, Linus Torvalds wrote:
> > I'm sorry you feel that way, but what the gcc team has done to libgcc
> > has only made me _more_ convinced that not linking against that
> > steenking heap of *** is a really good idea..
>
> Ok, I'll bite -- what have we done to libgcc?

These days libgcc has a _lot_ of "non-architecture" stuff.

It used to be that libgcc only had the arithmetic etc extensions, for
things purely like "instruction extensions". Fair enough.

But do a "nm libgcc.a" today, and you get literally _pages_ of output of
things that have absolutely _zero_ to do with machine descriptions.

It has weak symbols for stuff like the pthreads stuff.

There is no way any of that must ever make it into the kernel even by
mistake. Having the compiler emit that crap and silently link against it
would be a _major_ mistake in my opinion.

And don't tell me it doesn't happen. The exception code DID in fact
happen, and if it weren't for the lack of libgcc in the kernel, I would
never have been told by people who started using snapshot compilers.

> > The fact that gcc cannot do a 64-bit divide by a constant is a gcc
> > deficiency.  But it is NOT cause for including crap in the kernel.
>
> Hogwash.  Look at the example again.  Signed division is not merely
> a shift.  It's more like

[ deleted ]

That's fine, but that _still_ supports my point 100%.

Read my email again. Read the part about why we do not want to have the
slow crap routines, when most likely the user really _wanted_ a unsigned
shift in the first place.

		Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Optimizations on long long multiply/divide on PowerPC32  don't work
  2001-12-10  2:40         ` Franz Sirl
@ 2001-12-10  9:21           ` Linus Torvalds
  2001-12-10 10:59             ` Franz Sirl
  0 siblings, 1 reply; 27+ messages in thread
From: Linus Torvalds @ 2001-12-10  9:21 UTC (permalink / raw)
  To: Franz Sirl; +Cc: Richard Henderson, gcc


On Mon, 10 Dec 2001, Franz Sirl wrote:
>
> Ah, now that is nice, I always wondered which codepath produced the
> signeddivide64-by-exactlog2constant to shift conversion on x86, but
> couldn't find anything in i386.md :-(. I'll try that one on 3.0.3pre and
> see if the FAT FS compiles again (I wonder if my ashrdi3_nopower pattern
> will get used then?).

Which part of FAT-FS actually tries to do a signed division?

I have a VERY strong suspicion that any filesystem that wants to divide by
512 is really just getting a sector number, and the division should not be
signed in the first place.

> Despite the missing GCC optimization, I always wondered if the FAT FS
> really relies the roundup happening with a signed divide, or if we could
> simply replace the "/512" with ">>9" without any ill effects? At least I
> couldn't find anything documented in the code...

It's almost certainly a "loff_t", and we should always have generated
-EINVAL for negative offsets, so it's pretty much guaranteed to be
positive modulo any bugs.

Of course, with fatfs, who knows..

		Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Optimizations on long long multiply/divide on PowerPC32 don't work
  2001-12-10  9:09     ` Paul Koning
@ 2001-12-10 10:23       ` Linus Torvalds
  2001-12-10 11:35         ` Optimizations on long long multiply/divide on PowerPC32 don't Joe Buck
  0 siblings, 1 reply; 27+ messages in thread
From: Linus Torvalds @ 2001-12-10 10:23 UTC (permalink / raw)
  To: Paul Koning; +Cc: gcc


On Mon, 10 Dec 2001, Paul Koning wrote:
>
>  Linus> ...Not linking against libgcc has found several problems in gcc.
>  Linus> Ranging from missing totally obvious optimizations...
>
> Does that mean that Linux isn't expected to build if you disable
> optimization?  That seems strange, considering that debugging with gdb
> is often easier with -O0.

Linux has always required optimizations to build. Since day 1, in fact.

That is partly because I just cannot ever imagine running the code that
gcc outputs without optimizations, but even more because gcc functionality
at -O0 is seriously lacking:

Gcc doesn't support inline functions without optimization (it does in C++,
and it may be that even the C side has started to honour the "inline"
keyword, but I've never been interested enough to check), and Linux has
always tended to prefer inline functions over complicated macros.

I never made "backing" static functions for things like "inb/outb" etc,
that actually expand in the end to just one assembler function.

And I don't know about you, but I debug almost exclusively on a source
level (where compiler optimizations do not matter) or, when I have to, on
an assembler level (where the code is actually _more_ readable with
optimizations). So I'd never use -O0 anyway.

			Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Optimizations on long long multiply/divide on PowerPC32  don't work
  2001-12-10  9:21           ` Linus Torvalds
@ 2001-12-10 10:59             ` Franz Sirl
  2001-12-10 11:11               ` Linus Torvalds
  2001-12-10 11:31               ` Gabriel Paubert
  0 siblings, 2 replies; 27+ messages in thread
From: Franz Sirl @ 2001-12-10 10:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Richard Henderson, gcc

On Monday 10 December 2001 18:08, Linus Torvalds wrote:
> On Mon, 10 Dec 2001, Franz Sirl wrote:
> > Ah, now that is nice, I always wondered which codepath produced the
> > signeddivide64-by-exactlog2constant to shift conversion on x86, but
> > couldn't find anything in i386.md :-(. I'll try that one on 3.0.3pre and
> > see if the FAT FS compiles again (I wonder if my ashrdi3_nopower pattern
> > will get used then?).
>
> Which part of FAT-FS actually tries to do a signed division?

It's this code fragment:

        inode->i_blocks = ((inode->i_size + inode->i_blksize - 1)
                           & ~(inode->i_blksize - 1)) / 512;

that is in inode.c/fat_read_root() and fat_fill_inode(). It's inode->i_size 
that is of type loff_t here.

> I have a VERY strong suspicion that any filesystem that wants to divide by
> 512 is really just getting a sector number, and the division should not be
> signed in the first place.

It calculates the number of blocks here, so the rounding up might be wanted...

> > Despite the missing GCC optimization, I always wondered if the FAT FS
> > really relies the roundup happening with a signed divide, or if we could
> > simply replace the "/512" with ">>9" without any ill effects? At least I
> > couldn't find anything documented in the code...
>
> It's almost certainly a "loff_t", and we should always have generated
> -EINVAL for negative offsets, so it's pretty much guaranteed to be
> positive modulo any bugs.
>
> Of course, with fatfs, who knows..

Hehe, yeah, exactly :-). Anyway, Richard's patch works fine and gcc3 now is 
able to optimize the divdi3 call away.

Franz.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Optimizations on long long multiply/divide on PowerPC32  don't work
  2001-12-10 10:59             ` Franz Sirl
@ 2001-12-10 11:11               ` Linus Torvalds
  2001-12-10 11:31               ` Gabriel Paubert
  1 sibling, 0 replies; 27+ messages in thread
From: Linus Torvalds @ 2001-12-10 11:11 UTC (permalink / raw)
  To: Franz Sirl; +Cc: Richard Henderson, gcc


On Mon, 10 Dec 2001, Franz Sirl wrote:
> >
> > Which part of FAT-FS actually tries to do a signed division?
>
> It's this code fragment:
>
>         inode->i_blocks = ((inode->i_size + inode->i_blksize - 1)
>                            & ~(inode->i_blksize - 1)) / 512;
>
> that is in inode.c/fat_read_root() and fat_fill_inode(). It's inode->i_size
> that is of type loff_t here.

Hmm, ok.

I've never really liked the signedness of "loff_t" (or "off_t"), as
negative numbers are illegal anyway, and are really only used for "seek
offset" case.

It might be a good idea to make the in-kernel sizes all be unsigned, to
avoid things like this being problematic.

For fat, the inode size is actually limited to 32 bits, so the above is
definitely _always_ a unsigned number, and a relatively small one at that.
So unsigned vs signed doesn't matter, but on the whole I'd prefer to
default to unsigned which tends to be faster anyway, and has less
surprising behaviour in case of bugs..

> It calculates the number of blocks here, so the rounding up might be wanted...

Yes, it does the rounding up by hand, but it does _not_ need or want the
"negative division" rounding part.

> Hehe, yeah, exactly :-). Anyway, Richard's patch works fine and gcc3 now is
> able to optimize the divdi3 call away.

Good. The code looks correct, even if I have this feeling that the kernel
might want changing too..

		Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Optimizations on long long multiply/divide on PowerPC32  don't work
  2001-12-10 10:59             ` Franz Sirl
  2001-12-10 11:11               ` Linus Torvalds
@ 2001-12-10 11:31               ` Gabriel Paubert
  2001-12-10 17:58                 ` Richard Henderson
  1 sibling, 1 reply; 27+ messages in thread
From: Gabriel Paubert @ 2001-12-10 11:31 UTC (permalink / raw)
  To: Franz Sirl, gcc

Franz Sirl wrote:

> On Monday 10 December 2001 18:08, Linus Torvalds wrote:
> 
>>On Mon, 10 Dec 2001, Franz Sirl wrote:
>>
>>>Ah, now that is nice, I always wondered which codepath produced the
>>>signeddivide64-by-exactlog2constant to shift conversion on x86, but
>>>couldn't find anything in i386.md :-(. I'll try that one on 3.0.3pre and
>>>see if the FAT FS compiles again (I wonder if my ashrdi3_nopower pattern
>>>will get used then?).
>>>
>>Which part of FAT-FS actually tries to do a signed division?
>>
> 
> It's this code fragment:
> 
>         inode->i_blocks = ((inode->i_size + inode->i_blksize - 1)
>                            & ~(inode->i_blksize - 1)) / 512;



I believe inode->i_blksize is always a multiple of 512. So the least 
significant 9 bits are guaranteed to be zero and you can safely replace 
the divide by a shift.


> It calculates the number of blocks here, so the rounding up might be wanted...


There is already a round-up to the next multiple of inode->i_blksize by 
the addition and masking.

> Hehe, yeah, exactly :-). Anyway, Richard's patch works fine and gcc3 now is 
> able to optimize the divdi3 call away.


Great, does it also work for unsigned (more interesting for me).	

	Gabriel.




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Optimizations on long long multiply/divide on PowerPC32 don't
  2001-12-10 10:23       ` Linus Torvalds
@ 2001-12-10 11:35         ` Joe Buck
  2001-12-10 13:49           ` guerby
  0 siblings, 1 reply; 27+ messages in thread
From: Joe Buck @ 2001-12-10 11:35 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Paul Koning, gcc

Linus writes:
> Gcc doesn't support inline functions without optimization (it does in C++,
> and it may be that even the C side has started to honour the "inline"
> keyword, but I've never been interested enough to check), and Linux has
> always tended to prefer inline functions over complicated macros.

No, even in C++ GCC does not inline functions without optimization.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Optimizations on long long multiply/divide on PowerPC32 don't work
  2001-12-10  9:15       ` Linus Torvalds
@ 2001-12-10 13:03         ` Richard Henderson
  2001-12-10 13:59           ` Linus Torvalds
  0 siblings, 1 reply; 27+ messages in thread
From: Richard Henderson @ 2001-12-10 13:03 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: gcc

On Mon, Dec 10, 2001 at 09:03:09AM -0800, Linus Torvalds wrote:
> These days libgcc has a _lot_ of "non-architecture" stuff.

But it's still all "compiler support" routines.

> Read my email again. Read the part about why we do not want to have the
> slow crap routines, when most likely the user really _wanted_ a unsigned
> shift in the first place.

As stated, that sounds reasonable.  Just so long as it's not
automatically considered a compiler bug that you run in to
these things from time to time.


r~

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Optimizations on long long multiply/divide on PowerPC32 don't
  2001-12-10 11:35         ` Optimizations on long long multiply/divide on PowerPC32 don't Joe Buck
@ 2001-12-10 13:49           ` guerby
  2001-12-10 14:08             ` Arnaud Charlet
  0 siblings, 1 reply; 27+ messages in thread
From: guerby @ 2001-12-10 13:49 UTC (permalink / raw)
  To: jbuck; +Cc: torvalds, pkoning, gcc

> No, even in C++ GCC does not inline functions without optimization.

Same thing for Ada, although a GNAT specific pragma named
Inline_Always exists, my reading of gigi code implies that it still
requires optimizations to be active for the inlining to occur:

gcc/ada/trans.c
<<
static void
process_inlined_subprograms (gnat_node)
     Node_Id gnat_node;
{
  Entity_Id gnat_entity;
  Node_Id gnat_body;

  /* If we can inline, generate RTL for all the inlined subprograms.
     Define the entity first so we set DECL_EXTERNAL.  */
  if (optimize > 0 && ! flag_no_inline)
>>

gcc/ada/gnat_rm.texi
<<
@findex Inline_Always
@item pragma Inline_Always
@noindent
Syntax:

@smallexample
pragma Inline_Always (NAME [, NAME]);
@end smallexample

@noindent
Similar to pragma  @code{Inline} except that inlining is not subject to
the use of option @code{-gnatn} for inter-unit inlining.
>>

Should I provide a patch to mention explicitely that optimization
needs to be on for inlining to occur with Inline_Always?

-- 
Laurent Guerby <guerby@acm.org>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Optimizations on long long multiply/divide on PowerPC32 don't work
  2001-12-10 13:03         ` Richard Henderson
@ 2001-12-10 13:59           ` Linus Torvalds
  0 siblings, 0 replies; 27+ messages in thread
From: Linus Torvalds @ 2001-12-10 13:59 UTC (permalink / raw)
  To: Richard Henderson; +Cc: gcc


On Mon, 10 Dec 2001, Richard Henderson wrote:
> > Read my email again. Read the part about why we do not want to have the
> > slow crap routines, when most likely the user really _wanted_ a unsigned
> > shift in the first place.
>
> As stated, that sounds reasonable.  Just so long as it's not
> automatically considered a compiler bug that you run in to
> these things from time to time.

Oh, agreed.  The kernel makes this a conscious choice, and makes it own
routines for when they are needed.

As an example of where the kernel does its own "mini-libgcc" is the fact
that the kernel actually _does_ end up needing a 64-bit divide, but the
kernel happily gets by with the (often much faster) 64:32 version.

Now, gcc itself could be smart enough to notice when we do a 64:32 divide,
but it historically hasn't, and I bet it still doesn't. So Linux has it's
own per-architecture "do_div()" routines rather than letting gcc mess up a
perfectly simple 64:32 divide into a much more complicated 64:64 divide.

So it's a bit of extra work, but it's really not all that much (I do not
think we've needed to do _anything_ in this area for the last few years
except for some cleanups).

And as you've noticed, the problems really often _do_ end up being gcc
optimization issues..

(The kernel has historically also noticed some cases where gcc was silly
enough to not optimize away compile-time constant "switch" statements etc,
so I think that on the whole we've found mostly real deficiencies in the
compiler, and quite few cases where we needed to change the kernel)

			Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Optimizations on long long multiply/divide on PowerPC32 don't
  2001-12-10 13:49           ` guerby
@ 2001-12-10 14:08             ` Arnaud Charlet
  2001-12-10 14:31               ` guerby
  0 siblings, 1 reply; 27+ messages in thread
From: Arnaud Charlet @ 2001-12-10 14:08 UTC (permalink / raw)
  To: guerby; +Cc: jbuck, torvalds, pkoning, gcc

> Same thing for Ada, although a GNAT specific pragma named
> Inline_Always exists, my reading of gigi code implies that it still
> requires optimizations to be active for the inlining to occur:

The front end inlining mechanism is supposed to take care of this. but
there are still some subtle issues that prevent it from working properly.

Anyway, the documentation should not be fixed, since the intent is definitely
to always inline.

Arno

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Optimizations on long long multiply/divide on PowerPC32 don't
  2001-12-10 14:08             ` Arnaud Charlet
@ 2001-12-10 14:31               ` guerby
  0 siblings, 0 replies; 27+ messages in thread
From: guerby @ 2001-12-10 14:31 UTC (permalink / raw)
  To: charlet; +Cc: jbuck, torvalds, pkoning, gcc

> The front end inlining mechanism is supposed to take care of this. but
> there are still some subtle issues that prevent it from working properly.
> Anyway, the documentation should not be fixed, since the intent is definitely
> to always inline.

If I understand correctly, Inline_Always not inlining at -O0 is
considered a bug. Any available test case or description for the
subtle issues you mention?

-- 
Laurent Guerby <guerby@acm.org>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Optimizations on long long multiply/divide on PowerPC32  don't work
  2001-12-10 11:31               ` Gabriel Paubert
@ 2001-12-10 17:58                 ` Richard Henderson
  0 siblings, 0 replies; 27+ messages in thread
From: Richard Henderson @ 2001-12-10 17:58 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: Franz Sirl, gcc

On Mon, Dec 10, 2001 at 08:27:45PM +0100, Gabriel Paubert wrote:
> Great, does it also work for unsigned (more interesting for me).	

It always worked for unsigned.


r~

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Optimizations on long long multiply/divide on PowerPC32 don't work
  2001-12-10  1:24       ` Richard Henderson
  2001-12-10  2:40         ` Franz Sirl
@ 2002-01-22  8:51         ` Franz Sirl
  2002-01-22 12:04           ` Richard Henderson
  1 sibling, 1 reply; 27+ messages in thread
From: Franz Sirl @ 2002-01-22  8:51 UTC (permalink / raw)
  To: Richard Henderson; +Cc: gcc

At 10:14 10.12.2001, you wrote:
>On Mon, Dec 10, 2001 at 12:54:44AM -0800, Richard Henderson wrote:
> > GCC knows how to do this.  It will do this if the target supports
> > all of these operations on 64 bit data types and it considers them
> > to be cheap enough.
>
>How irritating.  I do not like the taste of crow.
>
>While the above is true, powerpc apparently has a rather cheap 32-bit
>divide instruction which throws off the heuristics in this case.
>
>
>r~
>
>
>
>Index: expmed.c
>===================================================================
>RCS file: /cvs/gcc/gcc/gcc/expmed.c,v
>retrieving revision 1.97
>diff -c -p -d -r1.97 expmed.c
>*** expmed.c    2001/11/20 04:12:11     1.97
>--- expmed.c    2001/12/10 09:07:20
>*************** expand_divmod (rem_flag, code, mode, op0
>*** 3271,3277 ****
>                       goto fail1;
>                   }
>                 else if (EXACT_POWER_OF_2_OR_ZERO_P (d)
>!                       && (rem_flag ? smod_pow2_cheap : sdiv_pow2_cheap))
>                   ;
>                 else if (EXACT_POWER_OF_2_OR_ZERO_P (abs_d))
>                   {
>--- 3271,3286 ----
>                       goto fail1;
>                   }
>                 else if (EXACT_POWER_OF_2_OR_ZERO_P (d)
>!                       && (rem_flag ? smod_pow2_cheap : sdiv_pow2_cheap)
>!                       /* ??? The cheap metric is computed only for
>!                           word_mode.  If this operation is wider, this may
>!                           not be so.  Assume true if the optab has an
>!                           expander for this mode.  */
>!                       && (((rem_flag ? smod_optab : sdiv_optab)
>!                             ->handlers[(int) compute_mode].insn_code
>!                             != CODE_FOR_nothing)
>!                            || (sdivmod_optab->handlers[(int) compute_mode]
>!                               .insn_code != CODE_FOR_nothing)))
>                   ;
>                 else if (EXACT_POWER_OF_2_OR_ZERO_P (abs_d))
>                   {


May I move that one to the branch for 3.0.4? Bootstrapped and regtested on 
powerpc-linux-gnu and x86-linux-gnu.

Franz.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Optimizations on long long multiply/divide on PowerPC32 don't work
  2002-01-22  8:51         ` Franz Sirl
@ 2002-01-22 12:04           ` Richard Henderson
  2002-01-23  7:24             ` Gerald Pfeifer
  0 siblings, 1 reply; 27+ messages in thread
From: Richard Henderson @ 2002-01-22 12:04 UTC (permalink / raw)
  To: Franz Sirl; +Cc: gcc

On Tue, Jan 22, 2002 at 03:59:02PM +0100, Franz Sirl wrote:
> May I move that one to the branch for 3.0.4?

Yes.


r~

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Optimizations on long long multiply/divide on PowerPC32 don't work
  2002-01-22 12:04           ` Richard Henderson
@ 2002-01-23  7:24             ` Gerald Pfeifer
  2002-02-03 10:44               ` Franz Sirl
  0 siblings, 1 reply; 27+ messages in thread
From: Gerald Pfeifer @ 2002-01-23  7:24 UTC (permalink / raw)
  To: Franz Sirl; +Cc: Richard Henderson, gcc

On Tue, 22 Jan 2002, Richard Henderson wrote:
> On Tue, Jan 22, 2002 at 03:59:02PM +0100, Franz Sirl wrote:
>> May I move that one to the branch for 3.0.4?
> Yes.

And please also update htdocs/gcc-3.0/features.html accordingly.

Thanks,
Gerald
-- 
Gerald "Jerry" pfeifer@dbai.tuwien.ac.at http://www.dbai.tuwien.ac.at/~pfeifer/

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Optimizations on long long multiply/divide on PowerPC32 don't work
  2002-01-23  7:24             ` Gerald Pfeifer
@ 2002-02-03 10:44               ` Franz Sirl
  0 siblings, 0 replies; 27+ messages in thread
From: Franz Sirl @ 2002-02-03 10:44 UTC (permalink / raw)
  To: Gerald Pfeifer; +Cc: gcc

On Wednesday 23 January 2002 12:59, Gerald Pfeifer wrote:
> On Tue, 22 Jan 2002, Richard Henderson wrote:
> > On Tue, Jan 22, 2002 at 03:59:02PM +0100, Franz Sirl wrote:
> >> May I move that one to the branch for 3.0.4?
> >
> > Yes.
>
> And please also update htdocs/gcc-3.0/features.html accordingly.

I've committed the appended hunk.

Franz.

Index: htdocs/gcc-3.0/features.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-3.0/features.html,v
retrieving revision 1.25
diff -u -p -r1.25 features.html
--- features.html       2002/01/31 20:45:02     1.25
+++ features.html       2002/02/03 18:32:46
@@ -190,6 +190,7 @@
     <li>A fix for shared library generation under AIX 4.3.</li>
     <li>Documentation updates.</li>
     <li>Port of GCC to Tensilica's Xtensa processor contributed.</li>
+    <li>A fix for compiling the PPC Linux kernel (FAT fs wouldn't link).</li>
   </ul></li>
 
</ul>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Optimizations on long long multiply/divide on PowerPC32 don't work
@ 2001-12-07 18:28 mike stump
  0 siblings, 0 replies; 27+ messages in thread
From: mike stump @ 2001-12-07 18:28 UTC (permalink / raw)
  To: minyard, rth; +Cc: gcc

> Date: Fri, 7 Dec 2001 16:49:04 -0800
> From: Richard Henderson <rth@redhat.com>
> To: Corey Minyard <minyard@acm.org>
> Cc: gcc@gcc.gnu.org

> On Fri, Dec 07, 2001 at 06:09:55PM -0600, Corey Minyard wrote:
> >   1) Add __divdi3 to the linux kernel.  I don't really think this is a 
> > good idea, and it shouldn't be required.

> Yes it should.  I've long considered it a bug that Linux
> didn't link against libgcc.

Speaking as someone that has a kernel and gcc, let me say, that of
course one wants to link against libgcc.a and libstdc++.a.  In fact,
one might even want to go out of their way to include large swaths of
the libraries even though there are no references to them, just so
that dynamically loaded code can resolve against it.

This is the recommendation of the gcc folks, and if folks don't want
to follow it, that is their problem, and they will have to deal with
it by themselves, as they created it.

I've been witnessing these problems first hand for the last 4 years,
and I know it is the right solution.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2002-02-03 18:39 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-12-07 16:08 Optimizations on long long multiply/divide on PowerPC32 don't work Corey Minyard
2001-12-07 17:08 ` Richard Henderson
2001-12-07 17:40   ` Corey Minyard
2001-12-09 23:29   ` Linus Torvalds
2001-12-10  1:15     ` Richard Henderson
2001-12-10  1:24       ` Richard Henderson
2001-12-10  2:40         ` Franz Sirl
2001-12-10  9:21           ` Linus Torvalds
2001-12-10 10:59             ` Franz Sirl
2001-12-10 11:11               ` Linus Torvalds
2001-12-10 11:31               ` Gabriel Paubert
2001-12-10 17:58                 ` Richard Henderson
2002-01-22  8:51         ` Franz Sirl
2002-01-22 12:04           ` Richard Henderson
2002-01-23  7:24             ` Gerald Pfeifer
2002-02-03 10:44               ` Franz Sirl
2001-12-10  9:15       ` Linus Torvalds
2001-12-10 13:03         ` Richard Henderson
2001-12-10 13:59           ` Linus Torvalds
2001-12-10  9:09     ` Paul Koning
2001-12-10 10:23       ` Linus Torvalds
2001-12-10 11:35         ` Optimizations on long long multiply/divide on PowerPC32 don't Joe Buck
2001-12-10 13:49           ` guerby
2001-12-10 14:08             ` Arnaud Charlet
2001-12-10 14:31               ` guerby
2001-12-10  9:09 ` Optimizations on long long multiply/divide on PowerPC32 don't work Christoph Hellwig
2001-12-07 18:28 mike stump

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).