public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* swap (a, b) suggestion
@ 1998-01-03 15:11 Eric Buddington
  1998-01-03 17:56 ` Gabriel Dos Reis
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Eric Buddington @ 1998-01-03 15:11 UTC (permalink / raw)
  To: egcs

Greetings.

I am just learning about STL, and have found a possible improvement
for the swap() template in include/g++/stl_algobase.h.  The template
here is straightforward:

template <class T>
inline void swap(T& a, T& b) {
  T tmp = a;
  a = b;
  b = tmp;
}

For integer values, the following works, and is significantly faster:

template <class T>
inline void swap(T& a, T& b) {
  a ^= b;
  b ^= a;
  a ^= b;
}

In my own code, this change sped up a quicksort significantly.
Explicit versions for char, short, int, and long (and their unsigned
counterparts) alongside the template would do the trick.

I'll be interested to hear any comments. Thanks for your collective
work - egcs has really saved my butt development-wise.

Eric Buddington

^ permalink raw reply	[flat|nested] 13+ messages in thread

* swap (a, b) suggestion
  1998-01-03 15:11 swap (a, b) suggestion Eric Buddington
@ 1998-01-03 17:56 ` Gabriel Dos Reis
  1998-01-03 20:12 ` Richard Henderson
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Gabriel Dos Reis @ 1998-01-03 17:56 UTC (permalink / raw)
  To: ebuddington; +Cc: egcs

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1147 bytes --]

>>>>> «Eric», Eric Buddington <eric@sparrow.vgernet.net> wrote:

Eric> Greetings.
Eric> I am just learning about STL, and have found a possible improvement
Eric> for the swap() template in include/g++/stl_algobase.h.  The template
Eric> here is straightforward:

Eric> template <class T>
Eric> inline void swap(T& a, T& b) {
Eric>   T tmp = a;
Eric>   a = b;
Eric>   b = tmp;
Eric> }

Eric> For integer values, the following works, and is significantly faster:

Eric> template <class T>
Eric> inline void swap(T& a, T& b) {
Eric>   a ^= b;
Eric>   b ^= a;
Eric>   a ^= b;
Eric> }

	This works fine for integral types. But about float/double and
more generaly non-integral types (complex<>, ...)


Eric> In my own code, this change sped up a quicksort significantly.
Eric> Explicit versions for char, short, int, and long (and their unsigned
Eric> counterparts) alongside the template would do the trick.

Eric> I'll be interested to hear any comments. Thanks for your collective
Eric> work - egcs has really saved my butt development-wise.

	specialisations for the template function swap<>() for
integral types may be added. Uli ?

-- Gaby

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: swap (a, b) suggestion
  1998-01-03 15:11 swap (a, b) suggestion Eric Buddington
  1998-01-03 17:56 ` Gabriel Dos Reis
@ 1998-01-03 20:12 ` Richard Henderson
  1998-01-03 23:22   ` Benny Amorsen
                     ` (2 more replies)
  1998-01-05 14:13 ` swap (a, b) - test Michael Meissner
       [not found] ` <199801040146.CAA01125.cygnus.egcs@piano.dptmaths.ens-cachan.fr>
  3 siblings, 3 replies; 13+ messages in thread
From: Richard Henderson @ 1998-01-03 20:12 UTC (permalink / raw)
  To: Eric Buddington; +Cc: egcs

On Sat, Jan 03, 1998 at 06:11:23PM -0500, Eric Buddington wrote:
> For integer values, the following works, and is significantly faster:
> 
> template <class T>
> inline void swap(T& a, T& b) {
>   a ^= b;
>   b ^= a;
>   a ^= b;
> }

If it is faster at all I'll warrent it is only so for depressingly
register starved machines.


r~

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: swap (a, b) suggestion
  1998-01-03 20:12 ` Richard Henderson
@ 1998-01-03 23:22   ` Benny Amorsen
  1998-01-04  5:36     ` Alan Lehotsky
  1998-01-04 20:43   ` swap (a, b) - test Eric Buddington
  1998-01-05 15:21   ` swap (a, b) suggestion Stephen Williams
  2 siblings, 1 reply; 13+ messages in thread
From: Benny Amorsen @ 1998-01-03 23:22 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Eric Buddington, egcs

>>>>> "RH" == Richard Henderson <rth@cygnus.com> writes:

RH> On Sat, Jan 03, 1998 at 06:11:23PM -0500, Eric Buddington wrote:
>> For integer values, the following works, and is significantly
>> faster:
>> 
>> template <class T>
>> inline void swap(T& a, T& b) {
>>   a ^= b;
>>   b ^= a;
>>   a ^= b;
>> }

RH> If it is faster at all I'll warrent it is only so for depressingly
RH> register starved machines.

Perhaps the optimization should be made in a different place -- Make
the optimizer recognize the sequence "tmp = a; a = b; b = tmp" and do
it the most efficient way. The instruction combiner or the peephole
optimizer could be made to catch this.


Benny

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: swap (a, b) suggestion
  1998-01-03 23:22   ` Benny Amorsen
@ 1998-01-04  5:36     ` Alan Lehotsky
  1998-01-04 10:41       ` "D. Hugh Redelmeier"
  1998-01-04 22:09       ` Richard Henderson
  0 siblings, 2 replies; 13+ messages in thread
From: Alan Lehotsky @ 1998-01-04  5:36 UTC (permalink / raw)
  To: Benny Amorsen, egcs

At 8:22 AM +0100 1/4/98, Benny Amorsen wrote:

 * >>>>> "RH" == Richard Henderson <rth@cygnus.com> writes:
 *
 * RH> On Sat, Jan 03, 1998 at 06:11:23PM -0500, Eric Buddington wrote:
 * >> For integer values, the following works, and is significantly
 * >> faster:
 * >>
 * >> template <class T>
 * >> inline void swap(T& a, T& b) {
 * >>   a ^= b;
 * >>   b ^= a;
 * >>   a ^= b;
 * >> }
 *
 * RH> If it is faster at all I'll warrent it is only so for depressingly
 * RH> register starved machines.
 *
 * Perhaps the optimization should be made in a different place -- Make
 * the optimizer recognize the sequence "tmp = a; a = b; b = tmp" and do
 * it the most efficient way. The instruction combiner or the peephole
 * optimizer could be made to catch this.
 *
 *
 * Benny

That's the way I did this in my enhancements to the SHARC DSP port
of GCC.  It has an instruction that allows parallel register
assignments, so I can do

		ra=pass rb, rb=ra;

which in one instruction does the swap.  I suppose for completeness sake, I
ought to add a pattern for the three XORs and reduce them to the single
instruction too :-)

-- Al Lehotsky

------------------------------------------------------------------------

		    Quality Software Management
		http://www.tiac.net/users/lehotsky
			lehotsky@tiac.net
			(617)862-5418 Voice
			(617)674-1096 FAX

	Software Process Improvement and Management Consulting
	     Language Design and Compiler Implementation



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: swap (a, b) suggestion
  1998-01-04  5:36     ` Alan Lehotsky
@ 1998-01-04 10:41       ` "D. Hugh Redelmeier"
  1998-01-04 22:09       ` Richard Henderson
  1 sibling, 0 replies; 13+ messages in thread
From: "D. Hugh Redelmeier" @ 1998-01-04 10:41 UTC (permalink / raw)
  To: egcs

[I'm replying to a reconstruction of the original message; sorry if I
botch it.]

On Sat, Jan 03, 1998 at 06:11:23PM -0500, Eric Buddington wrote:
| For integer values, the following works, and is significantly
| faster:
| template <class T>
| inline void swap(T& a, T& b) {
|   a ^= b;
|   b ^= a;
|   a ^= b;
|  }

If a and be are aliases, then this code won't work.  I don't actually know
the specs for swap -- are aliases legal?  I hope aliases are legal since
allowing degenerate cases simplifies programming.

Hugh Redelmeier
hugh@mimosa.com  voice: +1 416 482-8253


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: swap (a, b) - test
  1998-01-03 20:12 ` Richard Henderson
  1998-01-03 23:22   ` Benny Amorsen
@ 1998-01-04 20:43   ` Eric Buddington
  1998-01-04 22:14     ` Richard Henderson
  1998-01-05  4:08     ` Andi Kleen
  1998-01-05 15:21   ` swap (a, b) suggestion Stephen Williams
  2 siblings, 2 replies; 13+ messages in thread
From: Eric Buddington @ 1998-01-04 20:43 UTC (permalink / raw)
  To: egcs

On Sat, Jan 03, 1998 at 08:12:51PM -0800, Richard Henderson wrote:

> If it is faster at all I'll warrent it is only so for depressingly
> register starved machines.

Well, I'm on an Intel 486 - that probably qualifies.

Now, I *did* get great advantage from this before, in more complex
code (quicksort), under gcc-2.7.2.3.

However, a quick test now is now running slower with the XOR swap. I
don't see why three XOR operations on two ints would be slower
than three assignments among three ints...

My test code (included below) ran about 35% faster with the old swap.

Maybe this isn't Nobel-prize material after all :)

-Eric

------------ test code -----------------

#include <stdlib.h>

inline void 
swap1 (int & a, int & b)
{
  int tmp = a; a = b; b = tmp;
}

inline void 
swap2 (int & a, int & b)
{
  if (&a != &b) { a ^= b; b ^= a; a ^= b; }
}

int 
main (int argc, char** argv)
{
  int data [10240];
  for (int i=0; i < 10240; i++) data[i] = rand();
  for (int j=0; j < 10240; j++)
    for (int i=0; i < 10239; i++)
#ifndef FAST_SWAP
      swap1 (data[i], data[i+1]);
#else
      swap2 (data[i], data[i+1]);
#endif
}

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: swap (a, b) suggestion
  1998-01-04  5:36     ` Alan Lehotsky
  1998-01-04 10:41       ` "D. Hugh Redelmeier"
@ 1998-01-04 22:09       ` Richard Henderson
  1 sibling, 0 replies; 13+ messages in thread
From: Richard Henderson @ 1998-01-04 22:09 UTC (permalink / raw)
  To: Alan Lehotsky; +Cc: Benny Amorsen, egcs

On Sun, Jan 04, 1998 at 08:32:58AM -0500, Alan Lehotsky wrote:
> That's the way I did this in my enhancements to the SHARC DSP port
> of GCC.  It has an instruction that allows parallel register
> assignments, so I can do
> 
> 		ra=pass rb, rb=ra;

And the way it should probably be done on x86 as well, since 
it has an xchg instruction.


r~

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: swap (a, b) - test
  1998-01-04 20:43   ` swap (a, b) - test Eric Buddington
@ 1998-01-04 22:14     ` Richard Henderson
  1998-01-05  4:08     ` Andi Kleen
  1 sibling, 0 replies; 13+ messages in thread
From: Richard Henderson @ 1998-01-04 22:14 UTC (permalink / raw)
  To: Eric Buddington; +Cc: egcs

On Sun, Jan 04, 1998 at 01:57:49PM -0500, Eric Buddington wrote:
> However, a quick test now is now running slower with the XOR swap. I
> don't see why three XOR operations on two ints would be slower
> than three assignments among three ints...

Ideally the register allocator can elide the swap altogether by
logically swapping what register it believes a value to reside in.

Certainly it would not be possible all of the time, and I think
things are in practice hindered by the fact that pseudos are 
currently allocated (or not) a single register for their entire 
life span.  


r~

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: swap (a, b) - test
  1998-01-04 20:43   ` swap (a, b) - test Eric Buddington
  1998-01-04 22:14     ` Richard Henderson
@ 1998-01-05  4:08     ` Andi Kleen
  1 sibling, 0 replies; 13+ messages in thread
From: Andi Kleen @ 1998-01-05  4:08 UTC (permalink / raw)
  To: ebuddington; +Cc: egcs

Eric Buddington <eric@sparrow.vgernet.net> writes:

> On Sat, Jan 03, 1998 at 08:12:51PM -0800, Richard Henderson wrote:
> 
> > If it is faster at all I'll warrent it is only so for depressingly
> > register starved machines.
> 
> Well, I'm on an Intel 486 - that probably qualifies.
> 
> Now, I *did* get great advantage from this before, in more complex
> code (quicksort), under gcc-2.7.2.3.
> 
> However, a quick test now is now running slower with the XOR swap. I
> don't see why three XOR operations on two ints would be slower
> than three assignments among three ints...

__asm__("xchg %1,%0" ...); might be faster on x86 too.

-Andi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: swap (a, b) - test
  1998-01-03 15:11 swap (a, b) suggestion Eric Buddington
  1998-01-03 17:56 ` Gabriel Dos Reis
  1998-01-03 20:12 ` Richard Henderson
@ 1998-01-05 14:13 ` Michael Meissner
       [not found] ` <199801040146.CAA01125.cygnus.egcs@piano.dptmaths.ens-cachan.fr>
  3 siblings, 0 replies; 13+ messages in thread
From: Michael Meissner @ 1998-01-05 14:13 UTC (permalink / raw)
  To: egcs

ak@muc.de (Andi Kleen) writes:

> __asm__("xchg %1,%0" ...); might be faster on x86 too.

Only if both operands are in registers.  Xchg with a memory operand, sets the
LOCK prefix which can slow down the reference.

-- 
Michael Meissner, Cygnus Solutions (Massachusetts office)
4th floor, 955 Massachusetts Avenue, Cambridge, MA 02139, USA
meissner@cygnus.com,	617-354-5416 (office),	617-354-7161 (fax)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: swap (a, b) suggestion
  1998-01-03 20:12 ` Richard Henderson
  1998-01-03 23:22   ` Benny Amorsen
  1998-01-04 20:43   ` swap (a, b) - test Eric Buddington
@ 1998-01-05 15:21   ` Stephen Williams
  2 siblings, 0 replies; 13+ messages in thread
From: Stephen Williams @ 1998-01-05 15:21 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Eric Buddington, egcs

On Sat, Jan 03, 1998 at 06:11:23PM -0500, Eric Buddington wrote:
> For integer values, the following works, and is significantly faster:
> 
> template <class T>
> inline void swap(T& a, T& b) {
>   a ^= b;
>   b ^= a;
>   a ^= b;
> }

rth@cygnus.com said:
> If it is faster at all I'll warrent it is only so for depressingly
> register starved machines. 

You mean like ix86 machines?

--steve



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: swap (a, b) suggestion
       [not found] ` <199801040146.CAA01125.cygnus.egcs@piano.dptmaths.ens-cachan.fr>
@ 1998-01-06 14:09   ` Ulrich Drepper
  0 siblings, 0 replies; 13+ messages in thread
From: Ulrich Drepper @ 1998-01-06 14:09 UTC (permalink / raw)
  To: egcs

Gabriel.Dos-Reis@dptmaths.ens-cachan.fr (Gabriel Dos Reis) writes:

> 	specialisations for the template function swap<>() for
> integral types may be added. Uli ?

Well, as Richard said already, I don't think it's generally a good
idea to use this since the new version will not always be faster.

But still I keep this around for some more testing since one of the
checkes to libstdc++ which will hopefully become available in egcs
throughout the year is that there will be general headers as required
by the standard plus several sets of optional headers which define
special optimizations (by specializations) for each machine.  For
those who know the structure of glibc 2.1, it'll be similar to how the
string.h and math.h headers are organized.

In such optimized header one could for (as Richard guessed)
register-poor machines add this hack for integral types but for ix86
it's still not the best solution (with some asm() hacking it should be
possible to use the xchg opcode).

-- Uli
---------------.      drepper at gnu.org  ,-.   Rubensstrasse 5
Ulrich Drepper  \    ,-------------------'   \  76149 Karlsruhe/Germany
Cygnus Solutions `--' drepper at cygnus.com   `------------------------

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~1998-01-06 14:09 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1998-01-03 15:11 swap (a, b) suggestion Eric Buddington
1998-01-03 17:56 ` Gabriel Dos Reis
1998-01-03 20:12 ` Richard Henderson
1998-01-03 23:22   ` Benny Amorsen
1998-01-04  5:36     ` Alan Lehotsky
1998-01-04 10:41       ` "D. Hugh Redelmeier"
1998-01-04 22:09       ` Richard Henderson
1998-01-04 20:43   ` swap (a, b) - test Eric Buddington
1998-01-04 22:14     ` Richard Henderson
1998-01-05  4:08     ` Andi Kleen
1998-01-05 15:21   ` swap (a, b) suggestion Stephen Williams
1998-01-05 14:13 ` swap (a, b) - test Michael Meissner
     [not found] ` <199801040146.CAA01125.cygnus.egcs@piano.dptmaths.ens-cachan.fr>
1998-01-06 14:09   ` swap (a, b) suggestion Ulrich Drepper

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).