1.0 sucessfull, install params questions

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* 1.0 sucessfull, install params questions
@ 1997-12-12  3:55 Hermann Lauer
  1997-12-12  8:55 ` Jeffrey A Law
  0 siblings, 1 reply; 68+ messages in thread
From: Hermann Lauer @ 1997-12-12  3:55 UTC (permalink / raw)
  To: egcs

Hello,

MANY THANKS FOR YOUR WORK ON EGCS !

on a i686-pc-linux-gnulibc1 (something between Redhat 4.2 and Redhat 5.0, with
libc 5.3.12 as development lib):

in the egcs dir I used the following to build:

mkdir objekts
cd objekts

../configure --program-prefix=e
make bootstrap

make install prefix=/tmp/usr/local program-prefix=e

The so generated egcs was then packed with rpm and installed to /usr/local.

egcs works, but the --program-prefix=e seems to be ignored -- all binaries
in /tmp/usr/local/bin alias /usr/local/bin have there default names !

Any comments on this - should this work in principle ?

If the "make install prefix=/tmp/usr/local" trick is not legal with egcs (works
with gcc-2.7.x), please tell me the correct way to achieve the same.

Thanks for any help.

Greetings
   Hermann


Output from tests:

                === libio Summary ===

# of expected passes            40

                === libstdc++ Summary ===

# of expected passes            30

                === gcc Summary ===

# of expected passes            4883
# of expected failures          5
# of unsupported tests          7

                === g++ Summary ===

# of expected passes            3400
# of unexpected successes       3
# of expected failures          80
# of untested testcases         6

                === g77 tests ===

FAIL: g77.f-torture/execute/dnrm2.f execution,  -O2 -fomit-frame-pointer
-finline-functions -funroll-loops
FAIL: g77.f-torture/execute/dnrm2.f execution,  -O2 -fomit-frame-pointer
-finline-functions -funroll-all-loops

                === g77 Summary ===

# of expected passes            130
# of unexpected failures        2



-- 
	Hermann Lauer

Bildverarbeitungsgruppe des Interdiziplinaeren Zentrums fuer
wissenschaftliches Rechnen, Universitaet Heidelberg
INF 368; 69120 Heidelberg; Tel: (06221)548826  Fax: (06221)548850
Email: Hermann.Lauer@iwr.uni-heidelberg.de


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: 1.0 sucessfull, install params questions
  1997-12-12  3:55 1.0 sucessfull, install params questions Hermann Lauer
@ 1997-12-12  8:55 ` Jeffrey A Law
  1997-12-12 10:18   ` Michael Poole
       [not found]   ` <law@hurl.cygnus.com>
  0 siblings, 2 replies; 68+ messages in thread
From: Jeffrey A Law @ 1997-12-12  8:55 UTC (permalink / raw)
  To: Hermann.Lauer; +Cc: egcs

  In message < 9712121155.ZM312@giotto.iwr.uni-heidelberg.de >you write:
  > on a i686-pc-linux-gnulibc1 (something between Redhat 4.2 and Redhat 5.0,
  > withlibc 5.3.12 as development lib):
Congrats.

  > make install prefix=/tmp/usr/local program-prefix=e
--prefix must be used at configure time, not at install time.  Trying
to do it at install time is just going to lead to problems later.

Test results look good.

jeff

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: 1.0 sucessfull, install params questions
  1997-12-12  8:55 ` Jeffrey A Law
@ 1997-12-12 10:18   ` Michael Poole
       [not found]   ` <law@hurl.cygnus.com>
  1 sibling, 0 replies; 68+ messages in thread
From: Michael Poole @ 1997-12-12 10:18 UTC (permalink / raw)
  To: egcs

On Fri, 12 Dec 1997, Jeffrey A Law wrote:

> 
>   In message < 9712121155.ZM312@giotto.iwr.uni-heidelberg.de >you write:
>   > on a i686-pc-linux-gnulibc1 (something between Redhat 4.2 and Redhat 5.0,
>   > withlibc 5.3.12 as development lib):
> Congrats.
> 
>   > make install prefix=/tmp/usr/local program-prefix=e
> --prefix must be used at configure time, not at install time.  Trying
> to do it at install time is just going to lead to problems later.

	Why is this?  Tools like Stow (which manages symlink trees
for you) and Depot (which does quite a bit more) generally require
separate a prefix and install-prefix; is having $(prefix)/foo a symlink to
some other location going to cause problems with egcs (for each foo in
the installed egcs files) in and of itself?

- Michael


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: 1.0 sucessfull, install params questions
       [not found]   ` <law@hurl.cygnus.com>
@ 1997-12-12 15:46     ` Hermann Lauer
  1998-07-14 14:29     ` porting EGCS to the Cray T3E Julian C. Cummings
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 68+ messages in thread
From: Hermann Lauer @ 1997-12-12 15:46 UTC (permalink / raw)
  To: egcs; +Cc: egcs

On Dec 12,  9:10am, Jeffrey A Law wrote:
...
>   > make install prefix=/tmp/usr/local program-prefix=e
> --prefix must be used at configure time, not at install time.  Trying
> to do it at install time is just going to lead to problems later.

So what is the recommended way to compile for a given location but to first
install to another location ? (for example, if /usr/local is exported read-only
via NFS ? (with AFS I have heard similar things can happen)
Also for package builder's this should be an possible option, as you don't want
to destroy another egcs at the same location...

Thanks for any advice.

  Hermann

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: porting EGCS to the Cray T3E
  1998-07-14 11:29 porting EGCS to the Cray T3E Julian C. Cummings
@ 1998-07-14 11:29 ` Jeffrey A Law
  0 siblings, 0 replies; 68+ messages in thread
From: Jeffrey A Law @ 1998-07-14 11:29 UTC (permalink / raw)
  To: Julian C. Cummings; +Cc: egcs

  In message < 9807141122.ZM4885@vapor.acl.lanl.gov >you write:
  > I guess I don't understand your comment.  There are other Cray platforms listed
  > as possibilities in the config.sub file, such as the Cray X-MP, Y-MP, Cray  2,
  > and Cray [C,J,T]90.  Why would these options be listed if Cray is
  > not supported?
They may be supported as host or build machines, but that does not
mean they are supported as a target.

ie, gcc does not know how to generate code for any cray that I'm aware of.

jeff

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: porting EGCS to the Cray T3E
@ 1998-07-14 11:29 Julian C. Cummings
  1998-07-14 11:29 ` Jeffrey A Law
  0 siblings, 1 reply; 68+ messages in thread
From: Julian C. Cummings @ 1998-07-14 11:29 UTC (permalink / raw)
  To: law; +Cc: egcs

>> It probably failed because it thought you also wanted to *target* for
>> a cray, which isn't supported.
>>
>> jeff

I guess I don't understand your comment.  There are other Cray platforms listed
as possibilities in the config.sub file, such as the Cray X-MP, Y-MP, Cray 2,
and Cray [C,J,T]90.  Why would these options be listed if Cray is not
supported?

Julian C.

-- 
Julian C. Cummings
Advanced Computing Laboratory
Los Alamos National Laboratory
(505) 667-6064
julianc@acl.lanl.gov

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: porting EGCS to the Cray T3E
  1998-07-14 14:29     ` porting EGCS to the Cray T3E Julian C. Cummings
@ 1998-07-14 13:20       ` Jeffrey A Law
  0 siblings, 0 replies; 68+ messages in thread
From: Jeffrey A Law @ 1998-07-14 13:20 UTC (permalink / raw)
  To: Julian C. Cummings; +Cc: egcs

  In message < 9807141146.ZM5055@vapor.acl.lanl.gov >you write:

  > OK.  As far as I can tell, the egcs build instructions do not make any sort of
  > distinction like this between "host" and "target".  The instructions imply that
  > these are the same thing.  Nevertheless, I tried the suggestion I received of
  > setting the host to "alpha-cray-unicosmk" to see if that works.  It does not.
No, look at configure.html



To configure egcs: 


     % mkdir objdir 
     % cd objdir 
     % srcdir/configure [target] [options] 

target specification 

     egcs has code to correctly determine the correct value for target for nearly all native
     systems. Therefore, we highly recommend you not provide a configure target when
     configuring a native compiler. 
     target must be specified when configuring a cross compiler; examples of valid targets
     would be i960-rtems, m68k-coff, sh-elf, etc. 


  >  I get the exact same behavior as before.  It works for a while, then says
  > 
  > Configuration alpha-cray-unicosmk not supported
This isn't going to help your problem -- there is no support for cray
targets.  As long as you continue to try and build for a cray target
this is going to fail.
jeff

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: porting EGCS to the Cray T3E
  1998-07-14 16:57     ` Julian C. Cummings
@ 1998-07-14 14:29       ` Jeffrey A Law
  0 siblings, 0 replies; 68+ messages in thread
From: Jeffrey A Law @ 1998-07-14 14:29 UTC (permalink / raw)
  To: Julian C. Cummings; +Cc: egcs

  In message < 9807141424.ZM5539@vapor.acl.lanl.gov >you write:
  > So what would it take for Cygnus to make egcs portable to the Cray T3E?
  > (i.e., how hard is it?, what would it cost?, etc.)
  > egcs would be extremely useful on Cray machines, since Cray CC is terrible
  > and KAI is very slow to update versions of KCC on the T3E.
Note that Cygnus != egcs.

egcs is a project to help build a better free compiler.  Cygnus happens
to be donating various resources to the project (both manpower and
physical resources).

--

Now, having said that, you would need to contact sales@cygnus.com to
get information about a port to the Cray T3E.


jeff

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: porting EGCS to the Cray T3E
       [not found]   ` <law@hurl.cygnus.com>
  1997-12-12 15:46     ` Hermann Lauer
@ 1998-07-14 14:29     ` Julian C. Cummings
  1998-07-14 13:20       ` Jeffrey A Law
  1998-07-14 16:57     ` Julian C. Cummings
                       ` (2 subsequent siblings)
  4 siblings, 1 reply; 68+ messages in thread
From: Julian C. Cummings @ 1998-07-14 14:29 UTC (permalink / raw)
  To: law; +Cc: egcs

On Jul 14, 11:20am, Jeffrey A Law wrote:
> Subject: Re: porting EGCS to the Cray T3E
>
>   In message < 9807141122.ZM4885@vapor.acl.lanl.gov >you write:
>   > I guess I don't understand your comment.  There are other Cray platforms
listed
>   > as possibilities in the config.sub file, such as the Cray X-MP, Y-MP,
Cray  2,
>   > and Cray [C,J,T]90.  Why would these options be listed if Cray is
>   > not supported?
> They may be supported as host or build machines, but that does not
> mean they are supported as a target.
>
> ie, gcc does not know how to generate code for any cray that I'm aware of.

OK.  As far as I can tell, the egcs build instructions do not make any sort of
distinction like this between "host" and "target".  The instructions imply that
these are the same thing.  Nevertheless, I tried the suggestion I received of
setting the host to "alpha-cray-unicosmk" to see if that works.  It does not.
 I get the exact same behavior as before.  It works for a while, then says

Configuration alpha-cray-unicosmk not supported
Configure in /usr/tmp/julianc/EGCS/egcs-objdir-alpha/gcc failed, exiting.

It sounds like you're saying that egcs cannot generate code for a Cray.  But
these are DEC Alpha processors, so I don't see why not.

Is there some sort of verbosity switch I can throw that might tell me why this
is failing?  There is nothing telltale in the config.log file, as you saw.

Julian C.

-- 
Julian C. Cummings
Advanced Computing Laboratory
Los Alamos National Laboratory
(505) 667-6064
julianc@acl.lanl.gov

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: porting EGCS to the Cray T3E
       [not found]   ` <law@hurl.cygnus.com>
  1997-12-12 15:46     ` Hermann Lauer
  1998-07-14 14:29     ` porting EGCS to the Cray T3E Julian C. Cummings
@ 1998-07-14 16:57     ` Julian C. Cummings
  1998-07-14 14:29       ` Jeffrey A Law
       [not found]     ` < 1845.919451010@hurl.cygnus.com >
       [not found]     ` < 13506.920599740@hurl.cygnus.com >
  4 siblings, 1 reply; 68+ messages in thread
From: Julian C. Cummings @ 1998-07-14 16:57 UTC (permalink / raw)
  To: law; +Cc: egcs

On Jul 14, 12:00pm, Jeffrey A Law wrote:
> Subject: Re: porting EGCS to the Cray T3E
>
>   >  I get the exact same behavior as before.  It works for a while, then
says
>   >
>   > Configuration alpha-cray-unicosmk not supported
> This isn't going to help your problem -- there is no support for cray
> targets.  As long as you continue to try and build for a cray target
> this is going to fail.
> jeff

I see now why choosing "alpha" as a target won't work.  There is a small
assembly language program in config.guess used to distinguish different
types of Alpha processors.  But this does not compile on the T3E; the
assembly language is different.

So what would it take for Cygnus to make egcs portable to the Cray T3E?
(i.e., how hard is it?, what would it cost?, etc.)
egcs would be extremely useful on Cray machines, since Cray CC is terrible
and KAI is very slow to update versions of KCC on the T3E.

Julian C.

-- 
Julian C. Cummings
Advanced Computing Laboratory
Los Alamos National Laboratory
(505) 667-6064
julianc@acl.lanl.gov

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Q] alpha egc -> motorolla dragonball
       [not found] <19990218210259.A720@loki.midheimar>
@ 1999-02-19  0:09 ` Scott Howard
       [not found]   ` < 36CD1CD3.1FC47334@objsw.com >
  1999-02-28 22:53   ` Scott Howard
  0 siblings, 2 replies; 68+ messages in thread
From: Scott Howard @ 1999-02-19  0:09 UTC (permalink / raw)
  To: crossgcc, egcs

I haven't tried it, so I'm not really on top of the details, but the gnu
documentation warns about problems when you host a cross-compiler for a
32-bit target on a 64-bit host (which I believe applies to the Alpha).

These issues may soon be (or may have already been) resolved by the EGCS
project.  Can anyone on the EGCS list provide some insight?

Kari Davidsson wrote:
> 
> Hi
> 
> I understand that a crosscompiler hosted on Alpha (Linux Alpha) is somewhat
> difficult to build. Is this something that is absolutly undoable?
> Target would be Motorolla dragonball CPU.
> 
> Thanks,
> 
> K.D.
> _______________________________________________
> New CrossGCC FAQ: http://www.objsw.com/CrossGCC
> _______________________________________________
> To remove yourself from the crossgcc list, send
> mail to crossgcc-request@cygnus.com with the
> text 'unsubscribe' (without the quotes) in the
> body of the message.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Q] alpha egc -> motorolla dragonball
       [not found]   ` < 36CD1CD3.1FC47334@objsw.com >
@ 1999-02-19 11:04     ` Jeffrey A Law
  1999-02-28 22:53       ` Jeffrey A Law
  0 siblings, 1 reply; 68+ messages in thread
From: Jeffrey A Law @ 1999-02-19 11:04 UTC (permalink / raw)
  To: Scott Howard; +Cc: crossgcc, egcs

  In message < 36CD1CD3.1FC47334@objsw.com >you write:
  > I haven't tried it, so I'm not really on top of the details, but the gnu
  > documentation warns about problems when you host a cross-compiler for a
  > 32-bit target on a 64-bit host (which I believe applies to the Alpha).
  > 
  > These issues may soon be (or may have already been) resolved by the EGCS
  > project.  Can anyone on the EGCS list provide some insight?
What problems are you referring to?  As far as I know it's supposed to work.

It's had bugs in the past, and no doubt it'll have bugs in the future, but
that's no different than building a 32x32 or 32x64 cross compiler.



jeff

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Q] alpha egc -> motorola dragonball
       [not found]     ` < 1845.919451010@hurl.cygnus.com >
@ 1999-02-19 11:09       ` David Edelsohn
  1999-02-28 22:53         ` David Edelsohn
  0 siblings, 1 reply; 68+ messages in thread
From: David Edelsohn @ 1999-02-19 11:09 UTC (permalink / raw)
  To: Scott Howard; +Cc: crossgcc, egcs

	64-bit -> 32-bit cross-compiling is not inherently a problem.
Some ports are not 64-bit safe.  The PowerPC port in egcs-1.1 release is
not 64-bit safe although the development sources should be fixed.  The
ability to cross-compile from a 64-bit host to a 32-bit target is not a
fundamental limitation in EGCS, but it does depend on the particular port
in question.

David

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Q] alpha egc -> motorola dragonball
  1999-02-19 11:09       ` [Q] alpha egc -> motorola dragonball David Edelsohn
@ 1999-02-28 22:53         ` David Edelsohn
  0 siblings, 0 replies; 68+ messages in thread
From: David Edelsohn @ 1999-02-28 22:53 UTC (permalink / raw)
  To: Scott Howard; +Cc: crossgcc, egcs

	64-bit -> 32-bit cross-compiling is not inherently a problem.
Some ports are not 64-bit safe.  The PowerPC port in egcs-1.1 release is
not 64-bit safe although the development sources should be fixed.  The
ability to cross-compile from a 64-bit host to a 32-bit target is not a
fundamental limitation in EGCS, but it does depend on the particular port
in question.

David

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Q] alpha egc -> motorolla dragonball
  1999-02-19 11:04     ` Jeffrey A Law
@ 1999-02-28 22:53       ` Jeffrey A Law
  0 siblings, 0 replies; 68+ messages in thread
From: Jeffrey A Law @ 1999-02-28 22:53 UTC (permalink / raw)
  To: Scott Howard; +Cc: crossgcc, egcs

  In message < 36CD1CD3.1FC47334@objsw.com >you write:
  > I haven't tried it, so I'm not really on top of the details, but the gnu
  > documentation warns about problems when you host a cross-compiler for a
  > 32-bit target on a 64-bit host (which I believe applies to the Alpha).
  > 
  > These issues may soon be (or may have already been) resolved by the EGCS
  > project.  Can anyone on the EGCS list provide some insight?
What problems are you referring to?  As far as I know it's supposed to work.

It's had bugs in the past, and no doubt it'll have bugs in the future, but
that's no different than building a 32x32 or 32x64 cross compiler.



jeff

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Q] alpha egc -> motorolla dragonball
  1999-02-19  0:09 ` [Q] alpha egc -> motorolla dragonball Scott Howard
       [not found]   ` < 36CD1CD3.1FC47334@objsw.com >
@ 1999-02-28 22:53   ` Scott Howard
  1 sibling, 0 replies; 68+ messages in thread
From: Scott Howard @ 1999-02-28 22:53 UTC (permalink / raw)
  To: crossgcc, egcs

I haven't tried it, so I'm not really on top of the details, but the gnu
documentation warns about problems when you host a cross-compiler for a
32-bit target on a 64-bit host (which I believe applies to the Alpha).

These issues may soon be (or may have already been) resolved by the EGCS
project.  Can anyone on the EGCS list provide some insight?

Kari Davidsson wrote:
> 
> Hi
> 
> I understand that a crosscompiler hosted on Alpha (Linux Alpha) is somewhat
> difficult to build. Is this something that is absolutly undoable?
> Target would be Motorolla dragonball CPU.
> 
> Thanks,
> 
> K.D.
> _______________________________________________
> New CrossGCC FAQ: http://www.objsw.com/CrossGCC
> _______________________________________________
> To remove yourself from the crossgcc list, send
> mail to crossgcc-request@cygnus.com with the
> text 'unsubscribe' (without the quotes) in the
> body of the message.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* gcc-2.7 creates faster code than pgcc-1.1.1
@ 1999-03-04  3:40 Терехин Вячеслав
       [not found] ` < 001401be6633$fed21a60$a18330d4@main.medtech.ru >
  1999-03-31 23:46 ` Терехин Вячеслав
  0 siblings, 2 replies; 68+ messages in thread
From: Терехин Вячеслав @ 1999-03-04  3:40 UTC (permalink / raw)
  To: egcs

As I wrote previously gcc-2.7.2.3 generates faster gzip
than egcs-1.1.1/pgcc-1.1.1 on PentiumPro.
The slowdown is greater than 10% on decompression operation.
This can be easily checked if you have RedHat 5.2.
The shipped gzip is gcc-2.7.2.3 compiled.

After several day of search I finally find out offending
instruction that slow down gzip compiled with egcs-1.1.1/pgcc-1.1.1
on PentiumPro 180MHz (132MB RAM) but the result seems crazy to me.

This instruction is:
andl $255, %eax
in flush_window (util.c) function body (it is inlined from updcrc)

if you manually replace it with
movzbl %al, $eax
this will boost decompression by 20%.

All the below staff is made in gzip-1.2.4a source folder.

$ make CFLAGS="-O6 -mpentiumpro"
$ time ./gzip -cd egcs-1.1.1.tar.gz > /dev/null

real    0m8.047s
user    0m7.970s
sys     0m0.070s

$time ./gzip -c egcs-1.1.1.tar.gz > /dev/null

real    0m12.646s
user    0m12.470s
sys     0m0.160s

$
gcc -c -DASMV -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DDIRENT=1 -O6 -mpentiumpro
util.c -S
$ sed 's/andl $255,%eax/movzbl %al, %eax/g' util.s > util.S
$
gcc -c -DASMV -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DDIRENT=1 -O6 -mpentiumpro
util.S
$ make CFLAGS="-O6 -mpentiumpro"

$ time ./gzip -cd egcs-1.1.1.tar.gz > /dev/null

real    0m6.658s
user    0m6.540s
sys     0m0.110s

$ time ./gzip -c egcs-1.1.1.tar.gz > /dev/null

real    0m12.688s
user    0m12.490s
sys     0m0.180s

All this staff do not apply to Pentium processor as far as I know
(I test it Pentium MMX 200MHz)

I do not know why this happens.
Anybody who knows how to deal with it, please, reply me
as soon as possible.

And finally if you have Pentium Pro or Pentium II please
do this check and report result to me.
I wonder whether I have brain damaged Pentium Pro.

Sincerely Yours, Eugene.

PS I am not on this mailing list.
Also it will be better if you will sent reply to bom@classic.iki.rssi.ru
I can not use it directly as mail can not be delivered by it to this list.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
       [not found] ` < 001401be6633$fed21a60$a18330d4@main.medtech.ru >
@ 1999-03-04 13:20   ` Jamie Lokier
       [not found]     ` < 19990304222018.A21939@pcep-jamie.cern.ch >
  1999-03-31 23:46     ` Jamie Lokier
  0 siblings, 2 replies; 68+ messages in thread
From: Jamie Lokier @ 1999-03-04 13:20 UTC (permalink / raw)
  To: ÃƒÂ´ÃƒÂ…ÃƒÂ’ÃƒÂ…ÃƒÂˆÃƒÂ‰ÃƒÂŽ
	ÃƒÂ·ÃƒÂ‘ÃƒÂžÃƒÂ…ÃƒÂ“ÃƒÂŒÃƒÂÃƒÂ—,
	egcs

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 801 bytes --]

Ã´Ã…Ã’Ã…ÃˆÃ‰ÃŽ Ã·Ã‘ÃžÃ…Ã“ÃŒÃÃ— wrote:
> After several day of search I finally find out offending
> instruction that slow down gzip compiled with egcs-1.1.1/pgcc-1.1.1
> on PentiumPro 180MHz (132MB RAM) but the result seems crazy to me.
> 
> This instruction is:
> andl $255, %eax
> in flush_window (util.c) function body (it is inlined from updcrc)
> 
> if you manually replace it with
> movzbl %al, $eax
> this will boost decompression by 20%.

In the past I have written hand-optimised assembly language, tuned for
the different x86 families, and I found movzbl to be a very effective
instruction on the Pentium Pro.  So what you describe sounds correct.

Another is to do xorl %eax,%eax just before loading something into %al.
That is fast on the PPro too.

-- Jamie

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
       [not found]     ` < 19990304222018.A21939@pcep-jamie.cern.ch >
@ 1999-03-04 17:05       ` Zack Weinberg
       [not found]         ` < 199903050104.UAA15335@octiron.phys.columbia.edu >
  1999-03-31 23:46         ` Zack Weinberg
  0 siblings, 2 replies; 68+ messages in thread
From: Zack Weinberg @ 1999-03-04 17:05 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: egcs

On Thu, 4 Mar 1999 22:20:18 +0100, Jamie Lokier wrote:
>> After several day of search I finally find out offending
>> instruction that slow down gzip compiled with egcs-1.1.1/pgcc-1.1.1
>> on PentiumPro 180MHz (132MB RAM) but the result seems crazy to me.
>> 
>> This instruction is:
>> andl $255, %eax
>> in flush_window (util.c) function body (it is inlined from updcrc)
>> 
>> if you manually replace it with
>> movzbl %al, $eax
>> this will boost decompression by 20%.
>
>In the past I have written hand-optimised assembly language, tuned for
>the different x86 families, and I found movzbl to be a very effective
>instruction on the Pentium Pro.  So what you describe sounds correct.
>
>Another is to do xorl %eax,%eax just before loading something into %al.
>That is fast on the PPro too.

A related issue:  I see us generate a lot of code like this for loops over
strings:

loop:
	xorl %eax, %eax
	movb (%esi), %al
	incl %esi
	movb %al, (%edi)
	incl %edi
	testl %eax
	jne loop

After the first iteration, the xorl is unnecessary.  We ought to be able to
hoist it out of the loop.

zw

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
       [not found]         ` < 199903050104.UAA15335@octiron.phys.columbia.edu >
@ 1999-03-04 18:09           ` Jeffrey A Law
  1999-03-31 23:46             ` Jeffrey A Law
  0 siblings, 1 reply; 68+ messages in thread
From: Jeffrey A Law @ 1999-03-04 18:09 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Jamie Lokier, egcs

  In message < 199903050104.UAA15335@octiron.phys.columbia.edu >you write:
  > A related issue:  I see us generate a lot of code like this for loops over
  > strings:
  > 
  > loop:
  > 	xorl %eax, %eax
  > 	movb (%esi), %al
  > 	incl %esi
  > 	movb %al, (%edi)
  > 	incl %edi
  > 	testl %eax
  > 	jne loop
  > 
  > After the first iteration, the xorl is unnecessary.  We ought to be able to
  > hoist it out of the loop.
True, but I don't believe the loop optimizer is prepared to do that since it
doesn't keep track of what bits are active vs what bits are inactive in a 
value.

jeff

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
       [not found]     ` < 13506.920599740@hurl.cygnus.com >
@ 1999-03-04 20:04       ` David Edelsohn
       [not found]         ` < 9903050403.AA36338@marc.watson.ibm.com >
  1999-03-31 23:46         ` David Edelsohn
  0 siblings, 2 replies; 68+ messages in thread
From: David Edelsohn @ 1999-03-04 20:04 UTC (permalink / raw)
  To: law; +Cc: Zack Weinberg, Jamie Lokier, egcs

>>>>> Jeffrey A Law writes:

Jeff> True, but I don't believe the loop optimizer is prepared to do that since it
Jeff> doesn't keep track of what bits are active vs what bits are inactive in a 
Jeff> value.

	GCC is missing a general feature of value propagation which would
help with a lot of optimizations like this.  Hopefully this infrastructure
will be added or contributed someday.

David

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
       [not found]         ` < 9903050403.AA36338@marc.watson.ibm.com >
@ 1999-03-04 20:31           ` Jeffrey A Law
       [not found]             ` < 13939.920608288@hurl.cygnus.com >
  1999-03-31 23:46             ` Jeffrey A Law
  1999-03-07 11:01           ` Zack Weinberg
  1 sibling, 2 replies; 68+ messages in thread
From: Jeffrey A Law @ 1999-03-04 20:31 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Zack Weinberg, Jamie Lokier, egcs

  In message < 9903050403.AA36338@marc.watson.ibm.com >you write:
  > 	GCC is missing a general feature of value propagation which would
  > help with a lot of optimizations like this.  Hopefully this infrastructure
  > will be added or contributed someday.
Most of the papers I've read have indicated only trivial gains from value
range propagation.  I've also had discussions with folks that have implemented
this opt in a commercial compiler -- it's so minor of a win that they didn't
consider it worth the effort.

It's best use appears to be for optimizing array bounds checking in languages
that require such checks.


jeff

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
       [not found]             ` < 13939.920608288@hurl.cygnus.com >
@ 1999-03-05  6:53               ` craig
       [not found]                 ` < 19990305143358.4747.qmail@deer >
  1999-03-31 23:46                 ` craig
  0 siblings, 2 replies; 68+ messages in thread
From: craig @ 1999-03-05  6:53 UTC (permalink / raw)
  To: law; +Cc: craig

>Most of the papers I've read have indicated only trivial gains from value
>range propagation.  I've also had discussions with folks that have implemented
>this opt in a commercial compiler -- it's so minor of a win that they didn't
>consider it worth the effort.

If that we really true *generally*, i.e. for floating-point as well,
then we could happily make -mieee the default for egcs/gcc on Alphas,
for example.  (This would make Alpha-generated floating-point code
much slower, of course, which is why I mention it: if it was possible
to use value-range propagation to determine when the special code-
generation normally needed for full IEEE range wasn't needed after all,
then some of that performance could be regained.)

But, I have to admit I can't document any really clear performance
wins from my own hand-tuned assembly code (written over the past few
decades) deriving solely from value-range propagation, especially
for *integer* (versus floating-point) values.

Generally, I think being able to avoid even checking for exceptional
values might be one source of performance wins from this: e.g. if
we knew it was "impossible" to divide by zero here, square-root a
negative number there, and so on, we could save generated extra
code, thus making Icache misses theoretically less frequent.

        tq vm, (burley)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
       [not found]                 ` < 19990305143358.4747.qmail@deer >
@ 1999-03-05  9:30                   ` Jeffrey A Law
       [not found]                     ` < 15755.920655014@hurl.cygnus.com >
  1999-03-31 23:46                     ` Jeffrey A Law
  0 siblings, 2 replies; 68+ messages in thread
From: Jeffrey A Law @ 1999-03-05  9:30 UTC (permalink / raw)
  To: craig; +Cc: dje, zack, egcs, egcs

  In message < 19990305143358.4747.qmail@deer >you write:
  > If that we really true *generally*, i.e. for floating-point as well,
  > then we could happily make -mieee the default for egcs/gcc on Alphas,
  > for example.  (This would make Alpha-generated floating-point code
  > much slower, of course, which is why I mention it: if it was possible
  > to use value-range propagation to determine when the special code-
  > generation normally needed for full IEEE range wasn't needed after all,
  > then some of that performance could be regained.)
None of the papers discussed it in a floating point context, merely from an
integer context.  The basics were you had a range [min,max] and a bit which
indicated that the value was in or out of the range which was built up on a
basic block level for each expression.

The ranges were then propagated through the flow graph like any other local 
local property.  A trivial example, at a flow merge point where we had
one path with a range 0..4 inclusive and a range 6...12 inclusive the
resulting range would be 0..12 inclusive. [ It didn't try to track the hole
at 5. ]

As you mention it could be used to detect and eliminate things like domain
checks which depend solely on the range, not the precision.

Knowing that certain numbers can't be a NaN or Inf would lead to being able
to apply more identity operations on floating point values.

It'd be a lot of work though.  I suspect there's other lower hanging fruit
we can/should go after (assignment motion, partial dead code elimination,
sparse conditional constant propagation, etc).

jeff

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
       [not found]                     ` < 15755.920655014@hurl.cygnus.com >
@ 1999-03-05 10:18                       ` Joe Buck
  1999-03-31 23:46                         ` Joe Buck
  1999-03-05 10:19                       ` craig
  1 sibling, 1 reply; 68+ messages in thread
From: Joe Buck @ 1999-03-05 10:18 UTC (permalink / raw)
  To: law; +Cc: craig, dje, zack, egcs, egcs

> None of the papers discussed it in a floating point context, merely from an
> integer context.  The basics were you had a range [min,max] and a bit which
> indicated that the value was in or out of the range which was built up on a
> basic block level for each expression.
> ...
> As you mention it could be used to detect and eliminate things like domain
> checks which depend solely on the range, not the precision.

There's been work on fixed point optimization for embedded DSP that also
propagates precision information.  e.g. we have [min,max,prec] which means
that the value is known to be a multiple of pow(2,prec), as well as being
in [min,max].  The idea is that we want to eliminate both the
overflow-check operations and the rounding operations in generated code.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
       [not found]                     ` < 15755.920655014@hurl.cygnus.com >
  1999-03-05 10:18                       ` Joe Buck
@ 1999-03-05 10:19                       ` craig
  1999-03-31 23:46                         ` craig
  1 sibling, 1 reply; 68+ messages in thread
From: craig @ 1999-03-05 10:19 UTC (permalink / raw)
  To: law; +Cc: craig

>It'd be a lot of work though.  I suspect there's other lower hanging fruit
>we can/should go after (assignment motion, partial dead code elimination,
>sparse conditional constant propagation, etc).

No analysis to offer, but: my instincts say you're right.

        tq vm, (burley)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
       [not found]         ` < 9903050403.AA36338@marc.watson.ibm.com >
  1999-03-04 20:31           ` Jeffrey A Law
@ 1999-03-07 11:01           ` Zack Weinberg
  1999-03-31 23:46             ` Zack Weinberg
  1 sibling, 1 reply; 68+ messages in thread
From: Zack Weinberg @ 1999-03-07 11:01 UTC (permalink / raw)
  To: David Edelsohn; +Cc: law, egcs

On Thu, 04 Mar 1999 23:03:58 -0500, David Edelsohn wrote:
>>>>>> Jeffrey A Law writes:
>
>Jeff> True, but I don't believe the loop optimizer is prepared to do that sinc
>e it
>Jeff> doesn't keep track of what bits are active vs what bits are inactive in 
>a 
>Jeff> value.
>
>	GCC is missing a general feature of value propagation which would
>help with a lot of optimizations like this.  Hopefully this infrastructure
>will be added or contributed someday.

For the case I'm interested in, we don't need general value propagation,
only to recognize that we are loading QImode values into a register without
sign extension.  Mode-based range analysis ought to be simpler than full
value propagation.

If we knew how to generate QImode compares, we wouldn't need to clear the
register at all.  That may be something fixable in i386.md, I'm not sure.

zw

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
  1999-03-05  9:30                   ` Jeffrey A Law
       [not found]                     ` < 15755.920655014@hurl.cygnus.com >
@ 1999-03-31 23:46                     ` Jeffrey A Law
  1 sibling, 0 replies; 68+ messages in thread
From: Jeffrey A Law @ 1999-03-31 23:46 UTC (permalink / raw)
  To: craig; +Cc: dje, zack, egcs, egcs

  In message < 19990305143358.4747.qmail@deer >you write:
  > If that we really true *generally*, i.e. for floating-point as well,
  > then we could happily make -mieee the default for egcs/gcc on Alphas,
  > for example.  (This would make Alpha-generated floating-point code
  > much slower, of course, which is why I mention it: if it was possible
  > to use value-range propagation to determine when the special code-
  > generation normally needed for full IEEE range wasn't needed after all,
  > then some of that performance could be regained.)
None of the papers discussed it in a floating point context, merely from an
integer context.  The basics were you had a range [min,max] and a bit which
indicated that the value was in or out of the range which was built up on a
basic block level for each expression.

The ranges were then propagated through the flow graph like any other local 
local property.  A trivial example, at a flow merge point where we had
one path with a range 0..4 inclusive and a range 6...12 inclusive the
resulting range would be 0..12 inclusive. [ It didn't try to track the hole
at 5. ]

As you mention it could be used to detect and eliminate things like domain
checks which depend solely on the range, not the precision.

Knowing that certain numbers can't be a NaN or Inf would lead to being able
to apply more identity operations on floating point values.

It'd be a lot of work though.  I suspect there's other lower hanging fruit
we can/should go after (assignment motion, partial dead code elimination,
sparse conditional constant propagation, etc).

jeff

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
  1999-03-05 10:19                       ` craig
@ 1999-03-31 23:46                         ` craig
  0 siblings, 0 replies; 68+ messages in thread
From: craig @ 1999-03-31 23:46 UTC (permalink / raw)
  To: law; +Cc: craig

>It'd be a lot of work though.  I suspect there's other lower hanging fruit
>we can/should go after (assignment motion, partial dead code elimination,
>sparse conditional constant propagation, etc).

No analysis to offer, but: my instincts say you're right.

        tq vm, (burley)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
  1999-03-04 20:04       ` gcc-2.7 creates faster code than pgcc-1.1.1 David Edelsohn
       [not found]         ` < 9903050403.AA36338@marc.watson.ibm.com >
@ 1999-03-31 23:46         ` David Edelsohn
  1 sibling, 0 replies; 68+ messages in thread
From: David Edelsohn @ 1999-03-31 23:46 UTC (permalink / raw)
  To: law; +Cc: Zack Weinberg, Jamie Lokier, egcs

>>>>> Jeffrey A Law writes:

Jeff> True, but I don't believe the loop optimizer is prepared to do that since it
Jeff> doesn't keep track of what bits are active vs what bits are inactive in a 
Jeff> value.

	GCC is missing a general feature of value propagation which would
help with a lot of optimizations like this.  Hopefully this infrastructure
will be added or contributed someday.

David

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
  1999-03-04 13:20   ` Jamie Lokier
       [not found]     ` < 19990304222018.A21939@pcep-jamie.cern.ch >
@ 1999-03-31 23:46     ` Jamie Lokier
  1 sibling, 0 replies; 68+ messages in thread
From: Jamie Lokier @ 1999-03-31 23:46 UTC (permalink / raw)
  To: ÃƒÂ´ÃƒÂ…ÃƒÂ’ÃƒÂ…ÃƒÂˆÃƒÂ‰ÃƒÂŽ
	ÃƒÂ·ÃƒÂ‘ÃƒÂžÃƒÂ…ÃƒÂ“ÃƒÂŒÃƒÂÃƒÂ—,
	egcs

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 801 bytes --]

Ã´Ã…Ã’Ã…ÃˆÃ‰ÃŽ Ã·Ã‘ÃžÃ…Ã“ÃŒÃÃ— wrote:
> After several day of search I finally find out offending
> instruction that slow down gzip compiled with egcs-1.1.1/pgcc-1.1.1
> on PentiumPro 180MHz (132MB RAM) but the result seems crazy to me.
> 
> This instruction is:
> andl $255, %eax
> in flush_window (util.c) function body (it is inlined from updcrc)
> 
> if you manually replace it with
> movzbl %al, $eax
> this will boost decompression by 20%.

In the past I have written hand-optimised assembly language, tuned for
the different x86 families, and I found movzbl to be a very effective
instruction on the Pentium Pro.  So what you describe sounds correct.

Another is to do xorl %eax,%eax just before loading something into %al.
That is fast on the PPro too.

-- Jamie

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
  1999-03-04 20:31           ` Jeffrey A Law
       [not found]             ` < 13939.920608288@hurl.cygnus.com >
@ 1999-03-31 23:46             ` Jeffrey A Law
  1 sibling, 0 replies; 68+ messages in thread
From: Jeffrey A Law @ 1999-03-31 23:46 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Zack Weinberg, Jamie Lokier, egcs

  In message < 9903050403.AA36338@marc.watson.ibm.com >you write:
  > 	GCC is missing a general feature of value propagation which would
  > help with a lot of optimizations like this.  Hopefully this infrastructure
  > will be added or contributed someday.
Most of the papers I've read have indicated only trivial gains from value
range propagation.  I've also had discussions with folks that have implemented
this opt in a commercial compiler -- it's so minor of a win that they didn't
consider it worth the effort.

It's best use appears to be for optimizing array bounds checking in languages
that require such checks.


jeff

^ permalink raw reply	[flat|nested] 68+ messages in thread

* gcc-2.7 creates faster code than pgcc-1.1.1
  1999-03-04  3:40 gcc-2.7 creates faster code than pgcc-1.1.1 Терехин Вячеслав
       [not found] ` < 001401be6633$fed21a60$a18330d4@main.medtech.ru >
@ 1999-03-31 23:46 ` Терехин Вячеслав
  1 sibling, 0 replies; 68+ messages in thread
From: Терехин Вячеслав @ 1999-03-31 23:46 UTC (permalink / raw)
  To: egcs

As I wrote previously gcc-2.7.2.3 generates faster gzip
than egcs-1.1.1/pgcc-1.1.1 on PentiumPro.
The slowdown is greater than 10% on decompression operation.
This can be easily checked if you have RedHat 5.2.
The shipped gzip is gcc-2.7.2.3 compiled.

After several day of search I finally find out offending
instruction that slow down gzip compiled with egcs-1.1.1/pgcc-1.1.1
on PentiumPro 180MHz (132MB RAM) but the result seems crazy to me.

This instruction is:
andl $255, %eax
in flush_window (util.c) function body (it is inlined from updcrc)

if you manually replace it with
movzbl %al, $eax
this will boost decompression by 20%.

All the below staff is made in gzip-1.2.4a source folder.

$ make CFLAGS="-O6 -mpentiumpro"
$ time ./gzip -cd egcs-1.1.1.tar.gz > /dev/null

real    0m8.047s
user    0m7.970s
sys     0m0.070s

$time ./gzip -c egcs-1.1.1.tar.gz > /dev/null

real    0m12.646s
user    0m12.470s
sys     0m0.160s

$
gcc -c -DASMV -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DDIRENT=1 -O6 -mpentiumpro
util.c -S
$ sed 's/andl $255,%eax/movzbl %al, %eax/g' util.s > util.S
$
gcc -c -DASMV -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DDIRENT=1 -O6 -mpentiumpro
util.S
$ make CFLAGS="-O6 -mpentiumpro"

$ time ./gzip -cd egcs-1.1.1.tar.gz > /dev/null

real    0m6.658s
user    0m6.540s
sys     0m0.110s

$ time ./gzip -c egcs-1.1.1.tar.gz > /dev/null

real    0m12.688s
user    0m12.490s
sys     0m0.180s

All this staff do not apply to Pentium processor as far as I know
(I test it Pentium MMX 200MHz)

I do not know why this happens.
Anybody who knows how to deal with it, please, reply me
as soon as possible.

And finally if you have Pentium Pro or Pentium II please
do this check and report result to me.
I wonder whether I have brain damaged Pentium Pro.

Sincerely Yours, Eugene.

PS I am not on this mailing list.
Also it will be better if you will sent reply to bom@classic.iki.rssi.ru
I can not use it directly as mail can not be delivered by it to this list.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
  1999-03-04 18:09           ` Jeffrey A Law
@ 1999-03-31 23:46             ` Jeffrey A Law
  0 siblings, 0 replies; 68+ messages in thread
From: Jeffrey A Law @ 1999-03-31 23:46 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Jamie Lokier, egcs

  In message < 199903050104.UAA15335@octiron.phys.columbia.edu >you write:
  > A related issue:  I see us generate a lot of code like this for loops over
  > strings:
  > 
  > loop:
  > 	xorl %eax, %eax
  > 	movb (%esi), %al
  > 	incl %esi
  > 	movb %al, (%edi)
  > 	incl %edi
  > 	testl %eax
  > 	jne loop
  > 
  > After the first iteration, the xorl is unnecessary.  We ought to be able to
  > hoist it out of the loop.
True, but I don't believe the loop optimizer is prepared to do that since it
doesn't keep track of what bits are active vs what bits are inactive in a 
value.

jeff

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
  1999-03-05 10:18                       ` Joe Buck
@ 1999-03-31 23:46                         ` Joe Buck
  0 siblings, 0 replies; 68+ messages in thread
From: Joe Buck @ 1999-03-31 23:46 UTC (permalink / raw)
  To: law; +Cc: craig, dje, zack, egcs, egcs

> None of the papers discussed it in a floating point context, merely from an
> integer context.  The basics were you had a range [min,max] and a bit which
> indicated that the value was in or out of the range which was built up on a
> basic block level for each expression.
> ...
> As you mention it could be used to detect and eliminate things like domain
> checks which depend solely on the range, not the precision.

There's been work on fixed point optimization for embedded DSP that also
propagates precision information.  e.g. we have [min,max,prec] which means
that the value is known to be a multiple of pow(2,prec), as well as being
in [min,max].  The idea is that we want to eliminate both the
overflow-check operations and the rounding operations in generated code.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
  1999-03-04 17:05       ` Zack Weinberg
       [not found]         ` < 199903050104.UAA15335@octiron.phys.columbia.edu >
@ 1999-03-31 23:46         ` Zack Weinberg
  1 sibling, 0 replies; 68+ messages in thread
From: Zack Weinberg @ 1999-03-31 23:46 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: egcs

On Thu, 4 Mar 1999 22:20:18 +0100, Jamie Lokier wrote:
>> After several day of search I finally find out offending
>> instruction that slow down gzip compiled with egcs-1.1.1/pgcc-1.1.1
>> on PentiumPro 180MHz (132MB RAM) but the result seems crazy to me.
>> 
>> This instruction is:
>> andl $255, %eax
>> in flush_window (util.c) function body (it is inlined from updcrc)
>> 
>> if you manually replace it with
>> movzbl %al, $eax
>> this will boost decompression by 20%.
>
>In the past I have written hand-optimised assembly language, tuned for
>the different x86 families, and I found movzbl to be a very effective
>instruction on the Pentium Pro.  So what you describe sounds correct.
>
>Another is to do xorl %eax,%eax just before loading something into %al.
>That is fast on the PPro too.

A related issue:  I see us generate a lot of code like this for loops over
strings:

loop:
	xorl %eax, %eax
	movb (%esi), %al
	incl %esi
	movb %al, (%edi)
	incl %edi
	testl %eax
	jne loop

After the first iteration, the xorl is unnecessary.  We ought to be able to
hoist it out of the loop.

zw

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
  1999-03-07 11:01           ` Zack Weinberg
@ 1999-03-31 23:46             ` Zack Weinberg
  0 siblings, 0 replies; 68+ messages in thread
From: Zack Weinberg @ 1999-03-31 23:46 UTC (permalink / raw)
  To: David Edelsohn; +Cc: law, egcs

On Thu, 04 Mar 1999 23:03:58 -0500, David Edelsohn wrote:
>>>>>> Jeffrey A Law writes:
>
>Jeff> True, but I don't believe the loop optimizer is prepared to do that sinc
>e it
>Jeff> doesn't keep track of what bits are active vs what bits are inactive in 
>a 
>Jeff> value.
>
>	GCC is missing a general feature of value propagation which would
>help with a lot of optimizations like this.  Hopefully this infrastructure
>will be added or contributed someday.

For the case I'm interested in, we don't need general value propagation,
only to recognize that we are loading QImode values into a register without
sign extension.  Mode-based range analysis ought to be simpler than full
value propagation.

If we knew how to generate QImode compares, we wouldn't need to clear the
register at all.  That may be something fixable in i386.md, I'm not sure.

zw

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
  1999-03-05  6:53               ` craig
       [not found]                 ` < 19990305143358.4747.qmail@deer >
@ 1999-03-31 23:46                 ` craig
  1 sibling, 0 replies; 68+ messages in thread
From: craig @ 1999-03-31 23:46 UTC (permalink / raw)
  To: law; +Cc: craig

>Most of the papers I've read have indicated only trivial gains from value
>range propagation.  I've also had discussions with folks that have implemented
>this opt in a commercial compiler -- it's so minor of a win that they didn't
>consider it worth the effort.

If that we really true *generally*, i.e. for floating-point as well,
then we could happily make -mieee the default for egcs/gcc on Alphas,
for example.  (This would make Alpha-generated floating-point code
much slower, of course, which is why I mention it: if it was possible
to use value-range propagation to determine when the special code-
generation normally needed for full IEEE range wasn't needed after all,
then some of that performance could be regained.)

But, I have to admit I can't document any really clear performance
wins from my own hand-tuned assembly code (written over the past few
decades) deriving solely from value-range propagation, especially
for *integer* (versus floating-point) values.

Generally, I think being able to avoid even checking for exceptional
values might be one source of performance wins from this: e.g. if
we knew it was "impossible" to divide by zero here, square-root a
negative number there, and so on, we could save generated extra
code, thus making Icache misses theoretically less frequent.

        tq vm, (burley)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
  1999-03-05  9:18               ` Jeffrey A Law
@ 1999-03-31 23:46                 ` Jeffrey A Law
  0 siblings, 0 replies; 68+ messages in thread
From: Jeffrey A Law @ 1999-03-31 23:46 UTC (permalink / raw)
  To: craig; +Cc: egcs

  In message < 19990305145306.4796.qmail@deer >you write:
  > >It is easier to back it out if we decide it is not a good idea after
  > >all.
  > 
  > Just for my own sanity: we're not in any way seriously considering
  > this *performance* patch for 1.1.2, are we??!
Absolutely not.

Once we can get 1.1.2 finished I'll dig out my work to improve the zero
extension code and submit it.  It's far more complete than HJ's hack.

jeff

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
  1999-03-04 16:04   ` Martin v. Loewis
       [not found]     ` < 199903050001.BAA00973@mira.isdn.cs.tu-berlin.de >
@ 1999-03-31 23:46     ` Martin v. Loewis
  1 sibling, 0 replies; 68+ messages in thread
From: Martin v. Loewis @ 1999-03-31 23:46 UTC (permalink / raw)
  To: hjl; +Cc: medtekh, egcs

> -  if ((TARGET_ZERO_EXTEND_WITH_AND || REGNO (operands[0]) == 0)
> +  if ((TARGET_ZERO_EXTEND_WITH_AND || (0 & REGNO (operands[0]) == 0))

It's late, so I'm probably going to say stupid things, but ...

Isn't (0 & REGNO (operands[0]) == 0) always 0? Why isn't the condition
just deleted?

Curious,
Martin

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
  1999-03-04 14:31 H.J. Lu
       [not found] ` < m10Igdx-000AUaC@shanghai.varesearch.com >
@ 1999-03-31 23:46 ` H.J. Lu
  1 sibling, 0 replies; 68+ messages in thread
From: H.J. Lu @ 1999-03-31 23:46 UTC (permalink / raw)
  To: medtekh; +Cc: egcs

Hi,

It seems that "movzb? %al,%?ax" may be faster than "and? $255,%?ax".
This patch for egcs 1.1.2 seems to make gzip faster.

Thanks.


-- 
H.J. Lu (hjl@gnu.org)
---
Thu Mar  4 14:04:49 1999  H.J. Lu  (hjl@gnu.org)

	* config/i386/i386.md (zero_extendqihi2): Use "and" when target
	and source are both "ax" only if TARGET_ZERO_EXTEND_WITH_AND is
	true.
	(zero_extendqisi2): Likewise.

--- ../../../import/egcs-1.1.x/egcs/gcc/config/i386/i386.md	Sun Feb 14 08:30:40 1999
+++ config/i386/i386.md	Thu Mar  4 13:46:07 1999
@@ -1738,7 +1741,7 @@
   {
   rtx xops[2];
 
-  if ((TARGET_ZERO_EXTEND_WITH_AND || REGNO (operands[0]) == 0)
+  if ((TARGET_ZERO_EXTEND_WITH_AND || (0 & REGNO (operands[0]) == 0))
       && REG_P (operands[1]) 
       && REGNO (operands[0]) == REGNO (operands[1]))
     {
@@ -1819,7 +1822,7 @@
   {
   rtx xops[2];
 
-  if ((TARGET_ZERO_EXTEND_WITH_AND || REGNO (operands[0]) == 0)
+  if ((TARGET_ZERO_EXTEND_WITH_AND || (0 & REGNO (operands[0]) == 0))
       && REG_P (operands[1]) 
       && REGNO (operands[0]) == REGNO (operands[1]))
     {

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
  1999-03-05  9:22   ` Alfred Perlstein
       [not found]     ` < Pine.BSF.3.96.990305121935.7355C-100000@cygnus.rush.net >
@ 1999-03-31 23:46     ` Alfred Perlstein
  1 sibling, 0 replies; 68+ messages in thread
From: Alfred Perlstein @ 1999-03-31 23:46 UTC (permalink / raw)
  To: Терехин
	Вячеслав
  Cc: H.J. Lu, egcs

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1548 bytes --]

On Fri, 5 Mar 1999, [koi8-r] Ã´Ã…Ã’Ã…ÃˆÃ‰ÃŽ Ã·Ã‘ÃžÃ…Ã“ÃŒÃÃ— wrote:

> >Hi,
> >
> >It seems that "movzb? %al,%?ax" may be faster than "and? $255,%?ax".
> >This patch for egcs 1.1.2 seems to make gzip faster.
> >
> >Thanks.
> >
> >
> 
> 
> Yes it maybe, but not allways, this is not the case as you can see from my
> message:
> Decompression becomes faster, while compression becomes slower.
> 
> More over this generally slow down code. I do have my own patch to egcs
> doing
> the same thing as yours. To turn on suppressing of andl in favor of movz of
> use -mextendz-with-movz.
> Compiling of several programms shows general slow down.
> 
> I any way "movzb? %al,%?ax" and "and? $255,%?ax" takes 1 tick both.
> So this is a kind of mistery with this instructions.

I think the magic lies in that with register renaming, instruction
caches and all the 'behind the scenes' optimizations PPro and later
versions of x86 chips can do.  It really should be investigated more.

Stalling any one of thses features can kill performance, any ideas
on what's feature could be hit by this in gzip, but at the same time
cause bad bahavior in other applications?

-Alfred

> 
> As you can see from my message this change in uncompression code
> yields 20% performance boost. At the same time all the loop dealing with crc
> is
> 0x15 bytes long and takes 50% of time. The one instruction from it 5 or 3
> bytes long
> saves 20% total time or 40% of loop time. This can not be at all. But it is.
> 
> Sincerely Yours, Eugene.
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
  1999-03-04 18:08       ` Jeffrey A Law
       [not found]         ` < 13494.920599668@hurl.cygnus.com >
@ 1999-03-31 23:46         ` Jeffrey A Law
  1 sibling, 0 replies; 68+ messages in thread
From: Jeffrey A Law @ 1999-03-31 23:46 UTC (permalink / raw)
  To: Martin v. Loewis; +Cc: hjl, medtekh, egcs

  In message < 199903050001.BAA00973@mira.isdn.cs.tu-berlin.de >you write:
  > > -  if ((TARGET_ZERO_EXTEND_WITH_AND || REGNO (operands[0]) == 0)
  > > +  if ((TARGET_ZERO_EXTEND_WITH_AND || (0 & REGNO (operands[0]) == 0))
  > 
  > It's late, so I'm probably going to say stupid things, but ...
  > 
  > Isn't (0 & REGNO (operands[0]) == 0) always 0? Why isn't the condition
  > just deleted?
Disabling the code like that is actually the wrong thing to do for certain
processor variants.

I need to dust off my changes to this code which do the right thing when
optimizing for size, PPro, Pent and older x86 variants.

It's not as simple as just deleting the test like that.

jeff

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
  1999-03-04 20:14               ` Jeffrey A Law
@ 1999-03-31 23:46                 ` Jeffrey A Law
  0 siblings, 0 replies; 68+ messages in thread
From: Jeffrey A Law @ 1999-03-31 23:46 UTC (permalink / raw)
  To: H.J. Lu; +Cc: egcs

  In message < m10Ilq3-00000YC@ocean.lucon.org >you write:

  > Just keep in mind that this small change speeds up gzip by 10-20% on
  > PPro, P/II and maybe P/III. It may be true also for 386. According
  > to TARGET_ZERO_EXTEND_WITH_AND, only 486 and Pentium seems to prefer
  > "and".
But that does not make the patch correct.  Period.


  > BTW, FWIW, "movzbl" has 3 bytes and "addl" has 5 bytes. So "movzbl" is
  > smaller and faster on 386, PPro, P/II and P/III.
Correct.  That is precisely one of the changes in my patch.

jeff

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
  1999-03-05  6:52           ` craig
       [not found]             ` < 19990305145306.4796.qmail@deer >
@ 1999-03-31 23:46             ` craig
  1 sibling, 0 replies; 68+ messages in thread
From: craig @ 1999-03-31 23:46 UTC (permalink / raw)
  To: egcs; +Cc: craig

>It is easier to back it out if we decide it is not a good idea after
>all.

Just for my own sanity: we're not in any way seriously considering
this *performance* patch for 1.1.2, are we??!

It'd be great to see it, or something like it, in 1.2, of course.

        tq vm, (burley)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
  1999-03-04 23:11 Терехин Вячеслав
       [not found] ` < 005601be66d7$ae033480$288230d4@main.medtech.ru >
@ 1999-03-31 23:46 ` Терехин Вячеслав
  1 sibling, 0 replies; 68+ messages in thread
From: Терехин Вячеслав @ 1999-03-31 23:46 UTC (permalink / raw)
  To: H.J. Lu; +Cc: egcs

[-- Attachment #1: Type: text/plain, Size: 988 bytes --]

>Hi,
>
>It seems that "movzb? %al,%?ax" may be faster than "and? $255,%?ax".
>This patch for egcs 1.1.2 seems to make gzip faster.
>
>Thanks.
>
>


Yes it maybe, but not allways, this is not the case as you can see from my
message:
Decompression becomes faster, while compression becomes slower.

More over this generally slow down code. I do have my own patch to egcs
doing
the same thing as yours. To turn on suppressing of andl in favor of movz of
use -mextendz-with-movz.
Compiling of several programms shows general slow down.

I any way "movzb? %al,%?ax" and "and? $255,%?ax" takes 1 tick both.
So this is a kind of mistery with this instructions.

As you can see from my message this change in uncompression code
yields 20% performance boost. At the same time all the loop dealing with crc
is
0x15 bytes long and takes 50% of time. The one instruction from it 5 or 3
bytes long
saves 20% total time or 40% of loop time. This can not be at all. But it is.

Sincerely Yours, Eugene.

[-- Attachment #2: egcs.patch --]
[-- Type: text/x-diff, Size: 4005 bytes --]

--- egcs-1.1.1/gcc/config/i386/i386.h.orig	Thu Mar  4 10:59:56 1999
+++ egcs-1.1.1/gcc/config/i386/i386.h	Thu Mar  4 13:23:32 1999
@@ -80,7 +80,7 @@
 /* Masks for the -m switches */
 #define MASK_80387		000000000001	/* Hardware floating point */
 #define MASK_NOTUSED1		000000000002	/* bit not currently used */
-#define MASK_NOTUSED2		000000000004	/* bit not currently used */
+#define MASK_EXTENDZ_MOVZ	000000000004	/* Do not generate andl $255, %eax instead of movzbl %al, %eax */
 #define MASK_RTD		000000000010	/* Use ret that pops args */
 #define MASK_ALIGN_DOUBLE	000000000020	/* align doubles to 2 word boundary */
 #define MASK_SVR3_SHLIB		000000000040	/* Uninit locals into bss */
@@ -157,8 +157,9 @@
 #define TARGET_PENTIUMPRO (ix86_cpu == PROCESSOR_PENTIUMPRO)
 #define TARGET_USE_LEAVE (ix86_cpu == PROCESSOR_I386)
 #define TARGET_PUSH_MEMORY (ix86_cpu == PROCESSOR_I386)
-#define TARGET_ZERO_EXTEND_WITH_AND (ix86_cpu != PROCESSOR_I386 \
-				     && ix86_cpu != PROCESSOR_PENTIUMPRO)
+#define TARGET_EXTENDZ_MOVZ	((target_flags & MASK_EXTENDZ_MOVZ) != 0)	/* Generate andl $255, %eax instead of movzbl %al, %eax */
+#define TARGET_ZERO_EXTEND_WITH_AND (!TARGET_EXTENDZ_MOVZ || (ix86_cpu != PROCESSOR_I386 \
+				     && ix86_cpu != PROCESSOR_PENTIUMPRO))
 #define TARGET_DOUBLE_WITH_ADD (ix86_cpu != PROCESSOR_I386)
 #define TARGET_USE_BIT_TEST (ix86_cpu == PROCESSOR_I386)
 #define TARGET_UNROLL_STRLEN (ix86_cpu != PROCESSOR_I386)
@@ -207,6 +208,8 @@
   { "no-debug-arg",		-MASK_DEBUG_ARG },			\
   { "stack-arg-probe",		 MASK_STACK_PROBE },			\
   { "no-stack-arg-probe",	-MASK_STACK_PROBE },			\
+  { "extendz-with-movz",       	 MASK_EXTENDZ_MOVZ },			\
+  { "no-extendz-with-movz",     -MASK_EXTENDZ_MOVZ },			\
   { "windows",			0 },					\
   { "dll",			0 },					\
   SUBTARGET_SWITCHES							\
--- egcs-1.1.1/gcc/config/i386/i386.md.orig	Thu Mar  4 11:00:01 1999
+++ egcs-1.1.1/gcc/config/i386/i386.md	Thu Mar  4 13:09:21 1999
@@ -1674,7 +1674,7 @@
   {
   rtx xops[2];
 
-  if ((TARGET_ZERO_EXTEND_WITH_AND || REGNO (operands[0]) == 0) 
+  if (!TARGET_EXTENDZ_MOVZ && (TARGET_ZERO_EXTEND_WITH_AND || REGNO (operands[0]) == 0)
       && REG_P (operands[1]) && REGNO (operands[0]) == REGNO (operands[1]))
     {
       xops[0] = operands[0];
@@ -1738,7 +1738,7 @@
   {
   rtx xops[2];
 
-  if ((TARGET_ZERO_EXTEND_WITH_AND || REGNO (operands[0]) == 0)
+  if (!TARGET_EXTENDZ_MOVZ && (TARGET_ZERO_EXTEND_WITH_AND || REGNO (operands[0]) == 0)
       && REG_P (operands[1]) 
       && REGNO (operands[0]) == REGNO (operands[1]))
     {
@@ -1819,7 +1819,7 @@
   {
   rtx xops[2];
 
-  if ((TARGET_ZERO_EXTEND_WITH_AND || REGNO (operands[0]) == 0)
+  if (!TARGET_EXTENDZ_MOVZ && (TARGET_ZERO_EXTEND_WITH_AND || REGNO (operands[0]) == 0)
       && REG_P (operands[1]) 
       && REGNO (operands[0]) == REGNO (operands[1]))
     {
@@ -3553,13 +3553,13 @@
   switch (GET_CODE (operands[2]))
     {
     case CONST_INT:
-      if (GET_CODE (operands[0]) == MEM && MEM_VOLATILE_P (operands[0]))
+    if (GET_CODE (operands[0]) == MEM && MEM_VOLATILE_P (operands[0]))
 	break;
       intval = INTVAL (operands[2]);
       /* zero-extend 16->32? */
       if (intval == 0xffff && REG_P (operands[0])
 	  && (! REG_P (operands[1])
-	      || REGNO (operands[0]) != 0 || REGNO (operands[1]) != 0)
+	      || TARGET_EXTENDZ_MOVZ || REGNO (operands[0]) != 0 || REGNO (operands[1]) != 0)
 	  && (!TARGET_ZERO_EXTEND_WITH_AND || ! rtx_equal_p (operands[0], operands[1])))
 	{
 	  /* ??? tege: Should forget CC_STATUS only if we clobber a
@@ -3576,7 +3576,7 @@
       if (intval == 0xff && REG_P (operands[0])
 	  && !(REG_P (operands[1]) && NON_QI_REG_P (operands[1]))
 	  && (! REG_P (operands[1])
-	      || REGNO (operands[0]) != 0 || REGNO (operands[1]) != 0)
+	      || TARGET_EXTENDZ_MOVZ || REGNO (operands[0]) != 0 || REGNO (operands[1]) != 0)
 	  && (!TARGET_ZERO_EXTEND_WITH_AND || ! rtx_equal_p (operands[0], operands[1])))
 	{
 	  /* ??? tege: Should forget CC_STATUS only if we clobber a

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
  1999-03-04 17:03           ` Joe Buck
       [not found]             ` < 199903050102.RAA06944@atrus.synopsys.com >
@ 1999-03-31 23:46             ` Joe Buck
  1 sibling, 0 replies; 68+ messages in thread
From: Joe Buck @ 1999-03-31 23:46 UTC (permalink / raw)
  To: H.J. Lu; +Cc: martin, medtekh, egcs

> > Isn't (0 & REGNO (operands[0]) == 0) always 0? Why isn't the condition
> > just deleted?
> > 
> 
> It is easier to back it out if we decide it is not a good idea after
> all.

If so, isn't the usual gcc convention to do that
#if 0
#endif

?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
  1999-03-09  1:19 Ã’Ã¥Ã°Ã¥ÃµÃ¨Ã Ã‚Ã¿Ã·Ã¥Ã±Ã«Ã Ã¢
@ 1999-03-31 23:46 ` Ã’Ã¥Ã°Ã¥ÃµÃ¨Ã Ã‚Ã¿Ã·Ã¥Ã±Ã«Ã Ã¢
  0 siblings, 0 replies; 68+ messages in thread
From: ÃƒÂ’ÃƒÂ¥ÃƒÂ°ÃƒÂ¥ÃƒÂµÃƒÂ¨ÃƒÂ ÃƒÂ‚ÃƒÂ¿ÃƒÂ·ÃƒÂ¥ÃƒÂ±ÃƒÂ«ÃƒÂ ÃƒÂ¢ @ 1999-03-31 23:46 UTC (permalink / raw)
  To: Richard Henderson, Alfred Perlstein; +Cc: H.J. Lu, egcs

>On Fri, Mar 05, 1999 at 12:23:46PM -0500, Alfred Perlstein wrote:
>> > I any way "movzb? %al,%?ax" and "and? $255,%?ax" takes 1 tick both.
>> > So this is a kind of mistery with this instructions.
>>
>> I think the magic lies in that with register renaming, instruction
>> caches and all the 'behind the scenes' optimizations PPro and later
>> versions of x86 chips can do.  It really should be investigated more.
>
>It has nothing to do with register renaming.
>
>It is most likely to be related to instruction alignment -- some
>important insn in the loop is straddling a 16-byte boundary, which
>requires an extra cycle to decode.
>
>I've seen such create up to a 20% difference in runtime on a small loop.
>


It has nothing to deal with para boundary. In movz case xorb insn crosses
para boundary
while with andl no insn crosses para boundary.

Sincerely Yours, Eugene.

P.S. For H.J.Lu -- I do not state that things go slower with movz. Slow down
I get were 1% (this can be statistical error). Nevertheless there is no
speed up in
most cases too (or such a huge speed up as with decompression).
We should try to find out more why and how this happens.
BTW I have PPro 180MHz.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
  1999-03-04 17:06               ` H.J. Lu
@ 1999-03-31 23:46                 ` H.J. Lu
  0 siblings, 0 replies; 68+ messages in thread
From: H.J. Lu @ 1999-03-31 23:46 UTC (permalink / raw)
  To: Joe Buck; +Cc: egcs

> 
> 
> > > Isn't (0 & REGNO (operands[0]) == 0) always 0? Why isn't the condition
> > > just deleted?
> > > 
> > 
> > It is easier to back it out if we decide it is not a good idea after
> > all.
> 
> If so, isn't the usual gcc convention to do that
> #if 0
> #endif
> 

I thought it was the convention for i386. At least, that is what
was done to TARGET_CMOVE/CC_FCOMI.

-- 
H.J. Lu (hjl@gnu.org)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
  1999-03-05 15:52   ` H.J. Lu
@ 1999-03-31 23:46     ` H.J. Lu
  0 siblings, 0 replies; 68+ messages in thread
From: H.J. Lu @ 1999-03-31 23:46 UTC (permalink / raw)
  To: Терехин
	Вячеслав
  Cc: hjl, egcs

> 
> 
> Yes it maybe, but not allways, this is not the case as you can see from my
> message:
> Decompression becomes faster, while compression becomes slower.
> 

That is not the case for me. With my patch, I got:

# time ./gzip.new    -cd egcs-1.1.1.tar.gz  >! /dev/null
./gzip.new -cd egcs-1.1.1.tar.gz >| /dev/null  2.71s user 0.05s system 99% cpu 2.778 total
# time ./gzip.new    -cd egcs-1.1.1.tar.gz  >! /dev/null
./gzip.new -cd egcs-1.1.1.tar.gz >| /dev/null  2.73s user 0.04s system 99% cpu 2.776 total
# time ./gzip.new    -cd egcs-1.1.1.tar.gz  >! /dev/null
./gzip.new -c egcs-1.1.1.tar.gz >| /dev/null  5.15s user 0.08s system 101% cpu 5.167 total
# time ./gzip.new    -c egcs-1.1.1.tar.gz  >! /dev/null
./gzip.new -c egcs-1.1.1.tar.gz >| /dev/null  5.14s user 0.06s system 99% cpu 5.211 total
# time ./gzip.new    -c egcs-1.1.1.tar.gz  >! /dev/null
./gzip.new -c egcs-1.1.1.tar.gz >| /dev/null  5.11s user 0.08s system 100% cpu 5.179 total

Without my patch, I got

# time ./gzip.1.1.2     -cd egcs-1.1.1.tar.gz  >! /dev/null
./gzip.1.1.2 -cd egcs-1.1.1.tar.gz >| /dev/null  3.34s user 0.03s system 100% cpu 3.367 total
# time ./gzip.1.1.2     -cd egcs-1.1.1.tar.gz  >! /dev/null
./gzip.1.1.2 -cd egcs-1.1.1.tar.gz >| /dev/null  3.36s user 0.03s system 99% cpu 3.406 total
# time ./gzip.1.1.2     -cd egcs-1.1.1.tar.gz  >! /dev/null
./gzip.1.1.2 -cd egcs-1.1.1.tar.gz >| /dev/null  3.32s user 0.08s system 100% cpu 3.399 total
# time ./gzip.1.1.2     -c egcs-1.1.1.tar.gz  >! /dev/null 
./gzip.1.1.2 -c egcs-1.1.1.tar.gz >| /dev/null  5.29s user 0.04s system 102% cpu 5.214 total
# time ./gzip.1.1.2     -c egcs-1.1.1.tar.gz  >! /dev/null 
./gzip.1.1.2 -c egcs-1.1.1.tar.gz >| /dev/null  5.30s user 0.03s system 100% cpu 5.327 total
# time ./gzip.1.1.2     -c egcs-1.1.1.tar.gz  >! /dev/null 
./gzip.1.1.2 -c egcs-1.1.1.tar.gz >| /dev/null  5.29s user 0.06s system 100% cpu 5.328 total

As you can see, my patch makes both decompression and compression
faster. BTW, it is done under dual Xeon 450Mhz.


H.J.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
  1999-03-04 20:03           ` H.J. Lu
       [not found]             ` < m10Ilq3-00000YC@ocean.lucon.org >
@ 1999-03-31 23:46             ` H.J. Lu
  1 sibling, 0 replies; 68+ messages in thread
From: H.J. Lu @ 1999-03-31 23:46 UTC (permalink / raw)
  To: law; +Cc: egcs

> 
> 
>   In message < 199903050001.BAA00973@mira.isdn.cs.tu-berlin.de >you write:
>   > > -  if ((TARGET_ZERO_EXTEND_WITH_AND || REGNO (operands[0]) == 0)
>   > > +  if ((TARGET_ZERO_EXTEND_WITH_AND || (0 & REGNO (operands[0]) == 0))
>   > 
>   > It's late, so I'm probably going to say stupid things, but ...
>   > 
>   > Isn't (0 & REGNO (operands[0]) == 0) always 0? Why isn't the condition
>   > just deleted?
> Disabling the code like that is actually the wrong thing to do for certain
> processor variants.
> 
> I need to dust off my changes to this code which do the right thing when
> optimizing for size, PPro, Pent and older x86 variants.
> 
> It's not as simple as just deleting the test like that.
> 

Just keep in mind that this small change speeds up gzip by 10-20% on
PPro, P/II and maybe P/III. It may be true also for 386. According
to TARGET_ZERO_EXTEND_WITH_AND, only 486 and Pentium seems to prefer
"and".

BTW, FWIW, "movzbl" has 3 bytes and "addl" has 5 bytes. So "movzbl" is
smaller and faster on 386, PPro, P/II and P/III.


-- 
H.J. Lu (hjl@gnu.org)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
  1999-03-05 13:02       ` Richard Henderson
@ 1999-03-31 23:46         ` Richard Henderson
  0 siblings, 0 replies; 68+ messages in thread
From: Richard Henderson @ 1999-03-31 23:46 UTC (permalink / raw)
  To: Alfred Perlstein,
	ÃƒÂ´ÃƒÂ…ÃƒÂ’ÃƒÂ…ÃƒÂˆÃƒÂ‰ÃƒÂŽ
	ÃƒÂ·ÃƒÂ‘ÃƒÂžÃƒÂ…ÃƒÂ“ÃƒÂŒÃƒÂÃƒÂ—
  Cc: H.J. Lu, egcs

On Fri, Mar 05, 1999 at 12:23:46PM -0500, Alfred Perlstein wrote:
> > I any way "movzb? %al,%?ax" and "and? $255,%?ax" takes 1 tick both.
> > So this is a kind of mistery with this instructions.
> 
> I think the magic lies in that with register renaming, instruction
> caches and all the 'behind the scenes' optimizations PPro and later
> versions of x86 chips can do.  It really should be investigated more.

It has nothing to do with register renaming. 

It is most likely to be related to instruction alignment -- some
important insn in the loop is straddling a 16-byte boundary, which
requires an extra cycle to decode.

I've seen such create up to a 20% difference in runtime on a small loop.


r~

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
  1999-03-04 16:46       ` H.J. Lu
       [not found]         ` < m10Iil3-000393C@ocean.lucon.org >
@ 1999-03-31 23:46         ` H.J. Lu
  1 sibling, 0 replies; 68+ messages in thread
From: H.J. Lu @ 1999-03-31 23:46 UTC (permalink / raw)
  To: Martin v. Loewis; +Cc: medtekh, egcs

> 
> > -  if ((TARGET_ZERO_EXTEND_WITH_AND || REGNO (operands[0]) == 0)
> > +  if ((TARGET_ZERO_EXTEND_WITH_AND || (0 & REGNO (operands[0]) == 0))
> 
> It's late, so I'm probably going to say stupid things, but ...
> 
> Isn't (0 & REGNO (operands[0]) == 0) always 0? Why isn't the condition
> just deleted?
> 

It is easier to back it out if we decide it is not a good idea after
all.

-- 
H.J. Lu (hjl@gnu.org)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
@ 1999-03-09  1:19 Ã’Ã¥Ã°Ã¥ÃµÃ¨Ã Ã‚Ã¿Ã·Ã¥Ã±Ã«Ã Ã¢
  1999-03-31 23:46 ` Ã’Ã¥Ã°Ã¥ÃµÃ¨Ã Ã‚Ã¿Ã·Ã¥Ã±Ã«Ã Ã¢
  0 siblings, 1 reply; 68+ messages in thread
From: ÃƒÂ’ÃƒÂ¥ÃƒÂ°ÃƒÂ¥ÃƒÂµÃƒÂ¨ÃƒÂ ÃƒÂ‚ÃƒÂ¿ÃƒÂ·ÃƒÂ¥ÃƒÂ±ÃƒÂ«ÃƒÂ ÃƒÂ¢ @ 1999-03-09  1:19 UTC (permalink / raw)
  To: Richard Henderson, Alfred Perlstein; +Cc: H.J. Lu, egcs

>On Fri, Mar 05, 1999 at 12:23:46PM -0500, Alfred Perlstein wrote:
>> > I any way "movzb? %al,%?ax" and "and? $255,%?ax" takes 1 tick both.
>> > So this is a kind of mistery with this instructions.
>>
>> I think the magic lies in that with register renaming, instruction
>> caches and all the 'behind the scenes' optimizations PPro and later
>> versions of x86 chips can do.  It really should be investigated more.
>
>It has nothing to do with register renaming.
>
>It is most likely to be related to instruction alignment -- some
>important insn in the loop is straddling a 16-byte boundary, which
>requires an extra cycle to decode.
>
>I've seen such create up to a 20% difference in runtime on a small loop.
>


It has nothing to deal with para boundary. In movz case xorb insn crosses
para boundary
while with andl no insn crosses para boundary.

Sincerely Yours, Eugene.

P.S. For H.J.Lu -- I do not state that things go slower with movz. Slow down
I get were 1% (this can be statistical error). Nevertheless there is no
speed up in
most cases too (or such a huge speed up as with decompression).
We should try to find out more why and how this happens.
BTW I have PPro 180MHz.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
       [not found] ` < 005601be66d7$ae033480$288230d4@main.medtech.ru >
  1999-03-05  9:22   ` Alfred Perlstein
@ 1999-03-05 15:52   ` H.J. Lu
  1999-03-31 23:46     ` H.J. Lu
  1 sibling, 1 reply; 68+ messages in thread
From: H.J. Lu @ 1999-03-05 15:52 UTC (permalink / raw)
  To: Терехин
	Вячеслав
  Cc: hjl, egcs

> 
> 
> Yes it maybe, but not allways, this is not the case as you can see from my
> message:
> Decompression becomes faster, while compression becomes slower.
> 

That is not the case for me. With my patch, I got:

# time ./gzip.new    -cd egcs-1.1.1.tar.gz  >! /dev/null
./gzip.new -cd egcs-1.1.1.tar.gz >| /dev/null  2.71s user 0.05s system 99% cpu 2.778 total
# time ./gzip.new    -cd egcs-1.1.1.tar.gz  >! /dev/null
./gzip.new -cd egcs-1.1.1.tar.gz >| /dev/null  2.73s user 0.04s system 99% cpu 2.776 total
# time ./gzip.new    -cd egcs-1.1.1.tar.gz  >! /dev/null
./gzip.new -c egcs-1.1.1.tar.gz >| /dev/null  5.15s user 0.08s system 101% cpu 5.167 total
# time ./gzip.new    -c egcs-1.1.1.tar.gz  >! /dev/null
./gzip.new -c egcs-1.1.1.tar.gz >| /dev/null  5.14s user 0.06s system 99% cpu 5.211 total
# time ./gzip.new    -c egcs-1.1.1.tar.gz  >! /dev/null
./gzip.new -c egcs-1.1.1.tar.gz >| /dev/null  5.11s user 0.08s system 100% cpu 5.179 total

Without my patch, I got

# time ./gzip.1.1.2     -cd egcs-1.1.1.tar.gz  >! /dev/null
./gzip.1.1.2 -cd egcs-1.1.1.tar.gz >| /dev/null  3.34s user 0.03s system 100% cpu 3.367 total
# time ./gzip.1.1.2     -cd egcs-1.1.1.tar.gz  >! /dev/null
./gzip.1.1.2 -cd egcs-1.1.1.tar.gz >| /dev/null  3.36s user 0.03s system 99% cpu 3.406 total
# time ./gzip.1.1.2     -cd egcs-1.1.1.tar.gz  >! /dev/null
./gzip.1.1.2 -cd egcs-1.1.1.tar.gz >| /dev/null  3.32s user 0.08s system 100% cpu 3.399 total
# time ./gzip.1.1.2     -c egcs-1.1.1.tar.gz  >! /dev/null 
./gzip.1.1.2 -c egcs-1.1.1.tar.gz >| /dev/null  5.29s user 0.04s system 102% cpu 5.214 total
# time ./gzip.1.1.2     -c egcs-1.1.1.tar.gz  >! /dev/null 
./gzip.1.1.2 -c egcs-1.1.1.tar.gz >| /dev/null  5.30s user 0.03s system 100% cpu 5.327 total
# time ./gzip.1.1.2     -c egcs-1.1.1.tar.gz  >! /dev/null 
./gzip.1.1.2 -c egcs-1.1.1.tar.gz >| /dev/null  5.29s user 0.06s system 100% cpu 5.328 total

As you can see, my patch makes both decompression and compression
faster. BTW, it is done under dual Xeon 450Mhz.


H.J.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
       [not found]     ` < Pine.BSF.3.96.990305121935.7355C-100000@cygnus.rush.net >
@ 1999-03-05 13:02       ` Richard Henderson
  1999-03-31 23:46         ` Richard Henderson
  0 siblings, 1 reply; 68+ messages in thread
From: Richard Henderson @ 1999-03-05 13:02 UTC (permalink / raw)
  To: Alfred Perlstein,
	ÃƒÂ´ÃƒÂ…ÃƒÂ’ÃƒÂ…ÃƒÂˆÃƒÂ‰ÃƒÂŽ
	ÃƒÂ·ÃƒÂ‘ÃƒÂžÃƒÂ…ÃƒÂ“ÃƒÂŒÃƒÂÃƒÂ—
  Cc: H.J. Lu, egcs

On Fri, Mar 05, 1999 at 12:23:46PM -0500, Alfred Perlstein wrote:
> > I any way "movzb? %al,%?ax" and "and? $255,%?ax" takes 1 tick both.
> > So this is a kind of mistery with this instructions.
> 
> I think the magic lies in that with register renaming, instruction
> caches and all the 'behind the scenes' optimizations PPro and later
> versions of x86 chips can do.  It really should be investigated more.

It has nothing to do with register renaming. 

It is most likely to be related to instruction alignment -- some
important insn in the loop is straddling a 16-byte boundary, which
requires an extra cycle to decode.

I've seen such create up to a 20% difference in runtime on a small loop.


r~

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
       [not found] ` < 005601be66d7$ae033480$288230d4@main.medtech.ru >
@ 1999-03-05  9:22   ` Alfred Perlstein
       [not found]     ` < Pine.BSF.3.96.990305121935.7355C-100000@cygnus.rush.net >
  1999-03-31 23:46     ` Alfred Perlstein
  1999-03-05 15:52   ` H.J. Lu
  1 sibling, 2 replies; 68+ messages in thread
From: Alfred Perlstein @ 1999-03-05  9:22 UTC (permalink / raw)
  To: Терехин
	Вячеслав
  Cc: H.J. Lu, egcs

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1547 bytes --]

On Fri, 5 Mar 1999, [koi8-r] Ã´Ã…Ã’Ã…ÃˆÃ‰ÃŽ Ã·Ã‘ÃžÃ…Ã“ÃŒÃÃ— wrote:

> >Hi,
> >
> >It seems that "movzb? %al,%?ax" may be faster than "and? $255,%?ax".
> >This patch for egcs 1.1.2 seems to make gzip faster.
> >
> >Thanks.
> >
> >
> 
> 
> Yes it maybe, but not allways, this is not the case as you can see from my
> message:
> Decompression becomes faster, while compression becomes slower.
> 
> More over this generally slow down code. I do have my own patch to egcs
> doing
> the same thing as yours. To turn on suppressing of andl in favor of movz of
> use -mextendz-with-movz.
> Compiling of several programms shows general slow down.
> 
> I any way "movzb? %al,%?ax" and "and? $255,%?ax" takes 1 tick both.
> So this is a kind of mistery with this instructions.

I think the magic lies in that with register renaming, instruction
caches and all the 'behind the scenes' optimizations PPro and later
versions of x86 chips can do.  It really should be investigated more.

Stalling any one of thses features can kill performance, any ideas
on what's feature could be hit by this in gzip, but at the same time
cause bad bahavior in other applications?

-Alfred

> 
> As you can see from my message this change in uncompression code
> yields 20% performance boost. At the same time all the loop dealing with crc
> is
> 0x15 bytes long and takes 50% of time. The one instruction from it 5 or 3
> bytes long
> saves 20% total time or 40% of loop time. This can not be at all. But it is.
> 
> Sincerely Yours, Eugene.
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
       [not found]             ` < 19990305145306.4796.qmail@deer >
@ 1999-03-05  9:18               ` Jeffrey A Law
  1999-03-31 23:46                 ` Jeffrey A Law
  0 siblings, 1 reply; 68+ messages in thread
From: Jeffrey A Law @ 1999-03-05  9:18 UTC (permalink / raw)
  To: craig; +Cc: egcs

  In message < 19990305145306.4796.qmail@deer >you write:
  > >It is easier to back it out if we decide it is not a good idea after
  > >all.
  > 
  > Just for my own sanity: we're not in any way seriously considering
  > this *performance* patch for 1.1.2, are we??!
Absolutely not.

Once we can get 1.1.2 finished I'll dig out my work to improve the zero
extension code and submit it.  It's far more complete than HJ's hack.

jeff

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
       [not found]         ` < m10Iil3-000393C@ocean.lucon.org >
  1999-03-04 17:03           ` Joe Buck
@ 1999-03-05  6:52           ` craig
       [not found]             ` < 19990305145306.4796.qmail@deer >
  1999-03-31 23:46             ` craig
  1 sibling, 2 replies; 68+ messages in thread
From: craig @ 1999-03-05  6:52 UTC (permalink / raw)
  To: egcs; +Cc: craig

>It is easier to back it out if we decide it is not a good idea after
>all.

Just for my own sanity: we're not in any way seriously considering
this *performance* patch for 1.1.2, are we??!

It'd be great to see it, or something like it, in 1.2, of course.

        tq vm, (burley)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
@ 1999-03-04 23:11 Терехин Вячеслав
       [not found] ` < 005601be66d7$ae033480$288230d4@main.medtech.ru >
  1999-03-31 23:46 ` Терехин Вячеслав
  0 siblings, 2 replies; 68+ messages in thread
From: Терехин Вячеслав @ 1999-03-04 23:11 UTC (permalink / raw)
  To: H.J. Lu; +Cc: egcs

>Hi,
>
>It seems that "movzb? %al,%?ax" may be faster than "and? $255,%?ax".
>This patch for egcs 1.1.2 seems to make gzip faster.
>
>Thanks.
>
>

Yes it maybe, but not allways, this is not the case as you can see from my
message:
Decompression becomes faster, while compression becomes slower.

More over this generally slow down code. I do have my own patch to egcs
doing
the same thing as yours. To turn on suppressing of andl in favor of movz of
use -mextendz-with-movz.
Compiling of several programms shows general slow down.

I any way "movzb? %al,%?ax" and "and? $255,%?ax" takes 1 tick both.
So this is a kind of mistery with this instructions.

As you can see from my message this change in uncompression code
yields 20% performance boost. At the same time all the loop dealing with crc
is
0x15 bytes long and takes 50% of time. The one instruction from it 5 or 3
bytes long
saves 20% total time or 40% of loop time. This can not be at all. But it is.

Sincerely Yours, Eugene.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
       [not found]             ` < m10Ilq3-00000YC@ocean.lucon.org >
@ 1999-03-04 20:14               ` Jeffrey A Law
  1999-03-31 23:46                 ` Jeffrey A Law
  0 siblings, 1 reply; 68+ messages in thread
From: Jeffrey A Law @ 1999-03-04 20:14 UTC (permalink / raw)
  To: H.J. Lu; +Cc: egcs

  In message < m10Ilq3-00000YC@ocean.lucon.org >you write:

  > Just keep in mind that this small change speeds up gzip by 10-20% on
  > PPro, P/II and maybe P/III. It may be true also for 386. According
  > to TARGET_ZERO_EXTEND_WITH_AND, only 486 and Pentium seems to prefer
  > "and".
But that does not make the patch correct.  Period.


  > BTW, FWIW, "movzbl" has 3 bytes and "addl" has 5 bytes. So "movzbl" is
  > smaller and faster on 386, PPro, P/II and P/III.
Correct.  That is precisely one of the changes in my patch.

jeff

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
       [not found]         ` < 13494.920599668@hurl.cygnus.com >
@ 1999-03-04 20:03           ` H.J. Lu
       [not found]             ` < m10Ilq3-00000YC@ocean.lucon.org >
  1999-03-31 23:46             ` H.J. Lu
  0 siblings, 2 replies; 68+ messages in thread
From: H.J. Lu @ 1999-03-04 20:03 UTC (permalink / raw)
  To: law; +Cc: egcs

> 
> 
>   In message < 199903050001.BAA00973@mira.isdn.cs.tu-berlin.de >you write:
>   > > -  if ((TARGET_ZERO_EXTEND_WITH_AND || REGNO (operands[0]) == 0)
>   > > +  if ((TARGET_ZERO_EXTEND_WITH_AND || (0 & REGNO (operands[0]) == 0))
>   > 
>   > It's late, so I'm probably going to say stupid things, but ...
>   > 
>   > Isn't (0 & REGNO (operands[0]) == 0) always 0? Why isn't the condition
>   > just deleted?
> Disabling the code like that is actually the wrong thing to do for certain
> processor variants.
> 
> I need to dust off my changes to this code which do the right thing when
> optimizing for size, PPro, Pent and older x86 variants.
> 
> It's not as simple as just deleting the test like that.
> 

Just keep in mind that this small change speeds up gzip by 10-20% on
PPro, P/II and maybe P/III. It may be true also for 386. According
to TARGET_ZERO_EXTEND_WITH_AND, only 486 and Pentium seems to prefer
"and".

BTW, FWIW, "movzbl" has 3 bytes and "addl" has 5 bytes. So "movzbl" is
smaller and faster on 386, PPro, P/II and P/III.


-- 
H.J. Lu (hjl@gnu.org)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
       [not found]     ` < 199903050001.BAA00973@mira.isdn.cs.tu-berlin.de >
  1999-03-04 16:46       ` H.J. Lu
@ 1999-03-04 18:08       ` Jeffrey A Law
       [not found]         ` < 13494.920599668@hurl.cygnus.com >
  1999-03-31 23:46         ` Jeffrey A Law
  1 sibling, 2 replies; 68+ messages in thread
From: Jeffrey A Law @ 1999-03-04 18:08 UTC (permalink / raw)
  To: Martin v. Loewis; +Cc: hjl, medtekh, egcs

  In message < 199903050001.BAA00973@mira.isdn.cs.tu-berlin.de >you write:
  > > -  if ((TARGET_ZERO_EXTEND_WITH_AND || REGNO (operands[0]) == 0)
  > > +  if ((TARGET_ZERO_EXTEND_WITH_AND || (0 & REGNO (operands[0]) == 0))
  > 
  > It's late, so I'm probably going to say stupid things, but ...
  > 
  > Isn't (0 & REGNO (operands[0]) == 0) always 0? Why isn't the condition
  > just deleted?
Disabling the code like that is actually the wrong thing to do for certain
processor variants.

I need to dust off my changes to this code which do the right thing when
optimizing for size, PPro, Pent and older x86 variants.

It's not as simple as just deleting the test like that.

jeff

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
       [not found]             ` < 199903050102.RAA06944@atrus.synopsys.com >
@ 1999-03-04 17:06               ` H.J. Lu
  1999-03-31 23:46                 ` H.J. Lu
  0 siblings, 1 reply; 68+ messages in thread
From: H.J. Lu @ 1999-03-04 17:06 UTC (permalink / raw)
  To: Joe Buck; +Cc: egcs

> 
> 
> > > Isn't (0 & REGNO (operands[0]) == 0) always 0? Why isn't the condition
> > > just deleted?
> > > 
> > 
> > It is easier to back it out if we decide it is not a good idea after
> > all.
> 
> If so, isn't the usual gcc convention to do that
> #if 0
> #endif
> 

I thought it was the convention for i386. At least, that is what
was done to TARGET_CMOVE/CC_FCOMI.

-- 
H.J. Lu (hjl@gnu.org)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
       [not found]         ` < m10Iil3-000393C@ocean.lucon.org >
@ 1999-03-04 17:03           ` Joe Buck
       [not found]             ` < 199903050102.RAA06944@atrus.synopsys.com >
  1999-03-31 23:46             ` Joe Buck
  1999-03-05  6:52           ` craig
  1 sibling, 2 replies; 68+ messages in thread
From: Joe Buck @ 1999-03-04 17:03 UTC (permalink / raw)
  To: H.J. Lu; +Cc: martin, medtekh, egcs

> > Isn't (0 & REGNO (operands[0]) == 0) always 0? Why isn't the condition
> > just deleted?
> > 
> 
> It is easier to back it out if we decide it is not a good idea after
> all.

If so, isn't the usual gcc convention to do that
#if 0
#endif

?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
       [not found]     ` < 199903050001.BAA00973@mira.isdn.cs.tu-berlin.de >
@ 1999-03-04 16:46       ` H.J. Lu
       [not found]         ` < m10Iil3-000393C@ocean.lucon.org >
  1999-03-31 23:46         ` H.J. Lu
  1999-03-04 18:08       ` Jeffrey A Law
  1 sibling, 2 replies; 68+ messages in thread
From: H.J. Lu @ 1999-03-04 16:46 UTC (permalink / raw)
  To: Martin v. Loewis; +Cc: medtekh, egcs

> 
> > -  if ((TARGET_ZERO_EXTEND_WITH_AND || REGNO (operands[0]) == 0)
> > +  if ((TARGET_ZERO_EXTEND_WITH_AND || (0 & REGNO (operands[0]) == 0))
> 
> It's late, so I'm probably going to say stupid things, but ...
> 
> Isn't (0 & REGNO (operands[0]) == 0) always 0? Why isn't the condition
> just deleted?
> 

It is easier to back it out if we decide it is not a good idea after
all.

-- 
H.J. Lu (hjl@gnu.org)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
       [not found] ` < m10Igdx-000AUaC@shanghai.varesearch.com >
@ 1999-03-04 16:04   ` Martin v. Loewis
       [not found]     ` < 199903050001.BAA00973@mira.isdn.cs.tu-berlin.de >
  1999-03-31 23:46     ` Martin v. Loewis
  0 siblings, 2 replies; 68+ messages in thread
From: Martin v. Loewis @ 1999-03-04 16:04 UTC (permalink / raw)
  To: hjl; +Cc: medtekh, egcs

> -  if ((TARGET_ZERO_EXTEND_WITH_AND || REGNO (operands[0]) == 0)
> +  if ((TARGET_ZERO_EXTEND_WITH_AND || (0 & REGNO (operands[0]) == 0))

It's late, so I'm probably going to say stupid things, but ...

Isn't (0 & REGNO (operands[0]) == 0) always 0? Why isn't the condition
just deleted?

Curious,
Martin

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: gcc-2.7 creates faster code than pgcc-1.1.1
@ 1999-03-04 14:31 H.J. Lu
       [not found] ` < m10Igdx-000AUaC@shanghai.varesearch.com >
  1999-03-31 23:46 ` H.J. Lu
  0 siblings, 2 replies; 68+ messages in thread
From: H.J. Lu @ 1999-03-04 14:31 UTC (permalink / raw)
  To: medtekh; +Cc: egcs

Hi,

It seems that "movzb? %al,%?ax" may be faster than "and? $255,%?ax".
This patch for egcs 1.1.2 seems to make gzip faster.

Thanks.


-- 
H.J. Lu (hjl@gnu.org)
---
Thu Mar  4 14:04:49 1999  H.J. Lu  (hjl@gnu.org)

	* config/i386/i386.md (zero_extendqihi2): Use "and" when target
	and source are both "ax" only if TARGET_ZERO_EXTEND_WITH_AND is
	true.
	(zero_extendqisi2): Likewise.

--- ../../../import/egcs-1.1.x/egcs/gcc/config/i386/i386.md	Sun Feb 14 08:30:40 1999
+++ config/i386/i386.md	Thu Mar  4 13:46:07 1999
@@ -1738,7 +1741,7 @@
   {
   rtx xops[2];
 
-  if ((TARGET_ZERO_EXTEND_WITH_AND || REGNO (operands[0]) == 0)
+  if ((TARGET_ZERO_EXTEND_WITH_AND || (0 & REGNO (operands[0]) == 0))
       && REG_P (operands[1]) 
       && REGNO (operands[0]) == REGNO (operands[1]))
     {
@@ -1819,7 +1822,7 @@
   {
   rtx xops[2];
 
-  if ((TARGET_ZERO_EXTEND_WITH_AND || REGNO (operands[0]) == 0)
+  if ((TARGET_ZERO_EXTEND_WITH_AND || (0 & REGNO (operands[0]) == 0))
       && REG_P (operands[1]) 
       && REGNO (operands[0]) == REGNO (operands[1]))
     {

^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~1999-03-31 23:46 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-03-04  3:40 gcc-2.7 creates faster code than pgcc-1.1.1 Терехин Вячеслав
     [not found] ` < 001401be6633$fed21a60$a18330d4@main.medtech.ru >
1999-03-04 13:20   ` Jamie Lokier
     [not found]     ` < 19990304222018.A21939@pcep-jamie.cern.ch >
1999-03-04 17:05       ` Zack Weinberg
     [not found]         ` < 199903050104.UAA15335@octiron.phys.columbia.edu >
1999-03-04 18:09           ` Jeffrey A Law
1999-03-31 23:46             ` Jeffrey A Law
1999-03-31 23:46         ` Zack Weinberg
1999-03-31 23:46     ` Jamie Lokier
1999-03-31 23:46 ` Терехин Вячеслав
  -- strict thread matches above, loose matches on Subject: below --
1999-03-09  1:19 Ã’Ã¥Ã°Ã¥ÃµÃ¨Ã Ã‚Ã¿Ã·Ã¥Ã±Ã«Ã Ã¢
1999-03-31 23:46 ` Ã’Ã¥Ã°Ã¥ÃµÃ¨Ã Ã‚Ã¿Ã·Ã¥Ã±Ã«Ã Ã¢
1999-03-04 23:11 Терехин Вячеслав
     [not found] ` < 005601be66d7$ae033480$288230d4@main.medtech.ru >
1999-03-05  9:22   ` Alfred Perlstein
     [not found]     ` < Pine.BSF.3.96.990305121935.7355C-100000@cygnus.rush.net >
1999-03-05 13:02       ` Richard Henderson
1999-03-31 23:46         ` Richard Henderson
1999-03-31 23:46     ` Alfred Perlstein
1999-03-05 15:52   ` H.J. Lu
1999-03-31 23:46     ` H.J. Lu
1999-03-31 23:46 ` Терехин Вячеслав
1999-03-04 14:31 H.J. Lu
     [not found] ` < m10Igdx-000AUaC@shanghai.varesearch.com >
1999-03-04 16:04   ` Martin v. Loewis
     [not found]     ` < 199903050001.BAA00973@mira.isdn.cs.tu-berlin.de >
1999-03-04 16:46       ` H.J. Lu
     [not found]         ` < m10Iil3-000393C@ocean.lucon.org >
1999-03-04 17:03           ` Joe Buck
     [not found]             ` < 199903050102.RAA06944@atrus.synopsys.com >
1999-03-04 17:06               ` H.J. Lu
1999-03-31 23:46                 ` H.J. Lu
1999-03-31 23:46             ` Joe Buck
1999-03-05  6:52           ` craig
     [not found]             ` < 19990305145306.4796.qmail@deer >
1999-03-05  9:18               ` Jeffrey A Law
1999-03-31 23:46                 ` Jeffrey A Law
1999-03-31 23:46             ` craig
1999-03-31 23:46         ` H.J. Lu
1999-03-04 18:08       ` Jeffrey A Law
     [not found]         ` < 13494.920599668@hurl.cygnus.com >
1999-03-04 20:03           ` H.J. Lu
     [not found]             ` < m10Ilq3-00000YC@ocean.lucon.org >
1999-03-04 20:14               ` Jeffrey A Law
1999-03-31 23:46                 ` Jeffrey A Law
1999-03-31 23:46             ` H.J. Lu
1999-03-31 23:46         ` Jeffrey A Law
1999-03-31 23:46     ` Martin v. Loewis
1999-03-31 23:46 ` H.J. Lu
     [not found] <19990218210259.A720@loki.midheimar>
1999-02-19  0:09 ` [Q] alpha egc -> motorolla dragonball Scott Howard
     [not found]   ` < 36CD1CD3.1FC47334@objsw.com >
1999-02-19 11:04     ` Jeffrey A Law
1999-02-28 22:53       ` Jeffrey A Law
1999-02-28 22:53   ` Scott Howard
1998-07-14 11:29 porting EGCS to the Cray T3E Julian C. Cummings
1998-07-14 11:29 ` Jeffrey A Law
1997-12-12  3:55 1.0 sucessfull, install params questions Hermann Lauer
1997-12-12  8:55 ` Jeffrey A Law
1997-12-12 10:18   ` Michael Poole
     [not found]   ` <law@hurl.cygnus.com>
1997-12-12 15:46     ` Hermann Lauer
1998-07-14 14:29     ` porting EGCS to the Cray T3E Julian C. Cummings
1998-07-14 13:20       ` Jeffrey A Law
1998-07-14 16:57     ` Julian C. Cummings
1998-07-14 14:29       ` Jeffrey A Law
     [not found]     ` < 1845.919451010@hurl.cygnus.com >
1999-02-19 11:09       ` [Q] alpha egc -> motorola dragonball David Edelsohn
1999-02-28 22:53         ` David Edelsohn
     [not found]     ` < 13506.920599740@hurl.cygnus.com >
1999-03-04 20:04       ` gcc-2.7 creates faster code than pgcc-1.1.1 David Edelsohn
     [not found]         ` < 9903050403.AA36338@marc.watson.ibm.com >
1999-03-04 20:31           ` Jeffrey A Law
     [not found]             ` < 13939.920608288@hurl.cygnus.com >
1999-03-05  6:53               ` craig
     [not found]                 ` < 19990305143358.4747.qmail@deer >
1999-03-05  9:30                   ` Jeffrey A Law
     [not found]                     ` < 15755.920655014@hurl.cygnus.com >
1999-03-05 10:18                       ` Joe Buck
1999-03-31 23:46                         ` Joe Buck
1999-03-05 10:19                       ` craig
1999-03-31 23:46                         ` craig
1999-03-31 23:46                     ` Jeffrey A Law
1999-03-31 23:46                 ` craig
1999-03-31 23:46             ` Jeffrey A Law
1999-03-07 11:01           ` Zack Weinberg
1999-03-31 23:46             ` Zack Weinberg
1999-03-31 23:46         ` David Edelsohn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).