public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Using a DFA for insn-recog.c
@ 2008-05-02  9:12 Richard Sandiford
  2008-05-02 14:08 ` Michael Matz
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Richard Sandiford @ 2008-05-02  9:12 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 12025 bytes --]

There has been talk in the past of making automatic extensions
easier to express in .md files.  See for example:

    http://gcc.gnu.org/ml/gcc/2004-07/msg01075.html

I'm currently trying to do something along these lines.  Unfortunately,
the current tree-based implementation of genrecog makes it difficult to
do this without exploding the size of recog.  I'd therefore like to move
to a DFA-based implementation instead.

The beginners' project page says:

----------------------------------------------------------------------
# Make insn-recog.c use a byte-coded DFA.

  Richard Henderson and I started this back in 1999 but never
  finished. I may still be able to find the code. It produces an order
  of magnitude size reduction in insn-recog.o, which is huge (432KB on
  i386).
----------------------------------------------------------------------

However, I'm reluctant to go for a byte-coded implementation
at this stage because I'm not sure how it would work out speedwise.
(I suppose a byte-coded, interpreter-based implementation is likely
to have better code cache locality but worse data cache locality.
It would obviously also have more indirection.)

I instead went for an open-coded DFA that I hope is strictly better
than what we have now.  If we do still want to go for a byte-coded
implementation in future, the new DFA-based code ought to be a better
starting point than the old tree-based code.

The point of using a DFA is that we can try to share states between
instructions.  I've tried to cut down on the number of states and
transitions by:

  - Recording that certain predicates imply other predicates.
  - Recording that certain predicates are mutually exclusive.

    See my earlier post for the infrastructure behind this.

  - Storing the result of recog in a local variable.

    This means that if two insns end with the same requirements -- such
    as operands 1 and 2 being SImode register_operands -- they can use
    the same states to check this.  The new code still uses "return
    constant" statements if no state sharing is possible.

  - Introducing local "mode", "code" and "intval" variables for
    modes, rtx codes, and UNSPEC{,_VOLATILE} numbers respectively.

    As in the previous point, this allows us to share some states
    between instructions.  For example, SFmode patterns often have the
    same form as DFmode patterns, with only the mode varying between
    them.  The same is often true for operations on different vector or
    fixed-point modes.  LSHIFTRT patterns often have a similar form to
    ASHIFTRT patterns.  Many patterns for built-in functions differ only
    in modes and unspec numbers.

    Again, the new code still uses constant modes, codes and intvals
    when no state sharing is possible.

  - Doing a better job of skipping tests whose outcome is already known.

    As a random example of this -- it wasn't the motivating example --
    the original version of ARM's insn-recog.c has:

     L19315: ATTRIBUTE_UNUSED_LABEL
      if (GET_CODE (x4) == NOT)
        goto L986;
      if (s_register_operand (x4, SImode))
        {
          operands[1] = x4;
          goto L800;
        }
     L19314: ATTRIBUTE_UNUSED_LABEL
      if (s_register_operand (x4, SImode))
        {
          operands[0] = x4;
          goto L813;
        }
      goto ret0;
    ....
     L800: ATTRIBUTE_UNUSED_LABEL
      x4 = XEXP (x3, 1);
      if (arm_not_operand (x4, SImode))
        {
          operands[2] = x4;
          goto L801;
        }
      x4 = XEXP (x3, 0);
      goto L19314;
    ....
     L802: ATTRIBUTE_UNUSED_LABEL
      x1 = XVECEXP (x0, 0, 1);
      if (GET_CODE (x1) == SET)
        goto L803;
      x1 = XVECEXP (x0, 0, 0);
      x2 = XEXP (x1, 1);
      x3 = XEXP (x2, 0);
      x4 = XEXP (x3, 0);
      goto L19314;
    ....
     L813: ATTRIBUTE_UNUSED_LABEL
      x4 = XEXP (x3, 1);
      if (arm_not_operand (x4, SImode))
        {
          operands[1] = x4;
          goto L814;
        }
      goto ret0;
    ....
     L815: ATTRIBUTE_UNUSED_LABEL
      x1 = XVECEXP (x0, 0, 1);
      if (GET_CODE (x1) == CLOBBER)
        goto L816;
      goto ret0;

    where the trees could not be merged because of the different operand
    numbering.  The code therefore tests the same predicates twice.

    The new code looks like:

     L5059:
      if (GET_CODE (x4) == NOT)
        goto L5085;
      result = 69;  /* *andsi3_compare0 */
      if (s_register_operand (x4, SImode))
        goto L5062;
      goto L4793;
    ....
     L5062:
      operands[1] = x4;
      x4 = XEXP (x3, 1);
      if (arm_not_operand (x4, SImode))
        goto L5065;
      goto failed;
    ....
     L5068:
      x1 = XVECEXP (x0, 0, 1);
      if (GET_CODE (x1) == SET)
        goto L5083;
      x1 = XVECEXP (x0, 0, 0);
      x2 = XEXP (x1, 1);
      x3 = XEXP (x2, 0);
      x4 = XEXP (x3, 0);
      result = 70;  /* *andsi3_compare0_scratch */
      operands[0] = x4;
      x4 = XEXP (x3, 1);
      operands[1] = x4;
      x1 = XVECEXP (x0, 0, 1);
      if (GET_CODE (x1) == CLOBBER)
        goto L5080;
      goto failed;

The main disadvantage of using open-coded DFAs vs. open-coded trees
is that, because of state sharing, it's harder to use subroutines
for DFAs.  Everything has to be in one big function.  This means
that (a) we're more likely to hit size-based limit parameters within
the compiler and that (b) it might take longer to compile insn-recog.o.

The port with the biggest insn-recog.c -- both before and after the patch --
is x86_64.  Taking the lowest of three runs, the insn-recog.o build times
are as follows:

             -g (stage 1)             -O2 -g (stages 2 and 3)
    before   12.3s                    46.6s
    after    10.2s                    71.2s

So I'm afraid the patch might add a minute to bootstrap times.

The new genrecog runs quickly when compiled with -O2 -g; it takes my
machine three seconds to produce insn-recog.c for x86_64-linux-gnu.
It takes ten seconds when compiled with -g alone. (I've used a lot
of inline function accessors, which probably doesn't help the -g time.)

I haven't been able to measure any difference in the speed of the
resulting compiler.

I tried compiling insn-recog.o before and after the patch for the
targets given in the table below.  This was supposed to cover every
cpu/cpu.md file for every non-deprecated target; please let me know
if I've missed one.  I used gcc 4.3 on x86_64-linux-gnu with "-O2 -g"
as the build compiler.

As you can see, the patch gives consistently smaller text sections.
The biggest improvement is for ARM, where the new text section is
about half the size of the original.  As expected, there's nothing
as dramatic as the order-of-magnitude improvement reported for the
byte-coded version.

(By the way, given the big initial size of insn-recog.o quoted above,
I wonder if that was for the whole of insn-recog.o?  I imagine the debug
info would be much smaller for an interpreter-based implementation.)

--------------------------------------------------------------------------
                                                   Text sizes
                                                Before    After
--------------------------------------------------------------------------
alpha-linux-gnu                                  92032    70224 :   76.30%
arc-elf                                          13294     9195 :   69.17%
arm-eabi                                        456977   236490 :   51.75%
avr-rtems                                        24525    19526 :   79.62%
bfin-elf                                         49788    38809 :   77.95%
cris-elf                                         60014    47951 :   79.90%
fr30-elf                                          7843     5795 :   73.89%
frv-elf                                          53216    42150 :   79.21%
h8300-elf                                        67673    56980 :   84.20%
hppa2.0-linux-gnu                                59668    45340 :   75.99%
i686-linux-gnu                                  395132   301417 :   76.28%
iq2000-elf                                       11138     7897 :   70.90%
m32c-elf                                         51041    36591 :   71.69%
m32r-elf                                         17334    13663 :   78.82%
m68hc11-elf                                      54123    42887 :   79.24%
m68k-elf                                         53619    41835 :   78.02%
mcore-elf                                        17592    13796 :   78.42%
mipsisa64-elf                                   150752   103896 :   68.92%
mmix-knuth-mmixware                              10387     7466 :   71.88%
pdp11-bsd                                         9827     7827 :   79.65%
powerpc64-linux-gnu                             290853   225571 :   77.55%
powerpc-linux-gnu                               245706   189601 :   77.17%
s390-linux-gnu                                  148570   105682 :   71.13%
score-elf                                        38684    27547 :   71.21%
sh64-elf                                        151811   128755 :   84.81%
sh-elf                                          151811   128755 :   84.81%
sparc-sun-solaris2.8                             89949    67474 :   75.01%
spu-elf                                          78980    49162 :   62.25%
v850-elf                                         13601    11097 :   81.59%
vax-netbsdelf                                    17936    11807 :   65.83%
x86_64-linux-gnu                                480483   373214 :   77.67%
xstormy16-elf                                    14059    11135 :   79.20%
xtensa-elf                                        6703     5927 :   88.42%
--------------------------------------------------------------------------

Very little of the old genrecog remains, so I've posted the new version
rather than a diff.

Tested by compiling cc1 and cc1plus for the targets in the table above
and making sure that there were no changes in the assembly output for
gcc.c-torture, gcc.dg and g++.dg (compiled once with -g and once with
-O2).[*]  Also bootstrapped & regression-tested on x86_64-linux-gnu.
OK to install?

Richard

[*] Well, almost.  There are changes in the m68hc11-elf -O2 output for
    gcc.c-torture/execute/ieee/pr30704.c and gcc.dg/torture/pr32897.c,
    but those changes are not related to this patch.  The differences
    come from folding a VIEW_CONVERT_EXPR in which the source value is
    a long long and the target value is a double.  A bogus offset
    calculation causes native_interpret_real to read uninitialized data.

gcc/
	* Makefile.in (build/genrecog.o): Update dependencies.
	(build/genrecog$(build_exeext)): Depend on build/alloc-pool.o.
	* genrecog.c: Include vec.h and alloc-pool.h.  Rewrite to use DFAs.

Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	2008-05-01 19:33:15.000000000 +0100
+++ gcc/Makefile.in	2008-05-01 19:33:16.000000000 +0100
@@ -3253,8 +3253,8 @@ build/genpeep.o : genpeep.c $(RTL_BASE_H
   coretypes.h $(GTM_H) errors.h gensupport.h toplev.h
 build/genpreds.o : genpreds.c $(RTL_BASE_H) $(BCONFIG_H) $(SYSTEM_H)	\
   coretypes.h $(GTM_H) errors.h gensupport.h $(OBSTACK_H)
-build/genrecog.o : genrecog.c $(RTL_BASE_H) $(BCONFIG_H) $(SYSTEM_H)	\
-  coretypes.h $(GTM_H) errors.h gensupport.h
+build/genrecog.o : genrecog.c $(BCONFIG_H) $(SYSTEM_H) coretypes.h	\
+  $(GTM_H) $(RTL_BASE_H) errors.h gensupport.h vec.h alloc-pool.h
 
 # Compile the programs that generate insn-* from the machine description.
 # They are compiled with $(CC_FOR_BUILD), and associated libraries,
@@ -3279,6 +3279,7 @@ build/gengenrtl$(build_exeext) : $(BUILD
 build/genmodes$(build_exeext) : $(BUILD_ERRORS)
 build/gengtype$(build_exeext) : build/gengtype-lex.o build/gengtype-parse.o \
 				$(BUILD_ERRORS)
+build/genrecog$(build_exeext) : build/alloc-pool.o
 
 # Generated source files for gengtype.
 gengtype-lex.c : gengtype-lex.l


[-- Attachment #2: genrecog.c.bz2 --]
[-- Type: application/octet-stream, Size: 34636 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Using a DFA for insn-recog.c
  2008-05-02  9:12 Using a DFA for insn-recog.c Richard Sandiford
@ 2008-05-02 14:08 ` Michael Matz
  2008-05-02 14:56   ` Richard Sandiford
  2008-05-02 18:09 ` Andi Kleen
  2008-05-07 17:25 ` Mark Mitchell
  2 siblings, 1 reply; 16+ messages in thread
From: Michael Matz @ 2008-05-02 14:08 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches

Hi Richard,

On Fri, 2 May 2008, Richard Sandiford wrote:

> do this without exploding the size of recog.  I'd therefore like to move
> to a DFA-based implementation instead.

Very interesting.  From the description in the new genrecog.c of how the 
DFA is constructed it reads like so, but let me ask anyway: have you made 
sure that an insn that is matched by multiple patterns in the .md files, 
is matched against the first one?


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Using a DFA for insn-recog.c
  2008-05-02 14:08 ` Michael Matz
@ 2008-05-02 14:56   ` Richard Sandiford
  0 siblings, 0 replies; 16+ messages in thread
From: Richard Sandiford @ 2008-05-02 14:56 UTC (permalink / raw)
  To: Michael Matz; +Cc: gcc-patches

Michael Matz <matz@suse.de> writes:
>> to a DFA-based implementation instead.
>
> Very interesting.  From the description in the new genrecog.c of how the 
> DFA is constructed it reads like so, but let me ask anyway: have you made 
> sure that an insn that is matched by multiple patterns in the .md files, 
> is matched against the first one?

Yeah.  During the merging process, the fallback is always to do things
in .md-file order.  We only swap things around if we can prove that they
are mutually exclusive.  Once merging is complete, we just rewrite the
DFA into equivalent forms.

Richard

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Using a DFA for insn-recog.c
  2008-05-02  9:12 Using a DFA for insn-recog.c Richard Sandiford
  2008-05-02 14:08 ` Michael Matz
@ 2008-05-02 18:09 ` Andi Kleen
  2008-05-02 19:43   ` Andrew Pinski
  2008-05-03  8:46   ` Richard Sandiford
  2008-05-07 17:25 ` Mark Mitchell
  2 siblings, 2 replies; 16+ messages in thread
From: Andi Kleen @ 2008-05-02 18:09 UTC (permalink / raw)
  To: gcc-patches; +Cc: rsandifo

Richard Sandiford <rsandifo@nildram.co.uk> writes:
> at this stage because I'm not sure how it would work out speedwise.
> (I suppose a byte-coded, interpreter-based implementation is likely
> to have better code cache locality but worse data cache locality.

On the other hand the byte code will likely be much more compact than 
the code, so you should use much less dcache. And usually for the 
larger lower level caches d and i is shared anyways.

-Andi

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Using a DFA for insn-recog.c
  2008-05-02 18:09 ` Andi Kleen
@ 2008-05-02 19:43   ` Andrew Pinski
  2008-05-02 20:26     ` Andi Kleen
  2008-05-03  8:46   ` Richard Sandiford
  1 sibling, 1 reply; 16+ messages in thread
From: Andrew Pinski @ 2008-05-02 19:43 UTC (permalink / raw)
  To: Andi Kleen; +Cc: gcc-patches, rsandifo

On Fri, May 2, 2008 at 11:08 AM, Andi Kleen <andi@firstfloor.org> wrote:
> And usually for the larger lower level caches d and i is shared anyways.

Which lower level cache are you talking about? L1? L2?
For all the platforms (PPC) I works on, L1 caches are split between D
and I caches while the L2 are shared.
Even the SPU has an icache (well kinda).

Thanks,
Andrew Pinski

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Using a DFA for insn-recog.c
  2008-05-02 19:43   ` Andrew Pinski
@ 2008-05-02 20:26     ` Andi Kleen
  0 siblings, 0 replies; 16+ messages in thread
From: Andi Kleen @ 2008-05-02 20:26 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: gcc-patches, rsandifo

Andrew Pinski wrote:
> On Fri, May 2, 2008 at 11:08 AM, Andi Kleen <andi@firstfloor.org> wrote:
>> And usually for the larger lower level caches d and i is shared anyways.
> 
> Which lower level cache are you talking about? L1? L2?

L2 and lower.

> For all the platforms (PPC) I works on, L1 caches are split between D
> and I caches while the L2 are shared.

L1 (both d and i) tends to be so small that you'll thrash it always on
compiler execution. The interesting area are the lower level caches.

-Andi

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Using a DFA for insn-recog.c
  2008-05-02 18:09 ` Andi Kleen
  2008-05-02 19:43   ` Andrew Pinski
@ 2008-05-03  8:46   ` Richard Sandiford
  1 sibling, 0 replies; 16+ messages in thread
From: Richard Sandiford @ 2008-05-03  8:46 UTC (permalink / raw)
  To: Andi Kleen; +Cc: gcc-patches

Andi Kleen <andi@firstfloor.org> writes:
> Richard Sandiford <rsandifo@nildram.co.uk> writes:
>> at this stage because I'm not sure how it would work out speedwise.
>> (I suppose a byte-coded, interpreter-based implementation is likely
>> to have better code cache locality but worse data cache locality.
>
> On the other hand the byte code will likely be much more compact than 
> the code, so you should use much less dcache.

Much less dcache than the open-coded version would use icache,
I assume you mean?  Sure.  But the dcache also has things like
the rtxes themselves (which ISTR someone finding had little locality
these days), the operands[] array, the const_int_rtx[] array,
peep2_current_count, the stack, etc.  The more independent things
you're trying to access at the same time, the more chance you have
of cache line contention.

> And usually for the larger lower level caches d and i is shared
> anyways.

Sure.

If I understand correctly, you're simply saying that the byte-coded
implementation might well be faster.  If so, I agree.  To be clear,
"I'm not sure how it would work out speedwise" was supposed to mean
"it might well be faster, or it might be slower".  My point was that
it is hard to predict.  The only way of finding out would be to try it,
and since rewriting genrecog was only a side-issue for the problem I was
trying to solve, I didn't want to commit to both changes (tree->DFA,
open-coded->byte-coded) at once.

In both the message and the comments, I was trying to leave open the
possibility of moving to a byte-coded implementation in future.

Richard

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Using a DFA for insn-recog.c
  2008-05-02  9:12 Using a DFA for insn-recog.c Richard Sandiford
  2008-05-02 14:08 ` Michael Matz
  2008-05-02 18:09 ` Andi Kleen
@ 2008-05-07 17:25 ` Mark Mitchell
  2008-05-07 18:21   ` Richard Sandiford
  2 siblings, 1 reply; 16+ messages in thread
From: Mark Mitchell @ 2008-05-07 17:25 UTC (permalink / raw)
  To: gcc-patches, rsandifo

Richard Sandiford wrote:
> There has been talk in the past of making automatic extensions
> easier to express in .md files.  See for example:
> 
>     http://gcc.gnu.org/ml/gcc/2004-07/msg01075.html

Would you please summarize the benefits of the patch?  (That's not meant 
to be a wise-guy question; I'm just getting lost in the details and 
trying to understand.)

 From your email, it seems that:

* Compilation time is approximately unchanged, despite what appears to 
be more efficient code in recognizer which avoids checking predicates 
multiple times

* Text side in resulting compiler is smaller

* Bootstrap time may have increased

Is that right?

Also, based on the preface to your email above, I'm guessing that this 
patch might be providing leverage for some other improvement?  Perhaps 
that this paves the way for a future byte-coded implementation that will 
deliver faster compile times?  Or, do you think that the text size 
improvements alone justify this patch?  (I'm not saying they do not; 
just asking.)

Or, did I misunderstand the benefits/costs entirely?

Thanks,

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Using a DFA for insn-recog.c
  2008-05-07 17:25 ` Mark Mitchell
@ 2008-05-07 18:21   ` Richard Sandiford
  2008-05-07 18:22     ` Richard Sandiford
  2008-05-08  5:42     ` Mark Mitchell
  0 siblings, 2 replies; 16+ messages in thread
From: Richard Sandiford @ 2008-05-07 18:21 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: gcc-patches

Mark Mitchell <mark@codesourcery.com> writes:
> Richard Sandiford wrote:
>> There has been talk in the past of making automatic extensions
>> easier to express in .md files.  See for example:
>> 
>>     http://gcc.gnu.org/ml/gcc/2004-07/msg01075.html
>
> Would you please summarize the benefits of the patch?  (That's not meant 
> to be a wise-guy question; I'm just getting lost in the details and 
> trying to understand.)
>
>  From your email, it seems that:
>
> * Compilation time is approximately unchanged, despite what appears to 
> be more efficient code in recognizer which avoids checking predicates 
> multiple times
>
> * Text side in resulting compiler is smaller
>
> * Bootstrap time may have increased
>
> Is that right?

Yup.

> Also, based on the preface to your email above, I'm guessing that this 
> patch might be providing leverage for some other improvement?  Perhaps 
> that this paves the way for a future byte-coded implementation that will 
> deliver faster compile times?  Or, do you think that the text size 
> improvements alone justify this patch?  (I'm not saying they do not; 
> just asking.)

No, you're right.

Originally, the main motivation was to make it easier to handle things
like those discussed in the message I linked above.  In other words,
I wanted to allow most MIPS operands of the form:

   (match_operand:DI 0 "register_operand" "d")

to also match:

   (aign_extend:DI (match_operand:SI 0 "register_operand" "d"))

(with appropriate conditions and changes elsewhere).  The sort of
state sharing that we get with the DFA approach can significantly
reduce the size increase in recog.

However, although that was the main motivation for doing this in
the first place, I did hope that the change would be justified
for its own sake, on the grounds you give.

I suppose the argument would be more compelling if I could measure a
significant difference in execution speed.  Unfortunately, results on
my machine seem to be quite noisy, even when it's more-or-less idle.
I'm also not sure what would be a good test.  I don't know whether
there's anything whose gcc execution time is significantly affected
by recog.

(I wondered about using detailed profiling tools, but if that's the
only way to measure the result, would any improvement really count
for much?)

The argument would also be more compelling if there were no increase
in bootstrap time.  (As I think you realise, the time increase is due
to slowness in gcc, rather than in the new genrecog, and I deliberately
picked the most extreme target.)  For example, the argument would be
more compelling if I went straight to the byte-coded version and
measured no difference in execution speed (despite my reservations
about being able to measure this properly on my machine).  The insn
table would obviously compile much faster than the open-coded version.

But like I say, I don't want to do both steps (tree->dfa, open-coded->
byte-coded) at once.  I suppose a lot of the problem is that I don't
think it would be a good idea to pick just one byte encoding, try it,
and send it out.  I'd need to investigate several different possibilities
and see which works out best.  (A stack-based approach to handling the
rtx position?  Separate jump actions?  Fixed-width or variable-width
"instructions"?  And plenty more besides.)  It would be a lot of work,
and I'm only doing this in my spare time. ;)  I can also imagine that
whatever encoding I pick would be controversial with someone.

If the change isn't acceptable on the grounds you give alone[*],
then I'd rather try to find a different approach to the MIPS problem.
I don't want any future MIPS-related change to tip the scales one way
or the other.

  [*] And if it isn't, that's fine.  I knew that was a risk all along.

Richard

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Using a DFA for insn-recog.c
  2008-05-07 18:21   ` Richard Sandiford
@ 2008-05-07 18:22     ` Richard Sandiford
  2008-05-08  5:42     ` Mark Mitchell
  1 sibling, 0 replies; 16+ messages in thread
From: Richard Sandiford @ 2008-05-07 18:22 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: gcc-patches

Richard Sandiford <rsandifo@nildram.co.uk> writes:
> Mark Mitchell <mark@codesourcery.com> writes:
>> Also, based on the preface to your email above, I'm guessing that this 
>> patch might be providing leverage for some other improvement?  Perhaps 
>> that this paves the way for a future byte-coded implementation that will 
>> deliver faster compile times?  Or, do you think that the text size 
>> improvements alone justify this patch?  (I'm not saying they do not; 
>> just asking.)
>
> No, you're right.

Er, sorry for the ambiguousness.  What I meant here, and what I meant
when I said "the reasons you give" later, was that the justifications
for the patch were:

  - smaller text sizes
  - fewer predicate tests and shorter paths through the generated functions
  - paves the way for a byte-coded DFA implementation

Richard

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Using a DFA for insn-recog.c
  2008-05-07 18:21   ` Richard Sandiford
  2008-05-07 18:22     ` Richard Sandiford
@ 2008-05-08  5:42     ` Mark Mitchell
  2008-05-09 16:45       ` Paolo Bonzini
  1 sibling, 1 reply; 16+ messages in thread
From: Mark Mitchell @ 2008-05-08  5:42 UTC (permalink / raw)
  To: Mark Mitchell, gcc-patches, rsandifo

Richard Sandiford wrote:

>> * Compilation time is approximately unchanged, despite what appears to 
>> be more efficient code in recognizer which avoids checking predicates 
>> multiple times
>>
>> * Text side in resulting compiler is smaller
>>
>> * Bootstrap time may have increased
>>
>> Is that right?
> 
> Yup.

OK, good; at least I understand the objectives.

> I suppose the argument would be more compelling if I could measure a
> significant difference in execution speed. 

Yes; I was hoping you were going to have some measurable improvement in 
speed.  However, I'm also not terribly worried about bootstrap times. 
(As a fraction of test time, bootstrap time is small.)

> I don't want any future MIPS-related change to tip the scales one way
> or the other.

I don't think it would be a problem if that was the scale-tipper; making 
infrastructure changes that allow something to be done better for some 
architecture is fine, as long as it's not adversely impacting other 
architectures.

I'm on the fence because the whole thing seems a bit of a wash at 
present; not much improvement, not much harm, some risk just because 
change is risky.  Does anyone else have an opinion as to whether or not 
this patch is sufficiently compelling to include?

Thanks,

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Using a DFA for insn-recog.c
  2008-05-08  5:42     ` Mark Mitchell
@ 2008-05-09 16:45       ` Paolo Bonzini
  2008-05-12 19:43         ` Richard Sandiford
  0 siblings, 1 reply; 16+ messages in thread
From: Paolo Bonzini @ 2008-05-09 16:45 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: gcc-patches, rsandifo


> I'm on the fence because the whole thing seems a bit of a wash at 
> present; not much improvement, not much harm, some risk just because 
> change is risky.  Does anyone else have an opinion as to whether or not 
> this patch is sufficiently compelling to include?

Personally, I think the patch is very interesting and it would be a pity 
to not get it in.  What happens if you compile insn-recog.o at -O1? 
Since time spent in recog is almost never measurable, it should improve 
bootstrap time and not make the compiler much slower.

Paolo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Using a DFA for insn-recog.c
  2008-05-09 16:45       ` Paolo Bonzini
@ 2008-05-12 19:43         ` Richard Sandiford
  2008-05-12 20:14           ` Mark Mitchell
  2008-05-13 10:25           ` Andi Kleen
  0 siblings, 2 replies; 16+ messages in thread
From: Richard Sandiford @ 2008-05-12 19:43 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Mark Mitchell, gcc-patches

Hi Paolo,

Paolo Bonzini <bonzini@gnu.org> writes:
>> I'm on the fence because the whole thing seems a bit of a wash at 
>> present; not much improvement, not much harm, some risk just because 
>> change is risky.  Does anyone else have an opinion as to whether or not 
>> this patch is sufficiently compelling to include?
>
> Personally, I think the patch is very interesting and it would be a pity 
> to not get it in.  What happens if you compile insn-recog.o at -O1? 
> Since time spent in recog is almost never measurable, it should improve 
> bootstrap time and not make the compiler much slower.

Thanks for the support.  I'd rather not compile at -O though.
My understanding from Mark's response is that a one-minute increase
in bootstrap time is not itself a problem, so it wouldn't worth
compiling at a lower optimisation level for that reason alone.

I think Mark's point was that the size reductions are not in themselves
particularly exciting.  The loadable size of an x86_64-linux-gnu cc1 is
about 10M on my machine, so the patch represents a saving of about 1%.
I can see that that's not much to write home about.  The saving on ARM
(the biggest winner) is about 2.9%.

Removing recog entirely on x86_64-linux-gnu would only save about 4%.
Unless Mark's scales are different from mine, I imagine a 4% reduction
in loadable size wouldn't be significantly more exciting than a 2.9%
reduction, especially given the amount of memory we need at run time,
and given the risk involved in switching the implementation wholesale.
On that basis, I think changing recog is a bit of a dead end.

I'll set this patch aside and do the MIPS changes in some other way.
It was a fun patch to work on, so nothing's lost ;)

Richard

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Using a DFA for insn-recog.c
  2008-05-12 19:43         ` Richard Sandiford
@ 2008-05-12 20:14           ` Mark Mitchell
  2008-05-13 10:25           ` Andi Kleen
  1 sibling, 0 replies; 16+ messages in thread
From: Mark Mitchell @ 2008-05-12 20:14 UTC (permalink / raw)
  To: Paolo Bonzini, Mark Mitchell, gcc-patches, rdsandiford

Richard Sandiford wrote:

> I think Mark's point was that the size reductions are not in themselves
> particularly exciting.  The loadable size of an x86_64-linux-gnu cc1 is
> about 10M on my machine, so the patch represents a saving of about 1%.
> I can see that that's not much to write home about.  The saving on ARM
> (the biggest winner) is about 2.9%.

I hope I didn't come across as too negative.  I just felt ambivalent; it 
wasn't a big win, and it was a relatively big change, and so I didn't 
get too excited.  It's not like I would argue if someone else approved 
the change, if people feel it's the right direction.

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Using a DFA for insn-recog.c
  2008-05-12 19:43         ` Richard Sandiford
  2008-05-12 20:14           ` Mark Mitchell
@ 2008-05-13 10:25           ` Andi Kleen
  2008-05-13 10:45             ` Richard Sandiford
  1 sibling, 1 reply; 16+ messages in thread
From: Andi Kleen @ 2008-05-13 10:25 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Mark Mitchell, gcc-patches, rdsandiford

Richard Sandiford <rdsandiford@googlemail.com> writes:
>
> I think Mark's point was that the size reductions are not in themselves
> particularly exciting.  The loadable size of an x86_64-linux-gnu cc1 is
> about 10M on my machine, so the patch represents a saving of about 1%.

Looking at the total executable size is not very interesting because
it won't have a direct impact on performance . Some of that code might 
be never loaded.

What would be more interesting is the working set size of the compiler
that has to be loaded into the CPU's caches.

You could probably determine this by comparing d/icache miss rate
between your patched and unpatched version for the same
program. If the miss rate is better with your version it's a win.

-Andi

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Using a DFA for insn-recog.c
  2008-05-13 10:25           ` Andi Kleen
@ 2008-05-13 10:45             ` Richard Sandiford
  0 siblings, 0 replies; 16+ messages in thread
From: Richard Sandiford @ 2008-05-13 10:45 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Paolo Bonzini, Mark Mitchell, gcc-patches

Andi Kleen <andi@firstfloor.org> writes:
> Richard Sandiford <rdsandiford@googlemail.com> writes:
>> I think Mark's point was that the size reductions are not in themselves
>> particularly exciting.  The loadable size of an x86_64-linux-gnu cc1 is
>> about 10M on my machine, so the patch represents a saving of about 1%.
>
> Looking at the total executable size is not very interesting because
> it won't have a direct impact on performance . Some of that code might 
> be never loaded.

But we seem to be talking past each other.  I wasn't suggesting
that reducing the loadable size of cc1 would directly improve
performance (as in, compiler speed).  At least not in cases
where no swap space is needed.

> What would be more interesting is the working set size of the compiler
> that has to be loaded into the CPU's caches.
>
> You could probably determine this by comparing d/icache miss rate
> between your patched and unpatched version for the same
> program. If the miss rate is better with your version it's a win.

But this goes back to what I said upthread:

    (I wondered about using detailed profiling tools, but if that's the
    only way to measure the result, would any improvement really count
    for much?)

In other words, if the only way to measure the impact of the patch
is by pointing at miss rates, the patch does not justify itself from
a performance perspective.  A necessary (but not sufficient) condition
for it to be justified from a performance perspective is that a user
would notice the difference for a particular testcase.

Richard

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2008-05-13 10:33 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-05-02  9:12 Using a DFA for insn-recog.c Richard Sandiford
2008-05-02 14:08 ` Michael Matz
2008-05-02 14:56   ` Richard Sandiford
2008-05-02 18:09 ` Andi Kleen
2008-05-02 19:43   ` Andrew Pinski
2008-05-02 20:26     ` Andi Kleen
2008-05-03  8:46   ` Richard Sandiford
2008-05-07 17:25 ` Mark Mitchell
2008-05-07 18:21   ` Richard Sandiford
2008-05-07 18:22     ` Richard Sandiford
2008-05-08  5:42     ` Mark Mitchell
2008-05-09 16:45       ` Paolo Bonzini
2008-05-12 19:43         ` Richard Sandiford
2008-05-12 20:14           ` Mark Mitchell
2008-05-13 10:25           ` Andi Kleen
2008-05-13 10:45             ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).