* Using a DFA for insn-recog.c
@ 2008-05-02 9:12 Richard Sandiford
2008-05-02 14:08 ` Michael Matz
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Richard Sandiford @ 2008-05-02 9:12 UTC (permalink / raw)
To: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 12025 bytes --]
There has been talk in the past of making automatic extensions
easier to express in .md files. See for example:
http://gcc.gnu.org/ml/gcc/2004-07/msg01075.html
I'm currently trying to do something along these lines. Unfortunately,
the current tree-based implementation of genrecog makes it difficult to
do this without exploding the size of recog. I'd therefore like to move
to a DFA-based implementation instead.
The beginners' project page says:
----------------------------------------------------------------------
# Make insn-recog.c use a byte-coded DFA.
Richard Henderson and I started this back in 1999 but never
finished. I may still be able to find the code. It produces an order
of magnitude size reduction in insn-recog.o, which is huge (432KB on
i386).
----------------------------------------------------------------------
However, I'm reluctant to go for a byte-coded implementation
at this stage because I'm not sure how it would work out speedwise.
(I suppose a byte-coded, interpreter-based implementation is likely
to have better code cache locality but worse data cache locality.
It would obviously also have more indirection.)
I instead went for an open-coded DFA that I hope is strictly better
than what we have now. If we do still want to go for a byte-coded
implementation in future, the new DFA-based code ought to be a better
starting point than the old tree-based code.
The point of using a DFA is that we can try to share states between
instructions. I've tried to cut down on the number of states and
transitions by:
- Recording that certain predicates imply other predicates.
- Recording that certain predicates are mutually exclusive.
See my earlier post for the infrastructure behind this.
- Storing the result of recog in a local variable.
This means that if two insns end with the same requirements -- such
as operands 1 and 2 being SImode register_operands -- they can use
the same states to check this. The new code still uses "return
constant" statements if no state sharing is possible.
- Introducing local "mode", "code" and "intval" variables for
modes, rtx codes, and UNSPEC{,_VOLATILE} numbers respectively.
As in the previous point, this allows us to share some states
between instructions. For example, SFmode patterns often have the
same form as DFmode patterns, with only the mode varying between
them. The same is often true for operations on different vector or
fixed-point modes. LSHIFTRT patterns often have a similar form to
ASHIFTRT patterns. Many patterns for built-in functions differ only
in modes and unspec numbers.
Again, the new code still uses constant modes, codes and intvals
when no state sharing is possible.
- Doing a better job of skipping tests whose outcome is already known.
As a random example of this -- it wasn't the motivating example --
the original version of ARM's insn-recog.c has:
L19315: ATTRIBUTE_UNUSED_LABEL
if (GET_CODE (x4) == NOT)
goto L986;
if (s_register_operand (x4, SImode))
{
operands[1] = x4;
goto L800;
}
L19314: ATTRIBUTE_UNUSED_LABEL
if (s_register_operand (x4, SImode))
{
operands[0] = x4;
goto L813;
}
goto ret0;
....
L800: ATTRIBUTE_UNUSED_LABEL
x4 = XEXP (x3, 1);
if (arm_not_operand (x4, SImode))
{
operands[2] = x4;
goto L801;
}
x4 = XEXP (x3, 0);
goto L19314;
....
L802: ATTRIBUTE_UNUSED_LABEL
x1 = XVECEXP (x0, 0, 1);
if (GET_CODE (x1) == SET)
goto L803;
x1 = XVECEXP (x0, 0, 0);
x2 = XEXP (x1, 1);
x3 = XEXP (x2, 0);
x4 = XEXP (x3, 0);
goto L19314;
....
L813: ATTRIBUTE_UNUSED_LABEL
x4 = XEXP (x3, 1);
if (arm_not_operand (x4, SImode))
{
operands[1] = x4;
goto L814;
}
goto ret0;
....
L815: ATTRIBUTE_UNUSED_LABEL
x1 = XVECEXP (x0, 0, 1);
if (GET_CODE (x1) == CLOBBER)
goto L816;
goto ret0;
where the trees could not be merged because of the different operand
numbering. The code therefore tests the same predicates twice.
The new code looks like:
L5059:
if (GET_CODE (x4) == NOT)
goto L5085;
result = 69; /* *andsi3_compare0 */
if (s_register_operand (x4, SImode))
goto L5062;
goto L4793;
....
L5062:
operands[1] = x4;
x4 = XEXP (x3, 1);
if (arm_not_operand (x4, SImode))
goto L5065;
goto failed;
....
L5068:
x1 = XVECEXP (x0, 0, 1);
if (GET_CODE (x1) == SET)
goto L5083;
x1 = XVECEXP (x0, 0, 0);
x2 = XEXP (x1, 1);
x3 = XEXP (x2, 0);
x4 = XEXP (x3, 0);
result = 70; /* *andsi3_compare0_scratch */
operands[0] = x4;
x4 = XEXP (x3, 1);
operands[1] = x4;
x1 = XVECEXP (x0, 0, 1);
if (GET_CODE (x1) == CLOBBER)
goto L5080;
goto failed;
The main disadvantage of using open-coded DFAs vs. open-coded trees
is that, because of state sharing, it's harder to use subroutines
for DFAs. Everything has to be in one big function. This means
that (a) we're more likely to hit size-based limit parameters within
the compiler and that (b) it might take longer to compile insn-recog.o.
The port with the biggest insn-recog.c -- both before and after the patch --
is x86_64. Taking the lowest of three runs, the insn-recog.o build times
are as follows:
-g (stage 1) -O2 -g (stages 2 and 3)
before 12.3s 46.6s
after 10.2s 71.2s
So I'm afraid the patch might add a minute to bootstrap times.
The new genrecog runs quickly when compiled with -O2 -g; it takes my
machine three seconds to produce insn-recog.c for x86_64-linux-gnu.
It takes ten seconds when compiled with -g alone. (I've used a lot
of inline function accessors, which probably doesn't help the -g time.)
I haven't been able to measure any difference in the speed of the
resulting compiler.
I tried compiling insn-recog.o before and after the patch for the
targets given in the table below. This was supposed to cover every
cpu/cpu.md file for every non-deprecated target; please let me know
if I've missed one. I used gcc 4.3 on x86_64-linux-gnu with "-O2 -g"
as the build compiler.
As you can see, the patch gives consistently smaller text sections.
The biggest improvement is for ARM, where the new text section is
about half the size of the original. As expected, there's nothing
as dramatic as the order-of-magnitude improvement reported for the
byte-coded version.
(By the way, given the big initial size of insn-recog.o quoted above,
I wonder if that was for the whole of insn-recog.o? I imagine the debug
info would be much smaller for an interpreter-based implementation.)
--------------------------------------------------------------------------
Text sizes
Before After
--------------------------------------------------------------------------
alpha-linux-gnu 92032 70224 : 76.30%
arc-elf 13294 9195 : 69.17%
arm-eabi 456977 236490 : 51.75%
avr-rtems 24525 19526 : 79.62%
bfin-elf 49788 38809 : 77.95%
cris-elf 60014 47951 : 79.90%
fr30-elf 7843 5795 : 73.89%
frv-elf 53216 42150 : 79.21%
h8300-elf 67673 56980 : 84.20%
hppa2.0-linux-gnu 59668 45340 : 75.99%
i686-linux-gnu 395132 301417 : 76.28%
iq2000-elf 11138 7897 : 70.90%
m32c-elf 51041 36591 : 71.69%
m32r-elf 17334 13663 : 78.82%
m68hc11-elf 54123 42887 : 79.24%
m68k-elf 53619 41835 : 78.02%
mcore-elf 17592 13796 : 78.42%
mipsisa64-elf 150752 103896 : 68.92%
mmix-knuth-mmixware 10387 7466 : 71.88%
pdp11-bsd 9827 7827 : 79.65%
powerpc64-linux-gnu 290853 225571 : 77.55%
powerpc-linux-gnu 245706 189601 : 77.17%
s390-linux-gnu 148570 105682 : 71.13%
score-elf 38684 27547 : 71.21%
sh64-elf 151811 128755 : 84.81%
sh-elf 151811 128755 : 84.81%
sparc-sun-solaris2.8 89949 67474 : 75.01%
spu-elf 78980 49162 : 62.25%
v850-elf 13601 11097 : 81.59%
vax-netbsdelf 17936 11807 : 65.83%
x86_64-linux-gnu 480483 373214 : 77.67%
xstormy16-elf 14059 11135 : 79.20%
xtensa-elf 6703 5927 : 88.42%
--------------------------------------------------------------------------
Very little of the old genrecog remains, so I've posted the new version
rather than a diff.
Tested by compiling cc1 and cc1plus for the targets in the table above
and making sure that there were no changes in the assembly output for
gcc.c-torture, gcc.dg and g++.dg (compiled once with -g and once with
-O2).[*] Also bootstrapped & regression-tested on x86_64-linux-gnu.
OK to install?
Richard
[*] Well, almost. There are changes in the m68hc11-elf -O2 output for
gcc.c-torture/execute/ieee/pr30704.c and gcc.dg/torture/pr32897.c,
but those changes are not related to this patch. The differences
come from folding a VIEW_CONVERT_EXPR in which the source value is
a long long and the target value is a double. A bogus offset
calculation causes native_interpret_real to read uninitialized data.
gcc/
* Makefile.in (build/genrecog.o): Update dependencies.
(build/genrecog$(build_exeext)): Depend on build/alloc-pool.o.
* genrecog.c: Include vec.h and alloc-pool.h. Rewrite to use DFAs.
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in 2008-05-01 19:33:15.000000000 +0100
+++ gcc/Makefile.in 2008-05-01 19:33:16.000000000 +0100
@@ -3253,8 +3253,8 @@ build/genpeep.o : genpeep.c $(RTL_BASE_H
coretypes.h $(GTM_H) errors.h gensupport.h toplev.h
build/genpreds.o : genpreds.c $(RTL_BASE_H) $(BCONFIG_H) $(SYSTEM_H) \
coretypes.h $(GTM_H) errors.h gensupport.h $(OBSTACK_H)
-build/genrecog.o : genrecog.c $(RTL_BASE_H) $(BCONFIG_H) $(SYSTEM_H) \
- coretypes.h $(GTM_H) errors.h gensupport.h
+build/genrecog.o : genrecog.c $(BCONFIG_H) $(SYSTEM_H) coretypes.h \
+ $(GTM_H) $(RTL_BASE_H) errors.h gensupport.h vec.h alloc-pool.h
# Compile the programs that generate insn-* from the machine description.
# They are compiled with $(CC_FOR_BUILD), and associated libraries,
@@ -3279,6 +3279,7 @@ build/gengenrtl$(build_exeext) : $(BUILD
build/genmodes$(build_exeext) : $(BUILD_ERRORS)
build/gengtype$(build_exeext) : build/gengtype-lex.o build/gengtype-parse.o \
$(BUILD_ERRORS)
+build/genrecog$(build_exeext) : build/alloc-pool.o
# Generated source files for gengtype.
gengtype-lex.c : gengtype-lex.l
[-- Attachment #2: genrecog.c.bz2 --]
[-- Type: application/octet-stream, Size: 34636 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Using a DFA for insn-recog.c
2008-05-02 9:12 Using a DFA for insn-recog.c Richard Sandiford
@ 2008-05-02 14:08 ` Michael Matz
2008-05-02 14:56 ` Richard Sandiford
2008-05-02 18:09 ` Andi Kleen
2008-05-07 17:25 ` Mark Mitchell
2 siblings, 1 reply; 16+ messages in thread
From: Michael Matz @ 2008-05-02 14:08 UTC (permalink / raw)
To: Richard Sandiford; +Cc: gcc-patches
Hi Richard,
On Fri, 2 May 2008, Richard Sandiford wrote:
> do this without exploding the size of recog. I'd therefore like to move
> to a DFA-based implementation instead.
Very interesting. From the description in the new genrecog.c of how the
DFA is constructed it reads like so, but let me ask anyway: have you made
sure that an insn that is matched by multiple patterns in the .md files,
is matched against the first one?
Ciao,
Michael.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Using a DFA for insn-recog.c
2008-05-02 14:08 ` Michael Matz
@ 2008-05-02 14:56 ` Richard Sandiford
0 siblings, 0 replies; 16+ messages in thread
From: Richard Sandiford @ 2008-05-02 14:56 UTC (permalink / raw)
To: Michael Matz; +Cc: gcc-patches
Michael Matz <matz@suse.de> writes:
>> to a DFA-based implementation instead.
>
> Very interesting. From the description in the new genrecog.c of how the
> DFA is constructed it reads like so, but let me ask anyway: have you made
> sure that an insn that is matched by multiple patterns in the .md files,
> is matched against the first one?
Yeah. During the merging process, the fallback is always to do things
in .md-file order. We only swap things around if we can prove that they
are mutually exclusive. Once merging is complete, we just rewrite the
DFA into equivalent forms.
Richard
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Using a DFA for insn-recog.c
2008-05-02 9:12 Using a DFA for insn-recog.c Richard Sandiford
2008-05-02 14:08 ` Michael Matz
@ 2008-05-02 18:09 ` Andi Kleen
2008-05-02 19:43 ` Andrew Pinski
2008-05-03 8:46 ` Richard Sandiford
2008-05-07 17:25 ` Mark Mitchell
2 siblings, 2 replies; 16+ messages in thread
From: Andi Kleen @ 2008-05-02 18:09 UTC (permalink / raw)
To: gcc-patches; +Cc: rsandifo
Richard Sandiford <rsandifo@nildram.co.uk> writes:
> at this stage because I'm not sure how it would work out speedwise.
> (I suppose a byte-coded, interpreter-based implementation is likely
> to have better code cache locality but worse data cache locality.
On the other hand the byte code will likely be much more compact than
the code, so you should use much less dcache. And usually for the
larger lower level caches d and i is shared anyways.
-Andi
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Using a DFA for insn-recog.c
2008-05-02 18:09 ` Andi Kleen
@ 2008-05-02 19:43 ` Andrew Pinski
2008-05-02 20:26 ` Andi Kleen
2008-05-03 8:46 ` Richard Sandiford
1 sibling, 1 reply; 16+ messages in thread
From: Andrew Pinski @ 2008-05-02 19:43 UTC (permalink / raw)
To: Andi Kleen; +Cc: gcc-patches, rsandifo
On Fri, May 2, 2008 at 11:08 AM, Andi Kleen <andi@firstfloor.org> wrote:
> And usually for the larger lower level caches d and i is shared anyways.
Which lower level cache are you talking about? L1? L2?
For all the platforms (PPC) I works on, L1 caches are split between D
and I caches while the L2 are shared.
Even the SPU has an icache (well kinda).
Thanks,
Andrew Pinski
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Using a DFA for insn-recog.c
2008-05-02 19:43 ` Andrew Pinski
@ 2008-05-02 20:26 ` Andi Kleen
0 siblings, 0 replies; 16+ messages in thread
From: Andi Kleen @ 2008-05-02 20:26 UTC (permalink / raw)
To: Andrew Pinski; +Cc: gcc-patches, rsandifo
Andrew Pinski wrote:
> On Fri, May 2, 2008 at 11:08 AM, Andi Kleen <andi@firstfloor.org> wrote:
>> And usually for the larger lower level caches d and i is shared anyways.
>
> Which lower level cache are you talking about? L1? L2?
L2 and lower.
> For all the platforms (PPC) I works on, L1 caches are split between D
> and I caches while the L2 are shared.
L1 (both d and i) tends to be so small that you'll thrash it always on
compiler execution. The interesting area are the lower level caches.
-Andi
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Using a DFA for insn-recog.c
2008-05-02 18:09 ` Andi Kleen
2008-05-02 19:43 ` Andrew Pinski
@ 2008-05-03 8:46 ` Richard Sandiford
1 sibling, 0 replies; 16+ messages in thread
From: Richard Sandiford @ 2008-05-03 8:46 UTC (permalink / raw)
To: Andi Kleen; +Cc: gcc-patches
Andi Kleen <andi@firstfloor.org> writes:
> Richard Sandiford <rsandifo@nildram.co.uk> writes:
>> at this stage because I'm not sure how it would work out speedwise.
>> (I suppose a byte-coded, interpreter-based implementation is likely
>> to have better code cache locality but worse data cache locality.
>
> On the other hand the byte code will likely be much more compact than
> the code, so you should use much less dcache.
Much less dcache than the open-coded version would use icache,
I assume you mean? Sure. But the dcache also has things like
the rtxes themselves (which ISTR someone finding had little locality
these days), the operands[] array, the const_int_rtx[] array,
peep2_current_count, the stack, etc. The more independent things
you're trying to access at the same time, the more chance you have
of cache line contention.
> And usually for the larger lower level caches d and i is shared
> anyways.
Sure.
If I understand correctly, you're simply saying that the byte-coded
implementation might well be faster. If so, I agree. To be clear,
"I'm not sure how it would work out speedwise" was supposed to mean
"it might well be faster, or it might be slower". My point was that
it is hard to predict. The only way of finding out would be to try it,
and since rewriting genrecog was only a side-issue for the problem I was
trying to solve, I didn't want to commit to both changes (tree->DFA,
open-coded->byte-coded) at once.
In both the message and the comments, I was trying to leave open the
possibility of moving to a byte-coded implementation in future.
Richard
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Using a DFA for insn-recog.c
2008-05-02 9:12 Using a DFA for insn-recog.c Richard Sandiford
2008-05-02 14:08 ` Michael Matz
2008-05-02 18:09 ` Andi Kleen
@ 2008-05-07 17:25 ` Mark Mitchell
2008-05-07 18:21 ` Richard Sandiford
2 siblings, 1 reply; 16+ messages in thread
From: Mark Mitchell @ 2008-05-07 17:25 UTC (permalink / raw)
To: gcc-patches, rsandifo
Richard Sandiford wrote:
> There has been talk in the past of making automatic extensions
> easier to express in .md files. See for example:
>
> http://gcc.gnu.org/ml/gcc/2004-07/msg01075.html
Would you please summarize the benefits of the patch? (That's not meant
to be a wise-guy question; I'm just getting lost in the details and
trying to understand.)
From your email, it seems that:
* Compilation time is approximately unchanged, despite what appears to
be more efficient code in recognizer which avoids checking predicates
multiple times
* Text side in resulting compiler is smaller
* Bootstrap time may have increased
Is that right?
Also, based on the preface to your email above, I'm guessing that this
patch might be providing leverage for some other improvement? Perhaps
that this paves the way for a future byte-coded implementation that will
deliver faster compile times? Or, do you think that the text size
improvements alone justify this patch? (I'm not saying they do not;
just asking.)
Or, did I misunderstand the benefits/costs entirely?
Thanks,
--
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Using a DFA for insn-recog.c
2008-05-07 17:25 ` Mark Mitchell
@ 2008-05-07 18:21 ` Richard Sandiford
2008-05-07 18:22 ` Richard Sandiford
2008-05-08 5:42 ` Mark Mitchell
0 siblings, 2 replies; 16+ messages in thread
From: Richard Sandiford @ 2008-05-07 18:21 UTC (permalink / raw)
To: Mark Mitchell; +Cc: gcc-patches
Mark Mitchell <mark@codesourcery.com> writes:
> Richard Sandiford wrote:
>> There has been talk in the past of making automatic extensions
>> easier to express in .md files. See for example:
>>
>> http://gcc.gnu.org/ml/gcc/2004-07/msg01075.html
>
> Would you please summarize the benefits of the patch? (That's not meant
> to be a wise-guy question; I'm just getting lost in the details and
> trying to understand.)
>
> From your email, it seems that:
>
> * Compilation time is approximately unchanged, despite what appears to
> be more efficient code in recognizer which avoids checking predicates
> multiple times
>
> * Text side in resulting compiler is smaller
>
> * Bootstrap time may have increased
>
> Is that right?
Yup.
> Also, based on the preface to your email above, I'm guessing that this
> patch might be providing leverage for some other improvement? Perhaps
> that this paves the way for a future byte-coded implementation that will
> deliver faster compile times? Or, do you think that the text size
> improvements alone justify this patch? (I'm not saying they do not;
> just asking.)
No, you're right.
Originally, the main motivation was to make it easier to handle things
like those discussed in the message I linked above. In other words,
I wanted to allow most MIPS operands of the form:
(match_operand:DI 0 "register_operand" "d")
to also match:
(aign_extend:DI (match_operand:SI 0 "register_operand" "d"))
(with appropriate conditions and changes elsewhere). The sort of
state sharing that we get with the DFA approach can significantly
reduce the size increase in recog.
However, although that was the main motivation for doing this in
the first place, I did hope that the change would be justified
for its own sake, on the grounds you give.
I suppose the argument would be more compelling if I could measure a
significant difference in execution speed. Unfortunately, results on
my machine seem to be quite noisy, even when it's more-or-less idle.
I'm also not sure what would be a good test. I don't know whether
there's anything whose gcc execution time is significantly affected
by recog.
(I wondered about using detailed profiling tools, but if that's the
only way to measure the result, would any improvement really count
for much?)
The argument would also be more compelling if there were no increase
in bootstrap time. (As I think you realise, the time increase is due
to slowness in gcc, rather than in the new genrecog, and I deliberately
picked the most extreme target.) For example, the argument would be
more compelling if I went straight to the byte-coded version and
measured no difference in execution speed (despite my reservations
about being able to measure this properly on my machine). The insn
table would obviously compile much faster than the open-coded version.
But like I say, I don't want to do both steps (tree->dfa, open-coded->
byte-coded) at once. I suppose a lot of the problem is that I don't
think it would be a good idea to pick just one byte encoding, try it,
and send it out. I'd need to investigate several different possibilities
and see which works out best. (A stack-based approach to handling the
rtx position? Separate jump actions? Fixed-width or variable-width
"instructions"? And plenty more besides.) It would be a lot of work,
and I'm only doing this in my spare time. ;) I can also imagine that
whatever encoding I pick would be controversial with someone.
If the change isn't acceptable on the grounds you give alone[*],
then I'd rather try to find a different approach to the MIPS problem.
I don't want any future MIPS-related change to tip the scales one way
or the other.
[*] And if it isn't, that's fine. I knew that was a risk all along.
Richard
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Using a DFA for insn-recog.c
2008-05-07 18:21 ` Richard Sandiford
@ 2008-05-07 18:22 ` Richard Sandiford
2008-05-08 5:42 ` Mark Mitchell
1 sibling, 0 replies; 16+ messages in thread
From: Richard Sandiford @ 2008-05-07 18:22 UTC (permalink / raw)
To: Mark Mitchell; +Cc: gcc-patches
Richard Sandiford <rsandifo@nildram.co.uk> writes:
> Mark Mitchell <mark@codesourcery.com> writes:
>> Also, based on the preface to your email above, I'm guessing that this
>> patch might be providing leverage for some other improvement? Perhaps
>> that this paves the way for a future byte-coded implementation that will
>> deliver faster compile times? Or, do you think that the text size
>> improvements alone justify this patch? (I'm not saying they do not;
>> just asking.)
>
> No, you're right.
Er, sorry for the ambiguousness. What I meant here, and what I meant
when I said "the reasons you give" later, was that the justifications
for the patch were:
- smaller text sizes
- fewer predicate tests and shorter paths through the generated functions
- paves the way for a byte-coded DFA implementation
Richard
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Using a DFA for insn-recog.c
2008-05-07 18:21 ` Richard Sandiford
2008-05-07 18:22 ` Richard Sandiford
@ 2008-05-08 5:42 ` Mark Mitchell
2008-05-09 16:45 ` Paolo Bonzini
1 sibling, 1 reply; 16+ messages in thread
From: Mark Mitchell @ 2008-05-08 5:42 UTC (permalink / raw)
To: Mark Mitchell, gcc-patches, rsandifo
Richard Sandiford wrote:
>> * Compilation time is approximately unchanged, despite what appears to
>> be more efficient code in recognizer which avoids checking predicates
>> multiple times
>>
>> * Text side in resulting compiler is smaller
>>
>> * Bootstrap time may have increased
>>
>> Is that right?
>
> Yup.
OK, good; at least I understand the objectives.
> I suppose the argument would be more compelling if I could measure a
> significant difference in execution speed.
Yes; I was hoping you were going to have some measurable improvement in
speed. However, I'm also not terribly worried about bootstrap times.
(As a fraction of test time, bootstrap time is small.)
> I don't want any future MIPS-related change to tip the scales one way
> or the other.
I don't think it would be a problem if that was the scale-tipper; making
infrastructure changes that allow something to be done better for some
architecture is fine, as long as it's not adversely impacting other
architectures.
I'm on the fence because the whole thing seems a bit of a wash at
present; not much improvement, not much harm, some risk just because
change is risky. Does anyone else have an opinion as to whether or not
this patch is sufficiently compelling to include?
Thanks,
--
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Using a DFA for insn-recog.c
2008-05-08 5:42 ` Mark Mitchell
@ 2008-05-09 16:45 ` Paolo Bonzini
2008-05-12 19:43 ` Richard Sandiford
0 siblings, 1 reply; 16+ messages in thread
From: Paolo Bonzini @ 2008-05-09 16:45 UTC (permalink / raw)
To: Mark Mitchell; +Cc: gcc-patches, rsandifo
> I'm on the fence because the whole thing seems a bit of a wash at
> present; not much improvement, not much harm, some risk just because
> change is risky. Does anyone else have an opinion as to whether or not
> this patch is sufficiently compelling to include?
Personally, I think the patch is very interesting and it would be a pity
to not get it in. What happens if you compile insn-recog.o at -O1?
Since time spent in recog is almost never measurable, it should improve
bootstrap time and not make the compiler much slower.
Paolo
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Using a DFA for insn-recog.c
2008-05-09 16:45 ` Paolo Bonzini
@ 2008-05-12 19:43 ` Richard Sandiford
2008-05-12 20:14 ` Mark Mitchell
2008-05-13 10:25 ` Andi Kleen
0 siblings, 2 replies; 16+ messages in thread
From: Richard Sandiford @ 2008-05-12 19:43 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Mark Mitchell, gcc-patches
Hi Paolo,
Paolo Bonzini <bonzini@gnu.org> writes:
>> I'm on the fence because the whole thing seems a bit of a wash at
>> present; not much improvement, not much harm, some risk just because
>> change is risky. Does anyone else have an opinion as to whether or not
>> this patch is sufficiently compelling to include?
>
> Personally, I think the patch is very interesting and it would be a pity
> to not get it in. What happens if you compile insn-recog.o at -O1?
> Since time spent in recog is almost never measurable, it should improve
> bootstrap time and not make the compiler much slower.
Thanks for the support. I'd rather not compile at -O though.
My understanding from Mark's response is that a one-minute increase
in bootstrap time is not itself a problem, so it wouldn't worth
compiling at a lower optimisation level for that reason alone.
I think Mark's point was that the size reductions are not in themselves
particularly exciting. The loadable size of an x86_64-linux-gnu cc1 is
about 10M on my machine, so the patch represents a saving of about 1%.
I can see that that's not much to write home about. The saving on ARM
(the biggest winner) is about 2.9%.
Removing recog entirely on x86_64-linux-gnu would only save about 4%.
Unless Mark's scales are different from mine, I imagine a 4% reduction
in loadable size wouldn't be significantly more exciting than a 2.9%
reduction, especially given the amount of memory we need at run time,
and given the risk involved in switching the implementation wholesale.
On that basis, I think changing recog is a bit of a dead end.
I'll set this patch aside and do the MIPS changes in some other way.
It was a fun patch to work on, so nothing's lost ;)
Richard
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Using a DFA for insn-recog.c
2008-05-12 19:43 ` Richard Sandiford
@ 2008-05-12 20:14 ` Mark Mitchell
2008-05-13 10:25 ` Andi Kleen
1 sibling, 0 replies; 16+ messages in thread
From: Mark Mitchell @ 2008-05-12 20:14 UTC (permalink / raw)
To: Paolo Bonzini, Mark Mitchell, gcc-patches, rdsandiford
Richard Sandiford wrote:
> I think Mark's point was that the size reductions are not in themselves
> particularly exciting. The loadable size of an x86_64-linux-gnu cc1 is
> about 10M on my machine, so the patch represents a saving of about 1%.
> I can see that that's not much to write home about. The saving on ARM
> (the biggest winner) is about 2.9%.
I hope I didn't come across as too negative. I just felt ambivalent; it
wasn't a big win, and it was a relatively big change, and so I didn't
get too excited. It's not like I would argue if someone else approved
the change, if people feel it's the right direction.
--
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Using a DFA for insn-recog.c
2008-05-12 19:43 ` Richard Sandiford
2008-05-12 20:14 ` Mark Mitchell
@ 2008-05-13 10:25 ` Andi Kleen
2008-05-13 10:45 ` Richard Sandiford
1 sibling, 1 reply; 16+ messages in thread
From: Andi Kleen @ 2008-05-13 10:25 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Mark Mitchell, gcc-patches, rdsandiford
Richard Sandiford <rdsandiford@googlemail.com> writes:
>
> I think Mark's point was that the size reductions are not in themselves
> particularly exciting. The loadable size of an x86_64-linux-gnu cc1 is
> about 10M on my machine, so the patch represents a saving of about 1%.
Looking at the total executable size is not very interesting because
it won't have a direct impact on performance . Some of that code might
be never loaded.
What would be more interesting is the working set size of the compiler
that has to be loaded into the CPU's caches.
You could probably determine this by comparing d/icache miss rate
between your patched and unpatched version for the same
program. If the miss rate is better with your version it's a win.
-Andi
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Using a DFA for insn-recog.c
2008-05-13 10:25 ` Andi Kleen
@ 2008-05-13 10:45 ` Richard Sandiford
0 siblings, 0 replies; 16+ messages in thread
From: Richard Sandiford @ 2008-05-13 10:45 UTC (permalink / raw)
To: Andi Kleen; +Cc: Paolo Bonzini, Mark Mitchell, gcc-patches
Andi Kleen <andi@firstfloor.org> writes:
> Richard Sandiford <rdsandiford@googlemail.com> writes:
>> I think Mark's point was that the size reductions are not in themselves
>> particularly exciting. The loadable size of an x86_64-linux-gnu cc1 is
>> about 10M on my machine, so the patch represents a saving of about 1%.
>
> Looking at the total executable size is not very interesting because
> it won't have a direct impact on performance . Some of that code might
> be never loaded.
But we seem to be talking past each other. I wasn't suggesting
that reducing the loadable size of cc1 would directly improve
performance (as in, compiler speed). At least not in cases
where no swap space is needed.
> What would be more interesting is the working set size of the compiler
> that has to be loaded into the CPU's caches.
>
> You could probably determine this by comparing d/icache miss rate
> between your patched and unpatched version for the same
> program. If the miss rate is better with your version it's a win.
But this goes back to what I said upthread:
(I wondered about using detailed profiling tools, but if that's the
only way to measure the result, would any improvement really count
for much?)
In other words, if the only way to measure the impact of the patch
is by pointing at miss rates, the patch does not justify itself from
a performance perspective. A necessary (but not sufficient) condition
for it to be justified from a performance perspective is that a user
would notice the difference for a particular testcase.
Richard
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2008-05-13 10:33 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-05-02 9:12 Using a DFA for insn-recog.c Richard Sandiford
2008-05-02 14:08 ` Michael Matz
2008-05-02 14:56 ` Richard Sandiford
2008-05-02 18:09 ` Andi Kleen
2008-05-02 19:43 ` Andrew Pinski
2008-05-02 20:26 ` Andi Kleen
2008-05-03 8:46 ` Richard Sandiford
2008-05-07 17:25 ` Mark Mitchell
2008-05-07 18:21 ` Richard Sandiford
2008-05-07 18:22 ` Richard Sandiford
2008-05-08 5:42 ` Mark Mitchell
2008-05-09 16:45 ` Paolo Bonzini
2008-05-12 19:43 ` Richard Sandiford
2008-05-12 20:14 ` Mark Mitchell
2008-05-13 10:25 ` Andi Kleen
2008-05-13 10:45 ` Richard Sandiford
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).