public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: Can we speed up the gcc_target structure?
@ 2004-01-19 23:42 Richard Kenner
  2004-01-19 23:46 ` Zack Weinberg
  0 siblings, 1 reply; 45+ messages in thread
From: Richard Kenner @ 2004-01-19 23:42 UTC (permalink / raw)
  To: zack; +Cc: gcc

    Whether by accident or intention you picked a lot of stuff having to
    do with register classes, 

Accident: I was going sequentially through alpha.h.

    Wasn't Michael Matz just saying that regclass.c needed a major rework
    anyway, or the new register allocator would never be able to replace
    the old?  I don't know what his design looks like, or even if he has
    one yet, but surely there is a simpler way to structure this, that
    doesn't involve lots of little tiny macros.

I don't see what that has to do with the specification of classes, which
are attributes of the machine.  Those macros are used by far more than
just regclass.c and the register allocator ...

^ permalink raw reply	[flat|nested] 45+ messages in thread
* Re: Can we speed up the gcc_target structure?
@ 2004-01-19 23:48 Richard Kenner
  0 siblings, 0 replies; 45+ messages in thread
From: Richard Kenner @ 2004-01-19 23:48 UTC (permalink / raw)
  To: zack; +Cc: gcc

    I'm speculating that a simpler specification of classes could be
    informed by the needs of the new register allocator.  That's all.

I don't see it.  The complexity of register class specifications is
dictated by the complexity of architectures, not anything we put into GCC:
it has to be expressive enough to describe all architectures.

Sure, it's somewhat redundant, in that we have maps both ways, and a few
macros could be eliminated in that process, but it doesn't seem worth the
effort to me.

^ permalink raw reply	[flat|nested] 45+ messages in thread
* Re: Can we speed up the gcc_target structure?
@ 2004-01-19 21:25 Richard Kenner
  2004-01-19 23:36 ` Zack Weinberg
  0 siblings, 1 reply; 45+ messages in thread
From: Richard Kenner @ 2004-01-19 21:25 UTC (permalink / raw)
  To: zack; +Cc: gcc

    > True for some, but not others.  Yes, we have a lot of macros which are
    > actually functions, but we also have a lot of macros that are just a
    > half dozen tokens which would have to be converted into a function.

    Do you have any in particular in mind?

If the idea is to eventually convert *all* target macros, then most are
in that category (though "half dozen" should probabaly have been "few dozen").

From alpha.h:

#define WORD_SWITCH_TAKES_ARG(STR)		\
 (!strcmp (STR, "rpath") || DEFAULT_WORD_SWITCH_TAKES_ARG(STR))
#define TARGET_FLOAT_FORMAT \
  (TARGET_FLOAT_VAX ? VAX_FLOAT_FORMAT : IEEE_FLOAT_FORMAT)
#define PROMOTE_MODE(MODE,UNSIGNEDP,TYPE)  \
  if (GET_MODE_CLASS (MODE) == MODE_INT		\
      && GET_MODE_SIZE (MODE) < UNITS_PER_WORD)	\
    {						\
      if ((MODE) == SImode)			\
	(UNSIGNEDP) = 0;			\
      (MODE) = DImode;				\
    }
#define HARD_REGNO_NREGS(REGNO, MODE)   \
  ((GET_MODE_SIZE (MODE) + UNITS_PER_WORD - 1) / UNITS_PER_WORD)
#define HARD_REGNO_MODE_OK(REGNO, MODE) 				\
  ((REGNO) >= 32 && (REGNO) <= 62 					\
   ? (MODE) == SFmode || (MODE) == DFmode || (MODE) == DImode		\
   : 1)
#define VECTOR_MODE_SUPPORTED_P(MODE) \
  (TARGET_MAX \
   && ((MODE) == V8QImode || (MODE) == V4HImode || (MODE) == V2SImode))
#define MODES_TIEABLE_P(MODE1, MODE2) 				\
  (HARD_REGNO_MODE_OK (32, (MODE1))				\
   ? HARD_REGNO_MODE_OK (32, (MODE2))				\
   : 1)
#define SECONDARY_MEMORY_NEEDED_MODE(MODE)		\
  (GET_MODE_CLASS (MODE) == MODE_FLOAT ? (MODE)		\
   : GET_MODE_SIZE (MODE) >= 4 ? (MODE)			\
   : mode_for_size (BITS_PER_WORD, GET_MODE_CLASS (MODE), 0))
#define CLASS_MAX_NREGS(CLASS, MODE)				\
 ((GET_MODE_SIZE (MODE) + UNITS_PER_WORD - 1) / UNITS_PER_WORD)
#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)		\
  (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO)			\
   ? reg_classes_intersect_p (FLOAT_REGS, CLASS) : 0)
#define REGISTER_MOVE_COST(MODE, CLASS1, CLASS2)	\
  (((CLASS1) == FLOAT_REGS) == ((CLASS2) == FLOAT_REGS)	\
   ? 2							\
   : TARGET_FIX ? 3 : 4+2*alpha_memory_latency)
#define MEMORY_MOVE_COST(MODE,CLASS,IN)  (2*alpha_memory_latency)
#define INITIAL_ELIMINATION_OFFSET(FROM, TO, OFFSET) \
  ((OFFSET) = alpha_initial_elimination_offset(FROM, TO))
#define FUNCTION_VALUE_REGNO_P(N)  \
  ((N) == 0 || (N) == 1 || (N) == 32 || (N) == 33)
#define FUNCTION_ARG_REGNO_P(N) \
  (((N) >= 16 && (N) <= 21) || ((N) >= 16 + 32 && (N) <= 21 + 32))
#define INIT_CUMULATIVE_ARGS(CUM,FNTYPE,LIBNAME,INDIRECT)  (CUM) = 0
#define FUNCTION_ARG_PASS_BY_REFERENCE(CUM, MODE, TYPE, NAMED) \
  ((MODE) == TFmode || (MODE) == TCmode)


... and so on ...

^ permalink raw reply	[flat|nested] 45+ messages in thread
* Re: Can we speed up the gcc_target structure?
@ 2004-01-19 19:05 Richard Kenner
  2004-01-19 21:15 ` Zack Weinberg
  0 siblings, 1 reply; 45+ messages in thread
From: Richard Kenner @ 2004-01-19 19:05 UTC (permalink / raw)
  To: zack; +Cc: gcc

    If done right, it ought to be simpler than target macros.  If you look
    at individual target macros in isolation, conversion from macro to
    hook invariably makes the back-end interface simpler, just because it
    forces you not to do the horrible define-here-redefine-there mess that
    is the current state of a lot of the macros.  

True for some, but not others.  Yes, we have a lot of macros which are actually
functions, but we also have a lot of macros that are just a half dozen
tokens which would have to be converted into a function.

^ permalink raw reply	[flat|nested] 45+ messages in thread
* Re: Can we speed up the gcc_target structure?
@ 2004-01-19 18:18 Richard Kenner
  2004-01-19 18:26 ` Zack Weinberg
  0 siblings, 1 reply; 45+ messages in thread
From: Richard Kenner @ 2004-01-19 18:18 UTC (permalink / raw)
  To: zack; +Cc: gcc

    Does your opinion change if the target parameters are properly
    redesigned, as I suggested in another message to this thread?

Somewhat, but I still wonder whether the complexity of such a scheme
is worth the ability to have common .o files.

^ permalink raw reply	[flat|nested] 45+ messages in thread
* Re:  Can we speed up the gcc_target structure?
@ 2004-01-19 11:51 Richard Kenner
  2004-01-19 12:01 ` Richard Guenther
                   ` (2 more replies)
  0 siblings, 3 replies; 45+ messages in thread
From: Richard Kenner @ 2004-01-19 11:51 UTC (permalink / raw)
  To: ian; +Cc: gcc

I have to say that I was never happy with the move to the target structure,
but couldn't completely put my finger on why.

One reason I didn't like it was a feeling that it made references to these
parameters lexically more complex and hence made the code harder to read, but
that's not a strong reason.

Howver, *this* is the reason I was trying to express: a significant fraction
of these parameters are constants on most targets and we lose that with a
move to the target structure.

I've seen the argument that it would be good to be able to have the binaries
for many of the compiler files be target-independent, but I don't see that as
a major argument given compilation speeds at the moment.  I think the loss in
compile-time performance is significant.

My sense would be to revert these changes and eliminate the target stucture
in favor of the simpler macro approach.

Am I the only one who feels this way?  If not, this may be an issue for the
SC to address.

^ permalink raw reply	[flat|nested] 45+ messages in thread
* Re: Can we speed up the gcc_target structure?
@ 2004-01-18 22:18 Chris Lattner
  2004-01-18 22:33 ` Jan Hubicka
  2004-01-18 22:36 ` Richard Henderson
  0 siblings, 2 replies; 45+ messages in thread
From: Chris Lattner @ 2004-01-18 22:18 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Kaveh R. Ghazi, ian, gcc, Joseph S. Myers


Richard Henderson wrote:
> On Sun, Jan 18, 2004 at 09:14:14PM +0000, Joseph S. Myers wrote:
> > When --enable-intermodule is used, does (or should) the compiler
> > manage to detect which parts of the target structure are in fact
> > constant (even without constifying)?

LLVM is very good at this kind of stuff.

> However, that sort of optimization requires that you see the *entire*
> program, not just large parts of it, as with the current intermodule
> code.  So I expect this sort of thing is relatively far away.

This is not really true.  At some point, the structure needs to be marked
as having internal linkage.  In LLVM, this is accomplished with the
"internalize" pass, which by default marks all symbols internal if the
linked program contains a main (ie, this does not happen for libraries).
This change enables a _lot_ of interprocedural optimizations that would
not be safe to perform otherwise.  Of course the internalize pass can be
completely disabled, or enabled for a list of symbols as needed.

Note that it is quite possible that the user would like to run the
internalize pass _before_ the whole program is available, for example, to
prune the public symbols exposed by a library.

> One possibility is a switch that says "except for main, nothing
> outside these files reference any of the symbols herein defined."
> That might get you the same effect as whole-program optimization
> without having to have extra info about external runtime libraries.

This is _extremely_ dangerous, and in practice, cannot be done.  A
compiler _very rarely_ has the entire program to analyze, and must
therefore be able to handle the fact that there is external code that can
access program structures (e.g., there can be precompiled libraries (ie,
libc, libm), dynamically loaded libraries (plugins), etc).

The nice thing about LLVM using the internalize pass is that if it is used
incorrectly, a program will not link.  If you use a "whole program"
compiler incorrectly, it will be silently misoptimized, which IMHO is
_much_ worse.

-Chris

-- 
http://llvm.cs.uiuc.edu/
http://www.nondot.org/~sabre/Projects/

^ permalink raw reply	[flat|nested] 45+ messages in thread
* Can we speed up the gcc_target structure?
@ 2004-01-18  8:37 Ian Lance Taylor
  2004-01-18  9:03 ` Zack Weinberg
                   ` (3 more replies)
  0 siblings, 4 replies; 45+ messages in thread
From: Ian Lance Taylor @ 2004-01-18  8:37 UTC (permalink / raw)
  To: gcc

Back in the old days, gcc had a lot of code which was conditionally
compiled with #ifdef.  That was ugly, but the resulting code was fast.
Over time, a lot of the parameters checked with #ifdef were converted
into macros which were checked at runtime using if.  That was less
ugly, and, since the macros normally had constant values, when gcc was
compiled with an optimizing compiler, the code was just as fast in the
normal case.  When it was slower, it was generally because the
compiler was doing something it couldn't do before.

More recently, some of those parameters have moved into the gcc_target
structure.  They are still checked at run time, but now the if
condition never has a constant value.  It always requires fetching a
value from memory in the target vector, and often requires calling a
function.  This results in cleaner, more comprehensible code.

However, it also slows the compiler down.

Just for fun, I converted every instance of
    targetm.calls.xxxx
to be
    TARGETM_CALLS_XXXX
instead.  Then I added stuff like this to the end of target.h:

#ifndef TARGETM_CALLS_PROMOTE_FUNCTION_ARGS
#define TARGETM_CALLS_PROMOTE_FUNCTION_ARGS(FNTYPE) \
  targetm.calls.promote_function_args ((FNTYPE))
#endif

Then I added stuff like this to i386.h:

#define TARGETM_CALLS_PROMOTE_FUNCTION_ARGS(FNTYPE) false

Then I rebuilt the compiler and tried it on some reasonably small C++
example (with a native i386 GNU/Linux compiler).  I saw compilation
speedups of up to 3% when compiling without optimization.  The
resulting assembler output was, as expected, identical.

These tests were far from rigorous.  However, compilation speed is a
concern these days, and this suggests that the target vector is a
measurable speed problem.

Somebody must have noticed this before, but I couldn't find anything
in the gcc mailing list.

It seems to me that we should try to find a way to regain the speed
which was lost when we switched to the target vector, without losing
the comprehensibility which was gained.

Here is a sketch of a possible approach which would require fairly
minimal changes in the way the target vector works today:

1) Turn hooks.c and targhooks.c into .h files which define inline
   functions (with appropriate fallbacks to support older non-gcc
   compilers for bootstrapping, of course).

2) Move all definitions of target initializer macros from tm.c files
   into new CPU-target.h files.

3) Include CPU-target.h at the end of target-def.h, where it will
   redefine and undefine target initializer macros.  For cases in
   which targetm is changed at run time, CPU-target.h must #undef the
   corresponding initializer macro (and CPU.c must #define it before
   initializing targetm) (alternatively, force targetm to be const,
   and adjust the relatively few cases in which it is changed at run
   time).

4) Change all uses of targetm.xxxx into code which uses TARGETM_XXXX
   macros, as above.

5) Define the TARGETM macros as either using the target vector or
   using the initializer macro from target-def.h.  The choice would be
   made based on whether the initializer macro was defined and
   probably based on some other control.

6) Now code which uses the new inline versions of hooks.c and
   targhooks.c, and which includes target-def.h and CPU-target.h, will
   automatically use the inlined versions of the functions when
   possible, and will see constant variable definitions when possible.

The main problem that I see with this approach is the requirement to
#undef an initializer macro which is changed at run time.  That's why
I suggest the alternative of making targetm const.

We can convert to this approach over time if we require a particular
macro to be defined in order to define the TARGETM macros as using the
initializer macros rather than the target vector.  Then a backend
which has been converted to use CPU-target.h would define that macro.

If we eventually want to configure gcc to support multiple target
vectors, that would still be possible.  When more than one target
vector was to be supported, the code would force the TARGETM macros to
always use the target vector.  This would be determined at configure
time.

I considered more complex approaches, such as creating a target.def
file which defined the target vector, but the basic problem boils down
to detecting when the target does not use the default version of a
target vector field.  Inventing CPU-target.h seems as effective an
approach as any to solving this particular problem.

Any thoughts?  Does anybody think this would be a waste of time?  Does
anybody have a better approach to solving the general problem?

Ian

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2004-01-19 23:48 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-01-19 23:42 Can we speed up the gcc_target structure? Richard Kenner
2004-01-19 23:46 ` Zack Weinberg
  -- strict thread matches above, loose matches on Subject: below --
2004-01-19 23:48 Richard Kenner
2004-01-19 21:25 Richard Kenner
2004-01-19 23:36 ` Zack Weinberg
2004-01-19 19:05 Richard Kenner
2004-01-19 21:15 ` Zack Weinberg
2004-01-19 18:18 Richard Kenner
2004-01-19 18:26 ` Zack Weinberg
2004-01-19 11:51 Richard Kenner
2004-01-19 12:01 ` Richard Guenther
2004-01-19 20:02   ` Richard Henderson
2004-01-19 14:16 ` Robert Dewar
2004-01-19 18:03 ` Zack Weinberg
2004-01-18 22:18 Chris Lattner
2004-01-18 22:33 ` Jan Hubicka
2004-01-18 22:40   ` Chris Lattner
2004-01-18 22:48     ` Jan Hubicka
2004-01-18 22:50       ` Chris Lattner
2004-01-18 23:27         ` Jan Hubicka
2004-01-18 23:34           ` Jakub Jelinek
2004-01-19  1:36           ` Chris Lattner
2004-01-18 22:42   ` Joseph S. Myers
2004-01-18 22:44     ` Chris Lattner
2004-01-18 22:36 ` Richard Henderson
2004-01-18 22:42   ` Chris Lattner
2004-01-18  8:37 Ian Lance Taylor
2004-01-18  9:03 ` Zack Weinberg
2004-01-18 14:09   ` Ian Lance Taylor
2004-01-18 22:25     ` Zack Weinberg
2004-01-19  0:53       ` Ian Lance Taylor
2004-01-19  1:18       ` Geoff Keating
2004-01-18 11:30 ` Joseph S. Myers
2004-01-18 13:58 ` Kaveh R. Ghazi
2004-01-18 19:54   ` Ian Lance Taylor
2004-01-18 20:10     ` Richard Henderson
2004-01-18 20:17       ` Ian Lance Taylor
2004-01-18 21:14   ` Joseph S. Myers
2004-01-18 22:05     ` Richard Henderson
2004-01-18 22:22       ` Jan Hubicka
2004-01-18 22:37         ` Richard Henderson
2004-01-19 19:33           ` DJ Delorie
2004-01-19 20:41             ` Richard Henderson
2004-01-19  1:12 ` Geoff Keating
2004-01-19 13:51   ` Ian Lance Taylor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).