* How do I stop gcc from loading data into registers when that's not needed? @ 2018-05-18 18:03 Paul Koning 2018-05-18 18:07 ` Richard Biener 0 siblings, 1 reply; 8+ messages in thread From: Paul Koning @ 2018-05-18 18:03 UTC (permalink / raw) To: GCC Gents, In some targets, like pdp11 and vax, most instructions can reference data in memory directly. So when I have "if (x < y) ..." I would expect something like this: cmpw x, y bgeq 1f ... What I actually see, with -O2 and/or -Os, is: movw x, r0 movw y, r1 cmpw r0, r1 bgeq 1f ... which is both longer and slower. I can't tell why this happens, or how to stop it. The machine description has "general_operand" so it doesn't seem to be the place that forces things into registers. paul ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: How do I stop gcc from loading data into registers when that's not needed? 2018-05-18 18:03 How do I stop gcc from loading data into registers when that's not needed? Paul Koning @ 2018-05-18 18:07 ` Richard Biener [not found] ` <018C29D6-6245-4D31-B43B-623E080A6F87@comcast.net> 0 siblings, 1 reply; 8+ messages in thread From: Richard Biener @ 2018-05-18 18:07 UTC (permalink / raw) To: gcc, Paul Koning, GCC On May 18, 2018 8:03:05 PM GMT+02:00, Paul Koning <paulkoning@comcast.net> wrote: >Gents, > >In some targets, like pdp11 and vax, most instructions can reference >data in memory directly. > >So when I have "if (x < y) ..." I would expect something like this: > > cmpw x, y > bgeq 1f > ... > >What I actually see, with -O2 and/or -Os, is: > > movw x, r0 > movw y, r1 > cmpw r0, r1 > bgeq 1f > ... > >which is both longer and slower. I can't tell why this happens, or how >to stop it. The machine description has "general_operand" so it >doesn't seem to be the place that forces things into registers. I would expect combine to merge the load and arithmetic and thus it is eventually the target costing that makes that not succeed. Richard. > paul ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <018C29D6-6245-4D31-B43B-623E080A6F87@comcast.net>]
* Re: How do I stop gcc from loading data into registers when that's not needed? [not found] ` <018C29D6-6245-4D31-B43B-623E080A6F87@comcast.net> @ 2018-05-22 8:49 ` Richard Biener 2018-05-22 19:26 ` Segher Boessenkool 0 siblings, 1 reply; 8+ messages in thread From: Richard Biener @ 2018-05-22 8:49 UTC (permalink / raw) To: paulkoning, GCC Development On Tue, May 22, 2018 at 2:19 AM Paul Koning <paulkoning@comcast.net> wrote: > > On May 18, 2018, at 2:07 PM, Richard Biener <richard.guenther@gmail.com> wrote: > > > > On May 18, 2018 8:03:05 PM GMT+02:00, Paul Koning < paulkoning@comcast.net> wrote: > >> Gents, > >> > >> In some targets, like pdp11 and vax, most instructions can reference > >> data in memory directly. > >> > >> So when I have "if (x < y) ..." I would expect something like this: > >> > >> cmpw x, y > >> bgeq 1f > >> ... > >> > >> What I actually see, with -O2 and/or -Os, is: > >> > >> movw x, r0 > >> movw y, r1 > >> cmpw r0, r1 > >> bgeq 1f > >> ... > >> > >> which is both longer and slower. I can't tell why this happens, or how > >> to stop it. The machine description has "general_operand" so it > >> doesn't seem to be the place that forces things into registers. > > > > I would expect combine to merge the load and arithmetic and thus it is eventually the target costing that makes that not succeed. > > > > Richard. > Thanks Richard. I am not having a whole lot of luck figuring out where precisely I need to adjust or how to make the adjustment. I'm doing the adjusting on the pdp11 port right now. That has a TARGET_RTX_COSTS hook which looks fairly plausible. It doesn't currently have a TARGET_MEMORY_MOVE_COST, or TARGET_ADDRESS_COST, or TARGET_INSN_COST. It is likely that I need some or all of those to get this working better? If yes, any hints you can offer where to start? Sorry, I'm not very familiar with this area of GCC either. Did you confirm that combine at least tries to merge the memory ops into the instruction? Maybe it is only RA / reload that will try. You can look at how it works for x86, for example whether there's already memory ops in the stmts during RTL expansion which can happen I think when the load has a single use (and if that works for pdp11). Richard. > paul ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: How do I stop gcc from loading data into registers when that's not needed? 2018-05-22 8:49 ` Richard Biener @ 2018-05-22 19:26 ` Segher Boessenkool 2018-05-23 0:50 ` Paul Koning 0 siblings, 1 reply; 8+ messages in thread From: Segher Boessenkool @ 2018-05-22 19:26 UTC (permalink / raw) To: Richard Biener; +Cc: paulkoning, GCC Development On Tue, May 22, 2018 at 10:49:35AM +0200, Richard Biener wrote: > On Tue, May 22, 2018 at 2:19 AM Paul Koning <paulkoning@comcast.net> wrote: > > > On May 18, 2018, at 2:07 PM, Richard Biener <richard.guenther@gmail.com> > wrote: > > > On May 18, 2018 8:03:05 PM GMT+02:00, Paul Koning < > paulkoning@comcast.net> wrote: > > >> In some targets, like pdp11 and vax, most instructions can reference > > >> data in memory directly. > > >> > > >> So when I have "if (x < y) ..." I would expect something like this: > > >> > > >> cmpw x, y > > >> bgeq 1f > > >> ... > > >> > > >> What I actually see, with -O2 and/or -Os, is: > > >> > > >> movw x, r0 > > >> movw y, r1 > > >> cmpw r0, r1 > > >> bgeq 1f > > >> ... > > >> > > >> which is both longer and slower. I can't tell why this happens, or how > > >> to stop it. The machine description has "general_operand" so it > > >> doesn't seem to be the place that forces things into registers. > > > > > > I would expect combine to merge the load and arithmetic and thus it is > eventually the target costing that makes that not succeed. > > > Thanks Richard. I am not having a whole lot of luck figuring out where > > precisely I need to adjust or how to make the adjustment. I'm doing the > > adjusting on the pdp11 port right now. That has a TARGET_RTX_COSTS hook > > which looks fairly plausible. It doesn't currently have a > > TARGET_MEMORY_MOVE_COST, or TARGET_ADDRESS_COST, or TARGET_INSN_COST. It > > is likely that I need some or all of those to get this working better? If > > yes, any hints you can offer where to start? -fdump-rtl-combine-all (or just -da or -dap), and then look at the dump file. Does combine try this combination? If so, it will tell you what the resulting costs are. If not, why does it not try it? > Sorry, I'm not very familiar with this area of GCC either. Did you confirm > that combine at least tries to merge the memory ops into the instruction? It should, it's a simple reg dependency. In many cases it will even do it if it is not single-use (via a 3->2 combination). Segher ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: How do I stop gcc from loading data into registers when that's not needed? 2018-05-22 19:26 ` Segher Boessenkool @ 2018-05-23 0:50 ` Paul Koning 2018-05-23 9:47 ` Richard Biener 0 siblings, 1 reply; 8+ messages in thread From: Paul Koning @ 2018-05-23 0:50 UTC (permalink / raw) To: Segher Boessenkool; +Cc: GCC Development > On May 22, 2018, at 3:26 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote: > > > -fdump-rtl-combine-all (or just -da or -dap), and then look at the dump > file. Does combine try this combination? If so, it will tell you what > the resulting costs are. If not, why does it not try it? > >> Sorry, I'm not very familiar with this area of GCC either. Did you confirm >> that combine at least tries to merge the memory ops into the instruction? > > It should, it's a simple reg dependency. In many cases it will even do > it if it is not single-use (via a 3->2 combination). I examined what gcc does with two simple functions: void c2(void) { if (x < y) z = 1; else if (x != y) z = 42; else z = 9; } void c3(void) { if (x < y) z = 1; else z = 9; } Two things popped out. 1. The original RTL (from the expand phase) has a memory->register move for x and y in c2, but it doesn't for c3 (it simply generates a memory/memory compare there). What triggers the different choice in that phase? 2. The reported costs for the various insns are r22:HI=['x'] 6 cmp(r22:HI,r23:HI) 4 cmp(['x'],['y']) 16 so the added cost for the memory argument in the cmp is 6 -- the same as the whole cost for the mov. That certainly explains the behavior. It isn't what I want it to be. Which target hook(s) are involved in these numbers? I don't see them in my rtx_costs hook. paul ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: How do I stop gcc from loading data into registers when that's not needed? 2018-05-23 0:50 ` Paul Koning @ 2018-05-23 9:47 ` Richard Biener 2018-05-24 0:33 ` Paul Koning 0 siblings, 1 reply; 8+ messages in thread From: Richard Biener @ 2018-05-23 9:47 UTC (permalink / raw) To: paulkoning; +Cc: Segher Boessenkool, GCC Development On Wed, May 23, 2018 at 2:50 AM Paul Koning <paulkoning@comcast.net> wrote: > > On May 22, 2018, at 3:26 PM, Segher Boessenkool < segher@kernel.crashing.org> wrote: > > > > > > -fdump-rtl-combine-all (or just -da or -dap), and then look at the dump > > file. Does combine try this combination? If so, it will tell you what > > the resulting costs are. If not, why does it not try it? > > > >> Sorry, I'm not very familiar with this area of GCC either. Did you confirm > >> that combine at least tries to merge the memory ops into the instruction? > > > > It should, it's a simple reg dependency. In many cases it will even do > > it if it is not single-use (via a 3->2 combination). > I examined what gcc does with two simple functions: > void c2(void) > { > if (x < y) > z = 1; > else if (x != y) > z = 42; > else > z = 9; > } > void c3(void) > { > if (x < y) > z = 1; > else > z = 9; > } > Two things popped out. > 1. The original RTL (from the expand phase) has a memory->register move for x and y in c2, but it doesn't for c3 (it simply generates a memory/memory compare there). What triggers the different choice in that phase? > 2. The reported costs for the various insns are > r22:HI=['x'] 6 > cmp(r22:HI,r23:HI) 4 > cmp(['x'],['y']) 16 > so the added cost for the memory argument in the cmp is 6 -- the same as the whole cost for the mov. That certainly explains the behavior. It isn't what I want it to be. Which target hook(s) are involved in these numbers? I don't see them in my rtx_costs hook. The rtx_cost hook. I think the costs above make sense. There's also a new insn_cost hook but you have to dig whether combine uses that. Otherwise address_cost might be involved. Richard. > paul ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: How do I stop gcc from loading data into registers when that's not needed? 2018-05-23 9:47 ` Richard Biener @ 2018-05-24 0:33 ` Paul Koning 2018-05-24 18:25 ` Segher Boessenkool 0 siblings, 1 reply; 8+ messages in thread From: Paul Koning @ 2018-05-24 0:33 UTC (permalink / raw) To: Richard Biener; +Cc: Segher Boessenkool, GCC Development > On May 23, 2018, at 5:46 AM, Richard Biener <richard.guenther@gmail.com> wrote: > > ... > >> 2. The reported costs for the various insns are >> r22:HI=['x'] 6 >> cmp(r22:HI,r23:HI) 4 >> cmp(['x'],['y']) 16 >> so the added cost for the memory argument in the cmp is 6 -- the same >> as the whole cost for the mov. That certainly explains the behavior. It >> isn't what I want it to be. Which target hook(s) are involved in these >> numbers? I don't see them in my rtx_costs hook. > > The rtx_cost hook. I think the costs above make sense. There's also a > new insn_cost hook but you have to dig whether combine uses that. > Otherwise address_cost might be involved. Thanks. For a pdp11, those costs aren't right because mov and cmp and a memory reference each have about the same cost. So 8, 4, 12 would be closer. But the real question for me at this point is where to find the knobs that adjust these choices. The various cost hooks have me confused, and the GCCINT manual is not really enlightening. There is rtx_costs, insn_cost, and addr_cost. It sort of feels like insn_cost and addr_cost together would provide roughly the same information that rtx_costs gives. In the existing platforms, I see rtx_costs everywhere, addr_cost in a fair number of targets, and insn_cost in just one (rs6000). Can someone explain the interaction of these various cost hooks, and what happens if you define various combinations of the three? paul ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: How do I stop gcc from loading data into registers when that's not needed? 2018-05-24 0:33 ` Paul Koning @ 2018-05-24 18:25 ` Segher Boessenkool 0 siblings, 0 replies; 8+ messages in thread From: Segher Boessenkool @ 2018-05-24 18:25 UTC (permalink / raw) To: Paul Koning; +Cc: Richard Biener, GCC Development On Wed, May 23, 2018 at 08:33:13PM -0400, Paul Koning wrote: > > On May 23, 2018, at 5:46 AM, Richard Biener <richard.guenther@gmail.com> wrote: > >> 2. The reported costs for the various insns are > >> r22:HI=['x'] 6 > >> cmp(r22:HI,r23:HI) 4 > >> cmp(['x'],['y']) 16 > >> so the added cost for the memory argument in the cmp is 6 -- the same > >> as the whole cost for the mov. That certainly explains the behavior. It > >> isn't what I want it to be. Which target hook(s) are involved in these > >> numbers? I don't see them in my rtx_costs hook. > > > > The rtx_cost hook. I think the costs above make sense. There's also a > > new insn_cost hook but you have to dig whether combine uses that. > > Otherwise address_cost might be involved. > > Thanks. For a pdp11, those costs aren't right because mov and cmp and > a memory reference each have about the same cost. So 8, 4, 12 would be > closer. But the real question for me at this point is where to find > the knobs that adjust these choices. > > The various cost hooks have me confused, and the GCCINT manual is not > really enlightening. There is rtx_costs, insn_cost, and addr_cost. > It sort of feels like insn_cost and addr_cost together would provide > roughly the same information that rtx_costs gives. In the existing > platforms, I see rtx_costs everywhere, addr_cost in a fair number of > targets, and insn_cost in just one (rs6000). Can someone explain the > interaction of these various cost hooks, and what happens if you define > various combinations of the three? rtx_costs computes the cost for any rtx (an insn, a set, a set source, any random piece of one). set_src_cost, set_rtx_cost, etc. are helper functions that use that. Those functions do not work for parallels. Also, costs are not additive like this simplified model assumes. Also, more complex backends tend to miss many cases in their rtx_costs function. Many passes that want costs want to know the cost of a full insn. Like combine. That's why I created insn_cost: it solves all of the above problems. I'll hopefully make most passes use insn_cost for GCC 9. All of the very easy ones already do. Segher ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2018-05-24 18:25 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-05-18 18:03 How do I stop gcc from loading data into registers when that's not needed? Paul Koning 2018-05-18 18:07 ` Richard Biener [not found] ` <018C29D6-6245-4D31-B43B-623E080A6F87@comcast.net> 2018-05-22 8:49 ` Richard Biener 2018-05-22 19:26 ` Segher Boessenkool 2018-05-23 0:50 ` Paul Koning 2018-05-23 9:47 ` Richard Biener 2018-05-24 0:33 ` Paul Koning 2018-05-24 18:25 ` Segher Boessenkool
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).