I see, that refreshes my understanding on that. I have no concern now. Thx, Haochen From: H.J. Lu Sent: Wednesday, June 19, 2024 4:38 AM To: Jiang, Haochen Cc: Beulich, Jan ; Cui, Lili ; Binutils Subject: Re: [PATCH 6/6] x86: optimize {,V}PEXTR{D,Q} with immediate of 0 -Os should optimize for code size. Other optimizations should take performance into account. On Tue, Jun 18, 2024, 2:23 PM Jiang, Haochen > wrote: > >> Wait. While the compiler may use PSRLDQ here, based on knowing > >> assumptions > >> made elsewhere, the assembler can't: The replacement insn must generate the > >> exact same result in the destination register. PSRLDQ with an immediate of > >> 0 (which effectively you're suggesting to use here) doesn't alter the > >> destination register at all, though. When really we want the upper bits of > >> the register cleared. > > > > pextrd/q also doesn't clear them at all. For vpextrd/q and vpsrldq, they will > > both clear higher bits. So they will be the same. > > Wait - your suggestion is even more confusing: The destination of PSRLDQ is > an XMM register, whereas the destination of PEXTR* is a GPR or memory. This > is properly expressed in the constraints in the compiler, but clearly we > can't replace insns like this in the assembler. Yes, I realized that I am wrong here, there are no constraints. vmovd/q would be definitely better and doable here if we would like to do something. > >>> Also, I suppose the optimization related to latency should not be done in > >>> assembler. > >> > >> Why? We have -O, -O1, and -O2 alongside -Os for a reason. > > > > I am quite conservative on the optimization in assembler. If we are also going to > > optimize those hand-written code, the optimization could work. > > > > However, when they hand write some code, are we supposed to change them? > > Well, if we aren't to, people simply don't pass -O. > > > For -Os, we could give them all the optimizations we have, but for -O, I am not > > that sure. > > > > And I suppose we might add too much burden for the assembler if we are going > > to add too much optimizations related to latency. It will become another compiler. > > Are we supposed to copy all the optimizations from compiler? > > Probably not all (and many aren't the the insn level anyway, nor do we - so > far at least - optimize for latency/throughput at the expense of code size). > But yes - this specific aspect is why I keep raising questions on what > optimizations are worth it vs where we'd better leave code alone. H.J., what is your opinion on that? Thx, Haochen > > Jan > > > IMO, optimization to > > codesize is ok, but for latency, I am a little concerned. > > > > Thx, > > Haochen > > > >> > >> Jan