From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 391FC385840E; Fri, 24 Nov 2023 16:35:36 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 391FC385840E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1700843736; bh=JZhnNOLvRNow5ozJuZ9MpDTZ5eC2Fp3kHnv3ROhvLo8=; h=From:To:Subject:Date:In-Reply-To:References:From; b=dhaqZHrLOUpXEUlz/vZTc6roIjdQ0qIbVauJfPxJu+tPeVrzYDeHgqHGm4+c9tKDe WO9fHeuWrAN2y0LTBODcbdmGBV296mEYTpfWyRSOCk13l702BdZWOTI2VsD+eLTqE8 3neT7NbwE7M32i3NkYo3VKa8faPjxncn3j43T6rA= From: "cvs-commit at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug middle-end/109849] suboptimal code for vector walking loop Date: Fri, 24 Nov 2023 16:35:15 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: middle-end X-Bugzilla-Version: 13.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: cvs-commit at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D109849 --- Comment #23 from CVS Commits --- The master branch has been updated by Martin Jambor : https://gcc.gnu.org/g:aae723d360ca26cd9fd0b039fb0a616bd0eae363 commit r14-5831-gaae723d360ca26cd9fd0b039fb0a616bd0eae363 Author: Martin Jambor Date: Fri Nov 24 17:32:35 2023 +0100 sra: SRA of non-escaped aggregates passed by reference to calls PR109849 shows that a loop that heavily pushes and pops from a stack implemented by a C++ std::vec results in slow code, mainly because the vector structure is not split by SRA and so we end up in many loads and stores into it. This is because it is passed by reference to (re)allocation methods and so needs to live in memory, even though it does not escape from them and so we could SRA it if we re-constructed it before the call and then separated it to distinct replacements afterwards. This patch does exactly that, first relaxing the selection of candidates to also include those which are addressable but do not escape and then adding code to deal with the calls. The micro-benchmark that is also the (scan-dump) testcase in this patch runs twice as fast with it than with current trunk. Honza measured its effect on the libjxl benchmark and it almost closes the performance gap between Clang and GCC while not requiring excessive inlining and thus code growth. The patch disallows creation of replacements for such aggregates which are also accessed with a precision smaller than their size because I have observed that this led to excessive zero-extending of data leading to slow-downs of perlbench (on some CPUs). Apart from this case I have not noticed any regressions, at least not so far. Gimple call argument flags can tell if an argument is unused (and then we do not need to generate any statements for it) or if it is not written to and then we do not need to generate statements loading replacements from the original aggregate after the call statement. Unfortunately, we cannot symmetrically use flags that an aggregate is not read because to avoid re-constructing the aggregate before the call because flags don't tell which what parts of aggregates were not written to, so we load all replacements, and so all need to have the correct value before the call. This version of the patch also takes care to avoid attempts to modify abnormal edges, something which was missing in the previosu version. gcc/ChangeLog: 2023-11-23 Martin Jambor PR middle-end/109849 * tree-sra.cc (passed_by_ref_in_call): New. (sra_initialize): Allocate passed_by_ref_in_call. (sra_deinitialize): Free passed_by_ref_in_call. (create_access): Add decl pool candidates only if they are not already candidates. (build_access_from_expr_1): Bail out on ADDR_EXPRs. (build_access_from_call_arg): New function. (asm_visit_addr): Rename to scan_visit_addr, change the disqualification dump message. (scan_function): Check taken addresses for all non-call stateme= nts, including phi nodes. Process all call arguments, including the static chain, build_access_from_call_arg. (maybe_add_sra_candidate): Relax need_to_live_in_memory check to allow non-escaped local variables. (sort_and_splice_var_accesses): Disallow smaller-than-precision replacements for aggregates passed by reference to functions. (sra_modify_expr): Use a separate stmt iterator for adding satements before the processed statement and after it. (enum out_edge_check): New type. (abnormal_edge_after_stmt_p): New function. (sra_modify_call_arg): New function. (sra_modify_assign): Adjust calls to sra_modify_expr. (sra_modify_function_body): Likewise, use sra_modify_call_arg to process call arguments, including the static chain. gcc/testsuite/ChangeLog: 2023-11-23 Martin Jambor PR middle-end/109849 * g++.dg/tree-ssa/pr109849.C: New test. * g++.dg/tree-ssa/sra-eh-1.C: Likewise. * gcc.dg/tree-ssa/pr109849.c: Likewise. * gcc.dg/tree-ssa/sra-longjmp-1.c: Likewise. * gfortran.dg/pr43984.f90: Added -fno-tree-sra to dg-options.=