From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id 126283858D39; Tue, 26 Oct 2021 14:53:11 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 126283858D39
From: "hubicka at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: =?UTF-8?B?W0J1ZyBtaWRkbGUtZW5kLzEwMjk0OF0gTmV3OiA2MCUgYnVpbGQg?=
 =?UTF-8?B?dGltZSByZWdyZXNzaW9uIG9uIGdhbWVzcyBpbiByYW5nZSAyZmMyZTM5MTdm?=
 =?UTF-8?B?OWM4ZmQ5NGY1ZDEwMTQ3Nzk3MWQxNmM0ODNlZjg4Li4uYzE2ZjIxYzdjZjk3?=
 =?UTF-8?B?Y2U0ODk2N2U0MmQzYjVkMjJlYTE2OWE5YzJjOA==?=
Date: Tue, 26 Oct 2021 14:53:10 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: middle-end
X-Bugzilla-Version: 12.0
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: hubicka at gcc dot gnu.org
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status
 bug_severity priority component assigned_to reporter target_milestone
Message-ID: <bug-102948-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 26 Oct 2021 14:53:11 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D102948

            Bug ID: 102948
           Summary: 60% build time regression on gamess in range
                    2fc2e3917f9c8fd94f5d101477971d16c483ef88...c16f21c7cf9
                    7ce48967e42d3b5d22ea169a9c2c8
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=3D322.50.8&plot.1=
=3D307.50.8&plot.2=3D343.50.8&plot.3=3D266.50.8&plot.4=3D395.50.8&plot.5=3D=
412.50.8&plot.6=3D289.50.8&
this is with -Ofast -march=3Dnative -flto on zen

commit c16f21c7cf97ce48967e42d3b5d22ea169a9c2c8
Author: liuhongt <hongtao.liu@intel.com>
Date:   Wed Aug 4 18:43:22 2021 +0800

    Support cond_{xor,ior,and} for vector integer mode under AVX512.

    gcc/ChangeLog:

            * config/i386/sse.md (cond_<code><mode>): New expander.

    gcc/testsuite/ChangeLog:

            * gcc.target/i386/cond_op_anylogic_d-1.c: New test.
            * gcc.target/i386/cond_op_anylogic_d-2.c: New test.
            * gcc.target/i386/cond_op_anylogic_q-1.c: New test.
            * gcc.target/i386/cond_op_anylogic_q-2.c: New test.

commit f7aa81892eb54bc040ee6f7fd6134d800a5ee89c
Author: liuhongt <hongtao.liu@intel.com>
Date:   Wed Aug 4 18:15:43 2021 +0800

    Support cond_{smax,smin} for vector float/double modes under AVX512.

    gcc/ChangeLog:

            * config/i386/sse.md (cond_<code><mode>): New expander.

    gcc/testsuite/ChangeLog:

            * gcc.target/i386/cond_op_maxmin_double-1.c: New test.
            * gcc.target/i386/cond_op_maxmin_double-2.c: New test.
            * gcc.target/i386/cond_op_maxmin_float-1.c: New test.
            * gcc.target/i386/cond_op_maxmin_float-2.c: New test.

commit 9a8c3fc2b2cc6d73b2e3006625fca2b588ebc1b0
Author: liuhongt <hongtao.liu@intel.com>
Date:   Wed Aug 4 16:03:58 2021 +0800

    Support cond_{smax,smin,umax,umin} for vector integer modes under AVX51=
2.

    gcc/ChangeLog:

            * config/i386/sse.md (cond_<code><mode>): New expander.

    gcc/testsuite/ChangeLog:

            * gcc.target/i386/cond_op_maxmin_b-1.c: New test.
            * gcc.target/i386/cond_op_maxmin_b-2.c: New test.
            * gcc.target/i386/cond_op_maxmin_d-1.c: New test.
            * gcc.target/i386/cond_op_maxmin_d-2.c: New test.
            * gcc.target/i386/cond_op_maxmin_q-1.c: New test.
            * gcc.target/i386/cond_op_maxmin_q-2.c: New test.
            * gcc.target/i386/cond_op_maxmin_ub-1.c: New test.
            * gcc.target/i386/cond_op_maxmin_ub-2.c: New test.
            * gcc.target/i386/cond_op_maxmin_ud-1.c: New test.
            * gcc.target/i386/cond_op_maxmin_ud-2.c: New test.
            * gcc.target/i386/cond_op_maxmin_uq-1.c: New test.
            * gcc.target/i386/cond_op_maxmin_uq-2.c: New test.
            * gcc.target/i386/cond_op_maxmin_uw-1.c: New test.
            * gcc.target/i386/cond_op_maxmin_uw-2.c: New test.
            * gcc.target/i386/cond_op_maxmin_w-1.c: New test.
            * gcc.target/i386/cond_op_maxmin_w-2.c: New test.

commit 2697f8324fbb09b0d92036ba6a6b8a2b8d256b23
Author: GCC Administrator <gccadmin@gcc.gnu.org>
Date:   Thu Aug 5 00:17:03 2021 +0000

    Daily bump.

commit ded2c2c068f6f2825474758cb03a05070a5837e8
Author: David Malcolm <dmalcolm@redhat.com>
Date:   Wed Aug 4 18:21:21 2021 -0400

    analyzer: initial implementation of asm support [PR101570]

    gcc/ChangeLog:
            PR analyzer/101570
            * Makefile.in (ANALYZER_OBJS): Add analyzer/region-model-asm.o.

    gcc/analyzer/ChangeLog:
            PR analyzer/101570
            * analyzer.cc (maybe_reconstruct_from_def_stmt): Add GIMPLE_ASM
            case.
            * analyzer.h (class asm_output_svalue): New forward decl.
            (class reachable_regions): New forward decl.
            * complexity.cc (complexity::from_vec_svalue): New.
            * complexity.h (complexity::from_vec_svalue): New decl.
            * engine.cc (feasibility_state::maybe_update_for_edge): Handle
            asm stmts by calling on_asm_stmt.
            * region-model-asm.cc: New file.
            * region-model-manager.cc
            (region_model_manager::maybe_fold_asm_output_svalue): New.
            (region_model_manager::get_or_create_asm_output_svalue): New.
            (region_model_manager::log_stats): Log m_asm_output_values_map.
            * region-model.cc (region_model::on_stmt_pre): Handle GIMPLE_AS=
M.
            * region-model.h (visitor::visit_asm_output_svalue): New.
            (region_model_manager::get_or_create_asm_output_svalue): New de=
cl.
            (region_model_manager::maybe_fold_asm_output_svalue): New decl.
            (region_model_manager::asm_output_values_map_t): New typedef.
            (region_model_manager::m_asm_output_values_map): New field.
            (region_model::on_asm_stmt): New.
            * store.cc (binding_cluster::on_asm): New.
            * store.h (binding_cluster::on_asm): New decl.
            * svalue.cc (svalue::cmp_ptr): Handle SK_ASM_OUTPUT.
            (asm_output_svalue::dump_to_pp): New.
            (asm_output_svalue::dump_input): New.
            (asm_output_svalue::input_idx_to_asm_idx): New.
            (asm_output_svalue::accept): New.
            * svalue.h (enum svalue_kind): Add SK_ASM_OUTPUT.
            (svalue::dyn_cast_asm_output_svalue): New.
            (class asm_output_svalue): New.
            (is_a_helper <const asm_output_svalue *>::test): New.
            (struct default_hash_traits<asm_output_svalue::key_t>): New.

    gcc/testsuite/ChangeLog:
            PR analyzer/101570
            * gcc.dg/analyzer/asm-x86-1.c: New test.
            * gcc.dg/analyzer/asm-x86-lp64-1.c: New test.
            * gcc.dg/analyzer/asm-x86-lp64-2.c: New test.
            * gcc.dg/analyzer/pr101570.c: New test.
            * gcc.dg/analyzer/torture/asm-x86-linux-array_index_mask_nospec=
.c:
            New test.
            * gcc.dg/analyzer/torture/asm-x86-linux-cpuid-paravirt-1.c: New
            test.
            * gcc.dg/analyzer/torture/asm-x86-linux-cpuid-paravirt-2.c: New
            test.
            * gcc.dg/analyzer/torture/asm-x86-linux-cpuid.c: New test.
            * gcc.dg/analyzer/torture/asm-x86-linux-rdmsr-paravirt.c: New
            test.
            * gcc.dg/analyzer/torture/asm-x86-linux-rdmsr.c: New test.
            * gcc.dg/analyzer/torture/asm-x86-linux-wfx_get_ps_timeout-full=
.c:
            New test.
            *
gcc.dg/analyzer/torture/asm-x86-linux-wfx_get_ps_timeout-reduced.c:
            New test.

    Signed-off-by: David Malcolm <dmalcolm@redhat.com>

commit 5738a64f8b3cf132b88b39af84b9f5f5a9a1554c
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Tue Aug 3 06:17:22 2021 -0700

    x86: Update STORE_MAX_PIECES

    Update STORE_MAX_PIECES to allow 16/32/64 bytes only if inter-unit move
    is enabled since vec_duplicate enabled by inter-unit move is used to
    implement store_by_pieces of 16/32/64 bytes.

    gcc/

            PR target/101742
            * config/i386/i386.h (STORE_MAX_PIECES): Allow 16/32/64 bytes
            only if TARGET_INTER_UNIT_MOVES_TO_VEC is true.

    gcc/testsuite/

            PR target/101742
            * gcc.target/i386/pr101742a.c: New test.
            * gcc.target/i386/pr101742b.c: Likewise.

commit 09dba016db937e61be21ef1e9581065a9ed2847d
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Wed Aug 4 06:15:04 2021 -0700

    x86: Avoid stack realignment when copying data with SSE register

    To avoid stack realignment, call ix86_gen_scratch_sse_rtx to get a
    scratch SSE register to copy data with with SSE register from one
    memory location to another.

    gcc/

            PR target/101772
            * config/i386/i386-expand.c (ix86_expand_vector_move): Call
            ix86_gen_scratch_sse_rtx to get a scratch SSE register to copy
            data with SSE register from one memory location to another.

    gcc/testsuite/

            PR target/101772
            * gcc.target/i386/eh_return-2.c: New test.

commit 361da782a25031c6ae3967bf8c10a8119845255c
Author: Andreas Krebbel <krebbel@linux.ibm.com>
Date:   Wed Aug 4 18:40:11 2021 +0200

    IBM Z: Implement TARGET_VECTORIZE_VEC_PERM_CONST for vpdi

    This patch makes use of the vector permute double immediate
    instruction for constant permute vectors.

    gcc/ChangeLog:

            * config/s390/s390.c (expand_perm_with_vpdi): New function.
            (vectorize_vec_perm_const_1): Call expand_perm_with_vpdi.
            * config/s390/vector.md (*vpdi1<mode>, @vpdi1<mode>): Enable a
            parameterized expander.
            (*vpdi4<mode>, @vpdi4<mode>): Likewise.

    gcc/testsuite/ChangeLog:

            * gcc.target/s390/vector/perm-vpdi.c: New test.

commit 6dc8c4656444153c9e2f98d382de39728a849672
Author: Andreas Krebbel <krebbel@linux.ibm.com>
Date:   Wed Aug 4 18:40:10 2021 +0200

    IBM Z: Implement TARGET_VECTORIZE_VEC_PERM_CONST for vector merge

    This patch implements the TARGET_VECTORIZE_VEC_PERM_CONST in the IBM Z
    backend. The initial implementation only exploits the vector merge
    instruction but there is more to come.

    gcc/ChangeLog:

            * config/s390/s390.c (MAX_VECT_LEN): Define macro.
            (struct expand_vec_perm_d): Define struct.
            (expand_perm_with_merge): New function.
            (vectorize_vec_perm_const_1): New function.
            (s390_vectorize_vec_perm_const): New function.
            (TARGET_VECTORIZE_VEC_PERM_CONST): Define target macro.

    gcc/testsuite/ChangeLog:

            * gcc.target/s390/vector/perm-merge.c: New test.
            * gcc.target/s390/vector/vec-types.h: New test.

commit 4e34925ef1aeab73e022d80149be8cec92c48667
Author: Andreas Krebbel <krebbel@linux.ibm.com>
Date:   Wed Aug 4 18:40:10 2021 +0200

    IBM Z: Remove redundant V_HW_64 mode iterator.

    gcc/ChangeLog:

            * config/s390/vector.md (V_HW_64): Remove mode iterator.
            (*vec_load_pair<mode>): Use V_HW_2 instead of V_HW_64.
            * config/s390/vx-builtins.md
            (vec_scatter_element<V_HW_2:mode>_SI): Use V_HW_2 instead of
            V_HW_64.

commit 0aa7091befa9fdb67f7013dbd454d336a31ef71d
Author: Andreas Krebbel <krebbel@linux.ibm.com>
Date:   Wed Aug 4 18:40:09 2021 +0200

    IBM Z: Get rid of vpdi unspec

    The patch gets rid of the unspec used for the vector permute double
    immediate instruction and replaces it with generic rtx.

    gcc/ChangeLog:

            * config/s390/s390.md (UNSPEC_VEC_PERMI): Remove constant
            definition.
            * config/s390/vector.md (*vpdi1<mode>, *vpdi4<mode>): New patte=
rn
            definitions.
            * config/s390/vx-builtins.md (*vec_permi<mode>): Emit generic r=
tx
            instead of an unspec.

    gcc/testsuite/ChangeLog:

            * gcc.target/s390/zvector/vec-permi.c: Removed.
            * gcc.target/s390/zvector/vec_permi.c: New test.

commit 5391688acc997e26375e42340cea885fa6ad0d7d
Author: Andreas Krebbel <krebbel@linux.ibm.com>
Date:   Wed Aug 4 18:40:09 2021 +0200

    IBM Z: Get rid of vec merge unspec

    This patch gets rid of the unspecs we were using for the vector merge
    instruction and replaces it with generic rtx.

    gcc/ChangeLog:

            * config/s390/s390-modes.def: Add more vector modes to support
            concatenation of two vectors.
            * config/s390/s390-protos.h (s390_expand_merge_perm_const): Add
            prototype.
            (s390_expand_merge): Likewise.
            * config/s390/s390.c (s390_expand_merge_perm_const): New functi=
on.
            (s390_expand_merge): New function.
            * config/s390/s390.md (UNSPEC_VEC_MERGEH, UNSPEC_VEC_MERGEL):
            Remove constant definitions.
            * config/s390/vector.md (V_HW_2): Add mode iterators.
            (VI_HW_4, V_HW_4): Rename VI_HW_4 to V_HW_4.
            (vec_2x_nelts, vec_2x_wide): New mode attributes.
            (*vmrhb, *vmrlb, *vmrhh, *vmrlh, *vmrhf, *vmrlf, *vmrhg, *vmrlg=
):
            New pattern definitions.
            (vec_widen_umult_lo_<mode>, vec_widen_umult_hi_<mode>)
            (vec_widen_smult_lo_<mode>, vec_widen_smult_hi_<mode>)
            (vec_unpacks_lo_v4sf, vec_unpacks_hi_v4sf, vec_unpacks_lo_v2df)
            (vec_unpacks_hi_v2df): Adjust expanders to emit non-unspec RTX =
for
            vec merge.
            * config/s390/vx-builtins.md (V_HW_4): Remove mode iterator. Now
            in vector.md.
            (vec_mergeh<mode>, vec_mergel<mode>): Use s390_expand_merge to
            emit vec merge pattern.

    gcc/testsuite/ChangeLog:

            * gcc.target/s390/vector/long-double-asm-in-out-hard-fp-reg.c:
            Instead of vpdi with 0 and 5 vmrlg and vmrhg are used now.
            * gcc.target/s390/vector/long-double-asm-inout-hard-fp-reg.c:
Likewise.
            * gcc.target/s390/zvector/vec-types.h: New test.
            * gcc.target/s390/zvector/vec_merge.c: New test.

commit 63834c84d43fc2eeeaa054c5e24d1e468e9eddab
Author: Jonathan Wright <jonathan.wright@arm.com>
Date:   Mon Jul 19 10:19:30 2021 +0100

    aarch64: Don't include vec_select high-half in SIMD multiply cost

    The Neon multiply/multiply-accumulate/multiply-subtract instructions
    can select the top or bottom half of the operand registers. This
    selection does not change the cost of the underlying instruction and
    this should be reflected by the RTL cost function.

    This patch adds RTL tree traversal in the Neon multiply cost function
    to match vec_select high-half of its operands. This traversal
    prevents the cost of the vec_select from being added into the cost of
    the multiply - meaning that these instructions can now be emitted in
    the combine pass as they are no longer deemed prohibitively
    expensive.

    gcc/ChangeLog:

    2021-07-19  Jonathan Wright  <jonathan.wright@arm.com>

            * config/aarch64/aarch64.c (aarch64_strip_extend_vec_half):
            Define.
            (aarch64_rtx_mult_cost): Traverse RTL tree to prevent cost of
            vec_select high-half from being added into Neon multiply
            cost.
            * rtlanal.c (vec_series_highpart_p): Define.
            * rtlanal.h (vec_series_highpart_p): Declare.

    gcc/testsuite/ChangeLog:

            * gcc.target/aarch64/vmul_high_cost.c: New test.

commit 1d65c9d25199264bc8909018df1b0dca71c0b32d
Author: Jonathan Wright <jonathan.wright@arm.com>
Date:   Mon Jul 19 14:01:52 2021 +0100

    aarch64: Don't include vec_select element in SIMD multiply cost

    The Neon multiply/multiply-accumulate/multiply-subtract instructions
    can take various forms - multiplying full vector registers of values
    or multiplying one vector by a single element of another. Regardless
    of the form used, these instructions have the same cost, and this
    should be reflected by the RTL cost function.

    This patch adds RTL tree traversal in the Neon multiply cost function
    to match the vec_select used by the lane-referencing forms of the
    instructions already mentioned. This traversal prevents the cost of
    the vec_select from being added into the cost of the multiply -
    meaning that these instructions can now be emitted in the combine
    pass as they are no longer deemed prohibitively expensive.

    gcc/ChangeLog:

    2021-07-19  Jonathan Wright  <jonathan.wright@arm.com>

            * config/aarch64/aarch64.c (aarch64_strip_duplicate_vec_elt):
            Define.
            (aarch64_rtx_mult_cost): Traverse RTL tree to prevent
            vec_select cost from being added into Neon multiply cost.

    gcc/testsuite/ChangeLog:

            * gcc.target/aarch64/vmul_element_cost.c: New test.

commit 5a1017dc305c49c59129d45536630d02dbc01c45
Author: Richard Sandiford <richard.sandiford@arm.com>
Date:   Wed Aug 4 16:52:09 2021 +0100

    vect: Tweak comparisons with existing epilogue loops

    This patch uses a more accurate scalar iteration estimate when
    comparing the epilogue of a constant-iteration loop with a candidate
    replacement epilogue.

    In the testcase, the patch prevents a 1-to-3-element SVE epilogue
    from seeming better than a 64-bit Advanced SIMD epilogue.

    gcc/
            * tree-vect-loop.c (vect_better_loop_vinfo_p): Detect cases in
            which old_loop_vinfo is an epilogue loop that handles a constant
            number of iterations.

    gcc/testsuite/
            * gcc.target/aarch64/sve/cost_model_12.c: New test.

commit 315a1c3756cbc751c4af0ce0da2157a88d7c3b09
Author: Richard Sandiford <richard.sandiford@arm.com>
Date:   Wed Aug 4 16:52:08 2021 +0100

    vect: Tweak dump messages for vector mode choice

    After vect_analyze_loop has successfully analysed a loop for
    one base vector mode B1, it considers using following base vector
    modes to vectorise an epilogue.  However, for VECT_COMPARE_COSTS,
    a later mode B2 might turn out to be better than B1 was.  Initially
    this comparison will be between an epilogue loop (for B2) and a main
    loop (for B1).  However, in r11-6458 I'd added code to reanalyse the
    B2 epilogue loop as a main loop, partly for correctness and partly
    for better costing.

    This can lead to a situation in which we think that the B2 epilogue
    loop was better than the B1 main loop, but that the B2 main loop is
    not better than the B1 main loop.  There was no dump message to say
    that this had happened, which made it look like B2 had still won.

    gcc/
            * tree-vect-loop.c (vect_analyze_loop): Print a dump message
            when a reanalyzed loop fails to be cheaper than the current
            main loop.

commit eb55b5b0df26e95c98ab59d34e69189d4f61bc0c
Author: Richard Sandiford <richard.sandiford@arm.com>
Date:   Wed Aug 4 16:52:07 2021 +0100

    aarch64: Fix a typo

    gcc/
            * config/aarch64/aarch64.c: Fix a typo.

commit 929f2cf4105ccf12d0684c6d5838f58f0ee5e7c7
Author: Vincent Lef=C3=A8vre <vincent-gcc@vinc17.net>
Date:   Wed Aug 4 17:25:52 2021 +0200

    gcov: check return code of a fclose

    gcc/ChangeLog:

            PR gcov-profile/101773
            * gcov-io.c (gcov_close): Check return code of a fclose.

commit 96c82a16b2076891a9974d0f0e96a0b85fbc2df4
Author: Bernd Edlinger <bernd.edlinger@hotmail.de>
Date:   Sat Jul 24 12:53:39 2021 +0200

    Fix debug info for ignored decls at start of assembly

    Ignored functions decls that are compiled at the start of
    the assembly have bogus line numbers until the first .file
    directive, as reported in PR101575.

    The corresponding binutils bug report is
    https://sourceware.org/bugzilla/show_bug.cgi?id=3D28149

    The work around for this issue is to emit a dummy .file
    directive before the first function is compiled, unless
    another .file directive was already emitted previously.

    2021-08-04  Bernd Edlinger  <bernd.edlinger@hotmail.de>

            PR ada/101575
            * dwarf2out.c (dwarf2out_assembly_start): Emit a dummy
            .file statement when needed.

commit 9fcb8ec60302f5f110f94a885b618993c28d18d3
Author: Tamar Christina <tamar.christina@arm.com>
Date:   Wed Aug 4 14:36:15 2021 +0100

    [testsuite] Fix trapping access in test PR101750

    I believe PR101750 to be a testism. Fix it by giving the class a name.

    gcc/testsuite/ChangeLog:

            PR tree-optimization/101750
            * g++.dg/vect/pr99149.cc: Name class.

commit 31855ba6b16cd138d7484076a08cd40d609654b8
Author: Richard Biener <rguenther@suse.de>
Date:   Thu Jul 29 14:14:48 2021 +0200

    Add emulated gather capability to the vectorizer

    This adds a gather vectorization capability to the vectorizer
    without target support by decomposing the offset vector, doing
    sclar loads and then building a vector from the result.  This
    is aimed mainly at cases where vectorizing the rest of the loop
    offsets the cost of vectorizing the gather.

    Note it's difficult to avoid vectorizing the offset load, but in
    some cases later passes can turn the vector load + extract into
    scalar loads, see the followup patch.

    On SPEC CPU 2017 510.parest_r this improves runtime from 250s
    to 219s on a Zen2 CPU which has its native gather instructions
    disabled (using those the runtime instead increases to 254s)
    using -Ofast -march=3Dznver2 [-flto].  It turns out the critical
    loops in this benchmark all perform gather operations.

    2021-07-30  Richard Biener  <rguenther@suse.de>

            * tree-vect-data-refs.c (vect_check_gather_scatter):
            Include widening conversions only when the result is
            still handed by native gather or the current offset
            size not already matches the data size.
            Also succeed analysis in case there's no native support,
            noted by a IFN_LAST ifn and a NULL decl.
            (vect_analyze_data_refs): Always consider gathers.
            * tree-vect-patterns.c (vect_recog_gather_scatter_pattern):
            Test for no IFN gather rather than decl gather.
            * tree-vect-stmts.c (vect_model_load_cost): Pass in the
            gather-scatter info and cost emulated gathers accordingly.
            (vect_truncate_gather_scatter_offset): Properly test for
            no IFN gather.
            (vect_use_strided_gather_scatters_p): Likewise.
            (get_load_store_type): Handle emulated gathers and its
            restrictions.
            (vectorizable_load): Likewise.  Emulate them by extracting
            scalar offsets, doing scalar loads and a vector construct.

            * gcc.target/i386/vect-gather-1.c: New testcase.
            * gfortran.dg/vect/vect-8.f90: Adjust.

commit f2e5d2717d9e249edc5e0d45e49e4f9ef81fc694
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Tue Aug 3 06:17:22 2021 -0700

    by_pieces: Pass MAX_PIECES to op_by_pieces_d

    Pass MAX_PIECES to op_by_pieces_d::op_by_pieces_d for move, store and
    compare.

            PR target/101742
            * expr.c (op_by_pieces_d::op_by_pieces_d): Add a max_pieces
            argument to set m_max_size.
            (move_by_pieces_d): Pass MOVE_MAX_PIECES to op_by_pieces_d.
            (store_by_pieces_d): Pass STORE_MAX_PIECES to op_by_pieces_d.
            (compare_by_pieces_d): Pass COMPARE_MAX_PIECES to op_by_pieces_=
d.

commit 96146e61cd7aee62c21c2845916ec42152918ab7
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Wed Aug 4 14:19:14 2021 +0100

    Fold (X<<C1)^(X<<C2) to a multiplication when possible.

    The easiest way to motivate these additions to match.pd is with the
    following example:

    unsigned int foo(unsigned char i) {
      return i | (i<<8) | (i<<16) | (i<<24);
    }

    which mainline with -O2 on x86_64 currently generates:
    foo:    movzbl  %dil, %edi
            movl    %edi, %eax
            movl    %edi, %edx
            sall    $8, %eax
            sall    $16, %edx
            orl     %edx, %eax
            orl     %edi, %eax
            sall    $24, %edi
            orl     %edi, %eax
            ret

    but with this patch now becomes:
    foo:    movzbl  %dil, %eax
            imull   $16843009, %eax, %eax
            ret

    Interestingly, this transformation is already applied when using
    addition, allowing synth_mult to select an optimal sequence, but
    not when using the equivalent bit-wise ior or xor operators.

    The solution is to use tree_nonzero_bits to check that the
    potentially non-zero bits of each operand don't overlap, which
    ensures that BIT_IOR_EXPR and BIT_XOR_EXPR produce the same
    results as PLUS_EXPR, which effectively generalizes the old
    fold_plusminus_mult_expr.  Technically, the transformation
    is to canonicalize (X*C1)|(X*C2) and (X*C1)^(X*C2) to
    X*(C1+C2) where X and X<<C are considered special cases.

    2021-08-04  Roger Sayle  <roger@nextmovesoftware.com>
                Marc Glisse  <marc.glisse@inria.fr>

    gcc/ChangeLog
            * match.pd (bit_ior, bit_xor): Canonicalize (X*C1)|(X*C2) and
            (X*C1)^(X*C2) as X*(C1+C2), and related variants, using
            tree_nonzero_bits to ensure that operands are bit-wise disjoint.

    gcc/testsuite/ChangeLog
            * gcc.dg/fold-ior-4.c: New test.

commit 0d04fe49239d91787850036599164788f1c87785
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Tue Aug 3 20:50:52 2021 +0100

    libstdc++: Add [[nodiscard]] to sequence containers

    ... and container adaptors.

    This adds the [[nodiscard]] attribute to functions with no side-effects
    for the sequence containers and their iterators, and the debug versions
    of those containers, and the container adaptors,

    Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

    libstdc++-v3/ChangeLog:

            * include/bits/forward_list.h: Add [[nodiscard]] to functions
            with no side-effects.
            * include/bits/stl_bvector.h: Likewise.
            * include/bits/stl_deque.h: Likewise.
            * include/bits/stl_list.h: Likewise.
            * include/bits/stl_queue.h: Likewise.
            * include/bits/stl_stack.h: Likewise.
            * include/bits/stl_vector.h: Likewise.
            * include/debug/deque: Likewise.
            * include/debug/forward_list: Likewise.
            * include/debug/list: Likewise.
            * include/debug/safe_iterator.h: Likewise.
            * include/debug/vector: Likewise.
            * include/std/array: Likewise.
            * testsuite/23_containers/array/creation/3_neg.cc: Use
            -Wno-unused-result.
            * testsuite/23_containers/array/debug/back1_neg.cc: Cast result
            to void.
            * testsuite/23_containers/array/debug/back2_neg.cc: Likewise.
            * testsuite/23_containers/array/debug/front1_neg.cc: Likewise.
            * testsuite/23_containers/array/debug/front2_neg.cc: Likewise.
            *
testsuite/23_containers/array/debug/square_brackets_operator1_neg.cc:
            Likewise.
            *
testsuite/23_containers/array/debug/square_brackets_operator2_neg.cc:
            Likewise.
            * testsuite/23_containers/array/tuple_interface/get_neg.cc:
            Adjust dg-error line numbers.
            * testsuite/23_containers/deque/cons/clear_allocator.cc: Cast
            result to void.
            * testsuite/23_containers/deque/debug/invalidation/4.cc:
            Likewise.
            * testsuite/23_containers/deque/types/1.cc: Use
            -Wno-unused-result.
            * testsuite/23_containers/list/types/1.cc: Cast result to void.
            * testsuite/23_containers/priority_queue/members/7161.cc:
            Likewise.
            * testsuite/23_containers/queue/members/7157.cc: Likewise.
            * testsuite/23_containers/vector/59829.cc: Likewise.
            * testsuite/23_containers/vector/ext_pointer/types/1.cc:
            Likewise.
            * testsuite/23_containers/vector/ext_pointer/types/2.cc:
            Likewise.
            * testsuite/23_containers/vector/types/1.cc: Use
            -Wno-unused-result.

commit 240b01b0215f9e46ecf04267c8a3faeb19d4fe3c
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Tue Aug 3 18:06:27 2021 +0100

    libstdc++: Add [[nodiscard]] to iterators and related utilities

    This adds [[nodiscard]] throughout <iterator>, as proposed by P2377R0
    (with some minor corrections).

    The attribute is added for all modes from C++11 up, using
    [[__nodiscard__]] or _GLIBCXX_NODISCARD where C++17 [[nodiscard]] can't
    be used directly.

    Signed-off-by: Jonathan Wakely <jwakely@redhat.com>

    libstdc++-v3/ChangeLog:

            * include/bits/iterator_concepts.h (iter_move): Add
            [[nodiscard]].
            * include/bits/range_access.h (begin, end, cbegin, cend)
            (rbegin, rend, crbegin, crend, size, data, ssize): Likewise.
            * include/bits/ranges_base.h (ranges::begin, ranges::end)
            (ranges::cbegin, ranges::cend, ranges::rbegin, ranges::rend)
            (ranges::crbegin, ranges::crend, ranges::size, ranges::ssize)
            (ranges::empty, ranges::data, ranges::cdata): Likewise.
            * include/bits/stl_iterator.h (reverse_iterator, __normal_itera=
tor)
            (back_insert_iterator, front_insert_iterator, insert_iterator)
            (move_iterator, move_sentinel, common_iterator)
            (counted_iterator): Likewise.
            * include/bits/stl_iterator_base_funcs.h (distance, next, prev):
            Likewise.
            * include/bits/stream_iterator.h (istream_iterator)
            (ostream_iterartor): Likewise.
            * include/bits/streambuf_iterator.h (istreambuf_iterator)
            (ostreambuf_iterator): Likewise.
            * include/std/ranges (views::single, views::iota, views::all)
            (views::filter, views::transform, views::take, views::take_whil=
e)
            (views::drop, views::drop_while, views::join, views::lazy_split)
            (views::split, views::counted, views::common, views::reverse)
            (views::elements): Likewise.
            * testsuite/20_util/rel_ops.cc: Use -Wno-unused-result.
            * testsuite/24_iterators/move_iterator/greedy_ops.cc: Likewise.
            * testsuite/24_iterators/normal_iterator/greedy_ops.cc:
            Likewise.
            * testsuite/24_iterators/reverse_iterator/2.cc: Likewise.
            * testsuite/24_iterators/reverse_iterator/greedy_ops.cc:
            Likewise.
            * testsuite/21_strings/basic_string/range_access/char/1.cc:
            Cast result to void.
            * testsuite/21_strings/basic_string/range_access/wchar_t/1.cc:
            Likewise.
            * testsuite/21_strings/basic_string_view/range_access/char/1.cc:
            Likewise.
            * testsuite/21_strings/basic_string_view/range_access/wchar_t/1=
.cc:
            Likewise.
            * testsuite/23_containers/array/range_access.cc: Likewise.
            * testsuite/23_containers/deque/range_access.cc: Likewise.
            * testsuite/23_containers/forward_list/range_access.cc:
            Likewise.
            * testsuite/23_containers/list/range_access.cc: Likewise.
            * testsuite/23_containers/map/range_access.cc: Likewise.
            * testsuite/23_containers/multimap/range_access.cc: Likewise.
            * testsuite/23_containers/multiset/range_access.cc: Likewise.
            * testsuite/23_containers/set/range_access.cc: Likewise.
            * testsuite/23_containers/unordered_map/range_access.cc:
            Likewise.
            * testsuite/23_containers/unordered_multimap/range_access.cc:
            Likewise.
            * testsuite/23_containers/unordered_multiset/range_access.cc:
            Likewise.
            * testsuite/23_containers/unordered_set/range_access.cc:
            Likewise.
            * testsuite/23_containers/vector/range_access.cc: Likewise.
            * testsuite/24_iterators/customization_points/iter_move.cc:
            Likewise.
            * testsuite/24_iterators/istream_iterator/sentinel.cc:
            Likewise.
            * testsuite/24_iterators/istreambuf_iterator/sentinel.cc:
            Likewise.
            * testsuite/24_iterators/move_iterator/dr2061.cc: Likewise.
            * testsuite/24_iterators/operations/prev_neg.cc: Likewise.
            * testsuite/24_iterators/ostreambuf_iterator/2.cc: Likewise.
            * testsuite/24_iterators/range_access/range_access.cc:
            Likewise.
            * testsuite/24_iterators/range_operations/100768.cc: Likewise.
            * testsuite/26_numerics/valarray/range_access2.cc: Likewise.
            * testsuite/28_regex/range_access.cc: Likewise.
            * testsuite/experimental/string_view/range_access/char/1.cc:
            Likewise.
            * testsuite/experimental/string_view/range_access/wchar_t/1.cc:
            Likewise.
            * testsuite/ext/vstring/range_access.cc: Likewise.
            * testsuite/std/ranges/adaptors/take.cc: Likewise.
            * testsuite/std/ranges/p2259.cc: Likewise.

commit 2724d1bba6b36451404811fba3244f8897717ef3
Author: Richard Biener <rguenther@suse.de>
Date:   Fri Jul 30 11:06:50 2021 +0200

    Rewrite more vector loads to scalar loads

    This teaches forwprop to rewrite more vector loads that are only
    used in BIT_FIELD_REFs as scalar loads.  This provides the
    remaining uplift to SPEC CPU 2017 510.parest_r on Zen 2 which
    has CPU gathers disabled.

    In particular vector load + vec_unpack + bit-field-ref is turned
    into (extending) scalar loads which avoids costly XMM/GPR
    transitions.  To not conflict with vector load + bit-field-ref
    + vector constructor matching to vector load + shuffle the
    extended transform is only done after vector lowering.

    2021-07-30  Richard Biener  <rguenther@suse.de>

            * tree-ssa-forwprop.c (pass_forwprop::execute): Split
            out code to decompose vector loads ...
            (optimize_vector_load): ... here.  Generalize it to
            handle intermediate widening and TARGET_MEM_REF loads
            and apply it to loads with a supported vector mode as well.

commit 87a0b607e40f8122c7fc45d496ef48799fe11550
Author: Richard Biener <rguenther@suse.de>
Date:   Wed Aug 4 11:42:41 2021 +0200

    tree-optimization/101756 - avoid vectorizing boolean MAX reductions

    The following avoids vectorizing MIN/MAX reductions on bools which,
    when ending up as vector(2) <signed-boolean:64> would need to be
    adjusted because of the sign change.  The fix instead avoids any
    reduction vectorization where the result isn't compatible
    to the original scalar type since we don't compensate for that
    either.

    2021-08-04  Richard Biener  <rguenther@suse.de>

            PR tree-optimization/101756
            * tree-vect-slp.c (vectorizable_bb_reduc_epilogue): Make sure
            the result of the reduction epilogue is compatible to the origi=
nal
            scalar result.

            * gcc.dg/vect/bb-slp-pr101756.c: New testcase.

commit af31cab04770f7a1a1da069415ab62ca2ef54fc4
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Wed Aug 4 11:53:48 2021 +0200

    c++: Fix up #pragma omp declare {simd,variant} and acc routine parsing

    When parsing default arguments, we need to temporarily clear
parser->omp_declare_simd
    and parser->oacc_routine, otherwise it can clash with further declarati=
ons
    inside of e.g. lambdas inside of those default arguments.

    2021-08-04  Jakub Jelinek  <jakub@redhat.com>

            PR c++/101759
            * parser.c (cp_parser_default_argument): Temporarily override
            parser->omp_declare_simd and parser->oacc_routine to NULL.

            * g++.dg/gomp/pr101759.C: New test.
            * g++.dg/goacc/pr101759.C: New test.

commit 8aa14fa7d98b4d641de9c3ea8d0fa094e0a0ec76
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Wed Aug 4 11:42:59 2021 +0200

    testsuite: Fix duplicated content of gcc.c-torture/execute/ieee/pr29302=
-1.x

    The file has two identical halves, seems like twice applied patch.

    2021-08-04  Jakub Jelinek  <jakub@redhat.com>

            * gcc.c-torture/execute/ieee/pr29302-1.x: Undo doubly applied
patch.

commit 9f26640f7b89c771b0ebffd7e7f5019d0709a955
Author: liuhongt <hongtao.liu@intel.com>
Date:   Wed Aug 4 10:50:28 2021 +0800

    Refine predicate of peephole2 to general_reg_operand. [PR target/101743]

    The define_peephole2 which is added by r12-2640-gf7bf03cf69ccb7dc
    should only work on general registers, considering that x86 also
    supports mov instructions between gpr, sse reg, mask reg, limiting the
    peephole2 predicate to general_reg_operand.

    gcc/ChangeLog:

            PR target/101743
            * config/i386/i386.md (peephole2): Refine predicate from
            register_operand to general_reg_operand.

commit 7195fa03e7b8dfaff85d122da3b75f0a30ce95f8
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Wed Aug 4 11:40:52 2021 +0200

    libgcc: Fix duplicated content of config/t-slibgcc-fuchsia

    The file has two identical halves, seems like twice applied patch.

    2021-08-04  Jakub Jelinek  <jakub@redhat.com>

            * config/t-slibgcc-fuchsia: Undo doubly applied patch.

commit 9db0bcd9fdc2e3a659d56435cb18d553f4292edb
Author: Aldy Hernandez <aldyh@redhat.com>
Date:   Wed Aug 4 10:55:12 2021 +0200

    Mark path_range_query::dump as override.

    gcc/ChangeLog:

            * gimple-range-path.h (path_range_query::dump): Mark override.

commit 4d562591018a51f155a2e5d8b9f3e5860111a327
Author: Richard Biener <rguenther@suse.de>
Date:   Wed Aug 4 09:22:51 2021 +0200

    tree-optimization/101769 - tail recursion creates possibly infinite loop

    This makes tail recursion optimization produce a loop structure
    manually rather than relying on loop fixup.  That also allows the
    loop to be marked as finite (it would eventually blow the stack
    if it were not).

    2021-08-04  Richard Biener  <rguenther@suse.de>

            PR tree-optimization/101769
            * tree-tailcall.c (eliminate_tail_call): Add the created loop
            for the first recursion and return it via the new output parame=
ter.
            (optimize_tail_call): Pass through new output param.
            (tree_optimize_tail_calls_1): After creating all latches,
            add the created loop to the loop tree.  Do not mark loops for
fixup.

            * g++.dg/tree-ssa/pr101769.C: New testcase.

commit 5c73b94fdc46f03c761ee5c66e30e00a2bf9ee91
Author: Martin Liska <mliska@suse.cz>
Date:   Wed Aug 4 09:48:05 2021 +0200

    docs: document threader-mode param

    gcc/ChangeLog:

            * doc/invoke.texi: Document threader-mode param.

commit 3ae1468e260bf1f8e8c8637133263010213b6ac9
Author: liuhongt <hongtao.liu@intel.com>
Date:   Wed Aug 4 13:20:56 2021 +0800

    Add dg-require-effective-target for testcases.

    gcc/testsuite/ChangeLog:

            * gcc.target/i386/cond_op_addsubmul_d-2.c: Add
            dg-require-effective-target for avx512.
            * gcc.target/i386/cond_op_addsubmul_q-2.c: Ditto.
            * gcc.target/i386/cond_op_addsubmul_w-2.c: Ditto.
            * gcc.target/i386/cond_op_addsubmuldiv_double-2.c: Ditto.
            * gcc.target/i386/cond_op_addsubmuldiv_float-2.c: Ditto.
            * gcc.target/i386/cond_op_fma_double-2.c: Ditto.
            * gcc.target/i386/cond_op_fma_float-2.c: Ditto.

commit 2fc2e3917f9c8fd94f5d101477971d16c483ef88
Author: liuhongt <hongtao.liu@intel.com>
Date:   Wed Aug 4 11:41:37 2021 +0800

    Support cond_{fma,fms,fnma,fnms} for vector float/double under AVX512.

    gcc/ChangeLog:

            * config/i386/sse.md (cond_fma<mode>): New expander.
            (cond_fms<mode>): Ditto.
            (cond_fnma<mode>): Ditto.
            (cond_fnms<mode>): Ditto.

    gcc/testsuite/ChangeLog:

            * gcc.target/i386/cond_op_fma_double-1.c: New test.
            * gcc.target/i386/cond_op_fma_double-2.c: New test.
            * gcc.target/i386/cond_op_fma_float-1.c: New test.
            * gcc.target/i386/cond_op_fma_float-2.c: New test.=