From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR04-DB3-obe.outbound.protection.outlook.com (mail-db3eur04on2063.outbound.protection.outlook.com [40.107.6.63]) by sourceware.org (Postfix) with ESMTPS id 06EDD3858C53 for ; Wed, 14 Jun 2023 09:32:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 06EDD3858C53 Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.com ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=LldsgtVD+FFVFIgBlwDRfY7dKd5n9N9uiadXNSBRk2rS4ioDLsggf+sUVBSZlqbkbX6EDDcsEgmCjfSLkYItUcDJ+Taj/+fPBfGlYTDt9hNxkx3cM56Z8DYpn4TMUDb1KVr0Gibc6NDX8isxTIhh6o+YK4gYwV8ZCOwCcPvU3vX0QjAytFQ4PP0cDV9LE9cb3r59+9clC+XDcJ91O9lpo+i747be0lMLQv1jfXcsYqg5IvNfxG+GGWgMcL8j9OxbiOSy95IEquYxMDe0SkKft5jNewPz80czfbVi1yzjPh7bh7jwHUV49chNwJXN4qeerCgi5V6A85+yUWg17h13TA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=yYyvj/M9/f3bVjuZpa+DYsyAIlR2bQKLHqJOWwsIziI=; b=jRr1wDul2LhkDizFi2y50Xtp/ZnqCZ7n9G9qV+CkMQD+J7X8q1Tr2NKgbQXDZhN7JDHjrhrXC/mNls8BqF3N5CFkDS48YQqZEYMlilSEJ84UVMG2r+8CC+KUEYusvKqcKR4mwXpExuuaswAUNyDzz9kuyrUGGDcOWhp9KwZIsLPYVvkZdGNMeGRBjxcWibz0XuRDWNntG7KdVK77gUtDN35U+44ck9toeVmz+3UhR58q6viMxyqtmMVlWeQpN+2umnQnWVS4KFmVJyORqsmNGIzKxKfMKq/qHeIK1gEn4ZaOPl9QXyoW3PZsAZvpFB/NXPrtiu8RObuhkBCyff7gwA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=yYyvj/M9/f3bVjuZpa+DYsyAIlR2bQKLHqJOWwsIziI=; b=zpUmuZk+hdW6pke2JCFSz8NQUZPWXHF/exiH2WbgNLGUuWdn2d566My00e7onrVzgljlT3eCikuHcXUfOQ0VAStJNIMJyGy2EXVplBItuOet2pZzyr144g6urGNycvkJppGU025Sda/xIgFLc3axG0wh8AXVy3ImVGBpVWSv83d1wfOHiHcUqD5mevCxZYSf3DXVTDalttGzJuVD8WdXcTysE+3d9hot8RKS7dI/+h1A+dTaYcvXaw6q4iGeilppuoUXO4JVrA3wNLZkBdabApxO6USujFcZhRgFLF59pbVU9dFv8DTN0Q1+IDGdcVLNr20kuV97PjJ3VCZlPKfjyA== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com; Received: from VE1PR04MB6560.eurprd04.prod.outlook.com (2603:10a6:803:122::25) by AS8PR04MB8930.eurprd04.prod.outlook.com (2603:10a6:20b:42d::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6455.44; Wed, 14 Jun 2023 09:32:27 +0000 Received: from VE1PR04MB6560.eurprd04.prod.outlook.com ([fe80::e442:306f:7711:e24c]) by VE1PR04MB6560.eurprd04.prod.outlook.com ([fe80::e442:306f:7711:e24c%5]) with mapi id 15.20.6455.039; Wed, 14 Jun 2023 09:32:26 +0000 Message-ID: Date: Wed, 14 Jun 2023 11:32:25 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 Subject: Re: [PATCH] x86: make VPTERNLOG* usable on less than 512-bit operands with just AVX512F Content-Language: en-US To: Hongtao Liu Cc: "gcc-patches@gcc.gnu.org" , Kirill Yukhin , Hongtao Liu References: <68c1aa7d-0a7b-1427-55f8-edc6302f00dc@suse.com> From: Jan Beulich In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-ClientProxiedBy: FR3P281CA0024.DEUP281.PROD.OUTLOOK.COM (2603:10a6:d10:1c::15) To VE1PR04MB6560.eurprd04.prod.outlook.com (2603:10a6:803:122::25) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: VE1PR04MB6560:EE_|AS8PR04MB8930:EE_ X-MS-Office365-Filtering-Correlation-Id: ed603c12-bdeb-416f-7c73-08db6cba4439 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: lHyhlCKAXdvJvepvxa/qrxrzF8icJnZPbvBrIeofFJx+toQvrrsxJcQVzGrUPR7VlLvbJF1uwsDUWCxyOgiXV5luJ4VnyTQUQOvm0/nDkJNC3MmeXw9ttN4W5MoCGFu6eKOwCvyE7CuFCBIsBhx4DLa++BUzKgnZCSchH3/Kj9O3CxY/LuiKAtRMSHG8JKurL3u6dkOMUElN6W5Ok/tz35OThqogbsFl037xCtxjsVukNseSGRsUlRFrQZv2X+aOZvJBOXbQEO/9pfOTUg2wVkf4exT1JofNTqnJq+APAU+EDq7HSjiVEuKurseMRVShRrti1RLXh5XnxvjTVJjr6qbVkVn+lTspGOHAA5WiHva8i2CAYfsX/675ZIbeQoKfqiodOc9eidvF0vESV8YkWVQw5PML0+kQyHsIdK//KUroCMVu47SdQmaGcgw/9NyHQtyGZsXC7cXubHTial1Z6qcfYv4NjbB4wPM7dWelWnb3l8o2cY2AVXq0KJrU/4sSdMo0YEkFNATwEc3X6K7lcjHRfeIILpVh7p/6J0rTNaKFdWgNWhwPlxbMgbQycuiplSg3JwTB6gev/gLkQYmJQx2x9RIL7ZFn7T+6L1ATa3xzs5t+qM3wlIl2rOz92N8aI+h6yrTfv8mq6rj8SrBR+Q== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:VE1PR04MB6560.eurprd04.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230028)(39860400002)(376002)(366004)(136003)(346002)(396003)(451199021)(31686004)(83380400001)(66556008)(4326008)(41300700001)(36756003)(38100700002)(316002)(66946007)(66476007)(6512007)(53546011)(26005)(6506007)(6486002)(6916009)(478600001)(2906002)(86362001)(5660300002)(8936002)(8676002)(31696002)(54906003)(186003)(2616005)(45980500001)(43740500002);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?VG14N2p3bEFLZ1VVSmYrY0JpRXVjTlB1R29lMm8rSVhoSzZMZ1pGVUxYQm9D?= =?utf-8?B?Z3l5WWFSc1FWSngrVjVVcVR3UzdNQ2JKV3hVNzBTam9MVyszWWFkc2lGSWNE?= =?utf-8?B?RjZBNTBtdFdUQmlsSEhNR0wrNlZUVTJpNDVOWWhGOTJEQXpxNUlsdjVRbm1Z?= =?utf-8?B?OFVMMjRHQWgxd1hzeWxLdWFVUVVYM1RaUHdPZitZRGQ5Nlk1dGNaUnlLYVUy?= =?utf-8?B?cnkyUGVNQTNETTlIS29ya2JueER4WGw3SXN2R0dGMlR2c2VpeWpqYjNCTGll?= =?utf-8?B?N0xlUVQxMmRldkRHdTRhNC9OTUFBZFBGbDdoYUtBbG9mNmIwemhRektoOTlI?= =?utf-8?B?dGJsL0dtUXBwc09yT3JvZjMrZi9nUnRnTGlEbUFuSVdub3NKcEswZ3pUZHFj?= =?utf-8?B?MkVRSkVWdWdBRHpLL2RWMStzWGRaT2cycnI1V3VwZVdrd0kzMjU0YkFIWmVN?= =?utf-8?B?QmhFQlNUV3JVT2ZrampEeTNCSzc2ZEtuOHFJV1ZjTVlCdS9TR2ZTcG9ianRu?= =?utf-8?B?VThTM0w1aXpsc3N2L0NDdStOd2xvNzl0OWxNNnZWVVNQSW4ybkhoQVI5cmNM?= =?utf-8?B?N2p5ZzUydDdHZ1QvMUlzSkg4TFI5aFpOaDhPb2RLYVBLUUthbGZsamdRZXRp?= =?utf-8?B?aHBpSk5LazVYWnhieFM3SE9leEhGa0g0SGVjYk5TakVxdDhvbU4yVXFqdlN0?= =?utf-8?B?U29pMUgzNGRmcFNENDZTN2N6eHErVkhlSllqaEpuZHc5L0xJQ3dFeGpMTCtI?= =?utf-8?B?UUxCSGlxaWhQa3lLN1FmYUYrVUtUaDl6ckJ5cy9hM01mTkF4WE9CUHV3UzVD?= =?utf-8?B?OUc0TEloaXdSVzRISW95dHMwZGdGazVaSHUwY3loR0liaXRyTDBEYXNHOGJ0?= =?utf-8?B?MFk1akFzdG9uVk9ncERYaHRJbi9mbEpYSmRaMlNWWU84dVQyUVhxd1JHa01o?= =?utf-8?B?bjFwcHFNWHJtY2xwRFM2emhFMjVBcEc1U1FqQ3RKN2FuejFwbmxXdWRWeGli?= =?utf-8?B?dlB5ZHlkRDRpMStDZnk4cHY0cEVGUStuMXowSy8wMlIzcEpvaml3UmNNQUht?= =?utf-8?B?bUJVRVhBeG1DdHdMcGdtVnlkc0ltUWtUU3FJaFQ5NFpPTkVLVGp6U3U2ODZW?= =?utf-8?B?NlJrbnlKVnRqMVNoNzByM3R0L2V0OVlQd2MwSW1uVVM2ZFZlQ0Z5V1N5d0FP?= =?utf-8?B?OFRWb0FlTTFJK1AyN2JWZUNqeHppVmhlMUpRczA2VDlyeDZOQ0VlL1liZHRQ?= =?utf-8?B?MTVUbFJsNFhZZmVKWTRhUmFsSGtQUXhFempKQUJGSUpFbkZqTVNVeDRIb2hC?= =?utf-8?B?ZEwxTG5aZVZjVGZvc2RPY25GMjFkOFlnd05rRXR5ZndwelQyY2g0MmNTS3Yr?= =?utf-8?B?UXg1TWlpY0trWVJvZUdBcVRCY29kYXdJR25jK1l6VGVyUU53SEFLODBJdXZt?= =?utf-8?B?ZkFRbWlYRU1tU2RzWFhrbjYxUXZ0RFp4N1lOLzlZejJhYXBicHFHMG1TR0Yw?= =?utf-8?B?bkZrSmViTFpQcGUvYWF1QnhJUG0vejNCSmtSZmlKMUtLVFRmVzJJbXQ2MWEw?= =?utf-8?B?bnBNRjJUMHJOZU1HMkNXQjJSemYvaWdxMStxc0dQNG5iVFJTVUlBWFNKREkz?= =?utf-8?B?Vmw5UkRoRXNrZTdwRC9SNldjVXAvRlk5VTQ4YTM0WGsxTGRDb2I4ZXlZcGQ1?= =?utf-8?B?QWRZK3RBL09mUFRSTG0rSDczNWJUcUwybHFEd0FWdWtVSUNRR1VRU0tGV01L?= =?utf-8?B?Tzc5Y0JFSlFmR3FUemM4ZUtPS1ZDMjZ5TGs0bFBPbCttVkhiTllxd0VONTEw?= =?utf-8?B?LzBGdE9oN2ttVnhaaFJOVHRSd3o0OFhwaEZhbHo2bUNuMXRRcUd4bGcwT1J4?= =?utf-8?B?MnNVRGRDZkQ0Yk9hOWk3WlpNVUxOYmtpb2RUU3FrVlM5bmhkY1F3bFUyVWx1?= =?utf-8?B?US9WK1NISkNKVk1mQllHNURhSXo4NWg3QTFDNjAzeGtPYXlPVHNXSVp5Y09M?= =?utf-8?B?Mmh0MStqUU5kYTlpUjlZNVNjZXgxMi9LZjNIdGtVRDlXZmJxQkZENk90R2xk?= =?utf-8?B?ZlI4OFZ4WHRBd01Ld1FTM0NSSDJCd1h5dms2WHhJUDNCUEV3alpxWkVGSkRk?= =?utf-8?Q?yp58VtAkuUrvfDvKFZ50j7sJO?= X-OriginatorOrg: suse.com X-MS-Exchange-CrossTenant-Network-Message-Id: ed603c12-bdeb-416f-7c73-08db6cba4439 X-MS-Exchange-CrossTenant-AuthSource: VE1PR04MB6560.eurprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Jun 2023 09:32:26.8566 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: f7a17af6-1c5c-4a36-aa8b-f5be247aa4ba X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: jPaP1AclO1v7nB7N0T171gjGtLXB3kguGAJW8QQDxwrzpkRmFVOZn0Z+LiNBMCXiZuhSs9636oKlr3qx7srqgw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR04MB8930 X-Spam-Status: No, score=-3027.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 14.06.2023 10:10, Hongtao Liu wrote: > On Wed, Jun 14, 2023 at 1:59 PM Jan Beulich via Gcc-patches > wrote: >> >> There's no reason to constrain this to AVX512VL, as the wider operation >> is not usable for more narrow operands only when the possible memory > But this may require more resources (on AMD znver4 processor a zmm > instruction will also be split into 2 uops, right?) And on some intel > processors(SKX/CLX) there will be frequency reduction. I'm afraid I don't follow: Largely the same AVX512 code would be generated when passing -mavx512vl, so how can power/performance considerations matter here? All I'm doing here (and in a few more patches I'm still in the process of testing) is relax when AVX512 insns can actually be used (reducing the copying between registers and/or the number of insns needed). My understanding on the Intel side is that it only matters whether AVX512 insns are used, not what vector length they are. You may be right about znver4, though. Nevertheless I agree ... > If it needs to be done, it is better guarded with > !TARGET_PREFER_AVX256, at least when micro-architecture AVX256_OPTIMAL > or users explicitly uses -mprefer-vector-width=256, we don't want to > produce any zmm instruction for surprise.(Although > -mprefer-vector-width=256 is supposed for auto-vectorizer, but backend > codegen also use it under such cases, i.e. in *movsf_internal > alternative 5 use zmm only TARGET_AVX512F && !TARGET_PREFER_AVX256.) ... that respecting such overrides is probably desirable, so I'll adjust. Jan >> source is a non-broadcast one. This way even the scalar copysign3 >> can benefit from the operation being a single-insn one (leaving aside >> moves which the compiler decides to insert for unclear reasons, and >> leaving aside the fact that bcst_mem_operand() is too restrictive for >> broadcast to be embedded right into VPTERNLOG*). >> >> Along with this also request value duplication in >> ix86_expand_copysign()'s call to ix86_build_signbit_mask(), eliminating >> excess space allocation in .rodata.*, filled with zeros which are never >> read. >> >> gcc/ >> >> * config/i386/i386-expand.cc (ix86_expand_copysign): Request >> value duplication by ix86_build_signbit_mask() when AVX512F and >> not HFmode. >> * config/i386/sse.md (*_vternlog_all): Convert to >> 2-alternative form. Adjust "mode" attribute. Add "enabled" >> attribute. >> (*_vpternlog_1): Relax to just TARGET_AVX512F. >> (*_vpternlog_2): Likewise. >> (*_vpternlog_3): Likewise.