From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 22932 invoked by alias); 19 Jun 2017 18:01:37 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 22886 invoked by uid 89); 19 Jun 2017 18:01:36 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.8 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW,SPF_PASS,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy= X-Spam-User: qpsmtpd, 2 recipients X-HELO: cc-smtpout3.netcologne.de Received: from cc-smtpout3.netcologne.de (HELO cc-smtpout3.netcologne.de) (89.1.8.213) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 19 Jun 2017 18:01:34 +0000 Received: from cc-smtpin1.netcologne.de (cc-smtpin1.netcologne.de [89.1.8.201]) by cc-smtpout3.netcologne.de (Postfix) with ESMTP id B65911270D; Mon, 19 Jun 2017 20:01:35 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by cc-smtpin1.netcologne.de (Postfix) with ESMTP id B1E0D11DF9; Mon, 19 Jun 2017 20:01:35 +0200 (CEST) Received: from [78.35.152.73] (helo=cc-smtpin1.netcologne.de) by localhost with ESMTP (eXpurgate 4.1.9) (envelope-from ) id 5948117f-021e-7f0000012729-7f00000198f9-1 for ; Mon, 19 Jun 2017 20:01:35 +0200 Received: from [192.168.178.20] (xdsl-78-35-152-73.netcologne.de [78.35.152.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by cc-smtpin1.netcologne.de (Postfix) with ESMTPSA; Mon, 19 Jun 2017 20:01:33 +0200 (CEST) Subject: Re: [patch, libfortran] Speed up cshift for dim > 1 To: =?UTF-8?Q?Dominique_d'Humi=c3=a8res?= Cc: gfortran , gcc-patches References: From: Thomas Koenig Message-ID: Date: Mon, 19 Jun 2017 18:01:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.1.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-SW-Source: 2017-06/txt/msg01359.txt.bz2 Hi Dominique, > For the record, the following CSHIFT is still 4 times slower than the DO loop I have looked into this a bit. The main reason is that, unlike cshift0 (without the array as shift) we do not generate individual functions to call for the usual data types, we use memcpy with a size determined at run-time by looking at the array descriptor. This is, of course, quite slow. So, the solution should probably be to generate functions like cshift1_4_i4 and then call them. This would generate a bit of bloat, but if people use this in a serios way, I think this is OK. This was already done for cshift0 a few years ago. What we have there looks like (intrinsics/cshift0.c) type_size = GFC_DTYPE_TYPE_SIZE (array); switch(type_size) { case GFC_DTYPE_LOGICAL_1: case GFC_DTYPE_INTEGER_1: case GFC_DTYPE_DERIVED_1: cshift0_i1 ((gfc_array_i1 *)ret, (gfc_array_i1 *) array, shift, which); return; case GFC_DTYPE_LOGICAL_2: case GFC_DTYPE_INTEGER_2: cshift0_i2 ((gfc_array_i2 *)ret, (gfc_array_i2 *) array, shift, which); return; so this is something that we could also emulate. A bit of work, but nothing that looks un-doable. Regards Thomas