From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 10801 invoked by alias); 15 Nov 2016 07:23:00 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 10773 invoked by uid 89); 15 Nov 2016 07:23:00 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-4.2 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,RCVD_IN_DNSWL_LOW,RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=HTo:U*jvdelisle X-Spam-User: qpsmtpd, 2 recipients X-HELO: cc-smtpout2.netcologne.de Received: from cc-smtpout2.netcologne.de (HELO cc-smtpout2.netcologne.de) (89.1.8.212) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 15 Nov 2016 07:22:58 +0000 Received: from cc-smtpin2.netcologne.de (cc-smtpin2.netcologne.de [89.1.8.202]) by cc-smtpout2.netcologne.de (Postfix) with ESMTP id B6E7112861; Tue, 15 Nov 2016 08:22:53 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by cc-smtpin2.netcologne.de (Postfix) with ESMTP id B12CB11DFB; Tue, 15 Nov 2016 08:22:53 +0100 (CET) Received: from [78.35.170.112] (helo=cc-smtpin2.netcologne.de) by localhost with ESMTP (eXpurgate 4.1.9) (envelope-from ) id 582ab7cd-022c-7f0000012729-7f000001a953-1 for ; Tue, 15 Nov 2016 08:22:53 +0100 Received: from [192.168.178.20] (xdsl-78-35-170-112.netcologne.de [78.35.170.112]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by cc-smtpin2.netcologne.de (Postfix) with ESMTPSA; Tue, 15 Nov 2016 08:22:51 +0100 (CET) Subject: Re: [patch,libgfortran] PR51119 - MATMUL slow for large matrices To: Jerry DeLisle , "fortran@gcc.gnu.org" References: <2aad89ce-02e1-45bf-0bdc-d318e7995595@charter.net> <49101da5-e1cc-0817-7cb6-64cfe5778e60@netcologne.de> <63f1ffa4-02b5-11c9-b64d-86336733d4b5@charter.net> Cc: GCC Patches From: Thomas Koenig Message-ID: <3f3beeb0-21a9-3587-cb84-e22749ef691f@netcologne.de> Date: Tue, 15 Nov 2016 07:23:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <63f1ffa4-02b5-11c9-b64d-86336733d4b5@charter.net> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-SW-Source: 2016-11/txt/msg01402.txt.bz2 Hi Jerry, > With these changes, OK for trunk? Just going over this with a fine comb... One thing just struck me: The loop variables should be index_type, so const index_type m = xcount, n = ycount, k = count; [...] index_type a_dim1, a_offset, b_dim1, b_offset, c_dim1, c_offset, i1, i2, i3, i4, i5, i6; /* Local variables */ GFC_REAL_4 t1[65536], /* was [256][256] */ f11, f12, f21, f22, f31, f32, f41, f42, f13, f14, f23, f24, f33, f34, f43, f44; index_type i, j, l, ii, jj, ll; index_type isec, jsec, lsec, uisec, ujsec, ulsec; I agree that we should do the tuning of the inline limit separately. When we do that, we should think about -Os. With the buffering, we have much more memory usage in the library function. If -Os is in force, we should also consider raising the limit for inlining. Since I was involved in the development, I would like to give others a few days to raise more comments. If there are none, OK to commit with the above change within a few days. Of course, somebody else might also OK this patch :-) Regards Thomas