
Sat Sep 12 09:43:36 EDT 2015
numactl --interleave=all ../testing/testing_cgetrf -N 123 -N 1234 --range 10:90:10 --range 100:900:100 --range 1000:9000:1000 --range 10000:20000:2000 --lapack
% MAGMA 1.7.0  compiled for CUDA capability >= 3.5, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 7000, driver 7000. OpenMP threads 16. MKL 11.2.2, MKL threads 16. 
% device 0: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
% device 1: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
% device 2: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
% Sat Sep 12 09:43:43 2015
% Usage: ../testing/testing_cgetrf [options] [-h|--help]

% ngpu 1
%   M     N   CPU GFlop/s (sec)   GPU GFlop/s (sec)   |PA-LU|/(N*|A|)
%========================================================================
  123   123      3.49 (   0.00)      3.04 (   0.00)     ---   
 1234  1234    181.68 (   0.03)    266.37 (   0.02)     ---   
   10    10      0.33 (   0.00)      0.52 (   0.00)     ---   
   20    20      0.84 (   0.00)      1.32 (   0.00)     ---   
   30    30      1.98 (   0.00)      2.96 (   0.00)     ---   
   40    40      3.68 (   0.00)      4.96 (   0.00)     ---   
   50    50      5.09 (   0.00)      6.49 (   0.00)     ---   
   60    60      5.73 (   0.00)      7.06 (   0.00)     ---   
   70    70      6.50 (   0.00)      1.19 (   0.00)     ---   
   80    80      9.32 (   0.00)      1.78 (   0.00)     ---   
   90    90     10.70 (   0.00)      2.39 (   0.00)     ---   
  100   100     12.90 (   0.00)      3.06 (   0.00)     ---   
  200   200     36.53 (   0.00)     12.83 (   0.00)     ---   
  300   300     64.79 (   0.00)     29.22 (   0.00)     ---   
  400   400     87.99 (   0.00)     47.08 (   0.00)     ---   
  500   500    113.26 (   0.00)     69.03 (   0.00)     ---   
  600   600    127.67 (   0.00)     92.11 (   0.01)     ---   
  700   700    139.67 (   0.01)    116.17 (   0.01)     ---   
  800   800    155.26 (   0.01)    143.91 (   0.01)     ---   
  900   900    151.48 (   0.01)    170.46 (   0.01)     ---   
 1000  1000    165.10 (   0.02)    198.58 (   0.01)     ---   
 2000  2000    215.26 (   0.10)    514.58 (   0.04)     ---   
 3000  3000    272.83 (   0.26)    866.05 (   0.08)     ---   
 4000  4000    480.88 (   0.35)   1122.86 (   0.15)     ---   
 5000  5000    505.32 (   0.66)   1348.73 (   0.25)     ---   
 6000  6000    433.04 (   1.33)   1580.14 (   0.36)     ---   
 7000  7000    528.25 (   1.73)   1744.78 (   0.52)     ---   
 8000  8000    503.81 (   2.71)   1852.65 (   0.74)     ---   
 9000  9000    378.47 (   5.14)   1984.11 (   0.98)     ---   
10000 10000    540.14 (   4.94)   2113.90 (   1.26)     ---   
12000 12000    460.35 (  10.01)   2265.73 (   2.03)     ---   
14000 14000    463.89 (  15.77)   2349.43 (   3.11)     ---   
16000 16000    524.40 (  20.83)   2422.50 (   4.51)     ---   
18000 18000    538.98 (  28.85)   2489.29 (   6.25)     ---   
20000 20000    577.74 (  36.92)   2541.17 (   8.39)     ---   
Sat Sep 12 09:48:11 EDT 2015

Sat Sep 12 09:48:11 EDT 2015
numactl --interleave=all ../testing/testing_cgetrf_gpu -N 123 -N 1234 --range 10:90:10 --range 100:900:100 --range 1000:9000:1000 --range 10000:20000:2000
% MAGMA 1.7.0  compiled for CUDA capability >= 3.5, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 7000, driver 7000. OpenMP threads 16. MKL 11.2.2, MKL threads 16. 
% device 0: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
% device 1: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
% device 2: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
% Sat Sep 12 09:48:17 2015
% Usage: ../testing/testing_cgetrf_gpu [options] [-h|--help]

%   M     N   CPU GFlop/s (sec)   GPU GFlop/s (sec)   |PA-LU|/(N*|A|)
%========================================================================
  123   123     ---   (  ---  )      1.81 (   0.00)     ---  
 1234  1234     ---   (  ---  )    246.34 (   0.02)     ---  
   10    10     ---   (  ---  )      0.05 (   0.00)     ---  
   20    20     ---   (  ---  )      0.29 (   0.00)     ---  
   30    30     ---   (  ---  )      0.77 (   0.00)     ---  
   40    40     ---   (  ---  )      1.48 (   0.00)     ---  
   50    50     ---   (  ---  )      2.21 (   0.00)     ---  
   60    60     ---   (  ---  )      2.89 (   0.00)     ---  
   70    70     ---   (  ---  )      0.56 (   0.00)     ---  
   80    80     ---   (  ---  )      0.85 (   0.00)     ---  
   90    90     ---   (  ---  )      1.17 (   0.00)     ---  
  100   100     ---   (  ---  )      1.57 (   0.00)     ---  
  200   200     ---   (  ---  )      7.36 (   0.00)     ---  
  300   300     ---   (  ---  )     19.19 (   0.00)     ---  
  400   400     ---   (  ---  )     34.66 (   0.00)     ---  
  500   500     ---   (  ---  )     55.00 (   0.01)     ---  
  600   600     ---   (  ---  )     74.26 (   0.01)     ---  
  700   700     ---   (  ---  )     98.66 (   0.01)     ---  
  800   800     ---   (  ---  )    124.94 (   0.01)     ---  
  900   900     ---   (  ---  )    150.72 (   0.01)     ---  
 1000  1000     ---   (  ---  )    185.45 (   0.01)     ---  
 2000  2000     ---   (  ---  )    487.35 (   0.04)     ---  
 3000  3000     ---   (  ---  )    833.69 (   0.09)     ---  
 4000  4000     ---   (  ---  )   1174.27 (   0.15)     ---  
 5000  5000     ---   (  ---  )   1393.16 (   0.24)     ---  
 6000  6000     ---   (  ---  )   1694.11 (   0.34)     ---  
 7000  7000     ---   (  ---  )   1911.63 (   0.48)     ---  
 8000  8000     ---   (  ---  )   2078.50 (   0.66)     ---  
 9000  9000     ---   (  ---  )   2093.74 (   0.93)     ---  
10000 10000     ---   (  ---  )   2236.40 (   1.19)     ---  
12000 12000     ---   (  ---  )   2404.34 (   1.92)     ---  
14000 14000     ---   (  ---  )   2539.97 (   2.88)     ---  
16000 16000     ---   (  ---  )   2621.75 (   4.17)     ---  
18000 18000     ---   (  ---  )   2674.37 (   5.82)     ---  
20000 20000     ---   (  ---  )   2725.37 (   7.83)     ---  
Sat Sep 12 09:49:45 EDT 2015
