articles:numpy_vs_geo [RoboLab.Suehiro]

この文書は読取専用です。文書のソースを閲覧することは可能ですが、変更はできません。もし変更したい場合は管理者に連絡してください。
====== numpyとgeo.pyの速度比較 ======
===== – numpy はリストと比べて速いのか –  =====


===== 目的 =====

ベクトル，回転行列，座標変換行列といった三次元幾何演算をpythonのリストをベースに[[upload_files:geo.py|geo.py]]というモジュールを自作している．リストベースの処理よりnumpyを使ったほうが高速なのではないかとの疑念もあるので比較を行う． \\
geo.pyはもともとpython2で開発されたが，単純な構造なのでpython3でも問題なく動く．

このテストは以下の条件で行った．
  * ProBook 474s
  * メモリ：8 GB
  * CPU：Core™ i5-3230M
  * OS: Ubuntu 20.04
  * jupyter notebook，python3

このipynb自身は，[[./numpy_vs_geo.ipynb|numpy_vs_geo.ipynb]]となっている．

===== モジュールの読み込み =====

geo.py は自作モジュール

<code python>
from geo import *
import time
import numpy as np
import pandas as pd
</code>
===== データの生成 =====

geo.pyには三次元ベクトル：VECTOR と三次元回転行列：MATRIXのクラスが定義されている． MATRIXのa, bはそれぞれx軸，y軸周りの回転を指定する．

<code python>
g_v1=VECTOR(1,2,3)
g_v2=VECTOR(3,4,5)
g_R1=MATRIX(a=pi/3)
g_R2=MATRIX(b=pi/6)
</code>
この内容はこうなる．

<code python>
print('g_v1 =', g_v1)
print('g_v2 =', g_v2)
print('g_R1 =', g_R1)
print('g_R2 =', g_R2)
</code>
<code>
g_v1 = v:[1.0, 2.0, 3.0]
g_v2 = v:[3.0, 4.0, 5.0]
g_R1 = m:[[1.0, 0.0, 0.0], [0.0, 0.5000000000000001, -0.8660254037844386], [0.0, 0.8660254037844386, 0.5000000000000001]]
g_R2 = m:[[0.8660254037844387, 0.0, 0.49999999999999994], [0.0, 1.0, 0.0], [-0.49999999999999994, 0.0, 0.8660254037844387]]
</code>
同様にnumpyのデータをndarrayで作る．

<code python>
np_v1=np.array(g_v1)
np_v2=np.array(g_v2)
np_R1=np.array(g_R1)
np_R2=np.array(g_R2)
</code>
この内容はこうなる．

<code python>
print('np_v1 =', np_v1)
print('np_v2 =', np_v2)
print('np_R1 =', np_R1)
print('np_R2 =', np_R2)
</code>
<code>
np_v1 = [1. 2. 3.]
np_v2 = [3. 4. 5.]
np_R1 = [[ 1.         0.         0.       ]
 [ 0.         0.5       -0.8660254]
 [ 0.         0.8660254  0.5      ]]
np_R2 = [[ 0.8660254  0.         0.5      ]
 [ 0.         1.         0.       ]
 [-0.5        0.         0.8660254]]
</code>
計測結果を入れる辞書の作成

===== 時間計測の関数と結果保存データ =====

<code python>
def test(n,fn):
    i=0
    start=time.time()
    while i< n :
        fn()
        i += 1
    end = time.time()
    rslt=end-start
    return rslt
</code>
<code python>
data = []
</code>
<code python>
def judge(test_name, g_time, np_time) :
    if g_time < np_time :
        judgment = "geo.pyの勝ち"
    elif g_time > np_time :
        judgment = "npの勝ち"
    else :
        judgment = "引き分け"
    return test_name, g_time, np_time, judgment
</code>
===== ループ回数の決定 =====

<code python>
test(100, lambda : g_v1+g_v2)
</code>
<code>
0.00030612945556640625
</code>
<code python>
test(1000, lambda : g_v1+g_v2)
</code>
<code>
0.0015869140625
</code>
<code python>
test(10000, lambda : g_v1+g_v2)
</code>
<code>
0.024413347244262695
</code>
<code python>
test(100000, lambda : g_v1+g_v2)
</code>
<code>
0.1378471851348877
</code>
<code python>
test(1000000, lambda : g_v1+g_v2)
</code>
<code>
1.200444221496582
</code>
<code python>
test(10000000, lambda : g_v1+g_v2)
</code>
<code>
11.367036819458008
</code>
<code python>
test(100000000, lambda : g_v1+g_v2)
</code>
<code>
112.89321899414062
</code>
百万回ぐらいでループ前後のオーバーヘッドの影響が少なくなってきている． まだ多少影響はあるが，一千万，一億は時間がかかるし， どうせループ内の処理の影響は消せないので百万回に決定する．

<code python>
N=1000000
</code>
===== ベクトルの和　npの勝ち =====

<code python>
g_v1+g_v2
</code>
<code>
v:[4.0, 6.0, 8.0]
</code>
<code python>
np_v1+np_v2
</code>
<code>
array([4., 6., 8.])
</code>
<code python>
g_time = test(N, lambda : g_v1+g_v2)
print(g_time)
</code>
<code>
1.143357515335083
</code>
<code python>
np_time = test(N, lambda : np_v1+np_v2)
print(np_time)
</code>
<code>
0.6900453567504883
</code>
<code python>
data.append(judge('ベクトルの和', g_time, np_time))
</code>
===== ベクトルの内積　geo.pyの勝ち =====

<code python>
g_v1.dot(g_v2)
</code>
<code>
26.0
</code>
<code python>
np.dot(np_v1,np_v2)
</code>
<code>
26.0
</code>
<code python>
g_time = test(N, lambda : g_v1.dot(g_v2))
print(g_time)
</code>
<code>
0.6457569599151611
</code>
<code python>
np_time = test(N, lambda : np.dot(np_v1,np_v2))
print(np_time)
</code>
<code>
1.7959060668945312
</code>
<code python>
data.append(judge('ベクトルの内積', g_time, np_time))
</code>
===== ベクトルの外積　geo.pyの圧勝 =====

というか np がひどすぎる

<code python>
g_v1*g_v2
</code>
<code>
v:[-2.0, 4.0, -2.0]
</code>
<code python>
np.cross(np_v1,np_v2)
</code>
<code>
array([-2.,  4., -2.])
</code>
<code python>
g_time = test(N, lambda : g_v1*g_v2)
print(g_time)
</code>
<code>
1.6717863082885742
</code>
<code python>
np_time = test(N, lambda : np.cross(np_v1,np_v2))
print(np_time)
</code>
<code>
54.95365524291992
</code>
<code python>
data.append(judge('ベクトルの外積', g_time, np_time))
</code>
===== 行列とベクトルの積　npの勝ち =====

<code python>
g_R1*g_v1
</code>
<code>
v:[1.0, -1.5980762113533158, 3.2320508075688776]
</code>
<code python>
np.dot(np_R1, np_v1)
</code>
<code>
array([ 1.        , -1.59807621,  3.23205081])
</code>
<code python>
g_time = test(N, lambda : g_R1*g_v1)
print(g_time)
</code>
<code>
2.362830638885498
</code>
<code python>
np_time = test(N, lambda : np.dot(np_R1,np_v1))
print(np_time)
</code>
<code>
1.848921537399292
</code>
<code python>
data.append(judge('行列とベクトルの積', g_time, np_time))
</code>
===== 行列同士の積　npの勝ち =====

<code python>
g_R1*g_R2
</code>
<code>
m:[[0.8660254037844387, 0.0, 0.49999999999999994], [0.43301270189221924, 0.5000000000000001, -0.75], [-0.25, 0.8660254037844386, 0.43301270189221946]]
</code>
<code python>
np.dot(np_R1,np_R2)
</code>
<code>
array([[ 0.8660254,  0.       ,  0.5      ],
       [ 0.4330127,  0.5      , -0.75     ],
       [-0.25     ,  0.8660254,  0.4330127]])
</code>
<code python>
g_time = test(1000000, lambda : g_R1*g_R2)
print(g_time)
</code>
<code>
5.8199920654296875
</code>
<code python>
np_time = test(1000000, lambda : np.dot(np_R1,np_R2))
print(np_time)
</code>
<code>
2.266876697540283
</code>
<code python>
data.append(judge('行列同士の積', g_time, np_time))
</code>
===== 結論 =====

まとめの表

<code python>
df = pd.DataFrame(data, columns=["項目", "geo.py", "np", "結果"])
df
</code>


<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th>
</th>
<th>

項目

</th>
<th>

geo.py

</th>
<th>

np

</th>
<th>

結果

</th>
</tr>
</thead>
<tbody>
<tr>
<th>

0

</th>
<td>

ベクトルの和

</td>
<td>

1.143358

</td>
<td>

0.690045

</td>
<td>

npの勝ち

</td>
</tr>
<tr>
<th>


</th>
<td>

ベクトルの内積

</td>
<td>

0.645757

</td>
<td>

1.795906

</td>
<td>

geo.pyの勝ち

</td>
</tr>
<tr>
<th>

2

</th>
<td>

ベクトルの外積
<HTML>
</td>
<td>

1.671786

</td>
<td>

54.953655

</td>
<td>

geo.pyの勝ち

</td>
</tr>
<tr>
<th>

3

</th>
<td>

行列とベクトルの積

</td>
<td>

2.362831

</td>
<td>

1.848922

</td>
<td>

npの勝ち

</td>
</tr>
<tr>
<th>

4

</th>
<td>

行列同士の積

</td>
<td>

5.819992

</td>
<td>

2.266877

</td>
<td>

npの勝ち

</td>
</tr>
</tbody>
</table>
</div>
</HTML>

結論から言うと，意外にに差がないということが分かる． 大きなサイズのデータを扱うときはnumpyが良いのだろうが，三次元のベクトルや行列では大きな差は出ない．

それ以上に大きな驚きはnumpyのベクトルの外積の遅さであった．

ロボットのプログラムで使うときは個々の要素へのアクセスも多くあるので，なおさら差が出にくく現状のgeo.pyで十分であると考えられる．

<table tab_label>
<caption>まとめの表</caption>
^ 項目 ^ geo.py ^ np ^ 結果 ^
| foo    | bar    |
</table>