本文分享自天翼雲開發者社區《探究Openresty中ngx.re與Lua string.re兩種正則的選擇》.作者:王****淋
0. 背景
openresty中存在2套正則API,即ngx.re與 lua語言的string庫,都可以實現正則匹配查找等功能,那麼,這2個API有什麼區別,又如何選擇呢?
1. 性能測試
1.1 簡單loop測試
a) 短字符串&正則串
local http_range = 'bytes=10-65535'
local string_re_p = '^bytes=([%d]*)%-([%d]*)$'
local ngx_re_p = '^bytes=([\\d]*)?\\-([\\d]*)$'
local loop = 1000000
local t0 = get_t()
for i = 1, loop do
local _, _ = string_match(http_range, string_re_p)
end
local t1 = get_t()
for i = 1, loop do
local m, err = ngx_re_match(http_range, ngx_re_p, "jo")
end
local t2 = get_t()
Result: 0.247 vs. 0.32
b) 長字符串&複雜正則串
local http_range = 'dsfds65465fwef bytes=12345757860-4465458586465 ewfsd65sd4fg65fsd'
local string_re_p = '.*bytes=([%d]*)%-([%d]*) .+'
local ngx_re_p = '.*bytes=([\\d]*)?\\-([\\d]*) .+'
local loop = 1000000
Result: 1.16 vs. 0.526
由測試結果可以看出,對於字符串/正則規則越複雜,ngx-re的性能是有優勢的
1.2. 加入jit擾動
a) 對照組:ipairs不破壞jit (短串正則)
local http_range = 'bytes=10-65535'
local string_re_p = '^bytes=([%d]*)%-([%d]*)$'
local ngx_re_p = '^bytes=([\\d]*)?\\-([\\d]*)$'
local loop = 1000000
local t0 = get_t()
for i = 1, loop do
for k, v in ipairs({1,2}) do end
local _, _ = string_match(http_range, string_re_p)
end
local t1 = get_t()
for i = 1, loop do
for k, v in ipairs({1,2}) do end
local m, err = ngx_re_match(http_range, ngx_re_p, "jo")
end
local t2 = get_t()
jit-on: 0.369 - 0.326
jit-off: 0.38 - 3.265
b) pairs 破壞jit (短串正則)
local http_range = 'bytes=10-65535'
local string_re_p = '^bytes=([%d]*)%-([%d]*)$'
local ngx_re_p = '^bytes=([\\d]*)?\\-([\\d]*)$'
local loop = 1000000
local t0 = get_t()
for i = 1, loop do
for k, v in pairs({a=1,b=2}) do end
local _, _ = string_match(http_range, string_re_p)
end
local t1 = get_t()
for i = 1, loop do
for k, v in pairs({a=1,b=2}) do end
local m, err = ngx_re_match(http_range, ngx_re_p, "jo")
end
local t2 = get_t()
jit-off: 0.395 - 3.216
jit-on: 0.394 - 1.04
c) pairs + 長複雜串
local http_range = 'dsfds65465fwef bytes=12345757860-4465458586465 ewfsd65sd4fg65fsd'
local string_re_p = '.*bytes=([%d]*)%-([%d]*) .+'
local ngx_re_p = '.*bytes=([\\d]*)?\\-([\\d]*) .+'
local loop = 1000000
jit-on: 1.31 - 1.30
jit-off: 1.307 - 2.94
超長串 + jit-on:
local http_range = 'dsfds6546vsdvsdfdsfsdfsdfwaasdasdasdas5fwef bytes=12354345345345757860-4465453453453453453453453458586465 ewfsd65safdknsalk;nlkasdnflksdajfhkldashjnfkl;ashfgjklahfg;jlsasd4fg65fsd'
結果: 2.775 - 1.739
1.3測試結果彙總
| string.match | ngx.re.match | 備註 | |
|---|---|---|---|
| 短串正則 | 0.247 秒 | 0.32 秒 | jit-hit |
| 短串正則 帶ipirs | 0.369 | 0.326 | jit-hit |
| 短串正則 帶pairs | 0.394 | 1.04 | |
| 長串正則 帶pairs | 2.775 | 1.739 | |
| 短串正則 帶pairs+jit-off | 0.395 | 3.216 | jit-off |
| 短串正則 帶ipairs+jit-off | 0.38 | 3.265 | jit-off |
2. 結論
由測試結果可知:
1)在一般情況下,nginx-re正則庫更能適應複雜字符串與複雜正則規則的情況,一般情況下比較推薦使用
2)在極簡單字符串的情況下,二者差別不大,string正則稍帶優勢,可以按照方便的寫法來寫;
3)nginx-re正則受JIT的影響更大,在關閉jit或使用pairs等情況下,可能會有拖累;