如果你需要非常快的 IO,那么你必须比平常做更多的技巧。
-module(g).
-export([s/0]).
s()->
P = open_port({fd, 0, 1}, [in, binary, {line, 256}]),
r(P, 0, 0),
halt().
r(P, C, L) ->
receive
{P, {data, {eol, <<$>:8, _/binary>>}}} ->
r(P, C+1, L);
{P, {data, {eol, Line}}} ->
r(P, C, L + size(Line));
{'EXIT', P, normal} ->
io:format("~p~n",[L/C])
end.
据我所知,这是最快的 IO,但请注意-noshell -noinput
。
编译就像erlc +native +"{hipe, [o3]}" g.erl
但与-smp disable
erl -smp disable -noinput -mode minimal -boot start_clean -s erl_compile compile_cmdline @cwd /home/hynek/Download @option native @option '{hipe, [o3]}' @files g.erl
and run:
time erl -smp disable -noshell -mode minimal -boot start_clean -noinput -s g s < uniprot_sprot.fasta
352.6697028442464
real 0m3.241s
user 0m3.060s
sys 0m0.124s
With -smp enable
但本机需要:
$ erlc +native +"{hipe, [o3]}" g.erl
$ time erl -noshell -mode minimal -boot start_clean -noinput -s g s<uniprot_sprot.fasta
352.6697028442464
real 0m5.103s
user 0m4.944s
sys 0m0.112s
字节码但与-smp disable
(几乎与本地相同,因为大部分工作都是在端口中完成的!):
$ erlc g.erl
$ time erl -smp disable -noshell -mode minimal -boot start_clean -noinput -s g s<uniprot_sprot.fasta
352.6697028442464
real 0m3.565s
user 0m3.436s
sys 0m0.104s
只是为了 smp 字节码的完整性:
$ time erl -noshell -mode minimal -boot start_clean -noinput -s g s<uniprot_sprot.fasta
352.6697028442464
real 0m5.433s
user 0m5.236s
sys 0m0.128s
用于比较sarnold https://stackoverflow.com/users/377270/sarnold version https://stackoverflow.com/questions/3296855/average-length-of-the-sequences-in-a-fasta-file-can-you-improve-this-erlang-co/3296952#3296952给了我错误的答案,并且在相同的硬件上需要更多:
$ erl -smp disable -noinput -mode minimal -boot start_clean -s erl_compile compile_cmdline @cwd /home/hynek/Download @option native @option '{hipe, [o3]}' @files golf.erl
./golf.erl:5: Warning: variable 'Rest' is unused
$ time erl -smp disable -noshell -mode minimal -s golf test
359.04679841439776
real 0m17.569s
user 0m16.749s
sys 0m0.664s
EDIT: 我看过的特点uniprot_sprot.fasta
我有点惊讶。它有 3824397 行和 232MB。代表着-smp disabled
版本可以每秒处理 118 万行文本(面向行 IO 为 71MB/s)。