IF:
- 你有GNU
grep
- AND the hex bytes you search for NEVER contain newlines (
0xa
)[1]
- 如果它们包含 NUL (
0x
),您必须提供grep
通过文件搜索字符串 (-f
)而不是直接论证。
以下命令将帮助您到达那里,使用搜索的示例0e 8b 02
:
LC_ALL=C find . -type f -not -name "*.png" -exec grep -FHoab $'\x{0e}\x{8b}\x{02}' {} + |
LC_ALL=C cut -d: -f1-2
The grep
命令产生如下输出行:
<filename>:<byte-offset>:<matched-bytes>
which LC_ALL=C cut -d: -f1-2
然后减少到<filename>:<byte-offset>
命令almost与BSD grep
,除了报告的字节偏移量始终是start模式匹配的行的位置。
换句话说:仅当文件中的匹配项之前没有换行符时,字节偏移量才是正确的.
另外,BSDgrep
不支持指定 NUL (0x0
) 字节作为搜索字符串的一部分,即使通过文件提供-f
.
- 请注意,会有no并行处理,但只有一个few
grep
调用,基于使用find
's -exec ... +
,其中,就像xargs
,将命令行中适合的尽可能多的文件名传递给grep
立刻。
- By letting
grep
search for the byte sequence directly, there is no need for xxd
:
- The sequence is specified as an ANSI C-quoted string http://www.gnu.org/software/bash/manual/bash.html#ANSI_002dC-Quoting, which means that the escape sequences are expanded to literals by the shell, enabling Grep to then search for the resulting string as a literal (via
-F
), which is faster.
The linked article is from the bash
manual, but they work in zsh
(and ksh
) too.
- GNU Grep 的替代方案是使用
-P
(支持 PRCE、Perl 兼容的正则表达式)带有非预扩展的转义序列,但这将是slower: grep -PHoab '\x{0e}\x{8b}\x{02}'
-
LC_ALL=C
确保grep
对待每一个byte作为其自己的字符,而不应用任何编码规则。
-
-F
将搜索字符串视为文字(而不是正则表达式)
-
-H
将相关输入文件名添加到每个输出行;请注意,当给出超过 1 个文件名参数时,Grep 会隐式执行此操作
-
-o
only report matched strings (byte sequences), not the whole line (the concept of a line has no meaning in binary files anyway)[2]
-
-a
将二进制文件视为文本文件(如果没有这个,Grep 只会打印文本Binary file <filename> matches
对于具有匹配的二进制输入文件)
-
-b
报告匹配的字节偏移量
如果最多能找到1匹配给定的输入文件,添加-m 1
.
[1] Newlines cannot be used, because Grep invariably treats newlines in a search-pattern string as separating multiple search patterns. Also, Grep is line-based, so you can't match across lines; GNU Grep's -null-data
option to split the input by NUL bytes could help, but only if your search byte sequence doesn't also comprise NUL bytes; you'd also have to represent your byte values as escape sequences in a regex combined with -P
- because you'll need to use escape sequence \n
in lieu of actual newlines.
[2] -o
is needed to make -b
report the byte offset of the match as opposed to that of the beginning of the line (as stated, BSD Grep always does the latter, unfortunately); additionally, it is beneficial to only report the matches themselves here, as an attempt to print the entire line would result in unpredictably long output lines, given that there's no concept of lines in binary files; either way, however, outputting bytes from a binary file may cause strange rendering behavior in the terminal.