如何使用 PHP 查找字符串中字符的序列模式?

2024-05-14

假设我有随机的文本块:

EAMoAAQAABwEBAAAAAAAAAAAAAAABAgMFBgcIBAkBAQABBQEBAAAAAAAAAAAAAAAGAgMEBQcBCBAAAQMDAgMEBQcIBQgGCwEAAQACAxEEBSEGMRIHQVFhE3GBIhQIkaGxwTJCI9FScoKSojMV8GLCUxbhstKDo7M0ZHOTJEQlF/HiQ2PDVHSExEUmGBEBAAIBAgMDCAgCCgMBAQEAAAECAxEEITEFQRIGUWFxgZGhIhPwscHRMlIUB0Jy4fGCkqLCI1MVFrLSQ2IzF//aAAwDAQACEQMRAD8A7+QEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEEDwXkzpxHgusxi7NrnXF3G0NBLhzAkAeAqVH934r6bt57uTPSJ8ne1n2Rqycezy35VlRttwYu5DXNlLOcczOdpHM3hUUqtLs/wBxulZonXJ8vjp8caa+eOa5k6flrPLVcIbm3n/gytf4NcCVKtj1XbbqNcOSuT+W0W+pi3x2rzjRWWxUCAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAggV5It2Uy8GNYAWmW6kr5MDftO8T3BRXxR4s2/SccTb48lvw0jnPnn8tfP6o1Ze02ds08OERzlid+/P5Orp5BHEeFuxxa0Dxpx9a+fOu+Iup9Tmfm30p+Ss92vr/N6bat/t67fDyjWfLLG79pt45YpAA8NdUAg9ngolTFNbedtqWi0avVicv5bLKFr2kSRltHaahrXCnylZcd6k208rDy4ItxlkUr5+XnZE1zxq0h3KfUQqv1GWsxeI0tHKY1rPtjRgVivKZU7HebrS491ybX+TWnO7V7PEn7w+f0rpPhb9zdxt7Rj3szkx/n/AI6+n88f4vTyebno8Wr3qTGvun7mawSxzsbNC4Pje0Oa9pqCD2grv+3z0zUi9Ji1bRrEx2wjtqzWdJ5wqq8pEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQU

规格:

patternABC >= 2 characters = groupABC IF groupABC occurs more than once
groupABC + (groupABC)n = sequence WHERE n >= 1 AND sequence > 6 characters

** 序列需要大于 6 个字符才能进行评估

分解:

如何找到按顺序出现的重复模式?

QEBAQEBAQEBAQEBAQEBAQEBA

我还想计算每组重复的次数:

QEBA QEBA QEBA QEBA QEBA QEBA = 6

此外,序列必须大于 6 个字符才能进行计算:

NO GOOD: AA AA AA
GOOD: AA AA AA AA

如果输出可以存储在关联数组中并删除重复条目,那就太理想了:

QEBA => 6, AA => 4, QEBA => 3, AA => 8, (QEBA => 6)<- REMOVE

有人有时间和意愿来解决这个问题吗? 如果你这样做,你就会摇滚!


$str = 'EAMoAAQAABwEBAAAAAAAAAAAAAAABAgMFBgcIBAkBAQABBQEBAAAAAAAAAAAAAAAGAgMEBQcBCBAAAQMDAgMEBQcIBQgGCwEAAQACAxEEBSEGMRIHQVFhE3GBIhQIkaGxwTJCI9FScoKSojMV8GLCUxbhstKDo7M0ZHOTJEQlF/HiQ2PDVHSExEUmGBEBAAIBAgMDCAgCCgMBAQEAAAECAxEEITEFQRIGUWFxgZGhIhPwscHRMlIUB0Jy4fGCkqLCI1MVFrLSQ2IzF//aAAwDAQACEQMRAD8A7+QEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEEDwXkzpxHgusxi7NrnXF3G0NBLhzAkAeAqVH934r6bt57uTPSJ8ne1n2Rqycezy35VlRttwYu5DXNlLOcczOdpHM3hUUqtLs/wBxulZonXJ8vjp8caa+eOa5k6flrPLVcIbm3n/gytf4NcCVKtj1XbbqNcOSuT+W0W+pi3x2rzjRWWxUCAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAggV5It2Uy8GNYAWmW6kr5MDftO8T3BRXxR4s2/SccTb48lvw0jnPnn8tfP6o1Ze02ds08OERzlid+/P5Orp5BHEeFuxxa0Dxpx9a+fOu+Iup9Tmfm30p+Ss92vr/N6bat/t67fDyjWfLLG79pt45YpAA8NdUAg9ngolTFNbedtqWi0avVicv5bLKFr2kSRltHaahrXCnylZcd6k208rDy4ItxlkUr5+XnZE1zxq0h3KfUQqv1GWsxeI0tHKY1rPtjRgVivKZU7HebrS491ybX+TWnO7V7PEn7w+f0rpPhb9zdxt7Rj3szkx/n/AI6+n88f4vTyebno8Wr3qTGvun7mawSxzsbNC4Pje0Oa9pqCD2grv+3z0zUi9Ji1bRrEx2wjtqzWdJ5wqq8pEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQU';

preg_match_all( '/(\S{2,}?)\1+/', $str, $matches );

// Remove duplicates
$matches[0] = array_unique( $matches[0] ); 

foreach ( $matches[0] as $key => $value ) {
    if ( strlen( $value ) > 6 ) {
        $repeated = $matches[1][$key];
        $results[] = array( $repeated => count( explode( $repeated, $value ) ) - 1 );
    }    
}

print_r($results); 

/*
[AA] => 7
[QEBA] => 93
[CAgI] => 18
[EBAQ] => 18
*/

上面假设序列由非空格字符组成。

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

如何使用 PHP 查找字符串中字符的序列模式? 的相关文章

随机推荐