希望你能帮我做点什么。感谢@mklement0,我得到了一个很棒的脚本,它匹配按字母顺序排列的单词的最基本的初始模式。然而,缺少的是全文搜索和选择。
当前脚本的示例,其中包含 Words.txt 文件中几个单词的小样本:
App
Apple
Apply
Sword
Swords
Word
Words
Becomes:
App
Sword
Word
这很棒,因为它确实缩小到每行一个基本模式!然而,逐行运行的结果仍然有一个可以进一步缩小范围的模式,即“Word”(大写不重要),所以理想情况下输出应该是:
App
Word
“剑”被删除,因为它属于以“词”为前缀的更基本模式。
您对如何实现这一目标有什么建议吗?请记住,这将是大约 25 万个单词的字典列表,因此我不会提前知道我要查找什么
代码(来自相关帖子 https://stackoverflow.com/a/61305240/45375, 手柄prefix仅匹配):
$outFile = [IO.File]::CreateText("C:\Temp\Results.txt") # Output File Location
$prefix = '' # initialize the prefix pattern
foreach ($line in [IO.File]::ReadLines('C:\Temp\Words.txt')) # Input File name.
{
if ($line -like $prefix)
{
continue # same prefix, skip
}
$line # Visual output of new unique prefix
$prefix = "$line*" # Saves new prefix pattern
$outFile.writeline($line) # Output file write to configured location
}
您可以尝试分两步进行的方法:
你必须看看性能是否足够好;为了获得最佳性能,尽可能直接使用 .NET 类型。
# Read the input file and build the list of unique prefixes, assuming
# alphabetical sorting.
$inFilePath = 'C:\Temp\Words.txt' # Be sure to use a full path.
$uniquePrefixWords =
foreach ($word in [IO.File]::ReadLines($inFilePath)) {
if ($word -like $prefix) { continue }
$word
$prefix = "$word*"
}
# Sort the prefixes by length in ascending order (shorter ones first).
# Note: This is a more time- and space-efficient alternative to:
# $uniquePrefixWords = $uniquePrefixWords | Sort-Object -Property Length
[Array]::Sort($uniquePrefixWords.ForEach('Length'), $uniquePrefixWords)
# Build the result lists of unique shortest words with the help of a regex.
# Skip later - and therefore longer - words, if they are already represented
# in the result list of word by a substring.
$regexUniqueWords = ''; $first = $true
foreach ($word in $uniquePrefixWords) {
if ($first) { # first word
$regexUniqueWords = $word
$first = $false
} elseif ($word -notmatch $regexUniqueWords) {
# New unique word found: add it to the regex as an alternation (|)
$regexUniqueWords += '|' + $word
}
}
# The regex now contains all unique words, separated by "|".
# Split it into an array of individual words, sort the array again...
$resultWords = $regexUniqueWords.Split('|')
[Array]::Sort($resultWords)
# ... and write it to the output file.
$outFilePath = 'C:\Temp\Results.txt' # Be sure to use a full path.
[IO.File]::WriteAllLines($outFilePath, $resultWords)
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)