在进行字符串替换而不考虑单词边界之前,请参阅这篇“clbuttic”(或针对您的情况的 cl[Censored]ic)文章:
http://www.codinghorror.com/blog/2008/10/obscenity-filters-bad-idea-or-incredible-intercoursing-bad-idea.html
Update
显然不是万无一失的(参见上面的文章 - 这种方法很容易绕过或产生误报......)或优化(正则表达式应该被缓存和编译),但以下将过滤掉整个单词(没有“clbuttics”) ) 和单词的简单复数形式:
const string CensoredText = "[Censored]";
const string PatternTemplate = @"\b({0})(s?)\b";
const RegexOptions Options = RegexOptions.IgnoreCase;
string[] badWords = new[] { "cranberrying", "chuffing", "ass" };
IEnumerable<Regex> badWordMatchers = badWords.
Select(x => new Regex(string.Format(PatternTemplate, x), Options));
string input = "I've had no cranberrying sleep for chuffing chuffings days -
the next door neighbour is playing classical music at full tilt!";
string output = badWordMatchers.
Aggregate(input, (current, matcher) => matcher.Replace(current, CensoredText));
Console.WriteLine(output);
给出输出:
我已经[审查][审查]几天没有[审查]睡觉——隔壁邻居正在全速播放古典音乐!
请注意,“classical”不会变成“cl[Censored]ical”,因为整个单词都与正则表达式匹配。
Update 2
为了演示如何轻松地破坏这一点(以及一般的基本字符串\模式匹配技术),请参阅以下字符串:
“我已经好几天没睡过觉了——隔壁邻居正在全速演奏古典音乐!”
我已将“i”替换为土耳其小写不带点的“ı”。看起来还是蛮有攻击性的!