如何在Java中使用regex删除重复/重复的单词(包括连续和非连续的)?
Hello to everyone hello in this world world \\ how do I convert this into
Hello to everyone in this world \\into this
我确实找到了一个正则表达式,它可以找到不连续的重复单词:
regex: (?s)(\b\w+\b)(?=.*\b\1\b)
那么,我如何使用这个正则表达式来删除重复的单词(并且只保留第一次出现的重复单词)?
发布于 2020-11-19 16:07:20
尝试:
String text = "Hello to everyone hello in this world world \\ how do I convert this into";
Pattern p = Pattern.compile("(?i)(\\b\\w+\\b)(.*?) \\b\\1\\b");
Matcher m = p.matcher(text);
while (m.find()) {
text = m.replaceAll("$1$2");
m = p.matcher(text);
}
发布于 2020-11-19 16:48:36
这里是使用流的非正则表达式方法,假设单词由空格分隔
String original = "Hello to everyone hello in this world world";
Set<String> set = new HashSet<>();
String modified = Arrays.stream(original.split(" ")).filter(s -> set.add(s.toLowerCase())).collect(Collectors.joining(" "));
发布于 2020-11-19 16:58:27
这是另一种选择,您可以使用两种不同的模式应用replaceAll
两次。我可能遗漏了一些微妙之处,但这适用于提供的字符串。
String str =
"how do do I remove how repeated words from this words sentence.";
String nonc = "(?i)(\\S+)(.*)(\\1(\\s|$))";
String conc = "(?i)(\\S+\\s)(\\1)";
str = str.replaceAll(nonc,"$1$2").replaceAll(conc, "$1");
System.out.println(str);
打印
how do I remove repeated words from this sentence.
https://stackoverflow.com/questions/64915185
复制相似问题