Home >Java >javaTutorial >How Can I Efficiently Remove Diacritical Marks from Unicode Strings in Java?
Diacritical marks, such as tildes, circumflexes, carets, umlauts, and carons, can alter the pronunciation of characters. To facilitate efficient search and comparison, it may be necessary to remove these marks and obtain the "simple" counterpart of the characters.
Utilizing Unicode normalization and regular expressions, the following Java implementation effectively removes diacritical marks:
import java.text.Normalizer; import java.util.regex.Pattern; public class DiacriticStripper { private static final Pattern DIACRITICS_PATTERN = Pattern.compile("[\p{InCombiningDiacriticalMarks}\p{IsLm}\p{IsSk}\u0591-\u05C7]+"); public static String stripDiacritics(String input) { String normalizedInput = Normalizer.normalize(input, Normalizer.Form.NFD); return DIACRITICS_PATTERN.matcher(normalizedInput).replaceAll(""); } }
For instance, the input string "ńǹňñṅņṇṋṉ̈ɲƞᶇɳȵ" would be transformed into "n".
The provided solution addresses diacritical marks specifically. If additional non-diacritic special characters need to be handled, a more comprehensive string simplification method can be employed:
import java.text.Normalizer; import java.util.regex.Pattern; public class StringSimplifier { private static final Pattern DIACRITICS_PATTERN = Pattern.compile("[\p{InCombiningDiacriticalMarks}\p{IsLm}\p{IsSk}\u0591-\u05C7]+"); public static String simplify(String input) { String normalizedInput = Normalizer.normalize(input, Normalizer.Form.NFD); String diacriticStripped = DIACRITICS_PATTERN.matcher(normalizedInput).replaceAll(""); // Replace additional non-diacritic special characters using a custom mapping // ... return simplifiedString.toLowerCase(); } }
By considering a wider range of characters, this method provides a more comprehensive string simplification process.
The above is the detailed content of How Can I Efficiently Remove Diacritical Marks from Unicode Strings in Java?. For more information, please follow other related articles on the PHP Chinese website!