当前位置：首页 > news >正文

java：java.util.StringTokenizer实现字符串切割

news 2025/8/25 14:36:05

java：java.util.StringTokenizer实现字符串切割

1 前言

java.util工具包提供了字符串切割的工具类StringTokenizer，Spring等常见框架的字符串工具类（如Spring的StringUtils），常见此类使用。

例如Spring的StringUtils下的方法：

public static String[] tokenizeToStringArray(@Nullable String str, String delimiters, boolean trimTokens, boolean ignoreEmptyTokens) {if (str == null) {return EMPTY_STRING_ARRAY;}StringTokenizer st = new StringTokenizer(str, delimiters);List<String> tokens = new ArrayList<>();while (st.hasMoreTokens()) {String token = st.nextToken();if (trimTokens) {token = token.trim();}if (!ignoreEmptyTokens || token.length() > 0) {tokens.add(token);}}return toStringArray(tokens);
}

又如定时任务框架Quartz中，cron表达式类CronExpression，其中的buildExpression方法是为了处理cron表达式的，cron表达式有7个子表达式，空格隔开，cron表达式字符串的切割也使用到了StringTokenizer类，方法如下：

protected void buildExpression(String expression) throws ParseException {this.expressionParsed = true;try {if (this.seconds == null) {this.seconds = new TreeSet();}if (this.minutes == null) {this.minutes = new TreeSet();}if (this.hours == null) {this.hours = new TreeSet();}if (this.daysOfMonth == null) {this.daysOfMonth = new TreeSet();}if (this.months == null) {this.months = new TreeSet();}if (this.daysOfWeek == null) {this.daysOfWeek = new TreeSet();}if (this.years == null) {this.years = new TreeSet();}int exprOn = 0;for(StringTokenizer exprsTok = new StringTokenizer(expression, " \t", false); exprsTok.hasMoreTokens() && exprOn <= 6; ++exprOn) {String expr = exprsTok.nextToken().trim();if (exprOn == 3 && expr.indexOf(76) != -1 && expr.length() > 1 && expr.contains(",")) {throw new ParseException("Support for specifying 'L' and 'LW' with other days of the month is not implemented", -1);}if (exprOn == 5 && expr.indexOf(76) != -1 && expr.length() > 1 && expr.contains(",")) {throw new ParseException("Support for specifying 'L' with other days of the week is not implemented", -1);}if (exprOn == 5 && expr.indexOf(35) != -1 && expr.indexOf(35, expr.indexOf(35) + 1) != -1) {throw new ParseException("Support for specifying multiple \"nth\" days is not implemented.", -1);}StringTokenizer vTok = new StringTokenizer(expr, ",");while(vTok.hasMoreTokens()) {String v = vTok.nextToken();this.storeExpressionVals(0, v, exprOn);}}if (exprOn <= 5) {throw new ParseException("Unexpected end of expression.", expression.length());} else {if (exprOn <= 6) {this.storeExpressionVals(0, "*", 6);}TreeSet<Integer> dow = this.getSet(5);TreeSet<Integer> dom = this.getSet(3);boolean dayOfMSpec = !dom.contains(NO_SPEC);boolean dayOfWSpec = !dow.contains(NO_SPEC);if ((!dayOfMSpec || dayOfWSpec) && (!dayOfWSpec || dayOfMSpec)) {throw new ParseException("Support for specifying both a day-of-week AND a day-of-month parameter is not implemented.", 0);}}} catch (ParseException var8) {throw var8;} catch (Exception var9) {throw new ParseException("Illegal cron expression format (" + var9.toString() + ")", 0);}
}

2 使用

import com.google.common.collect.Lists;import java.util.List;
import java.util.StringTokenizer;/*** @author xiaoxu* @date 2023-10-18* spring_boot:com.xiaoxu.boot.tokenizer.TestStringTokenizer*/
public class TestStringTokenizer {public static void main(String[] args) {print("你 好 吗\t我是 \t你的\t 朋友 \t", " \t", false);}public static void print(String str, String delimiter, boolean isReturnDelims) {System.out.println("切割字符串：【" + str + "】；" + "分隔符：【" + delimiter + "】。");List<String> strs = Lists.newArrayList();String s;boolean x;for (StringTokenizer strToken = new StringTokenizer(str, delimiter, false); strToken.hasMoreTokens(); x = (s != null && strs.add(s))) {s = strToken.nextToken();System.out.println("切割：【" + s + "】");if(s.equals("吗"))s = null;}System.out.println("字符串数组：" + strs);}}

执行结果：

切割字符串：【你 好 吗	我是 	你的	 朋友 	】；分隔符：【 	】。
切割：【你】
切割：【好】
切割：【吗】
切割：【我是】
切割：【你的】
切割：【朋友】
字符串数组：[你, 好, 我是, 你的, 朋友]

源码片段分析：

public StringTokenizer(String str, String delim, boolean returnDelims) {currentPosition = 0;newPosition = -1;delimsChanged = false;this.str = str;maxPosition = str.length();delimiters = delim;retDelims = returnDelims;setMaxDelimCodePoint();
}

private void setMaxDelimCodePoint() {if (delimiters == null) {maxDelimCodePoint = 0;return;}int m = 0;int c;int count = 0;for (int i = 0; i < delimiters.length(); i += Character.charCount(c)) {c = delimiters.charAt(i);if (c >= Character.MIN_HIGH_SURROGATE && c <= Character.MAX_LOW_SURROGATE) {c = delimiters.codePointAt(i);hasSurrogates = true;}if (m < c)m = c;count++;}maxDelimCodePoint = m;if (hasSurrogates) {delimiterCodePoints = new int[count];for (int i = 0, j = 0; i < count; i++, j += Character.charCount(c)) {c = delimiters.codePointAt(j);delimiterCodePoints[i] = c;}}
}

调用setMaxDelimCodePoint()方法，源码可知，切割时设置int maxDelimCodePoint，是为了优化分隔符的检测（取的是分隔字符串中char的ASCII码值最大的字符的ASCII值，存入maxDelimCodePoint中。在方法int scanToken(int startPos)中，若满足条件(c <= maxDelimCodePoint) && (delimiters.indexOf© >= 0)，意即该字符的ASCII码值小于等于最大的maxDelimCodePoint，那么这个字符可能存在于分隔字符串中，再检测delimiters分隔字符串中是否包含该字符，反之，若ASCII码值大于分隔字符串中最大的maxDelimCodePoint，也就是说该字符一定不存在于分隔字符串里，&&直接跳过delimiters.indexOf的检测，也就达到了优化分隔符检测的效果了）。

private int scanToken(int startPos) {int position = startPos;while (position < maxPosition) {if (!hasSurrogates) {char c = str.charAt(position);if ((c <= maxDelimCodePoint) && (delimiters.indexOf(c) >= 0))break;position++;} else {int c = str.codePointAt(position);if ((c <= maxDelimCodePoint) && isDelimiter(c))break;position += Character.charCount(c);}}if (retDelims && (startPos == position)) {if (!hasSurrogates) {char c = str.charAt(position);if ((c <= maxDelimCodePoint) && (delimiters.indexOf(c) >= 0))position++;} else {int c = str.codePointAt(position);if ((c <= maxDelimCodePoint) && isDelimiter(c))position += Character.charCount(c);}}return position;
}

scanToken方法即跳过分隔字符串，只要某此循环时，该字符包含在分隔字符串里，那么position不再自增，以此时的position值作为实际切割获取字符串的末索引，因为subString方法是左闭右开的，该值是实际获取字符串的末索引值+1，所以可以截取到完整的不包含分隔符的字符串片段。

skipDelimiters方法类似，即过滤连续包含于分隔字符串中的字符，获取实际需要切割获取的字符串的开始索引值。

private int skipDelimiters(int startPos) {if (delimiters == null)throw new NullPointerException();int position = startPos;while (!retDelims && position < maxPosition) {if (!hasSurrogates) {char c = str.charAt(position);if ((c > maxDelimCodePoint) || (delimiters.indexOf(c) < 0))break;position++;} else {int c = str.codePointAt(position);if ((c > maxDelimCodePoint) || !isDelimiter(c)) {break;}position += Character.charCount(c);}}return position;
}

上述分析可知，只要待切割字符串中的字符，在分隔字符串中出现，那么就会做一次切割（也就是不论分隔字符串中的每个char或字符串片段的顺序，只要连续包含在分隔字符串里，就切割）。

演示如下（注意countTokens()方法不要在循环中和nextToken()一同使用）：

public static void print2(String str, String delimiter, boolean isReturnDelims) {StringTokenizer strTokenizer = new StringTokenizer(str, delimiter);System.out.println("总数目:" + strTokenizer.countTokens());int count;String[] strs = new String[count = strTokenizer.countTokens()];// 注意：不要在循环里写 int i = 0; i < strTokenizer.countTokens();// 因为  countTokens方法需要使用currentPosition，而每次执行nextToken方法时，currentPosition会一直往下偏移计算，// 会导致循环中， i < strTokenizer.countTokens();发生改变，这里应该是常量总数目for (int i = 0; i < count; i++) {String s = strTokenizer.nextToken();strs[i] = s;}System.out.println(Arrays.toString(strs));
}

countTokens源码如下：

public int countTokens() {int count = 0;int currpos = currentPosition;while (currpos < maxPosition) {currpos = skipDelimiters(currpos);if (currpos >= maxPosition)break;currpos = scanToken(currpos);count++;}return count;
}

执行：

print2("1a2b3c4ca5bc6ba7abc8acbbaba9", "abc", false);

结果如下所示：

总数目:9
[1, 2, 3, 4, 5, 6, 7, 8, 9]

查看全文

http://www.lryc.cn/news/197146.html

IPV6 ND协议--源码解析【根源分析】

Python学习笔记——存储容器

Android DI框架-Hilt

基于寄生捕食优化的BP神经网络（分类应用） - 附代码

【Java常见的几种设计模式】

jupyter崩溃进不去，报错module ‘mistune‘ has no attribute ‘BlockGrammar‘

windows terminal鼠标右键打开

数据库安全运维是什么意思？数据库安全运维系统用哪家好？

小程序的console中出现：。。。不在以下 request 合法域名列表中，请参考文档：。。。的报错解决

计算机网络基础（三）：IPv4编址方式、子网划分、IPv4通信的建立与验证及ICMP协议

Error: GlobalConfigUtils setMetaData Fail Cause:java.lang.NullPointerException

OpenHarmony 应用全局的 UI 状态存储：AppStorage

外置告警蜂鸣器使用小坑

SSO身份验证如何帮助加强密码安全性

JIRA 在 2024 年完全停止服务器版本支持

Ubuntu18.04安装gdal3.4

C#好资源网址推荐

UE5 Python脚本自动化Sequence Key帧

2023年整理的自动化测试面试题及答案

docker 命令记录

二、ElasticSearch中索引库与文档操作

few shot learnning笔记

轻松上手，制作电子期刊就这么简单

网络电视机顶盒怎么样？内行揭晓网络电视机顶盒排名

java：java.util.StringTokenizer实现字符串切割

相关文章：