这章讲平时最常用的String字符串。主要讲了一下几件事:
String看似常用,其实越简单的东西往往越靠近底层。有很多隐藏起来的复杂性并没有被我们看到。
翻开JDK中String的源码第一段:
public final class String implements java.io.Serializable, Comparable<String>, CharSequence {
/** String本质是个char数组. */
private final char value[];
...
...
}
这告诉我们,String的本质是一个char[]字符数组。
如上图所示,一个String由两部分组成:存放信息实体的字符数组char[]和管理这个char[]的元信息。我们以一个32位系统上含有8个字符的String为例,每个char32位,char[]纯信息一共128位。用来管理Array的元信息128位。所以整个char[]一共256位。另外String用来管理char[]的元信息一共244位。所以加在一起一共480位(60个字节)用来表示128位(16个字节,或者说8个字符)。
其中,String的三大元数据分别是,
作为程序员在用数据的时候,起码要知道自己手上用的是个什么东西。
关于String两个最基本的特性要记住,第一,String是不可变的。第二,String是线程安全的。
String不可变很简单,如下图,给一个已有String第二次赋值,不是在原内存地址上修改数据,而是重新指向一个新对象,新地址。
首先String类是用final关键字修饰,这说明String不可继承。再看下面,String类的主力成员字段 value 是个 char[ ] 数组,而且是用final修饰的。final修饰的字段创建以后就不可改变。有的人以为故事就这样完了,其实没有。因为虽然value是不可变,也只是value这个引用地址不可变。挡不住Array数组是可变的事实。Array的数据结构看下图,
也就是说Array变量只是stack上的一个引用,数组的本体结构在heap堆。String类里的value用final修饰,只是说stack里的这个叫value的引用地址不可变。没有说堆里array本身数据不可变。看下面这个例子,
final int[] value={1,2,3}
int[] another={4,5,6};
value=another; //编译器报错,final不可变
value用final修饰,编译器不允许我把value指向堆区另一个地址。但如果我直接对数组元素动手,分分钟搞定。
final int[] value={1,2,3};
value[2]=100; //这时候数组里已经是{1,2,100}
所以String是不可变,关键是因为SUN公司的工程师,在后面所有String的方法里很小心的没有去动Array里的元素,没有暴露内部成员字段。而且还很小心地把整个String设成final禁止继承,避免被其他人破坏。所以String是不可变的关键都在底层的实现,而不是一个final。考验的是工程师构造数据类型,封装数据的功力。
这个最简单地原因,就是为了安全。看下面这个场景(有评论反应例子不够清楚,现在完整地写出来),一个函数appendStr( )在不可变的String参数后面加上一段“bbb”后返回。appendSb( )负责在可变的StringBuilder后面加“bbb”。
class Test{
//不可变的String
public static String appendStr(String s){
s+="bbb";
return s;
}
//可变的StringBuilder
public static StringBuilder appendSb(StringBuilder sb){
return sb.append("bbb");
}
public static void main(String[] args){
//String做参数
String s=new String("aaa");
String ns=Test.appendStr(s);
System.out.println("String aaa >>> "+s.toString());
//StringBuilder做参数
StringBuilder sb=new StringBuilder("aaa");
StringBuilder nsb=Test.appendSb(sb);
System.out.println("StringBuilder aaa >>> "+sb.toString());
}
}
//Output:
//String aaa >>> aaa
//StringBuilder aaa >>> aaabbb
如果程序员不小心像上面例子里,直接在传进来的参数上加”bbb”,因为Java对象参数传的是引用,所以可变的的StringBuffer参数就被改变了。可以看到变量sb在Test.appendSb(sb)操作之后,就变成了”aaabbb”。有的时候这可能不是程序员的本意。所以String不可变的安全性就体现在这里。
再看下面这个HashSet用StringBuilder做元素的场景,问题就更严重了,而且更隐蔽。
class Test{
public static void main(String[] args){
HashSet<StringBuilder> hs=new HashSet<StringBuilder>();
StringBuilder sb1=new StringBuilder("aaa");
StringBuilder sb2=new StringBuilder("aaabbb");
hs.add(sb1);
hs.add(sb2); //这时候HashSet里是{"aaa","aaabbb"}
StringBuilder sb3=sb1;
sb3.append("bbb"); //这时候HashSet里是{"aaabbb","aaabbb"}
System.out.println(hs);
}
}
//Output:
//[aaabbb, aaabbb]
StringBuilder型变量sb1和sb2分别指向了堆内的字面量”aaa”和”aaabbb”。把他们都插入一个HashSet。到这一步没问题。但如果后面我把变量sb3也指向sb1的地址,再改变sb3的值,因为StringBuilder没有不可变性的保护,sb3直接在原先”aaa”的地址上改。导致sb1的值也变了。这时候,HashSet上就出现了两个相等的键值”aaabbb”。破坏了HashSet键值的唯一性。所以千万不要用可变类型做HashMap和HashSet键值。
还有一个大家都知道,就是在并发场景下,多个线程同时读一个资源,是不会引发竟态条件的。只有对资源做写操作才有危险。不可变对象不能被写,所以线程安全。
最后别忘了String另外一个字符串常量池的属性。接下来正好要讲这个。Java之所以能实现这个特性,String的不可变性是最基本的一个必要条件。要是内存里字符串内容能改来改去,这么做就完全没有意义了。
由于String是Java里最常用的数据类型之一,往往内存里会塞满了String,而且互相重复,数据冗余很高。为了避免内存臃肿,提高效率,java实现了一套基于 字符串常量池的优化方案。
用String Table驻留一个String对象的引用,每次要用到一个String字面量的时候,都会先查这个表。确定没有再创建,有的话就直接复制一个驻留引用。 String Table在哪儿? 可以理解为过去说的Perm Generation 永生代里,方法区的外面。 原理这里不展开了,详细可以参看一篇专题:《String str=new String(“Hello”)到底创建了几个对象?》。
这里我就贴一个简单的结论,方便以后查阅。更详细的例子,和解释都在上面这篇专题里。
class Test {
private static String staticStr="Hello";
private String memberStr="Hello";
public void sayHello(){
String methodStr="Hello";
System.out.println(methodStr);
}
public static void main (String[] args) {
Test t=new Test();
t.sayHello();
String threadStr="Hello";
}
}
代码中,staticStr是Test类的静态成员变量,memberStr是普通成员变量。methodStr是sayHello方法的局部变量。在主线程方法main里,还有一个threadStr。
如上图所示,四个变量都指向堆区新生代里的同一个对象实例“Hello”。但程序运行过程里,内存里实际存在过6个对这个对象的引用。具体如下,
String不可变的特性,影响了它的效率。比如“+”拼接字符串,因为String不可变,不可以直接在原字符串后面拼接,每次都需要重建一个新对象,来存放拼接后更长的字符串。拼接多次,中间生成的字符串就会成为垃圾,需要被回收。我们知道java的垃圾回收开销又很大。比如下面最简单的操作,
String str="a"+"b"+"c"+"d"+"e";
如果用笨办法拼接,最后为了得到“abcde”,java堆会产生”a”,”b”,”c”,”d”,”e”,”ab”,”abc”,”abcd”这么多中间对象,再算上垃圾回收的时间,真的会很慢。
Java为了优化性能,设计了一个StringBuilder类,它是可变的。使得StringBuilder.append()方法拼接字符串,是直接在原字符串上拼接。不产生中间字符串,提高了效率。
那StringBuilder.append()是怎么做到不产生中间副产品字符串的呢?
因为StringBuilder是可变的。直接看源码,StringBuilder的基类AbstractStringBuilder数据容器char数组不是final的:
char[] value;
AbstractStringBuilder#append()方法直接调用String#getChars()。
// Documentation in subclasses because of synchro difference
public AbstractStringBuilder append(StringBuffer sb) {
if (sb == null)
return append("null");
int len = sb.length();
ensureCapacityInternal(count + len);
sb.getChars(0, len, value, count);
count += len;
return this;
}
而String#getChars()直接调用的是System.arrayCopy():
public void getChars(int srcBegin, int srcEnd, char dst[], int dstBegin) {
if (srcBegin < 0) {
throw new StringIndexOutOfBoundsException(srcBegin);
}
if (srcEnd > value.length) {
throw new StringIndexOutOfBoundsException(srcEnd);
}
if (srcBegin > srcEnd) {
throw new StringIndexOutOfBoundsException(srcEnd - srcBegin);
}
System.arraycopy(value, srcBegin, dst, dstBegin, srcEnd - srcBegin);
}
而System.arrayCopy()是一个本地方法,用Native关键字修饰。说明它的实现是调用本地系统的用C或者C++写的程序执行字符串的拼接。为了更高的效率。
public static native void arraycopy(Object src, int srcPos,
Object dest, int destPos,
int length);
这个底层的arraycopy就先不管,只要知道它是一个效率比较高的直接拼接数组的函数就可以了。
再回到String,还是开始的简单例子,”a”+”b”+”c”+”d”。实际我们执行下面的语句真的很慢吗?
//String直接字面拼接:1308纳秒
String result="a"+"b"+"c"+"d";
//StringBuilder拼接:42920纳秒
StringBuilder sb=new StringBuilder();
sb.append("a");
sb.append("b");
sb.append("c");
sb.append("d");
//String对象拼接:13774纳秒
String a=new String("a");
String b=new String("b");
String c=new String("c");
String d=new String("d");
String str=a+b+c+d;
简单做个上面这个测试,结果是String字面量直接拼接耗时1308纳秒,而StringBuilder耗时为42920纳秒,String对象拼接:13774纳秒。
所以,真相是String类像这样简单地字面量的拼接操作也已经被优化过了,String result=”a”+”b”+”c”+”d”基本可以等同于String result=”abcd”。而且创建StringBuilder实例本身的开销也不小,区区几个字符的拼接,还体现不出它的价值,甚至还不够弥补它本身实例化的开销。
那什么规模能体现出StringBuilder的优势呢? 下面我们把上面每个操作都重复了一万次。
//String直接字面拼接:105334355纳秒
String a="a";
String b="";
for(int i=0;i<10000;i++){
b+=a;
}
//StringBuilder拼接:415360纳秒
StringBuilder sb=new StringBuilder();
for(int i=0;i<10000;i++){
sb.append("a");
}
//String对象拼接:198404867纳秒
String s=new String("");
for(int i=0;i<10000;i++){
String x=new String("a");
s+=x;
}
再看结果:
是不是StringBuilder的优势出来了?快了1000倍。为什么呢?因为由于for的循环结构,String拼接只能老老实实每次创建一个新对象了。JVM内部”+“加号操作符的红利就没有了。
所以,要记住下面这个结论:
简单的个别几个字符串拼接,还是用String直接拼接更快。如果有大量的字符串需要拼接,并且用到了loop循环控制流,这时候就是StringBuilder效率更高了。
刚才实验也显示了,StringBuilder实例化的开销比单个String对象的创建开销大多了。所以用循环语句拼接的时候,注意要在循环域外先创建一个StringBuilder,否则每次循环都会创建一个新的StringBuilder。
下面这个代码,
String s=new String();
for(int i=0;i<10;i++>){
s+=i;
}
相当于以下操作,
String s=new String();
for(int i=0;i<10;i++>){
StringBuilder sb=new StringBuilder();
sb.append(i);
}
这种循环拼接字符串的情况,还是手动在循环外面创建一个StringBuilder比较好,
StringBuilder sb=new StringBuilder();
for(int i=0;i<10;i++>){
sb.append(i);
}
有了String和StringBuilder,为什么还要StringBuffer呢?
其实是人家StringBuffer先出来的好吗。StringBuilder是在Java1.5才加进来的。
String 不可变 (线程安全) since JDK1.0 java.lang.String public final class String
StringBuffer 可变(线程安全) since JDK1.0 java.lang.StringBuffer public final class StringBuffer
StringBuilder 可变(非线程安全)since JDK1.5 java.lang.StringBuilder public final class StringBuilder
关于StringBuffer,这里就先不贴源码了,需要记住的点是:
归纳成一句,就是StringBuffer和StringBuilder一样都是String的辅助类。但StringBuilder是适用于单线程的轻量级版,StringBuffer是用于并发场景的重量级版。
最后贴一个StringBuilder和StringBuffer直接的PK。
//StringBuilder拼接:415360纳秒
StringBuilder sb=new StringBuilder();
for(int i=0;i<10000;i++){
sb.append("a");
}
//StringBuilder拼接:810470纳秒
StringBuffer sbf=new StringBuffer();
for(int i=0;i<10000;i++){
sbf.append("a");
}
测试结果:
StringBuilder比StringBuffer快了近一倍。
格式化输出,这个功能还是会很实用的。内存终归是有限的,实际工作中大量的数据是记录在外部文件里的,在学校的时候,经常跑一个实验就用到几百个G,甚至几个TB的样本数据。所以格式化输出的意义不是看打印出来漂不漂亮,而是在于让储存在外部文件里的数据更适于批量处理和读写。
Java里有三个类提供了格式化输出功能,也就是format()方法。他们都接受相同的参数来格式化输出。
一般在控制台输出点东西,这样简单的使用场景,会写这样的格式:
System.out.println("Row 1: ["+x+" "+y+" "+z+"]\n");
System.out.printf("Row 1: [%d %f %s]\n", x, y, z);
System.out.format("Row 1: [%d %f %s]\n", x, y, z);
这里,%d和%f和%s叫占位符,表示x插入%d的位置,y插入%f的位置,z插入%s的位置。%d表示数据转换成整型输出,%f表示数据转换成浮点型,%s表示数据转换成String型。
但这只是很简单的使用场景,虽然能用,但功能有限。完整的格式化语法是下面这个样子:
(%[argument_index$][flags][width][.precision][conversion], arg1, arg2, arg3, …)
其中每个位置分别表示:
下图是常用的flags格式符
下图是常用的conversion数据类型
下面用一个例子实际演示怎么使用格式化语法。下面代码用于打出和书上一样的小发票。
代码里用的是Formatter#format(),其实无论换成System.out.format()或者System.out.printf(),再或者String#format()效果都是一样的。下面是代码:
class Receipt {
/**
* PUBLIC METHODS
*/
public void printTitle(){
f.format("%1$-20.15s %2$3.3s %3$10.10s\n","Item","Qty","Price");
f.format("%1$-20.15s %2$3.3s %3$10.10s\n","----","---","-----");
}
public void print(String name, int qty, double price){
f.format("%1$-20.15s %2$3d %3$10.2f\n",name,qty,price);
total+=price;
}
public void printTotal(){
f.format("%1$-20.15s %2$3s %3$10.2f\n","Tax", "", total*0.15);
f.format("%1$-20.15s %2$3s %3$10.10s\n","", "","-----");
f.format("%1$-20.15s %2$3s %3$10.2f\n","Total", "",total*1.15);
}
/**
* PRIVATE FIELDS
*/
private double total=0;
private Formatter f=new Formatter(System.out);
/**
* MAIN
*/
public static void main(String[] args){
Receipt r=new Receipt();
r.printTitle();
r.print("Jack’s Magic Beans", 4, 4.25);
r.print("Princess Peas", 3, 5.1);
r.print("Three Bears Porridge", 1, 14.29);
r.printTotal();
}
}
简单举例解释“%1$-20.15s”这一段中:
打印出来的效果就和上面书上的小发票的图一模一样。
最后是这一章的一个大头,正则表达式。有多重要呢?如果不牢牢掌握的话,都不好意思说自己是程序员吧?写正则表达式,同一个意思可以有很多种写法,需要掌握一个原则:不是要写最花哨的表达式,而是写最简单,最好理解的表达式,只要足够完成任务就行。
语法因为比较复杂,这里不可能完整地复制过来。只挑比较重要的列出来。完整的语法定义参见“java.util.regex.Pattern”类的官方文档。里面有最权威的描述。再想深入学,可以读Friedl的《精通正则表达式-第3版》。
正则表达式最常用到的符号就是三个表示数量的符号了:
这些表示数量的符号要结合表示“字符”的符号一起使用。正则表达式语法里定义了很多,举几个最常用的例子:
_
也算在里面。[a-zA-Z_0-9]还有其他的一些格式符:
注意这里的一个 反斜杠”" 是很重要的,代表一个 “转义符”。因为后面的那些字母都有它们自身的意义,前面加个反斜杠转义符,就是说:不表示他们的常规意义了,现在开始表示它的特殊意义。就是这样。但如果用在想一个点(或者说是句号)”.” 的前面,因为这个点”.”本身不代表一个“句号”,而是是代表任意字符,已经转意了,前面再加一个转义符,就是转回来表示它的字面意义:“一个句号”。
另外,方括号 “[]” 代表一个“或”集。代表里面的字符,任取其一
括号 ”()” 和数学公式里一样,表示优先合并。每个用括号扩起来的都是一个“组”。这个在的Matcher#group()方法会用到,它专门返回正则表达式的第几个组。
转义符看上去很简单,两种情况:
但还是要专门提一下。自己以前经常被坑。
首先单纯从正则表达式的角度讲,”.“ 表示一个“句号”。两个反斜杠 ”\“ 表示一个正常反斜杠。都很好理解。
但在Java里,真的要用字符串表示一个句号”.”,要用 ”\.”。反斜杠”",要用 ”\\“ 来表示。吃过很多次亏。
这到底是为什么呢?简单讲是因为:正巧Java语言里,反斜杠 ”" 也是作为转义符存在,而我们在.java文件里的一个字符串String在被理解成正则表达式之前,要经过Java编译器,和正则表达式编译器的两次翻译。
对Java编译器来讲,单个反斜杠”"是个特殊字符。和正则表达式里一样,也是“转义符”。是用来转义其他所有特殊字符。悲剧由此开始。
所以要让Java编译器认出一个正常意义的反斜杠,就需要转义他自己:”\“两个反斜杠表示这是一个反斜杠。
经过java编译器翻译的内容,会交到正则编译器手里。不幸在这里反斜杠又是转义符,要让要想让正则表达式编译器读出一个反斜杠,要对正则编译器说”\“。
那怎么才能把两个反斜杠”\“交到正则编译器手里呢?
对了,就要交给java编译器四个反斜杠”\\“,翻译成两个反斜杠”\“之后交到正则编译器手里,再第二次翻译成一个反斜杠”"。
这叫什么?这就是贪污啊。把反斜杠看成是金条。我要是想交给正则表达式匹配函数一根金条,就要交给java编译器四根金条,贪污掉一半,剩两条交给正则表达式编译器,再贪污掉一半,最后剩的这根才交到表达式函数手里。
对一个句号”.”也是一样,”\.”给Java编译器,贪污一个反斜杠以后,交到正则编译器手里的时候剩一个反斜杠:”.“。然后正则编译器再贪污一遍,翻译成了”.”。
Java里使用正则表达式,主要通过两个地方:
String里用正则表达式的方法有常用的 matches(String regex), 替换 replaceFirst(String regex, String replacement), replaceAll(String regex, String replacement),切割 split(String regex)。参数里的regex就代表着这个参数接受Sting形式的正则表达式。比如像下面这个例子,可以直接拿字符串去和一个正则表达式匹配。
"hello world".match("(?i)((^[aeiou])|(\\s+[aeiou]))\\w+?[aeiou]\\b")
//output: false
上面例子里的正则表达式,翻译成人话就是:一个以元音(aeiou)开头,元音结尾的单词。其中(?i)表示一种匹配策略CASE_INSENSITIVE,忽略所有大小写。在Pattern和Matcher类里也有这个功能。然后明显hello和word都不是元音开头,所以返回false。
另外,在java.util.regex包里的 Pattern 类和 Matcher 类才是专门为正则表达式而生的类。
用法也很简单,Pattern.compile(String regex)方法可以把String形式的正则表达式编译成一个Pattern对象。然后Pattern#matcher(String str)方法通过给Pattern对象传递一个需要匹配的字符串,返回一个Matcher对象。可以调用Matcher类的各种方法,比如Matcher#split()以及Matcher#replaceAll()。
public boolean Matcher#find()尝试查找与该模式匹配的输入序列的下一个子序列。此方法从匹配器区域的开头开始,如果该方法的前一次调用成功了并且从那时开始匹配器没有被重置,则从以前匹配操作没有匹配的第一个字符开始。如果匹配成功,则可以通过 start、end 和 group 方法获取更多信息。
当且仅当输入序列的子序列匹配此匹配器的模式时才返回 true。
public boolean Matcher#matches()尝试对整个目标字符串进行正则匹配,只有当整个字符串完整匹配,才返回true。
而且,在Pattern类里有一个相同功能的静态方法Pattern.matches()。不用创建Pattern和Matcher对象适合小规模零散正则匹配。如果要多次使用一种模式,编译一次生成Pattern对象,重用此模式比每次都调用此方法效率更高。
public static boolean matches(String regex, CharSequence input)
//使用
Pattern.matches(regex, input);
和find()和matches()方法类似,也是匹配目标字符串和正则表达式。不同点是lookingAt()是从目标字符串的开头开始找有没有能匹配正则表达式的一个子串。比如下面这个例子,
Pattern p=Pattern.compile("reg");
Matcher m=p.matcher("regular");
System.out.println(m.lookingAt());
//output: true
单词regular的开头包含reg子串,所以匹配成功。
之前讲过,正则表达式用括号括起来的部分都是一个“组”。每个组在正则表达式里,都有自己的序号。序号是这样定义的: 假设有 “A(B(C))D” 这个正则表达式,一共有三个组:
如果上面的find(),matches(),lookingAt()方法,匹配成功的话,调用Matcher#group()无参数方法,可以返回序号是0的组,也就是全体。Matcher#group(int i)方法就返回对应序号的组。
正常情况下,如果每次替换的内容都一样,appendReplacement()其实和replaceAll()的效果一样。但appendReplacement()的优势是渐进式替换,
while(m.find()){
m.appendReplacement(sbuf, m.group().toUpperCase());
}
m.appendTail(sbuf);
上面的代码每次用的都是大写替换小写m.group().toUpperCase()。但因为可以调用一个函数来生成替换内容,这部分我们可以在里面做很大的文章,设计成每次特换不同的内容。
reset()方法可以将现有Matcher应用于一个新的字符串。这样就可以重复使用构造好的某个pattern。效率更高。例子如下,
Pattern p=Pattern.compile("reg");
Matcher m=p.matcher("regular");
System.out.println(m.lookingAt());
//不换表达式"reg",直接换一段目标字符串来匹配。
m.reset("expression");
System.out.println(m.lookingAt());
//output:
//true
//false
上面代码中的m.reset(“expression”)在不换正则表达式”reg”的情况下,直接换一段目标字符串来匹配。
正则表达式很重要,需要以后在工作中熟练操作。
String很底层,和虚拟机结合地很深。里面有很多学问。越是常见基本的东西,越是容易被人忽视。但其中往往有大乾坤。
第二种toStringBuilder()方法没有减少StringBuilder的数量。原因是第一个toString()方法已经优化过,也只产生了一个StringBuilder对象。因为没有在循环里拼接String,所以没有产生很多中间StringBuilder。
package com.ciaoshen.thinkinjava.chapter13;
import java.util.*;
public class Exercise1{
private final String valve1="Monday";
private final String valve2="Tusday";
private final String valve3="Wendsday";
private final String valve4="Thursday";
private WaterSource source = new WaterSource();
private int i;
private float f;
//use String
//每进行一次“+”操作,就产生一个StringBuilder对象。
public String toString() {
long begin=System.nanoTime();
String result=
"valve1 = " + valve1 + " " +
"valve2 = " + valve2 + " " +
"valve3 = " + valve3 + " " +
"valve4 = " + valve4 + "\n" +
"i = " + i + " " + "f = " + f + " " +
"source = " + source;
long end=System.nanoTime();
System.out.println((end-begin)/10e6);
return result;
}
//use StringBuilder
//只产生一个StringBuilder对象
public String toStringBuilder() {
long begin=System.nanoTime();
StringBuilder result=new StringBuilder();
result.append("valve1 = ").append(valve1).append(" ");
result.append("valve2 = ").append(valve2).append(" ");
result.append("valve3 = ").append(valve3).append(" ");
result.append("valve4 = ").append(valve4).append("\n");
result.append("i = ").append(i).append(" ").append("f = ").append(f).append(" ").append("source = ").append(source);
long end=System.nanoTime();
System.out.println((end-begin)/10e6);
return result.toString();
}
public static void main(String[] args) {
Exercise1 sprinklers = new Exercise1();
System.out.println(sprinklers);
System.out.println(sprinklers.toStringBuilder());
}
}
package com.ciaoshen.thinkinjava.chapter13;
public class WaterSource {
private String s;
public WaterSource() {
System.out.println("WaterSource()");
s = "Constructed";
}
public String toString() { return s; }
}
package com.ciaoshen.thinkinjava.chapter13;
import java.util.*;
public class Exercise2{
public String toString() {
return " InfiniteRecursion address: " + super.toString() + "\n";
}
public static void main(String[] args) {
List<Exercise2> v = new ArrayList<Exercise2>();
for(int i = 0; i < 10; i++){
v.add(new Exercise2());
}
System.out.println(v);
}
}
package com.ciaoshen.thinkinjava.chapter13;
import java.io.*;
import java.util.*;
public class Exercise3{
private String name;
private Formatter f;
public Exercise3(String name, Formatter f) {
this.name = name;
this.f = f;
}
public void move(int x, int y) {
f.format("%s The Turtle is at (%d,%d)\n", name, x, y);
}
public static void main(String[] args) {
PrintStream errAlias= System.err;
Exercise3 tommy = new Exercise3("Tommy",new Formatter(System.err));
Exercise3 terry = new Exercise3("Terry",new Formatter(errAlias));
tommy.move(0,0);
terry.move(4,8);
tommy.move(3,4);
terry.move(2,5);
tommy.move(3,3);
terry.move(3,3);
}
}
package com.ciaoshen.thinkinjava.chapter13;
import java.util.*;
public class Exercise4 {
private double total = 0;
private Formatter f = new Formatter(System.out);
private int[] columnsWidth={15,5,10};
public Exercise4(){}
public Exercise4(int[] width){columnsWidth=width;}
public void printTitle() {
f.format("%-"+columnsWidth[0]+"s %"+columnsWidth[1]+"s %"+columnsWidth[2]+"s\n", "Item", "Qty", "Price");
f.format("%-"+columnsWidth[0]+"s %"+columnsWidth[1]+"s %"+columnsWidth[2]+"s\n", "----", "---", "-----");
}
public void print(String name, int qty, double price) {
f.format("%-"+columnsWidth[0]+".15s %"+columnsWidth[1]+"d %"+columnsWidth[2]+".2f\n", name, qty, price);
total += price;
}
public void printTotal() {
f.format("%-"+columnsWidth[0]+"s %"+columnsWidth[1]+"s %"+columnsWidth[2]+".2f\n", "Tax", "", total*0.06);
f.format("%-"+columnsWidth[0]+"s %"+columnsWidth[1]+"s %"+columnsWidth[2]+"s\n", "", "", "-----");
f.format("%-"+columnsWidth[0]+"s %"+columnsWidth[1]+"s %"+columnsWidth[2]+".2f\n", "Total", "", total * 1.06);
}
public static void main(String[] args) {
int[] width={30,10,20};
Exercise4 receipt = new Exercise4(width);
receipt.printTitle();
receipt.print("Jack’s Magic Beans", 4, 4.25);
receipt.print("Princess Peas", 3, 5.1);
receipt.print("Three Bears Porridge", 1, 14.29);
receipt.printTotal();
}
}
package com.ciaoshen.thinkinjava.chapter13;
import java.math.*;
import java.util.*;
public class Exercise5{
public static void main(String[] args) {
Formatter f = new Formatter(System.out);
char u = 'a';
System.out.println("u = ‘a’");
f.format("s: %1$-15.15s\n", u);
// f.format("d: %d\n", u);
f.format("c: %1$-15c\n", u);
f.format("b: %1$-15.5b\n", u);
// f.format("f: %f\n", u);
// f.format("e: %e\n", u);
// f.format("x: %x\n", u);
f.format("h: %1$-15.5h\n", u);
}
}
package com.ciaoshen.thinkinjava.chapter13;
import java.util.*;
public class Exercise6 {
private static final int i=100;
private static final long l=10000l;
private static final float f=10000.00f;
private static final double d=100000.00;
public String toString(){
return String.format("Int: %1$-15d Long: %2$-15d Float: %3$-15.1f Double: %4$-15.7e", i, l, f, d);
}
public static void main(String[] args){
Exercise6 ex=new Exercise6();
System.out.println(ex);
}
}
package com.ciaoshen.thinkinjava.chapter13;
import java.util.*;
import java.util.regex.*;
public class Exercise7 {
private List<String> list=new ArrayList<String>();
public Exercise7(){}
public Exercise7(List<String> l){list=l;}
public void setList(List<String> l){list=l;}
public void parse(String regex){
Pattern p=Pattern.compile(regex);
Matcher m;
Formatter f=new Formatter(System.out);
for(String str:list){
m=p.matcher(str);
f.format("%1$-15.15s %2$-8.8s\n", str, m.find());
}
}
public static void main(String[] args){
Exercise7 test=new Exercise7(Arrays.asList("hello world!","Hello world!","Hello World!","Hello world.","Hello World.","HELLO WORLD."));
String regex="^[A-Z].*\\.";
test.parse(regex);
}
}
package com.ciaoshen.thinkinjava.chapter13;
import java.util.*;
public class Exercise8 {
private static final String knight =
"Then, when you have found the shrubbery, you must cut " +
"down the mightiest tree in the forest... " +
"with... a herring!";
public static void split(String regex) {
Formatter f = new Formatter(System.out);
List<String> list = Arrays.asList(knight.split(regex));
for(String str : list){
f.format("%50.50s\n", str);
}
}
public static void main(String[] args){
Exercise8.split("the | you");
}
}
package com.ciaoshen.thinkinjava.chapter13;
import java.util.*;
public class Exercise9{
private static final String knight =
"Then, when you have found the shrubbery, you must cut " +
"down the mightiest tree in the forest... " +
"with... a herring!";
public static void replace(String regex, String replacement){
System.out.println(knight.replaceAll(regex,replacement));
}
public static void main(String[] args){
Exercise9.replace("[aeiouAEIOU]","_");
}
}
^Java
, \Breg.*
,n.w\s+h(a|i)s
,s?
,s*
,s+
,s{4}
,S{1}.
,s{0,3}
.package com.ciaoshen.thinkinjava.chapter13;
import java.util.*;
import java.util.regex.*;
public class Exercise11 {
private static final String phrase = "Arline ate eight apples and one orange while Anita hadn't any";
public static boolean finding(String regex){
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(phrase);
return m.find();
}
public static void main(String[] args){
String regex = "(?i)((^[aeiou])|(\\s+[aeiou]))\\w+?[aeiou]\\b";
System.out.println(Exercise11.finding(regex));
}
}
package com.ciaoshen.thinkinjava.chapter13;
import java.util.*;
import java.util.regex.*;
public class Exercise12 {
protected static final String POEM =
"Twas brillig, and the slithy toves\n" +
"Did gyre and gimble in the wabe.\n" +
"All mimsy were the borogoves,\n" +
"And the mome raths outgrabe.\n\n" +
"Beware the Jabberwock, my son,\n" +
"The jaws that bite, the claws that catch.\n" +
"Beware the Jubjub bird, and shun\n" +
"The frumious Bandersnatch.";
public static Set<String> scan(String regex){
Set<String> set = new HashSet<String>();
Matcher m = Pattern.compile(regex).matcher(POEM);
while(m.find()){
set.add(m.group(2));
}
return set;
}
public static void display(Set<String> set){
System.out.println("Word count: " + set.size());
System.out.println(set);
}
public static void main(String[] args) {
String regex = "(?m)(^|\\W)([a-z]\\w*)(\\W|$)";
Exercise12.display(Exercise12.scan(regex));
}
}
package com.ciaoshen.thinkinjava.chapter13;
import java.util.*;
import java.util.regex.*;
public class Exercise13 {
public static String input = Exercise12.POEM;
private static class Display {
private boolean regexPrinted = false;
private String regex;
Display(String regex) { this.regex = regex; }
void display(String message) {
if(!regexPrinted) {
System.out.println(regex);
regexPrinted = true;
}
System.out.println(message);
}
}
static void examine(String s, String regex) {
Display d = new Display(regex);
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(s);
while(m.find()){
d.display("find() ‘" + m.group() + "‘ start = "+ m.start() + " end = " + m.end());
}
if(m.lookingAt()){ // No reset() necessary
d.display("lookingAt() start = " + m.start() + " end = " + m.end());
}
if(m.matches()){ // No reset() necessary
d.display("matches() start = " + m.start() + " end = " + m.end());
}
}
public static void main(String[] args) {
for(String in : input.split("\n")) {
System.out.println("input : " + in);
for(String regex : new String[]{"\\w*ere\\w*","\\w*are", "T\\w+", "Twas.*?"}){
examine(in, regex);
}
}
}
}
package com.ciaoshen.thinkinjava.chapter13;
import java.util.regex.*;
import java.util.*;
public class Exercise14 {
private static final String phrase = "This!!unusual use!!of exclamation!!points";
public static void split(String regex){
System.out.println(Arrays.toString(phrase.split(regex)));
}
public static void splitThreePieces(String regex){
System.out.println(Arrays.toString(phrase.split(regex,3)));
}
public static void main(String[] args) {
String regex = "!!";
Exercise14.split(regex);
Exercise14.splitThreePieces(regex);
}
}
package com.ciaoshen.thinkinjava.chapter13;
import java.util.*;
import java.util.regex.*;
import java.io.*;
public class Exercise15 {
private static final String SPLITER = "\n";
private static String readFile(String path){ //return the file content as a string, otherwise return null
StringBuilder sb = new StringBuilder();
File f = new File(path);
try{
BufferedReader br = new BufferedReader(new FileReader(new File(path)));
try{
String line = null;
while(true){
line = br.readLine();
if(line == null){break;}
sb.append(line+SPLITER);
}
return sb.toString();
}finally{
br.close();
}
}catch(IOException ioe){
System.out.println("IOException when openning file " + path);
return null;
}
}
public static void grep(String regex, String path, int flag){
String content = readFile(path);
if(content == null){System.out.println("Check your file path: " + path);return;}
Pattern p = Pattern.compile(regex, flag);
Matcher m = p.matcher("");
Formatter f = new Formatter(System.out);
int index=0;
f.format("%1$5.5s %2$-15.15s %3$5.5s \n", "INDEX", "WORD", "POS");
for(String line : content.split(SPLITER)){
m.reset(line);
while(m.find()){
f.format("%1$5d: %2$-15.15s %3$5d \n",++index, m.group(), m.start());
}
}
}
private static void unitTestReadFile(){ //test readFile()
String rightPath = "/Users/Wei/java/com/ciaoshen/thinkinjava/chapter13/Exercise15.java";
String wrongPath = "/Users/Wei/java/com/ciaoshen/thinkinjava/chapter13/Exercise.java";
System.out.println(readFile(rightPath));
System.out.println(readFile(wrongPath));
}
private static void unitTestGrep(String regex){ //test grep()
String rightPath = "/Users/Wei/java/com/ciaoshen/thinkinjava/chapter13/Exercise15.java";
String wrongPath = "/Users/Wei/java/com/ciaoshen/thinkinjava/chapter13/Exercise.java";
int flag = Pattern.CASE_INSENSITIVE;
grep(regex, rightPath, flag);
grep(regex, wrongPath, flag);
}
public static void main(String[] args){
//单元测试
//Exercise15.unitTestReadFile();
//Exercise15.unitTestGrep("s\\w*");
Exercise15.unitTestGrep("(?)(^|\\W)(s\\w*)(\\W|$)");
}
}
File[] files = new File(".").listFiles();
package com.ciaoshen.thinkinjava.chapter13;
import java.util.*;
import java.util.regex.*;
import java.io.*;
public class Exercise16 {
private static final String SPLITER = "\n";
private static String readFile(String path){ //return the file content as a string, otherwise return null
StringBuilder sb = new StringBuilder();
File f = new File(path);
try{
BufferedReader br = new BufferedReader(new FileReader(new File(path)));
try{
String line = null;
while(true){
line = br.readLine();
if(line == null){break;}
sb.append(line+SPLITER);
}
return sb.toString();
}finally{
br.close();
}
}catch(IOException ioe){
System.out.println("IOException when openning file " + path);
return null;
}
}
private static List<File> extracDir(String path){
File f = new File(path);
if(!f.exists()){System.out.println(path + " doesn't exist!");}
if(f.isFile()){return Arrays.asList(f);}
if(f.isDirectory()){
List<File> list = new ArrayList<File>();
File[] files = f.listFiles();
for(File file : files){
list.addAll(extracDir(file.getAbsolutePath()));
}
return list;
}
return new ArrayList<File>();
}
public static void grep(String regex, String path, int flag){
int index=0;
List<File> list = extracDir(path);
for(File file : list){
System.out.println("\n" + ">>> " + file.getAbsolutePath());
String content = readFile(file.getAbsolutePath());
if(content == null){System.out.println("Check your file path: " + path);return;}
Pattern p = Pattern.compile(regex, flag);
Matcher m = p.matcher("");
Formatter f = new Formatter(System.out);
f.format("%1$5.5s %2$-15.15s %3$5.5s \n", "INDEX", "WORD", "POS");
for(String line : content.split(SPLITER)){
m.reset(line);
while(m.find()){
f.format("%1$5d: %2$-15.15s %3$5d \n",++index, m.group(2), m.start());
}
}
}
}
private static void unitTestExtracDir(){
String wrongPath = "/Users/Wei/java/helloKitty.java";
String filePath = "/Users/Wei/java/com/ciaoshen/thinkinjava/chapter13/Exercise16.java";
String dirPath= "/Users/Wei/java/com/ciaoshen/thinkinjava/chapter13";
System.out.println(extracDir(dirPath));
System.out.println("=================================");
System.out.println(extracDir(filePath));
System.out.println("=================================");
System.out.println(extracDir(wrongPath));
}
private static void unitTestReadFile(){ //test readFile()
String rightPath = "/Users/Wei/java/com/ciaoshen/thinkinjava/chapter13/Exercise15.java";
String wrongPath = "/Users/Wei/java/com/ciaoshen/thinkinjava/chapter13/Exercise.java";
System.out.println(readFile(rightPath));
System.out.println(readFile(wrongPath));
}
private static void unitTestGrep(String regex){ //test grep()
String dirPath = "/Users/Wei/java/com/ciaoshen/thinkinjava/chapter13";
String filePath = "/Users/Wei/java/com/ciaoshen/thinkinjava/chapter13/Exercise15.java";
String wrongPath = "/Users/Wei/java/com/ciaoshen/thinkinjava/chapter13/Exercise.java";
int flag = Pattern.CASE_INSENSITIVE;
grep(regex, dirPath, flag);
System.out.println("=================================");
grep(regex, filePath, flag);
System.out.println("=================================");
grep(regex, wrongPath, flag);
}
public static void main(String[] args){
//单元测试
//Exercise16.unitTestReadFile();
//Exercise16.unitTestExtracDir();
//Exercise16.unitTestGrep("s\\w*");
Exercise16.unitTestGrep("(?)(^|\\W)(s\\w*)(\\W|$)");
}
}
不可能写出十全十美的能找出所有注释的正则表达式。我只能尽我所能,有条理地匹配几种最常见的情况。仅适用于遵守Google推荐编程风格的代码。
package com.ciaoshen.thinkinjava.chapter13;
import java.util.*;
import java.util.regex.*;
import java.io.*;
public class Exercise17 {
private static final String SPLITER = "\n";
private static boolean isInFormalAnnotation = false;
// open a file, return a string, return null otherwise
public static String readFile(String path) {
File inFile = new File(path);
if( !inFile.exists() || !inFile.isFile() ) {
System.out.println("Path ERROR! Check your path " + path);
return null;
}
StringBuilder resultString = new StringBuilder();
try {
BufferedReader buffReader = new BufferedReader( new FileReader( inFile) );
try {
String textLine = new String("");
while (true) {
textLine = buffReader.readLine();
if (textLine == null) {
break;
}
resultString.append(textLine + SPLITER);
}
} finally {
buffReader.close();
}
} catch (IOException ioe) {
System.out.println( "ERROR when reading the file " + inFile.getName() );
}
return resultString.toString();
}
// print all annotation
public static void scanAnnotation(String path){
String content = readFile(path);
if (content == null || content.isEmpty()) {
System.out.println("Method scan() cannot read file " + path);
return;
}
String simpleAnnotation = new String("");
String formalAnnotation = new String("");
for (String line : content.split(SPLITER)) {
//main procedure
simpleAnnotation = getSimpleAnnotation(line);
if(simpleAnnotation != null) {
System.out.println(simpleAnnotation);
}
formalAnnotation = getFormalAnnotation(line);
if(formalAnnotation != null) {
System.out.println(formalAnnotation);
}
}
}
// box-1.
// input: string line,
// output: return simple annotation. otherwise return null
private static String getSimpleAnnotation(String line) {
String simpleAnnotationRegex = "\\s//";
Matcher simpleAnnotationMatcher = Pattern.compile(simpleAnnotationRegex).matcher(line);
while (simpleAnnotationMatcher.find()) {
if (!isInStringQuote(line.substring(0, simpleAnnotationMatcher.start()))) {
return line.substring(simpleAnnotationMatcher.start());
}
}
return null;
}
// box-2.
// input: string line,
// output: return formal annotation. otherwise return null
private static String getFormalAnnotation(String line) {
String formalBeginRegex = "/\\*";
String formalEndRegex = "\\*/";
if (!isInFormalAnnotation) {
Matcher formalBeginMatcher = Pattern.compile(formalBeginRegex).matcher(line);
while (formalBeginMatcher.find()) {
if (!isInStringQuote(line.substring(0,formalBeginMatcher.start()))) {
isInFormalAnnotation = true;
String subLine = line.substring(formalBeginMatcher.start());
Matcher formalEndMatcher = Pattern.compile(formalEndRegex).matcher(subLine);
if (formalEndMatcher.find()) {
isInFormalAnnotation = false;
return subLine.substring(0,formalEndMatcher.end());
}
return subLine;
}
}
} else {
Matcher formalEndMatcher = Pattern.compile(formalEndRegex).matcher(line);
if (formalEndMatcher.find()) {
isInFormalAnnotation = false;
}
return line;
}
return null;
}
/*
* box-3
* input: prefix of annotation line
* output: boolean. if this prefix text is in string quote.
*/
private static boolean isInStringQuote(String prefix) {
String doubleQuoteRegex = "[^\\\\]\".*?[^\\\\]\"|\"\"";
Matcher doubleQuoteMatcher = Pattern.compile(doubleQuoteRegex).matcher(prefix);
if(! doubleQuoteMatcher.find()) {
String singleQuoteRegex = "[^\\\\]\"";
Matcher singleQuoteMatcher = Pattern.compile(singleQuoteRegex).matcher(prefix);
return singleQuoteMatcher.find();
} else {
return isInStringQuote(prefix.replaceFirst(doubleQuoteRegex, ""));
}
}
private static void testUnitScanAnnotation() {
String wrongPath = "/Users/helloKitty/java/com/ciaoshen/thinkinjava/chapter13/Exercise17.java";
String rightPath= "/Users/Wei/java/com/ciaoshen/thinkinjava/chapter13/Exercise17.java";
Exercise17.scanAnnotation(rightPath);
Exercise17.scanAnnotation(wrongPath);
}
private static void testIsInStringQuote(String prefix) {
System.out.println(isInStringQuote(prefix));
}
public static void main(String[] args) {
String testPatternString = "a" + "b //假注释" + "c" + "d"; /*给注释匹配模式出个难题*/
String testPatternString2 = "\"a\" + \"b //假注释";
Exercise17.testUnitScanAnnotation();
//Exercise17.testIsInStringQuote(testPatternString);
//Exercise17.testIsInStringQuote(testPatternString2);
}
}
利用了第17题的部分代码。因为如果引号里的字符串在注释里,就不算字符串。所以本题,先利用17题把注释全部过滤掉,再开始匹配字符串。所以代码被设计成两层独立的过滤器,一个过滤注释,一个过滤字符串。后面的19题也会用到17,18题的两个过滤器把可能的干扰项排除干净。
package com.ciaoshen.thinkinjava.chapter13;
import java.util.*;
import java.util.regex.*;
import java.io.*;
public class Exercise18 {
private static final String SPLITER = "\n";
private static boolean isInFormalAnnotation = false;
// open a file, return a string, return null otherwise
public static String readFile(String path) {
File inFile = new File(path);
if( !inFile.exists() || !inFile.isFile() ) {
System.out.println("Path ERROR! Check your path " + path);
return null;
}
StringBuilder resultString = new StringBuilder();
try {
BufferedReader buffReader = new BufferedReader( new FileReader( inFile) );
try {
String textLine = new String("");
while (true) {
textLine = buffReader.readLine();
if (textLine == null) {
break;
}
resultString.append(textLine + SPLITER);
}
} finally {
buffReader.close();
}
} catch (IOException ioe) {
System.out.println( "ERROR when reading the file " + inFile.getName() );
}
return resultString.toString();
}
// erase all annotation
public static String eraseAnnotation(String path){
StringBuilder resultBuilder = new StringBuilder();
String content = readFile(path);
if (content == null || content.isEmpty()) {
System.out.println("Method scan() cannot read file " + path);
return null;
}
String noSimpleAnnotation = new String("");
String noFormalAnnotation = new String("");
for (String line : content.split(SPLITER)) {
//main procedure
noSimpleAnnotation = eraseSimpleAnnotation(line);
noFormalAnnotation= eraseFormalAnnotation(noSimpleAnnotation);
if (noFormalAnnotation != null) {
System.out.println(noFormalAnnotation);
resultBuilder.append(noFormalAnnotation + SPLITER);
}
}
return resultBuilder.toString();
}
// box-1.
// input: string line,
// output: erase simple annotation.
private static String eraseSimpleAnnotation(String line) {
String simpleAnnotationRegex = "\\s//";
Matcher simpleAnnotationMatcher = Pattern.compile(simpleAnnotationRegex).matcher(line);
while (simpleAnnotationMatcher.find()) {
if (!isInStringQuote(line.substring(0, simpleAnnotationMatcher.start()))) {
return line.substring(0, simpleAnnotationMatcher.start());
}
}
return line;
}
// box-2.
// input: string line,
// output: erase the formal annotation.
private static String eraseFormalAnnotation(String line) {
String formalBeginRegex = "/\\*";
String formalEndRegex = "\\*/";
if (!isInFormalAnnotation) {
Matcher formalBeginMatcher = Pattern.compile(formalBeginRegex).matcher(line);
while (formalBeginMatcher.find()) {
if (!isInStringQuote(line.substring(0,formalBeginMatcher.start()))) {
isInFormalAnnotation = true;
String subLine = line.substring(formalBeginMatcher.start());
Matcher formalEndMatcher = Pattern.compile(formalEndRegex).matcher(subLine);
if (formalEndMatcher.find()) {
isInFormalAnnotation = false;
return line.replace(subLine.substring(0,formalEndMatcher.end()),"");
}
return line.replace(subLine,"");
}
}
} else {
Matcher formalEndMatcher = Pattern.compile(formalEndRegex).matcher(line);
if (formalEndMatcher.find()) {
isInFormalAnnotation = false;
}
return null;
}
return line;
}
/*
* box-3
* input: prefix of annotation line
* output: boolean. if this prefix text is in string quote.
*/
private static boolean isInStringQuote(String prefix) {
String doubleQuoteRegex = "[^\\\\]\".*?[^\\\\]\"";
Matcher doubleQuoteMatcher = Pattern.compile(doubleQuoteRegex).matcher(prefix);
if(! doubleQuoteMatcher.find()) {
String singleQuoteRegex = "[^\\\\]\"";
Matcher singleQuoteMatcher = Pattern.compile(singleQuoteRegex).matcher(prefix);
return singleQuoteMatcher.find();
} else {
return isInStringQuote(prefix.replaceFirst(doubleQuoteRegex, ""));
}
}
// box-4
// input: entire text file
// output: erase all literal string in double quote
private static String filterLiteralString(String content) {
StringBuilder resultBuilder = new StringBuilder();
for (String line : content.split(SPLITER)) {
resultBuilder.append(eraseLiteralString(line) + SPLITER);
}
return resultBuilder.toString();
}
// box-5
// input: line string
// output: erase literal string in double quote
private static String eraseLiteralString(String line) {
String doubleQuoteRegex = "[^\\\\]\".*?[^\\\\]\"|\"\"";
return line.replaceAll(doubleQuoteRegex,"");
}
private static void start(String path) {
String noAnnotation = eraseAnnotation(path);
String noLiteral = filterLiteralString(noAnnotation);
System.out.println(noLiteral);
}
private static void testUnitEraseAnnotation() {
String wrongPath = "/Users/helloKitty/java/com/ciaoshen/thinkinjava/chapter13/Exercise18.java";
String rightPath = "/Users/Wei/java/com/ciaoshen/thinkinjava/chapter13/Exercise18.java";
Exercise18.eraseAnnotation(rightPath);
Exercise18.eraseAnnotation(wrongPath);
}
private static void testIsInStringQuote(String prefix) {
System.out.println(isInStringQuote(prefix));
}
private static void testUnitEraseLiteralString(String phrase) {
System.out.println(eraseLiteralString(phrase));
}
private static void testUnitFilterLiteralString(String content) {
System.out.println(filterLiteralString(content));
}
private static void testUnitStart() {
String wrongPath = "/Users/helloKitty/java/com/ciaoshen/thinkinjava/chapter13/Exercise18.java";
String rightPath= "/Users/Wei/java/com/ciaoshen/thinkinjava/chapter13/Exercise18.java";
Exercise18.start(rightPath);
Exercise18.start(wrongPath);
}
public static void main(String[] args) {
String testPatternString = "a" + "b //假注释" + "c" + "d"; /*给注释匹配模式出个难题*/
String testPatternString2 = "\"a\" + \"b //假注释";
//Exercise18.testUnitEraseAnnotation();
//Exercise18.testIsInStringQuote(testPatternString);
//Exercise18.testIsInStringQuote(testPatternString2);
//Exercise18.testUnitEraseLiteralString(testPatternString);
//Exercise18.testUnitEraseLiteralString(testPatternString2);
Exercise18.testUnitStart();
}
}
基于17,18两题的成果,先把干扰项comments和literal清除干净。剩下的根据Google Java推荐编程风格,大驼峰ClassName的都是类名,只要找大写字母开头的单词就可以了。
package com.ciaoshen.thinkinjava.chapter13;
import java.util.*;
import java.util.regex.*;
import java.io.*;
public class Exercise19 {
private static final String SPLITER = "\n";
private static boolean isInFormalAnnotation = false;
// open a file, return a string, return null otherwise
public static String readFile(String path) {
File inFile = new File(path);
if( !inFile.exists() || !inFile.isFile() ) {
System.out.println("Path ERROR! Check your path " + path);
return null;
}
StringBuilder resultString = new StringBuilder();
try {
BufferedReader buffReader = new BufferedReader( new FileReader( inFile) );
try {
String textLine = new String("");
while (true) {
textLine = buffReader.readLine();
if (textLine == null) {
break;
}
resultString.append(textLine + SPLITER);
}
} finally {
buffReader.close();
}
} catch (IOException ioe) {
System.out.println( "ERROR when reading the file " + inFile.getName() );
}
return resultString.toString();
}
// erase all annotation
public static String filterAnnotation(String path){
StringBuilder resultBuilder = new StringBuilder();
String content = readFile(path);
if (content == null || content.isEmpty()) {
System.out.println("Method scan() cannot read file " + path);
return null;
}
String noSimpleAnnotation = new String("");
String noFormalAnnotation = new String("");
for (String line : content.split(SPLITER)) {
//main procedure
noSimpleAnnotation = eraseSimpleAnnotation(line);
noFormalAnnotation= eraseFormalAnnotation(noSimpleAnnotation);
if (noFormalAnnotation != null) {
//System.out.println(noFormalAnnotation);
resultBuilder.append(noFormalAnnotation + SPLITER);
}
}
return resultBuilder.toString();
}
// box-1.
// input: string line,
// output: erase simple annotation.
private static String eraseSimpleAnnotation(String line) {
String simpleAnnotationRegex = "\\s//";
Matcher simpleAnnotationMatcher = Pattern.compile(simpleAnnotationRegex).matcher(line);
while (simpleAnnotationMatcher.find()) {
if (!isInStringQuote(line.substring(0, simpleAnnotationMatcher.start()))) {
return line.substring(0, simpleAnnotationMatcher.start());
}
}
return line;
}
// box-2.
// input: string line,
// output: erase the formal annotation.
private static String eraseFormalAnnotation(String line) {
String formalBeginRegex = "/\\*";
String formalEndRegex = "\\*/";
if (!isInFormalAnnotation) {
Matcher formalBeginMatcher = Pattern.compile(formalBeginRegex).matcher(line);
while (formalBeginMatcher.find()) {
if (!isInStringQuote(line.substring(0,formalBeginMatcher.start()))) {
isInFormalAnnotation = true;
String subLine = line.substring(formalBeginMatcher.start());
Matcher formalEndMatcher = Pattern.compile(formalEndRegex).matcher(subLine);
if (formalEndMatcher.find()) {
isInFormalAnnotation = false;
return line.replace(subLine.substring(0,formalEndMatcher.end()),"");
}
return line.replace(subLine,"");
}
}
} else {
Matcher formalEndMatcher = Pattern.compile(formalEndRegex).matcher(line);
if (formalEndMatcher.find()) {
isInFormalAnnotation = false;
}
return null;
}
return line;
}
/*
* box-3
* input: prefix of annotation line
* output: boolean. if this prefix text is in string quote.
*/
private static boolean isInStringQuote(String prefix) {
String doubleQuoteRegex = "(^|\\s*)\".*?[^\\\\]\"|\"\"";
Matcher doubleQuoteMatcher = Pattern.compile(doubleQuoteRegex).matcher(prefix);
if(! doubleQuoteMatcher.find()) {
String singleQuoteRegex = "[^\\\\]\"";
Matcher singleQuoteMatcher = Pattern.compile(singleQuoteRegex).matcher(prefix);
return singleQuoteMatcher.find();
} else {
return isInStringQuote(prefix.replaceFirst(doubleQuoteRegex, ""));
}
}
// box-4
// input: entire text file
// output: erase all literal string in double quote
private static String filterLiteralString(String content) {
if (content == null || content.isEmpty()) {
System.out.println("filterLiteralString() get null content!");
return null;
}
StringBuilder resultBuilder = new StringBuilder();
for (String line : content.split(SPLITER)) {
resultBuilder.append(eraseLiteralString(line) + SPLITER);
}
return resultBuilder.toString();
}
// box-5
// input: line string
// output: erase literal string in double quote
private static String eraseLiteralString(String line) {
String doubleQuoteRegex = "(^|\\s*)\".*?[^\\\\]\"|\"\"";
Matcher doubleQuoteMatcher = Pattern.compile(doubleQuoteRegex).matcher(line);
while (doubleQuoteMatcher.find()) {
line = line.replace(doubleQuoteMatcher.group(),"");
}
return line;
}
private static String filterEmptyLine(String content) { //删除空行
if (content == null || content.isEmpty()) {
System.out.println("filterEmptyLine() get null content!");
return null;
}
String emptyRegex = "(?m)^\\s*$";
StringBuilder resultBuilder = new StringBuilder();
Matcher emptyMatcher = Pattern.compile(emptyRegex).matcher("");
for (String line : content.split(SPLITER)) {
emptyMatcher = emptyMatcher.reset(line);
if (!emptyMatcher.find()) {
resultBuilder.append(line + SPLITER);
}
}
return resultBuilder.toString();
}
private static Set<String> segmentWords(String content) {
if (content == null || content.isEmpty()) {
System.out.println("segmentWords() get null content!");
return null;
}
Set<String> wordsFound = new HashSet<String>();
String wordsRegex = "\\W([A-Z]\\w*)";
Matcher wordsMatcher = Pattern.compile(wordsRegex).matcher("");
for (String line : content.split(SPLITER)) {
wordsMatcher = wordsMatcher.reset(line);
while (wordsMatcher.find()) {
wordsFound.add(wordsMatcher.group(1));
}
}
return wordsFound;
}
private static void start(String path) {
String noAnnotation = filterAnnotation(path);
String noLiteral = filterLiteralString(noAnnotation);
String meaningfulCode = filterEmptyLine(noLiteral);
Set<String> className = segmentWords(meaningfulCode);
System.out.println(className);
}
private static void testUnitFilterAnnotation() {
String wrongPath = "/Users/helloKitty/java/com/ciaoshen/thinkinjava/chapter13/Exercise19.java";
String rightPath = "/Users/Wei/java/com/ciaoshen/thinkinjava/chapter13/Exercise19.java";
Exercise19.filterAnnotation(rightPath);
Exercise19.filterAnnotation(wrongPath);
}
private static void testIsInStringQuote(String prefix) {
System.out.println(isInStringQuote(prefix));
}
private static void testUnitEraseLiteralString(String phrase) {
System.out.println(eraseLiteralString(phrase));
}
private static void testUnitFilterLiteralString(String content) {
System.out.println(filterLiteralString(content));
}
private static void testUnitStart() {
String wrongPath = "/Users/helloKitty/java/com/ciaoshen/thinkinjava/chapter13/Exercise19.java";
String rightPath= "/Users/Wei/java/com/ciaoshen/thinkinjava/chapter13/Exercise19.java";
Exercise19.start(rightPath);
Exercise19.start(wrongPath);
}
private static void testUnitDoubleQuotePattern(String regex) {
String path = "/Users/Wei/java/com/ciaoshen/thinkinjava/chapter13/quote.txt";
Matcher doubleQuoteMatcher = Pattern.compile(regex).matcher("");
String content = readFile(path);
for(String line : content.split(SPLITER)) {
doubleQuoteMatcher = doubleQuoteMatcher.reset(line);
System.out.println("FOUND");
while (doubleQuoteMatcher.find()) {
System.out.println(" >>>" + doubleQuoteMatcher.group());
}
}
}
private static void testUnitDoubleQuoteReplace(String regex) {
String path = "/Users/Wei/java/com/ciaoshen/thinkinjava/chapter13/quote.txt";
Matcher doubleQuoteMatcher = Pattern.compile(regex).matcher("");
String content = readFile(path);
for(String line : content.split(SPLITER)) {
doubleQuoteMatcher = doubleQuoteMatcher.reset(line);
System.out.println("FOUND");
while (doubleQuoteMatcher.find()) {
System.out.println(" >>>" + doubleQuoteMatcher.group());
line = line.replace(doubleQuoteMatcher.group(),"");
}
System.out.println(line);
}
}
public static void main(String[] args) {
String testPatternString = "a" + "b //假注释" + "c" + "d"; /*给注释匹配模式出个难题*/
String testPatternString2 = "\"a\" + \"b //假注释";
//Exercise19.testUnitFilterAnnotation();
//Exercise19.testIsInStringQuote(testPatternString);
//Exercise19.testIsInStringQuote(testPatternString2);
//Exercise19.testUnitEraseLiteralString(testPatternString);
//Exercise19.testUnitEraseLiteralString(testPatternString2);
Exercise19.testUnitStart();
String doubleQuoteRegex1 = "\".*\"";
String doubleQuoteRegex2 = "\".*?\"";
String doubleQuoteRegex3 = "[^\\\\]\".*?[^\\\\]\"";
String doubleQuoteRegex4 = "[^\\\\]\".*?[^\\\\]\"|^\".*?[^\\\\]\"|\"\"";
String doubleQuoteRegex5 = "(^|\\s*)\".*?[^\\\\]\"|\"\""; //这个最好,不受转义符影响
//Exercise19.testUnitDoubleQuotePattern(doubleQuoteRegex4);
//Exercise19.testUnitDoubleQuoteReplace(doubleQuoteRegex5);
}
}