I am trying to remove all HTML tags from a document using the Perl5Util class’ substitute() method.
//doc is my HTML document
String new = “”:
Perl5Util perlUtil = new Perl5Util();
new = perlUtil.substitute(“s/<.*>//”, doc);
The problem here, and I can’t get around it, is that the regular expression removes everything from the document because the algorithm looks at the first ‘<' in and the last ‘>’ in and removes everything between these characters. I want itto look at individual tags first. Can this be done?