Here is my latest regular expression with Perl. I think it matches the spirit of the request in the original post.
Note, this isn't as easy as you think. You need to code up the complete set of html rules in you implementation. You need to allow for a certain amount of mal-formed html.
perl -0660pe 's^<[sS][pP][aA][nN]\s+class="B01-K-ITAL"\s*>(.*?)</[sS][pP][aA][nN]>^<em>$1</em>^gs' i.html >|o.html
input text
<html>
<head>...</head>
<body>
I'd like to be able to change something like this:
<span class="B01-K-ITAL">#1 one line</span>
I want to replace the open and close tags without changing or interfering with the text between the tags. I'm pretty sure I should use wildcards, but I can't figure out how to use them properly.
<p>note, this isn't as easy as you think. You need to code up the complete set of html rules in you implementation. You need to allow for a certain amount of mal-formed html.</p>
<span class="B01-K-ITAL">#2 don't be greedy</span>
<span class="B01-K-ITAL">$3
multiline text
</span>
<span
class="B01-K-ITAL">#4
multiline tag. I believe html allow a carriage return in white space of tags
</span>
<span
class="B01-K-ITAL"
>#5
split after the class tag. optional white space
</span>
<sPan class="B01-K-ITAL">#6 mixed case tag</Span>
<p>no text #7</p>
<span class="B01-K-ITAL"></span>
<!-- Apparently, this is valid
http://www.positioniseverything.net/articles/cc-plus.html -->
<!--[if IE]>
<div id="IEroot">
<![endif]-->
<p id="IE">This browser is IE.</p>
<p id="notIE">This browser is not IE.</p>
<!--[if IE]>
</div>
<![endif]-->
</body> </html>
output text
<html>
<head>...</head>
<body>
I'd like to be able to change something like this:
<em>#1 one line</em>
I want to replace the open and close tags without changing or interfering with the text between the tags. I'm pretty sure I should use wildcards, but I can't figure out how to use them properly.
<p>note, this isn't as easy as you think. You need to code up the complete set of html rules in you implementation. You need to allow for a certain amount of mal-formed html.</p>
<em>#2 don't be greedy</em>
<em>$3
multiline text
</em>
<em>#4
multiline tag. I believe html allow a carriage return in white space of tags
</em>
<em>#5
split after the class tag. optional white space
</em>
<em>#6 mixed case tag</em>
<p>no text #7</p>
<em></em>
<!-- Apparently, this is valid
http://www.positioniseverything.net/articles/cc-plus.html -->
<!--[if IE]>
<div id="IEroot">
<![endif]-->
<p id="IE">This browser is IE.</p>
<p id="notIE">This browser is not IE.</p>
<!--[if IE]>
</div>
<![endif]-->
</body> </html>