if I have a string like
<p> </p>
<p></p>
<p class="a"><br /></p>
<p class="b"> </p>
<p>blah blah blah this is some real content</p>
<p> </p>
<p></p>
<p class="a"><br /></p>
how can I turn it into just
<p>blah blah blah this is some real content</p>
needs to pick up nbsps and regular spaces
-
This regex will work against your example:
<p[^>]*>(?:\s+|(?: )+|(?:<br\s*/?>)+)*</p>
-
$result = preg_replace('#<p[^>]*>(\s| ?)*</p>#', '', $input);
This doesn't catch literal nbsp characters in the output, but that's very rare to see.
Since you're dealing with HTML, if this is user-input I might suggest using HTML Purifier, which will also deal with XSS vulnerabilities. The configuration setting you want there to remove empty p tags is %AutoFormat.RemoveEmpty.
-
As the original replier stated, regex isn't the best solution here, what you want is some sort of html stripper.
A function on this site: http://nadeausoftware.com/articles/2007/09/php_tip_how_strip_html_tags_web_page
Should help you out, you just need to use a bit of string manipulation to get the new lines and what not back to the format you want.
0 comments:
Post a Comment