What does this Regular Expression do#1
$pee = preg_replace( '|<p>|', "$1<p>", $pee );

This regular expression is from the Wordpress source code (formatting.php, wpautop function); I'm not sure what it does, can anyone help?

Actually I'm trying to port this function to Python...if anyone knows of an existing port already, that would be much better as I'm really bad with regex.

posted date: 2008-12-08 08:08:00

Re: What does this Regular Expression do#2
posted date: 2008-12-08 08:08:01

Re: What does this Regular Expression do#3
...?Actually, it looks like this takes the first <p> tag and prepends the previous regular expression's first match to it (since there's no match in this one),However, it seems that this behavior is bad to say the least, as there's no guarantee that preg_* functions won't clobber $1 with their own values.Edit: Judging from Jay's comment, this regex actually does nothing.

posted date: 2008-12-08 08:15:00

Re: What does this Regular Expression do#4
It replace the match from the pattern "|<p>|"by the string "$1<p>"The | in the replacement pattern is causes the regex engine to match either the part on the left side, or the part on the right side. I do not get why it's used that way because usually it's for something like "ta(b|p)e"...For the $1, I guess the variable $1 is in the PHP code and it replaced during the preg_replace so if $1 = "test"; the replacement will replace the "<p>"to "test<p>"But I am not sure of it for the $1

posted date: 2008-12-08 08:16:00

Re: What does this Regular Expression do#5
The preg_replace() function - somewhat confusingly - allows you to use other delimiters besides the standard "/" for regular expressions, so"|<p>|"Would be a regular expression just matching "<p>"in the text. However, I'm not clear on what the replacement parameter of "$1<p>"would be doing, since there's no grouping to map to $1. It would seem like as given, this is just replacing a paragraph tag with an empty string followed by a paragraph tag, and in effect doing nothing.Anyone with more in-depth knowledge of PHP quirks have a better analysis?

posted date: 2008-12-08 08:31:00

Re: What does this Regular Expression do#6
posted date: 2008-12-08 08:34:00

Re: What does this Regular Expression do#7
They do indeed. In fact, there's a comment in the code: // don't pee all over a tag

posted date: 2008-12-08 08:37:00

Re: What does this Regular Expression do#8
That probably won't help with this particular question, since the code in question isn't a standard regex issue. The delimiters are non-standard and the backreference doesn't actually come from within the pattern, so RegexBuddy probably won't be able to decipher this either.

posted date: 2008-12-08 08:39:00

Re: What does this Regular Expression do#9
I don't think preg_replace will carry the backreference for $1 forward into the next invocation of preg_replace(). I tried a quick test and it doesn't seem to work that way. You could still be right, but if so it certainly would be terrible practice!

posted date: 2008-12-08 08:48:00

Re: What does this Regular Expression do#10
$1 would be an illegal variable name so it can't be being set in the code. It has to be a backreference from the regular expression in preg_replace(), except there aren't any groups in the regex, so it should be just an empty string.

posted date: 2008-12-08 08:55:00

Re: What does this Regular Expression do#11
Although I agree with you, RegexBuddy has options that shows the differences of the Regex's implementations in several languages, which might be handy for him, since he is trying to port it from php to python.

posted date: 2008-12-08 09:02:00

Re: What does this Regular Expression do#12
The pipe symbols | in this case do not have the default meaning of "match this or that" but are use as alternative delimiters for the pattern instead of the more common slashes /. This may make sense, if you want to match for / without having to escape those appearances (e.g. /(.\*)\/(.\*)\// is not as readable as #/(.\*)/(.\*)/#). Seems quite contra productive to use | instead which is just another reserved char for patterns, though.Normally $1 in the replacement pattern should match the first group denoted by parentheses. E.g if you've got a pattern like"(.*)<p>"$0 would contain the whole match and $1 the part before the &lt;p&gt;.As the given reg-ex does not declare any groups and $1 is not a valid name for a variable (in PHP4) defined elsewhere, this call seems to replace any occurrences of &lt;p&gt; with &lt;p&gt;? To be honest, now I'm also quite confused. Just a guess: gets another pattern-matching method (preg_match and the like) called before the given line so the $1 is "leaked" from there?

posted date: 2008-12-08 09:05:00

