A community in which webmasters can ask for help with topics such as PHP coding , MySQL , IT jobs, web design, IT security.
Current location:homephp forumphp talk in 2008 yearGetting innertext of HTML tags using Regular Expressions - page 1
User InfoPosts
Getting innertext of HTML tags using Regular Expressions#1
I'm having trouble capturing this data:

<td><span class="bodytext"><b>Contact:</b><b></b></span><span style='font-size:10.0pt;font-family:Verdana;
mso-bidi-font-family:Arial'><b> </b>
<span class="bodytext">John Doe</span>
<td><span class="bodytext">PO Box 2112</span></td>
<td><span class="bodytext"></span></td>


<td><span class="bodytext"></span></td>

<td><span class="bodytext">JOHAN</span> NSW 9700</td>
02 9999 9999

Basically, I want to grab everything after "Contact:" and before "Phone:" minus the HTML; however these two designations may not always exist so I need to really grab everything between the two colons (:) that isn't located inside a HTML tag.
The number of <span class="bodytext">***data***</span> may actually vary so I need some sort of loop for matching these.

I prefer to use regular expressions as I could probably do this using loops and string matches.

Also, I'd like to know the syntax for non-matching groups in PHP regex.

Any help would be greatly appreciated!

posted date: 2008-12-17 18:28:00

Re: Getting innertext of HTML tags using Regular Expressions#2
I had made out the solution of this problem. click to view my topic...

hope that hepls.

posted date: 2008-12-17 18:28:01

Re: Getting innertext of HTML tags using Regular Expressions#3
If I understand you correctly, you're only interested in the text between the HTML tags. To ignore the HTML tags, simply strip them first:$text = preg_replace('/<[^<>]+>/', '', $html);To grab everything between "Contact:" and "Phone:", use:if (preg_match('/Contact:(.*?)Phone:/s', $text, $regs)) { $result = $regs[1];} else { $result = "";}To grab everything between two colons, use:if (preg_match('/:([^:]*):/', $text, $regs)) { $result = $regs[1];} else { $result = "";}

posted date: 2008-12-17 18:38:00

Re: Getting innertext of HTML tags using Regular Expressions#4
The seemingly arbitrary stack overflow response to these sort of questions seems to be "omg don't use regexes! Use Beautiful Soup instead!!". Personally I prefer not having to use external libraries for small tasks like this, and regexes are a good alternative.A simple way to strip out all the HTML tags, which is one way to tackle this, is to use this regex:$text = preg_replace("/<.*?>/", "", $text);then you can use whatever method you like to grab the appropriate text content.Non matching groups are like this: (?:this won't match)

posted date: 2008-12-17 18:39:00

Re: Getting innertext of HTML tags using Regular Expressions#5
(?this won't match) is a syntax error

posted date: 2008-12-17 18:48:00

Re: Getting innertext of HTML tags using Regular Expressions#6
So what is it? RegexBuddy gave me (?:this won't match) as PERL regex but there was no PHP option couldn't be sure...

posted date: 2008-12-17 18:58:00

Re: Getting innertext of HTML tags using Regular Expressions#7
PHP's preg functions use the PCRE flavor, which is an option in RegexBuddy. nickf's answer missed the : before he edited it.

posted date: 2008-12-18 00:09:00

Re: Getting innertext of HTML tags using Regular Expressions#8
I believe you (and the OP) mean "non-capturing groups", instead "non-matching groups". A non-matching group would be something like this: "(X(?<!X))". ;-)

posted date: 2008-12-18 00:17:00

Re: Getting innertext of HTML tags using Regular Expressions#9
Sounds like screenscraping, or you could use strip_tags() as well after finding the info you wanted.

posted date: 2009-10-05 05:33:00

select page: « 1 »
Copyright ©2008-2017 www.momige.com, all rights reserved.