A community in which webmasters can ask for help with topics such as PHP coding , MySQL , IT jobs, web design, IT security.
Current location:homephp forumphp talk in 2008 yearGoing where PHP parse_url() doesn't - Parsing only the domain - page 1
User InfoPosts
Going where PHP parse_url() doesn't - Parsing only the domain#1
PHP's parse_url() has a host field, which includes the full host. I'm looking for the most reliable (and least costly) way to only return the domain and TLD.

Given the examples:


http://www.google.com/foo, parse_url() returns www.google.com for host
http://www.google.co.uk/foo, parse_url() returns www.google.co.uk for host


I am looking for only google.com or google.co.uk. I have contemplated a table of valid TLD's/suffixes and only allowing those and one word. Would you do it any other way? Does anyone know of a pre-canned valid REGEX for this sort of thing?

posted date: 2008-12-29 16:51:00


Re: Going where PHP parse_url() doesn't - Parsing only the domain#2
I had made out the solution of this problem. click to view my topic...

hope that hepls.

posted date: 2008-12-29 16:51:01


Re: Going where PHP parse_url() doesn't - Parsing only the domain#3
Dug this up from a related post, for the idea of keeping a table: http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/src/effective_tld_names.dat?raw=1I'd rather not do that though.

posted date: 2008-12-29 17:03:00


Re: Going where PHP parse_url() doesn't - Parsing only the domain#4
Of course it depends on your specific use case, but generally speaking I would not use a table lookup for TLDs. New TLDs come out and you usually don't want to maintain them anywhere. Just ask me how often my firstname@lastname.name has been rejected because of shortsightedness.I guess I could help better if I knew why you not want the www? Do you need it for emails? You can query for MX records in such cases to verify it (eventually) accepts mails.You may also find help with PHP functions dealing with DNS records to find out more information about them, see http://php.net/dns_get_record for example.

posted date: 2008-12-29 17:06:00


Re: Going where PHP parse_url() doesn't - Parsing only the domain#5
I'm looking to use it make a blacklist of spammer domains and prevent people from using wildcard DNS to get around it. It's more for blog or comment spam than email.

posted date: 2008-12-29 17:08:00


Re: Going where PHP parse_url() doesn't - Parsing only the domain#6
You've made a judgement up front that I'm not sure will hold well enough, that is you can tell what portion of a host is the domain that is of interest, is it really the TLD?

posted date: 2008-12-29 17:29:00


Re: Going where PHP parse_url() doesn't - Parsing only the domain#7
For instance just about any dyndns domain name would seem to be blocked if you only look at the standard domain name. To stop spam from the domain of www.mysite.isa-geek.org, or just mysite.isa-geek.org would you care if you blocked all of isa-geek.org?

posted date: 2008-12-29 17:30:00


Re: Going where PHP parse_url() doesn't - Parsing only the domain#8
Just a proof, assuming the allowed tlds are memorized into an hash.The code can be shortened a lot.<?php $urlCompoments=parse_url($theUrl); $chunk=explode('.',$urlComponents['host']); $tldIndex = count($chunk-1); // assume last chunk is tld $maxTldLen = 2; // assuming a tld can be in the form .com or .co.uk $cursor=1; $found=false; while(($cursor<=$maxTldLen) or $found) { $tls = implode('.',array_slice($chunk, -$cursor)); $found=isset($tldSuffixesAllowed[$tld]); $cursor++; } if ($found){ $tld=implode('.',array_slice($chunk, -$cursor)); } else { // domain not recognized, do wathever you want }?>

posted date: 2008-12-29 17:39:00


Re: Going where PHP parse_url() doesn't - Parsing only the domain#9
Yes, I would be fine blocking isa-geek.org in this case. I'm most concerned with foo.[suffix] where [suffix] is the tld or combo of standard suffixes . tld (co.uk)

posted date: 2008-12-29 17:43:00


Re: Going where PHP parse_url() doesn't - Parsing only the domain#10
How about something like that?function getDomain($url) { $pieces = parse_url($url); $domain = isset($pieces['host']) ? $pieces['host'] : ''; if (preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i', $domain, $regs)) { return $regs['domain']; } return false;}Will extract the domain name using the classic parse_url and then look for a valid domain without any subdomain (www being a subdomain). Won't work on things like 'localhost'. Will return false if it didn't match anything.// Edit:Try it out with:echo getDomain('http://www.google.com/test.html') . '<br/>';echo getDomain('https://news.google.co.uk/?id=12345') . '<br/>';echo getDomain('http://my.subdomain.google.com/directory1/page.php?id=abc') . '<br/>';echo getDomain('https://testing.multiple.subdomain.google.co.uk/') . '<br/>';echo getDomain('http://nothingelsethan.com') . '<br/>';And it should return:google.comgoogle.co.ukgoogle.comgoogle.co.uknothingelsethan.comOf course, it won't return anything if it doesn't get through parse_url, so make sure it's a well-formed URL.// Addendum:Alnitak is right. The solution presented above will work in most cases but not necessarily all and needs to be maintained to make sure, for example, that their aren't new TLD with .morethan6characters and so on. The only reliable way of extracting the domain is to use a maintained list such as http://publicsuffix.org/. It's more painful at first but easier and more robust on the long-term. You need to make sure you understand the pros and cons of each method and how it fits with your project.

posted date: 2008-12-29 17:51:00


Re: Going where PHP parse_url() doesn't - Parsing only the domain#11
i'm afraid using that list is the only way. there's too much variety in ccTLDs to write solution which will do them all.

posted date: 2008-12-29 18:03:00


Re: Going where PHP parse_url() doesn't - Parsing only the domain#12
Currently the only "right" way to do this is to use a list such as that maintained at http://publicsuffix.org/BTW, this question is also pretty much a duplicate of:http://www.momige.com/399932/can-this-domain-name-regular-expression-be-refactored-furtherhttp://www.momige.com/288810/get-the-subdomain-from-a-urlThere are standardisation efforts at IETF looking at DNS methods of declaring whether a particular node in the DNS tree is used for "public" registrations, but they're in their early stages of development. All of the popular non-IE browsers use the publicsuffix.org list.

posted date: 2008-12-30 12:47:00


select page: « 1 2 »
Copyright ©2008-2017 www.momige.com, all rights reserved.