A community in which webmasters can ask for help with topics such as PHP coding , MySQL , IT jobs, web design, IT security.
|Better support for CURL with PHP and Linux||#1|
I'm the developer of twittertrend.net, I was wondering if there was a faster way to get headers of a URL, besides doing curl_multi? I process over 250 URLs a minute, and I need a really fast way to do this from a PHP standpoint. Either a bash script could be used and then output the headers or C appliation, anything that could be faster? I have primarily only programmed in PHP, but I can learn. Currently, CURL_MULTI (with 6 URLs provided at once, does an ok job, but I would prefer something faster?
Ultimately I would like to stick with PHP for any MySQL storing and processing.
posted date: 2008-12-11 14:15:00
|Re: Better support for CURL with PHP and Linux||#3|
The easiest way to get the headers of a URL is with get_headers(). Performance wise I don't think you can beat curl_multi, but try benchmarking it and see. It's hard to tell.
posted date: 2008-12-11 14:24:00
|Re: Better support for CURL with PHP and Linux||#4|
If you don't mind going into really low level stuff, you could send pipelined raw HTTP 1.1 requests using the socket functions.It'd help to know where the bottleneck is in what you're currently using - network, CPU, etc...
posted date: 2008-12-11 14:29:00
|Re: Better support for CURL with PHP and Linux||#5|
re: threading-via-bash-script, it's possible, but unlikely: process creation overhead for such a script will probably kill the speed. If it's that import to you, start up a daemon that does nothing but such resolution, then connect to the daemon locally. Then you can work on making that daemon do so as fast as possible, in C or C++ or whatever.
posted date: 2008-12-11 14:32:00
|Re: Better support for CURL with PHP and Linux||#6|
curl_multi + these options are probably your best bet:
curl_setopt ($ch, CURLOPT_HEADER, 1);curl_setopt ($ch, CURLOPT_NOBODY, 1);curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
The only other option may be to use wget with
and then multi-thread it using C/C++, Java, etc. I'm not convinced that this would be a faster option in the end.
posted date: 2008-12-11 14:36:00
|Re: Better support for CURL with PHP and Linux||#7|
Ok, I will look into the daemon idea first to see if I can get anywhere faster. I might have to end up staying with curl_multi. Another problem is that I cannot get the connect time or anything else from a curl_multi, only the response.
posted date: 2008-12-11 14:38:00
|Re: Better support for CURL with PHP and Linux||#8|
Alright, I figured out the following:get_headers = .0606 sec per URLcURL = .01235 per URLgethostbynamel = .001025 sec per URLWhat I'm going to do is first run gethostbynamel() and then cURL, this should decrease time, because it will resolve a host all the time, and thus cURL will not have to ever get stuck loading a url.Any objections?
posted date: 2008-12-11 17:54:00
|Re: Better support for CURL with PHP and Linux||#9|
I think you need a multi-process batch URL fetching daemon. PHP does not support multithreading, but there's nothing stopping you from spawning multiple PHP daemon processes.Having said that, PHP's lack of a proper garbage collector means that long-running processes can leak memory.Run a daemon which spawns lots of instances (a configurable, but controlled number) of the php program, which will of course have to be capable of reading a work queue, fetching the URLs and writing the results away in a manner which is multi-process safe; multiple procs shouldn't end up trying to do the same work.You'll want all of this to run autonomously as a daemon rather than from a web server. Really.
posted date: 2008-12-11 21:27:00
|Re: Better support for CURL with PHP and Linux||#10|
alright I will work on this! :) thanks so much
posted date: 2008-12-12 18:11:00
|Re: Better support for CURL with PHP and Linux||#11|
I recently wrote a blog post on how to speed up curl_multi. Basically I process each request as soon as it finishes and use a queue to keep a large number of requests going at once. I've had good success with this technique and am using it to process ~6000 RSS feeds a minute. I hope this helps!http://onlineaspect.com/2009/01/26/how-to-use-curl_multi-without-blocking/
posted date: 2009-01-26 17:42:00
|select page: « 1 »|