StructureCMS

February 9, 2011

Retrieving HTTP URLs in PHP

Filed under: PHP — joel.cass @ 9:32 am

It’s strange how many different ways there are to do the same thing in PHP. For example,if you want to retrieve a URL, it can be as easy as calling file_get_contents($url), or you can use the PECL libraries, or you can go dig up an open source project such as this one.

I was messing around one night and figured it would be possible to just run an http request over a socket. As it turns out it’s not so difficult, there is tons of information out there on how to do it and it wasn’t long before I had a method figured out.

The advantage of this is that it is lightweight and gives you some control over the headers (etc) that you want to send/receive. This has only been tested on text-only requests.

function get_http_content ($url, $timeout = 3, $headers = array()) {
	// initialise return variable
	$stcReturn = array("headers"=>array(), "content"=>"");

	// get server name, port, path from URL
	$strRegex = "/^(http[s]?)\:\/\/([^\:\/]+)[\:]?([0-9]*)(.*)$/";
	$strServer = preg_replace($strRegex,"$2",$url);
	$strPath = preg_replace($strRegex,"$4",$url);
	$numPort = preg_replace($strRegex,"$3",$url);
	if ($numPort == "") {
		if (preg_replace($strRegex,"$1",$url) == "https") {
			$stcReturn["headers"]["Status-Code"] = "0";
			$stcReturn["headers"]["Status"] = "HTTPS is not supported";
			$stcReturn["content"] = "Error: HTTPS is not supported";
		} else {
			$numPort = 80;
		}
	}

	// connect to server, run request
	$objSocket = fsockopen($strServer, $numPort, $numError, $strError, $timeout);
	if (!$objSocket) {
		// connection not possible
		$stcReturn["headers"]["Status-Code"] = $numError;
		$stcReturn["headers"]["Status"] = $strError;
		$stcReturn["content"] = "Error: {$strError} ({$numError})";
	} else {
		// connection made - send headers
		$strOut = "GET {$strPath} HTTP/1.1\r\n";
		$strOut .= "Host: {$strServer}\r\n";
		$strOut .= "Connection: Close\r\n";
		foreach ($headers as $strName=>$strValue) {
		$strOut .= "$strName: $strValue\r\n";
	}
	$strOut .= "\r\n";
	// get data
	fwrite($objSocket, $strOut);
	$strIn = "";
	while (!feof($objSocket)) {
		$strIn .= fgets($objSocket, 128);
	}
	fclose($objSocket);

	// split data into lines
	$aryIn = explode("\r\n", $strIn);

	// data is split into headers/content by double CR
	$bHeader = true;
		foreach ($aryIn as $i=>$strLine) {
			if ($i == 0) {
				// first line is [protocol] [status code] [status]
				$stcReturn["headers"]["Protocol"] = preg_replace("/^([^ ]+) ([^ ]+) (.+)$/", "$1", $strLine);
				$stcReturn["headers"]["Status-Code"] = preg_replace("/^([^ ]+) ([^ ]+) (.+)$/", "$2", $strLine);
				$stcReturn["headers"]["Status"] = preg_replace("/^([^ ]+) ([^ ]+) (.+)$/", "$3", $strLine);
			} elseif ($bHeader && $strLine == "") {
				// if line is empty headers have ended
				$bHeader = false;
			} elseif ($bHeader) {
				// set header
				$stcReturn["headers"][preg_replace("/^([^\:]+)\:[ ]*(.+)$/", "$1", $strLine)] = preg_replace("/^([^\:]+)\:[ ]*(.+)$/", "$2", $strLine);
			} else {
				// set content
				$stcReturn["content"] .= $strLine;
				if ($i < count($aryIn)-1) {
					$stcReturn["content"] .= "\r\n";
				}
			}
		}
	}
	return $stcReturn;
}