February 9, 2011

Retrieving HTTP URLs in PHP

Filed under: PHP — joel.cass @ 9:32 am

It’s strange how many different ways there are to do the same thing in PHP. For example,if you want to retrieve a URL, it can be as easy as calling file_get_contents($url), or you can use the PECL libraries, or you can go dig up an open source project such as this one.

I was messing around one night and figured it would be possible to just run an http request over a socket. As it turns out it’s not so difficult, there is tons of information out there on how to do it and it wasn’t long before I had a method figured out.

The advantage of this is that it is lightweight and gives you some control over the headers (etc) that you want to send/receive. This has only been tested on text-only requests.

function get_http_content ($url, $timeout = 3, $headers = array()) {
	// initialise return variable
	$stcReturn = array("headers"=>array(), "content"=>"");

	// get server name, port, path from URL
	$strRegex = "/^(http[s]?)\:\/\/([^\:\/]+)[\:]?([0-9]*)(.*)$/";
	$strServer = preg_replace($strRegex,"$2",$url);
	$strPath = preg_replace($strRegex,"$4",$url);
	$numPort = preg_replace($strRegex,"$3",$url);
	if ($numPort == "") {
		if (preg_replace($strRegex,"$1",$url) == "https") {
			$stcReturn["headers"]["Status-Code"] = "0";
			$stcReturn["headers"]["Status"] = "HTTPS is not supported";
			$stcReturn["content"] = "Error: HTTPS is not supported";
		} else {
			$numPort = 80;

	// connect to server, run request
	$objSocket = fsockopen($strServer, $numPort, $numError, $strError, $timeout);
	if (!$objSocket) {
		// connection not possible
		$stcReturn["headers"]["Status-Code"] = $numError;
		$stcReturn["headers"]["Status"] = $strError;
		$stcReturn["content"] = "Error: {$strError} ({$numError})";
	} else {
		// connection made - send headers
		$strOut = "GET {$strPath} HTTP/1.1\r\n";
		$strOut .= "Host: {$strServer}\r\n";
		$strOut .= "Connection: Close\r\n";
		foreach ($headers as $strName=>$strValue) {
		$strOut .= "$strName: $strValue\r\n";
	$strOut .= "\r\n";
	// get data
	fwrite($objSocket, $strOut);
	$strIn = "";
	while (!feof($objSocket)) {
		$strIn .= fgets($objSocket, 128);

	// split data into lines
	$aryIn = explode("\r\n", $strIn);

	// data is split into headers/content by double CR
	$bHeader = true;
		foreach ($aryIn as $i=>$strLine) {
			if ($i == 0) {
				// first line is [protocol] [status code] [status]
				$stcReturn["headers"]["Protocol"] = preg_replace("/^([^ ]+) ([^ ]+) (.+)$/", "$1", $strLine);
				$stcReturn["headers"]["Status-Code"] = preg_replace("/^([^ ]+) ([^ ]+) (.+)$/", "$2", $strLine);
				$stcReturn["headers"]["Status"] = preg_replace("/^([^ ]+) ([^ ]+) (.+)$/", "$3", $strLine);
			} elseif ($bHeader && $strLine == "") {
				// if line is empty headers have ended
				$bHeader = false;
			} elseif ($bHeader) {
				// set header
				$stcReturn["headers"][preg_replace("/^([^\:]+)\:[ ]*(.+)$/", "$1", $strLine)] = preg_replace("/^([^\:]+)\:[ ]*(.+)$/", "$2", $strLine);
			} else {
				// set content
				$stcReturn["content"] .= $strLine;
				if ($i < count($aryIn)-1) {
					$stcReturn["content"] .= "\r\n";
	return $stcReturn;

February 19, 2010

Another way to stop wordpress spammers

Filed under: PHP — joel.cass @ 9:22 am

It seems that getting a good ranking on a search engine can be a double-edged sword. Whilst it helps people find you, it also helps nasty spammers to find a way into your site so they can post their comments. Sure, wordpress does have some spam filtering ability thanks to the akismet plugin, but it would be so mauch better if they can be stopped at the source.

One way I have stopped spammers on my site is by implementing a text field inside a hidden block, as follows:

<!-- would recommend that this actually goes into a CSS file -->
<style type="text/css">
	.spam-check { display:none; }
<!-- spam detection -->
<p class="spam-check">
	<input type="text" name="spamcheck" id="fldSpamCheck" value="">
	<label for="fldSpamCheck">Please leave this field blank</label>
<!-- /spam detection -->

..and then on the processing side I would do something as follows:

// jnet spam detection
if (!isset($_POST['spamcheck']) || $_POST['spamcheck'] != "") {
	die('Error: please do not fill in the field that tells you not to fill it in.');
// end jnet spam detection

The beauty of it is that there are no CAPTCHA’s involved, and no thinking on the user’s side. The common trap that these spambots fall into is that they fill out all the fields in the form with useless garbage. Because this field is meant to be blank, the submission fails.

Furthermore, any user with a CSS enabled browser does not see the field. A user who can see the field can tell from the label that it is not meant to be filled in. So everyone wins! (Except the nasty spammers).

In wordpress, you will need to add the first code block to the files:

  • /wp-content/themes/[your_theme]/comments.php
  • /wp-content/themes/[your_theme]/comments-popup.php

…and you will need to add the second code block to the following file:

  • /wp-comments-post.php

…and you’ll be done! At least until the spammers figure out how to get around it.

December 18, 2009

Bypass Friendly Internet Explorer 404 Messages

Filed under: ColdFusion, PHP — joel.cass @ 9:22 am

I had to build a custom 404 page today and was bashing my head against a wall regarding Internet Explorer’s friendly “The webpage cannot be found” error message page.

Well, according to this link, the solution is simple – the page has to be at least 512 bytes. Too easy.

Of course, you can simply just add 512 bytes of whitespace and/or reduce the amount of whitespace to 512 minus the original size of the page. If your page is greater than 512 bytes you have nothing to worry about.

The ColdFusion repeatString and PHP str_repeat methods would be useful for this.

December 11, 2009

Automatic image resizing in PHP

Filed under: PHP, StructureCMS — joel.cass @ 9:54 am

One big issue in content management systems is that images are usually handled poorly. Even with the best file management capabilities, you can’t help users from uploading their 12 megapixel 15 megabyte images from their recent Christmas party and then chucking them into the content and resizing them down to fit the screen. The problem is that even though the image appears ’small’ in the browser, the full size image is downloaded and displayed, wasting bandwidth and slowing down the user experience.

A concept I really admired in Sitecore and have copied to StructureCMS is that resized images could automatically be created on the server-side and returned to the client simply by adding the parameters ‘w’ or ‘h’ to the URL, e.g. take these images for example:×120.gif

The attributes “w” and “h” can be added to the url to resize the image:×120.gif?w=60
test120x120, h=200

(if both parameters are defined, image is resized to ‘fit’ within the dimensions)

If you look at the images on their own, you will notice that they are resized by the server, and you can change the dimensions. The crunched down, resized version is sent back to the user, saving bandwidth and improving the user experience. Images are then cached so that future requests do not require any server resources.

How did I do this?

  1. I created image.php in the website root – this gets URL params and creates a new resized image (I’m not going to explain the code – it’s pretty simple)
  2. I installed the apache mod_rewrite module – this is as simple as opening your httpd.conf file and unhashing the line ‘LoadModule rewrite_module modules/’ (and then restarting apache)
  3. I created an .htaccess file that rewrites URL’s
  4. I then modified the tinyMCE image.js file to add the size attributes to the URL

The beauty of this modification is that it won’t break anything if apache nor mod_rewrite are installed. And, with some modification this code could be used on any website.