• Posted on October 23, 2013

TagCloud in PHP

Tag Clouds are used in many websites, but most of the time in bad ways. Even though I am not the largest fan of them, I will teach you how to generate one using PHP in three different methods. I mostly am doing this because I needed a tag cloud to quickly show me the most used search terms to reach my websites, and I figured tag clouds do this fairly well.

This isn’t really a tutorial, but an example on how to code a tag cloud. Below is the full code I created (plus a few alias functions). Under the code I will explain how things work if you do wish to know more about it.

<?php
	/*
	tagcloud_wordarray(
		array(
			array('word one',1),
			array('word two',1),
			array('word 3',3),...
		),[min font size,[max font size]]
	);
	*/
	function tagcloud($data,$minsize=12,$maxsize=32) {
		$highestval = 0;
		$lowestval = false;
		$numinc = 0;
		$output = '';
		$s = 0;
		$items = count($data);

		for($i = 0; $i < $items; $i++) {
			if($data[$i][1] > $highestval) {
				$highestval = $data[$i][1];
			}
			if($data[$i][1] < $lowestval || $lowestval === false) {
				$lowestval = $data[$i][1];
			}
		}

		$numinc = ($highestval - $lowestval);
		$sizedif = ($maxsize-$minsize);

		for($i = 0; $i < $items; $i++) {
			$s = $data[$i][1] - $lowestval;
			$s = $s / $numinc;
			$s = $s * $sizedif;
			$s = $s + $minsize;
			$output .= '<span style="font-size:'.$s.'px">'.$data[$i][0].'</span> ';
		}
		return $output;
	}

	/* tagcloud_wordarray(array('a','b','c',...),[min font size,[max font size]]) */
	function tagcloud_wordarray($words,$minsize=12,$maxsize=32) {
		$array_counts = array_count_values($words);
		$tagarray = array();
		foreach($array_counts as $k=>$v) {
			$tagarray[] = array($k,$v);
		}
		return tagcloud($tagarray,$minsize,$maxsize);
	}

	/* tagcloud_string("a b c ...",[min font size,[max font size]]) */
	function tagcloud_string($words,$minsize=12,$maxsize=32) {
		$words = strtolower($words);
		$words = str_replace(array('.',',','"'),'',$words);
		$words = strip_tags($words);
		$words = explode(' ',$words);
		return tagcloud_wordarray($words,$minsize,$maxsize);
	}
?>

In the functions tagcloud_wordarray and tagcloud_string, we accept different formats of the data to generate the tagcloud using the tagcloud function. We do still require the minsize and maxsize for the font settings, and we just pass those along to the final function.

In tagcloud_string, we have to strip the tag of punctuation. Having the word “lorem.” and “lorem” should both be considered the same. This is accomplished by running str_replace. After that, assuming some users may have HTML in their strings, we strip that out using the PHP function strip_tags. Now that the paragraph or string of words is cleaned up, we can split it into an array by using explode and the delimiter of a space. Last off, we return the value of the function tagcloud_wordarray with the new word array we just generated.

The function tagcloud_wordarray takes an array of words, counts the instances of each word and passes those values onto tagcloud for the final calculations. The tagcloud_wordarray function doesn’t format or parse the values for grammar or tags, since if the words are already in an array, we figure it’s already properly formatted, including the case of the words. The first function call is array_count_values, thus counting the recurrence of each word. After that, we have to turn the associative array into a numeric array. The key and value are grabbed from a foreach statement inserting the data into the final array of tagarray. Once that’s done, we call to return tagcloud with our new array of data and just pass through the minsize and maxsize for fonts.

The final and most important function is now tagarray. This was the original function I coded for my needs. The data argument is a numeric array that each value has another array of the word and the appearance rate (or just word count) of that word in value 0. Below is an example of the data layout required.

Array(
	[0] => Array
		(
			[0] => lorem
			[1] => 3
		)
	[1] => Array
		(
			[0] => ipsum
			[1] => 8
		)
	[2] => Array
		(
			[0] => foo
			[1] => 2
		)
	[3] => Array
		(
			[0] => bar
			[1] => 4
		)
)

We loop through each word getting the numeric value assigned to it. We check if it’s higher than our highest value, or lower than the lowest value. If either of those are true, said value is updated. This is going to be used in the future equation.

Next we calculate the difference between the highest occurrence and lowest, along with calculating the difference between font sizes we passed in the function arguments. We loop through the array once again. This time we are on the mission of calculating the font size and adding the string to the output variable by appending it. The size is calculated using the equation below.

((((x - j) / k) * d) + m)
x = occurrence of word
j = lowest value
k = difference between highest and lowest value
d = difference between font size values
m = minimum font size

We then finish up the code by returning the output variable with all the strings append to it.