Amid South Carolina Gov. Mark Sanford's newsmaking affair with an Argentine woman last week, we decided to run an online poll asking our readers whether Sanford should resign. We used a widget developed by Vizu to run our unscientific poll. I like their service because it's very fast, fairly customizable and produces a cool Google map of voters' locations.
This was working great as is, until the top editor of the paper said we need to show the vote totals on the site. Well, that info doesn't come as part of the Vizu widget directly, but it is visible on the page where the poll was created - but that's on Vizu's site. On a fast-moving news day (or any other day), the last thing you want to do is update anything every fifteen minutes by hand when you don't have to.
Enter Simple_HMTL_Dom - a PHP parser that does what is says - it simply parses HTML from the DOM of a given Web page. In plainer English - it scrapes a Web page and allows you to extract the information you need.
Here is the end result of what I was trying to achieve:
Poll: Should Sanford resign?
Click here to see map of votes
That vote total above is generated by a javascript which was generated using PHP. A cron hits the PHP script every 15 minutes, which in turn writes the result to javascript. The reason for that is to keep server hits to Vizu to a minimum. It also lightens the load on your own server because you're only serving a few lines of code and since we're using drupal to write the script, doesn't hit the MYSQL database in the process.
In any event, notice the Vizu page where this poll was created. It's here:
http://www.vizu.com//poll-results.html?n=170479
Notice just above the bar chart you'll see the vote totals. That's the information we want to show dynamically.
There's really not much to Simple_HTML_Dom. Just download the script and save it to your server, include() it in a PHP script and rock and roll.
Here you go:
<?php
include ('simple_html_dom.php');
$html = file_get_html('http://www.vizu.com/poll-results.html?n=170479');
$es = $html->find('td[align=right]');
$votes = "document.write(".$es[0].");";
$FileName2 = "votecount.js";
$FileHandle2 = fopen($FileName2, 'w') or die("can't open file");
fwrite($FileHandle2, $votes);
fclose($FileHandle2);
?>
So a few things here. The line that says
$es = $html->find('td[align=right]');
is just looking at the source code of the Vizu page and matching elements it finds. In this case, the first match is the Vote total. This is signified in the next line, when we define the one line of javascript that will right the first matched element:
$votes = "document.write(".$es[0].");";
Finally, we write it to a file (don't forget to first create the blank file and save it with the correct read/write/execute permissions).
The last thing you'd want to do, is set up a cron job to hit the URL of the php script every so often - in our case 15 minutes. There is tons of documentation on this step and I'll leave you to google to sort that out.
Now, in the HTML of the page where you want the dynamic info to show up, it's simply:
<script src="http://blogs.islandpacket.com/sites/default/files/votecount.js?1" type="text/javascript">
There you have it. Once you've done this once or twice, it doesn't take long to do something like this on the fly under pressure.
