• Website and Flash Development
  • Content Management Systems
  • .NET, PHP And More!

Jeff Lee's blog

Screen Scraping Made Pretty Easy

Many public libraries offer a valuable but often overlooked resource – free or discounted museum passes.  Here in Western Massachusetts residents have access to the combined materials of the entire CW/MARS regional system of more than 140 libraries.  Courtesy of our library cards, my family and I have made free trips to the Holyoke Children’s Art on exhibit at the Springfield QuadrangleMuseum, Amherst History Museum, Historic Deerfield, Magic Wings, Brattleboro Art Museum, Smith College Art Museum, Springfield Quadrangle and Boston Science Museum.

I became curious to know the full range of museum passes our library system had to offer.  This information is available via the C/W MARS search engine, but the more than 50 results are spread out over 5 rather verbose pages.  What I was really looking for was a concise, browsable list of available museum passes with links to detailed information. 

A powerful, elegant PHP library called Simple HTML DOM Parser makes constructing such a page relatively easy.  Contributed as open source by S. C. Chen, it features a “Find” function that takes arguments modeled after JQuery’s easy-to-use  syntax.  For instance, this small PHP snippet is virtually all that is needed to dump all the link destinations in a web page:

$html = file_get_html('http://some-site.com/some-page.html');
foreach($html->find('a') as $element)
       echo $element->href . '<br>';

I wrote a program that used Simple HTML DOM Parser to loop through each page of search results from a CW/MARS Library Catalog search query on the term "museum pass." The program parsed each page and extracted all HTML nodes with a class of "briefcitTitle" (find('.briefcitTitle')).  Each node contained a link to an individual museum pass page which was extracted and written back out as a table row. With a couple dozen lines of code I was able to put together the list of museum passes I was after:

Museum Passes Available for Loan at Western Mass. Libraries

OK, Use the Map Now

My recent blog post described an inaccuracy in the Google Map generated by entering the Studio4 Technologies office address.

Either by coincidence or, could it be the amazing power of the blogosphere, I find the map of "592 Main St., Amherst, MA" is now correct.  Our map marker is at the corner of Main and North Whitney, right where it should be. 

Thanks Google technician, wherever you are! 

Google map showing correct location of Studio4 Technologies office












Don't Use This Map!

Google Maps is the application that first clued me into the power of Ajax and it has been my mapping service of choice for quite sometime.  Therefore it is with some dismay that I must point out a glaring inaccuracy in one of its maps -- namely the one generated from the Studio4 Technologies street address!


View Larger Map

The pushpin marking the supposed location of 592 Main St. is nearly a half mile off.  The office is located in the Nacul Center, a former Methodist church that has been remodeled into studios and an art gallery, at the corner of Main and North Whitney Streets in Amherst, Mass., and more than 3 blocks west of where Google Maps would have us!

I have attempted to correct the location using Google's process for moving a marker, but the error exceeds the 200 meter limit that Google allows without human review and I am still waiting for their human to respond to my change request after several weeks.

Pay us a visit any time, but, at least in this case, you will be better off using a map from Bing or MapQuest to find your way.

Syndicate content