« Wedding congratulations | Main | Father's day »

Scrape IMDB MyMovies with PHP

June 08, 2006
Keywords:

pimp.pngI have been wanting to somehow mention the movies I have recently seen and what I thought of them on this blog but was not sure how to best do it.

For a while I was running my own custom code to add movie data and ratings and to display them but this was too cumbersome.

Recently I discovered that the movie rating system at IMDB.com keeps track of all the movies you have ever rated (IMDB.com is by the way an amazing resource for movie lovers and I hope that you have used it before).

Since IDMB is the oracle of movies a better approach seemed to be to continue reviewing and discussing movies on there and attempt to mash the data into my blog.

A few regular expressions later and a Pick IMDB MyMovies with PHP (PIMP) script was born.

The script scrapes a given IMDB MyMovies list and provides the results in a handy (two dimensional) array. It is up to the user to then write the display code to make it fit in with a web page (blog).

A basic file based cache has been implemented to save on the amount of hits on IMDB.

At the core of the script is the following regular expression:

/<a href=\"\/title\/([^\/]*)\/([^>]*)>([^<]*)<\/a> \(([0-9]*[\/I]*)\)( \(.\))?<\/td>([^<])<td align=\"center\" bgcolor=\"\#ffffff\">([0-9]{1,2})<\/td><td align=\"center\" bgcolor=\"\#ffffff\"> ([0-9]\.[0-9])?/i

Be aware that if the layout of the IMDB page changes, PIMP will fail. Of course I will be quick to update the regular expression since I am using PIMP my self.

There are some rumours that IMDB will introduce an open API that will allow developers to retrieve all kinds of movie information. Until that is done, crude HTML scraping techniques will have to do.

Feel free to suggest regular expression or code optimisations.

Download PIMP v1.2 and let me know what you think!

Usage:
------
0. Create an account with IMDB and mark a MyMovies list as public
1. To be able to use a cache file create a directory which is writable by the script process (may need chmod 666). For security reasons this directory should be OUTSIDE of the area accessible from the web.
2. Configure the script with your details (list id, cache directory and etc)
3. Upload the myImdbMovies.php file to your web server
4. Access the script directly or better yet use it as an include

Update 26/06/2006
Added a raw HTTP method if file_get_contents doesn't work for you (hosting provider restriction.

-1 cache time out will skip caching altogether and avoid possible cache file permission problems.

-1 for listItems will result in all items being returned



Comments

  1. John Says:

    no gravatar

    This was exactly what I was looking for.

    But my server has apparently disabled file_get_contents().

    Warning: file_get_contents(): URL file-access is disabled in the server configuration line 88

    Do you have an alternative way to make the script work?

    I appreciate it.

    David says:

    Glad that it seems helpful, shame your system is locked down. file_get_contents() is just one of several ways to get the content, let me look into an alternative...

    Sorted, try the new '$use_raw_http_mode = TRUE' option.

  2. John Says:

    no gravatar

    The '$use_raw_http_mode = TRUE' option' did the trick. Thanks David.

    Is it much trouble to code an option for displaying ALL the movies on my list?

    Wonderful, thanks again.

    David says:

    It is there already, the $listItems options. Make sure to select "Change display options" and "Show All titles" in IMDB MyMovies.

  3. John Says:

    no gravatar

    I see the link (VIEW ALL) that takes the user to the actual IMDB page. But I don't think there are any other options for $listItems to fetch and display ALL ITEMS.

  4. John Says:

    no gravatar

    The variable $listItems stores an int value. But there's no option to make $listItems list/fetch ALL items. The View All link simply directs the user to the IMDB page, doesn't grab the items.

    David says:

    A really high value like 999 should do it (unless you are a movie addict).

  5. PF Says:

    no gravatar

    Thanks a lot for creating this code!
    I've made your idea into a facebook application, http://www.facebook.com/apps/application.php?api_key=89f359599ea0017a182c504827e50a74 I don't know if you are on facebook but you might like it.

    David says:

    Hey, your Facebook App is a great way of realizing the proof of concept (which PIMP is).

    Just remember that IMDB does not endorse screen scraping, in fact they have a tendency of banning hosts if the scraping becomes too intense... but the more people scare their screens the more pressure we put on them to create or release an open API!

    Thanks for the credit.


Post a Comment

 *

 *

 

 
Remember Me?

* (you may use HTML tags for style)