Scrape IMDB MyMovies with PHP

pimp.pngI have been wanting to somehow mention the movies I have recently seen and what I thought of them on this blog but was not sure how to best do it.
For a while I was running my own custom code to add movie data and ratings and to display them but this was too cumbersome.
Recently I discovered that the movie rating system at IMDB.com keeps track of all the movies you have ever rated (IMDB.com is by the way an amazing resource for movie lovers and I hope that you have used it before).
Since IDMB is the oracle of movies a better approach seemed to be to continue reviewing and discussing movies on there and attempt to mash the data into my blog.
A few regular expressions later and a Pick IMDB MyMovies with PHP (PIMP) script was born.
The script scrapes a given IMDB MyMovies list and provides the results in a handy (two dimensional) array. It is up to the user to then write the display code to make it fit in with a web page (blog).
A basic file based cache has been implemented to save on the amount of hits on IMDB.
At the core of the script is the following regular expression:
/<a href=\"\/title\/([^\/]*)\/([^>]*)>([^<]*)<\/a> \(([0-9]*[\/I]*)\)( \(.\))?<\/td>([^<])<td align=\"center\" bgcolor=\"\#ffffff\">([0-9]{1,2})<\/td><td align=\"center\" bgcolor=\"\#ffffff\"> ([0-9]\.[0-9])?/i
Be aware that if the layout of the IMDB page changes, PIMP will fail. Of course I will be quick to update the regular expression since I am using PIMP my self.
There are some rumours that IMDB will introduce an open API that will allow developers to retrieve all kinds of movie information. Until that is done, crude HTML scraping techniques will have to do.
Feel free to suggest regular expression or code optimisations.
Download PIMP v1.2 and let me know what you think!
Usage:
——
0. Create an account with IMDB and mark a MyMovies list as public
1. To be able to use a cache file create a directory which is writable by the script process (may need chmod 666). For security reasons this directory should be OUTSIDE of the area accessible from the web.
2. Configure the script with your details (list id, cache directory and etc)
3. Upload the myImdbMovies.php file to your web server
4. Access the script directly or better yet use it as an include
Update 26/06/2006
Added a raw HTTP method if file_get_contents doesn’t work for you (hosting provider restriction.
-1 cache time out will skip caching altogether and avoid possible cache file permission problems.
-1 for listItems will result in all items being returned

Wedding congratulations

Isadora and Albert, thank you for inviting us to this special moment of your life. We wish that your wedded life will be as sweet as the desert that was served during the fabulous lunch.
Oscar and Asa, sorry we could not join you in person but be sure that we were with you in our minds.
To all of you, have a wonderful honeymoon and make sure to love and respect each other for the next 100 years.
On the topic of weddings, thank you Amor for the last 5 years.

Web &gt2.0 Conference, London, UK

webgt2con.pngThis is an early notice that later this Summer I will be organising a Web &gt2.0 Conference in London, UK (exact location TBA).
The attendance fee will be $2.795 ($2 and 795 cents) and all proceedings will be donated to an open source project TBA.
At the conference will be several speakers presenting why Web 2.0 is so yesterday and what Web 2.1 can offer.
There will also be brainstorming sessions to come up with new and tantalising buzzwords – ideally trademark-able after general adoption.

On a more serious note. I am an IT professional and have always favoured O’Reilly technology books when making new purchases. I have often recommended them to friends and colleagues.
The latest attempt by O’Reilly to trade mark Web 2.0 and going after the little man with a Cease and Desist letter have made me think twice.
It could be a sign that O’Reilly have lost touch with their audience and also that they don’t fully understand the term that they were a part of popularising.
How can a company that recently so strongly advocated against the Amazon one-click purchase patent make a such an u-turn I cannot comprehend.
O’Reilly should have been clear from the beginning (2003) and stated their intentions by adding a (TM) every time they used Web 2.0. Had the general public been aware of the pending TM I doubt that the term would have gained such a popularity.
One decent way for O’Reilly out of this mess would be to accept that the term is by now in Public Domain.
As long as O’Reilly is very knowledgeable in upcoming web technologies and a leading force in advocating Web 2.0, no imitators will be able to steal their thunder!

SkyHD installation: first impressions

skyhd sampleFinally May 22nd arrived and with it the first official installations of one of the first HDTV services in the UK: SkyHD.
Installation
In my case the actual installation was just a matter of unplugging the previous Sky+ box and replacing it with the new SkyHD box – easy peasy.
Currently I have the SkyHD box connected to an older Pioneer Plasma 433 (43″). The TV supports 1080i over component and that is how it is connected at the moment.
A HDMI to DVI cable has been ordered and I am looking forward to comparing the picture quality of HD over component vs. over HDMI.
skyhdbox.jpgThe SkyHD Box
SkyHD has a 300GB internal hard drive but only 160GB is allocated to the customer, the rest is “reserved” for Sky. Recorded HD content obviously requires more HD space than standard definition (SD) content and there is a risk of running out of HD space quickly. Star Wars III in HD requires around 10% of the SkyHD available space!
SkyHD comes with a new remote but my multi-remote that was programmed for the previous Sky+ box is working fine. This seems to suggest that Sky+ and SkyHD share remote control codes but I have heard reports of SkyHD remote not being able to control a Sky+ box.
The box has internal fans but sitting 6 meters away from it I was not able to hear it over the TV sound.
You can choose to output in 1080i, 720p, 576 or leave it on Automatic where the SkyHD box will switch according to the source.
The Automatic option may seem best but unfortunately the switching between HD and SD channels creates a flicker.
#PAGEBREAK#
Picture Quality
First channel any new HD viewer should browse to is the BBC HD Preview (145). It is transmitted in very good quality and shows just how good HDTV can be.
Overall HD content is pretty scarce and often mixed with upscaled video. Upscaled means that Sky has converted a SD source and broadcasts it as HD.
Currently the only true HD programmes on Sky One HD seem to be 24, Rescue Me, Enterprise and some episodes of Malcolm In The Middle. The upscaled content is easily spotted as it is a bit narrower than the standard 16:9 aspect ratio and has black bands on the sides (it is more like 14:9).
National Geographic, Discovery and Artsworld have dedicated HD channels but again there is mixed true HD and upscaled content.
The Star Wars III: Return of the Sith is running on Sky Box office in HD and is reported to be of much better quality than the DVD version.
When 1080i is selected the SkyHD box upscales everything in SD to 1080i. This removes the annoying flickering but I find the upscaled picture a bit soft. Most likely due to the fact that the TV has to downscale once extra time and additional conversions are never good.
#PAGEBREAK#
Conclusions
Do I like SkyHD? Yes. Am I blown away? No.
This may be because for the last year I have been spoiled with HD content sourced on the Internet. A recent HDV camcorder purchase (HDR-HC3) has also allowed me to create my own HD content.
The fact that there is so little true HD content on Sky at the moment does not help either.
In my opinion Sky+ was a more radical improvement because it added PVR (personal video recording) and all the goodness that comes with it: pause live TV, watch one channel record a second one, fast forward past commercials and similar. It also added Dolby Digital 5.1 surround sound which makes for a great TV experience.
It seems as if Sky has oversold the SkyHD service as many customers have been told their installation has been delayed. New customer asking about SkyHD are being told August as the first available date (BBC News has more about this).
sky hd image samplesky hd image samplesky hd image sample
PS The previous 80GB Sky+ is for sale

Is your TV ready for High Definition (HDTV)?

hdtvreadyWith the looming introduction of first HD TV content in the UK (SkyHD and BBCHD) and release dates available for HD media players (HD-DVD and Blu-ray Disc) you might be wondering whether your 2-3 years old Plasma or LCD TV will support it.
HDTV will mainly come in two resolutions: 1280 pixels horizontally by 720 pixels vertically progressive (aka 720p) and 1920 by 1080 pixels interlaced (aka 1080i). Your TV should have at least 1280 by 720 pixels to take advantage of all the details in the signal (1024×768, 1024×1024 and 1366×768 will be fine too).
If you choose to connect a HD signal to your traditional TV you will just end up using the SD (standard definition) version and miss out on the details. You may get a better picture due to less compression artifacts. This is the case especially with satellite TV.
In addition to resolution, you have to look in the technical specification of your TV for what frequencies are supported. It is quite common for older TVs to accept 1080i at both 50 and 60Hz but to accept 720p only at 60Hz. In the UK, SkyHD has confirmed they will broadcast in 50Hz only which means the 720p signal would not be accessible on older TVs.
There are various ways to feed a HDTV signal into a HDTV and your TV should accept one or more of the following inputs: Component (analog), DVI (digital) and/or HDMI (digital).
I believe that analog component outputs will be removed from second generation of HD units but at the moment SkyHD boxes and Sony Blu-ray Disc players will feature component out.
HDTV introduces content protection of the signal on the wire. This is achieved with the encryption named HDCP (High-bandwidth Digital Content Protection). Ideally your TV should support HDCP on digital inputs (DVI or HDMI) but again first generation of HDTV content will be unencrypted so for a limited time not supporting HDCP is OK.
Some TVs support extension boards that can provide recent HDTV requirements like 720p at 50Hz and HDCP. Due to the high price of extension boards, falling HDTV prices and limited life time of Plasmas and DLP Rear Projection TVs you may be better of investing in a new HDTV.
Checklist:

  1. At least 1280 by 720 pixels
  2. Accept 720p and/or 1080i signal at 50Hz
  3. Component and/or DVI and/or HDMI inputs
  4. Ideally HDCP on DVI or HDMI

Best picture is always achieved if the input signal can be shown without any scaling. This means that a 1080i signal will look best on a screen with 1920×1080 native resolution.
The “holy grail” of HDTV are 1080p screens that will be able to display a true progressive 1920 by 1080 pixels signal whenever that becomes a broadcast signal. Some manufacturers know this and market their top-end displays as 1080p but buy with caution as there is a discussion whether these first generation 1080p displays are true 1080p.
So what do I have? An almost 4 year old Pioneer Plasma 1024×768 display. No HDMI nor HDCP but I have confirmed it accepts 1080i at 50Hz and has component and DVI in. Due to the age of the screen I am not considering available extension boards. Instead I will get a decent 1080i screen/TV later this year.
My SkyHD installation is scheduled for May 22nd so I will let you know shortly thereafter what HDTV looks like on an old beast like that..

Linksys WAG54G (v1) as a repeater using WDS

wag54g_wds.jpgThe flaky WAG54G finally gave up one fine morning. It had been struggling for a few months but that morning the final flat line was there.
Actually, it wasn’t a total death but more of a coma. While the ADSL port and the 4 network ports were dead, the wireless functionality was still present and allowed me to connect and do a configuration.
Obviously the device could no longer be used for connecting to the internet nor for any LAN routing and I was about to bin it. Meanwhile it was collecting dust for a few weeks while I was playing with the new Billion BiPAC 7402VGP VOIP router.
What a relief it was to finally use a stable router that did not overheat frequently and did not require consequent restarts every few hours.
I have a spare internet web camera and I was hoping to set it up at the front of the house. Unfortunately there is no WiFi coverage which meant a WiFi repeater would be required and I was not willing to spend additional money to get up that spare internet camera.
Then it hit me that the WiFi-only capable router might just be able to serve as a repeater. Especially since the newest 1.03.0_beta4 firmware supports WDS (Wireless Distribution System).
WDS is used to link multiple wifi routers or access points together to serve as a single wifi access point. In theory a device will connect to the strongest available point. Expect to get only half through put since all traffic needs to be forwarded to the main router. This should not matter at all for internet traffic which is usually slower that the 802.11g connection speed (at 54Mbps).
Configuring the old Linksys router to the new WiFi settings was a hassle because every time a wifi parameter was changed on the router, the wifi settings had to be changed on the laptop as well. The WAG54G also kept clashing with the Billion router until I got the WDS settings right.
One important thing to remember is that a router often has several MAC addresses for the different components. One for the WAN adapter, one for the LAN adapter and one for the WLAN adapter.
You have to use the WLAN MAC address when configuring WDS peers. This took me a while to realize. On the WAG54G, you’ll find the correct MAC address in Status > Wireless > MAC Address.
oh, joy

NEWSFLASH: Dog eats Sky+ remote control – again

sky_remote.jpgSo a second Sky+ remote control beats the dust. It lasted only 2 weeks and was still in *mint* condition.
I never saw the carcass as the wife hid it knowing it would make my poor heart racing.
No wonder due to all the sweat that was put into purchasing it at a local Maplin’s shop. A sweaty shop with sweaty carpets and even sweatier customers.
The first remote control at least lasted 1 year which I was quite content with. 2 weeks is however not acceptable so this family will simply have to do without.
The family is however allowed to use the amplifier multi-control which was conveniently programmed for Sky+ before the destruction.
A small call out to Sky to make their remote controls less appetizing. I guess tastiness was something that was overlooked in the design process. The black rubbery part of the remote seems to be simply irresistible for dogs..
Maybe I should have a taste next time!

Transforming in Wales

wales.jpgI was in a bit of a pickle: a planned training session in Wales (5 hours drive away) but a heavily pregnant wife at home refusing me being away over night.
So I squeezed it all into a single day. Up at 4am, in Bangor 8:30am, training session 9 to 5 and back in London after 10pm with some 540 miles more on the old odometer.
ETL Solutions was demoing their Transformation Manager (TM) application for data transformations with generate Java code.
It turned out to be a very powerful tool both for the initial mapping between a source model and a destination model and for running the transformations.
Supported models are XML schemas (xsd), Document Type Definitions (DTD), partial XML data, Java objects, RDBMS and a multitude of flat files.
Data transformations are a two step process:
1. visually create a mapping between a source model and a destination model and let TM generate Java code to do the transformations
2. Deploy the generated code and integrate into your application with 5-6 lines of code.
The visual mapping included drag-and-dropping of elements between the source and destination. In addition, TM is using a powerful modeling language (SML) for more advanced mappings.
The tool originated from a requirement of migrating data between Oracle databases which is evident from the strong DB support. Included are operations for transaction management and batch operations for optimisation tweaking.
The tool then evolved by supporting additional data models and by the addition of various transformation functions (financial, mathematical and similar).
The only thing I was missing was the support for XSL output when converting between two XML schemas.
I find XSL to be more portable than Java classes but when asked to specify a scenario where our company would not be able to use Java classes for a transformation, I was not able to.

Movie hat-trick

Three evenings in a row watching three movies that I did know much about but proved to be very watchable. Now that is a good movie hat trick to write home about!
The Family Stone is a charming holiday story about relationships.
The movie has Sarah Jessica Parker starring as Meredith Morton who is visiting her boyfriend’s family for Christmas.
Meredith is an uptight career woman and she clashes with the liberal, close knit family of her boyfriend. His family wants only the best for their little Everett (Dermot Mulroney) and they don’t see Meredith as being the one.
The movie has a lot of funny moments and a few surprises. The overall feeling is of being very genuine and with some great acting.
In Shopgirl, Mirabelle (Claire Danes) works as a shop assistant. She has a large student loan, is getting older each day and starts questioning her self when and if her big break will come.
She has a brief, fairly embarrassing sexual encounter with Jeremy (Jason Schwartzman) but almost forgets about it when a wealthy business man (Ray Porter played by Steve Martin) shows interest in her.
Breakfast on Pluto is about the boy Patrick (Cillian Murphy) who is deserted by his mother and left on the doorsteps of a church to be raised by a step mother.
Quickly it becomes obvious that Patrick is actually Patricia “Kitten”; a girl trapped in a boy’s body who is “Looking for love in all the wrong places”.
Patricia spends the rest of the movie looking for her real mother while experiencing colorful adventures (show biz, circus, IRA attacks, prostitution, friendship and much more).
Breakfast on Pluto is a pleasure to watch and no wonder since it is directed by Neil Jordan (Interview with the Vampire).
Æon Flux was a fourth decent movie but only due to its stunning visual effects and lead character (Charlize Theron) but not quite on par with the three movies mentioned above..