Thursday, September 10, 2009

Overview of the Wordpress Video plugin 0.9

Well it's not a plugin, per se, but really set of plugins, and some background tasks, and an overall suggested architecture.

The basic flow so far is:
(Image checked in by Hailin @ Auttomatic)

  1. Register an action (remote_transcode_one_video) so that when the user attaches a file, it fires off an exec() to run video-upload.php
  2. in the transcoder (now we are on another server, most likely) it checks the auth, saves metadata about size, etc. and uses FFMPEG to transcode
  3. now it sends the file to the file server (send_to_fileserver) where it moves the files into the final resting place, and has a placeholder for sending it off to a CDN or other replicant
Basically, it bounces off the user's blog (WPMU) to the transcoder via a post which in turn bounces it off to the file server. I presume all these scripts should have access to the exact same database (or slave relicant) in order to keep the processing steps in synch, and to update the post metadata about where the final resting place of the video is, and the video info on size, etc.

It seems to me that in the best case, it will require a lot of (real-time HTTP) chatter and not very efficient file transfer through all this. Some folks out there are posting about getting evertything working on one box --- which should be very possible, but frankly I'd be more interested in getting it off the blog sever. I think that was the right decision architecturally, even if it means more complexity for the admin. But I wonder if moving the files around serially is the best way. Maybe some sort of queue (Amazon SQS for the message) + an S3 input bucket for the transcoding workload. Then when the transcoding process finishes a file, it moves it into the fileserver "master" bucket, and the fileserver caches pull from that bucket?? I know it will be fairly complex with load balancing, and geographic distribution, so it probably needs more thought that 30 seconds :)

The key point is to make the HTTP chatter asynchronous. Also by putting the messages into a queue, and having the LRPs just pull from the queue, you get more robustness against error. I'm sure there are all sorts of race conditions by simply relying on the simple status messages , i.e. update_video_info( $blog_id, $post_id, $format, $status ). I think having a workflow that can pick up in the case of error, and at least show the admin which ones are in various error states would be nice.

The other option, assuming a smaller site without geographic distribution, is to put it all on NFS servers behind the main server. That way the file transfers are not HTTP based. If you're not on EC2 I can see where this might be a viable alternative. We do this in the colo now. But if you're on EC2, then simply writing to S3 seems to be the way to go.

I think straight away we should implement an S3 option for the file server, use or similar for the encoding such that it writes to your S3 bucket, and then use CloudFront or other CDN for distribution.

If you really want to duplicate the features of, and hack all the ffmpeg options, then let me know and I can set up a high performance transcoder AMI on EC2 with the scripts.

Some other observations:
  • the upload script and the transcode script checks if you have a DATACENTER defined, presumably for load and redundancy; this is a stub, so you would have to add some logic to round robin or randomize or in other ways pick a "transcoder" server and a "fileserver"
  • there is a simple authentication mechanism using a simple md5 salty string to auth to the transcoder -- would be nice to add per user auth to prevent spam or abuse, or to block users who post illegal content -- CERTAINLY users should change this string in production! maybe in the comments we can make that more explicit. Security by obscurity is never the right way to go
  • we are using some standard formats, but there are ffmpeg templates with many more examples, lots for mobile video, etc. we should import these, and just have a list in the options of all template files, or maybe even create the template inside a post, perhaps with some sort of GUI
*For the fileserver: One nice thing might be to combine some of the logic from the S3 or CloudFront plugins.
*for both file server and transcoder: Also, might be nice to incorporate some sort of stats database on usage, RTT, etc
*for both: Might also be nice to check the country of origin from GeoIP/maxmind and pick a mirror based on that
*for the transcoder: How does this differ from Are these complimentary/orthogonal? Are they substitutes?
*for the ffmpeg calls in the transcoder, a lot of the variables are hardcoded --- we should have some way to specify these for different video types in some option set


andy said...

I always enjoy learning how other people employ Amazon S3 online storage. I am wondering if you can check out my very own tool CloudBerry Explorer that helps to manage S3 on Windows . It is a freeware. With CloudBerry Explorer PRO you can even connect to FTP accounts

Robert A. Ficcaglia said...

Andy --- isn't the S3 plugin for firefox pretty much all you need? That seems to fit all my needs ok, but maybe you can shed some light on how C.B. could help with the video project?

andy said...

MS Internet Explorer is all you need to browse the internet, however there are Firefox, Chrome, Safari and others... People who tried CloudBerry Explorer like it and use it

Robert A. Ficcaglia said...

I would be interested in seeing how we can use C.B. in the Wordpress admin GUI to help users manage their video assets. For example, can C.B. allow bulk operations, like rename all files in a bucket? If it could enable easy admin from the Wordpress GUI, that would be interesting indeed. I would be happy to code up a plugin if you have an API and a test sandbox.

andy said...

Well, CloudBerry Explorer is just a desktop client. You might be interested in this WP plug-in for Amazon S3 If you want to protect your contents you may also want to check out another plug-in

Robert A. Ficcaglia said...

I do use the Tan Tan project! It's good. I had to modify it to support Signed URI access.

As for the other one, we have our own access control for S3 links. It's more secure since it's a pay-per-view system. The plugin you mention basically just puts a time limit on the link, just like Amazon's Query String Signed URI...we however authenticate the user cryptographically.

I guess I could wrap that in a plugin if people wanted it. It's only really needed if you're SELLING videos.