Wednesday, December 30, 2009

Augmented Video on iPhone

http://www.morethantechnical.com/2009/08/09/near-realtime-face-detection-on-the-iphone-w-opencv-port-wcodevideo/
http://www.morethantechnical.com/2009/05/06/iphone-camera-frame-grabbing-and-a-real-time-meanshift-tracker/

video feed grabber:
http://github.com/norio-nomura/iphonetest/tree/master/CameraTest/Classes/

UIGetScreenImage post:
https://devforums.apple.com/message/149553#149553
http://www.tuaw.com/2009/12/15/apple-relents-and-is-now-allowing-uigetscreenimage-for-app-st/

http://pastie.org/235372.txt

Government Standards for Cloud Computing Security

http://csrc.nist.gov/groups/SNS/cloud-computing/index.html

Thursday, December 24, 2009

Useful iPhone SDK articles

http://www.edwardbenson.com/2009/01/10/creating-a-blocking-nag-screen-with-nsuserdefaults/
http://iphoneinaction.manning.com/iphone_in_action/2009/09/core-data-part-4-inserts-updates-deletes.html
http://developer.apple.com/iphone/library/documentation/UIKit/Reference/UITextField_Class/Reference/UITextField.html#//apple_ref/occ/instp/UITextField/leftView
http://regexkit.sourceforge.net/RegexKitLite/
http://forums.macrumors.com/showthread.php?t=473359
http://blogs.oreilly.com/iphone/2009/01/defining-legal-input-character.html
http://stackoverflow.com/questions/433337/iphone-sdk-set-max-character-length-textfield
http://groups.google.com/group/three20/browse_thread/thread/1dc38ac9d7176b77
http://github.com/klazuka/TTRemoteExamples

Also another useful tip:
Also, you can pull up the preferences file in the plist editor if you are running on the simulator. You will find the file in (at least on the Mac):

~/Library/Application Support/iPhone Simulator/Users/Applications//Library/Preferences/.plist

Decompiling:
http://tungchingkai.blogspot.com/2009/02/how-to-decrypt-iphone-ipa-file.html
http://dvlabs.tippingpoint.com/blog/2009/03/06/reverse-engineering-iphone-appstore-binaries

Using zlib:
http://stackoverflow.com/questions/289274/error-when-import-zlib-in-iphone-sdk
http://samuraiblog.com/wordpress/2009/11/02/compressing-nsdata-objects-using-zlib/

Wednesday, November 04, 2009

PayPal Mass Pay Disbursements

How is Mass Pay Disbursements product different than Adaptive Chained Payments??
Their examples of disbursements include several overlapping scenarios:
  • marketplaces
  • rebates
  • commissions
  • affiliates
  • rewards
  • games/gambling
These are the same ones they discussed in the Adaptive Payments session.

Answer: volume --- MassPay up to 5K in one batch and async. Mass Payment API - 250 per API call and async. Pay API (adaptive) - up to 6 buy is realtime sync.

Hmmm seems that they pull each txn from the bank SEPARATELY. How much will the bank charge you!?

2% per payment with $1.00 max. Sender pays. Might be the new pricing model soon.

Friday, October 30, 2009

x264 encoding options

One of H.264's most useful features is the ability to choose among many combinations of inter and intra partitions. Pictures are typically in units called macroblocks. A macroblock has typically the size 16x16 pixels and can have different types. If the macroblock type is Intra (I), that part of the decoded image is completely replaced by a new texture, while if the macroblock type is Inter (P) the decoded macroblock data is added to what was previously decoded in that macroblock area. P-macroblocks can be subdivided into 16x8, 8x16, 8x8, 4x8, 8x4, and 4x4 partitions. B-macroblocks can be divided into 16x8, 8x16, and 8x8 partitions. I-macroblocks can be divided into 4x4 or 8x8 partitions. Analyzing more partition options improves quality at the cost of speed. The default is to analyze all partitions except p4x4 (p8x8, i8x8, i4x4, b8x8), since p4x4 is not particularly useful except at high bitrates and lower resolutions. Note that i8x8 requires 8x8dct, and is therefore a High Profile-only partition. p8x8 is the most costly, speed-wise, of the partitions, but also gives the most benefit. Generally, whenever possible, all partition types except p4x4 should be used.
When encoding with h264, the screen is divided into 16x16 Macroblocks (MBs). These blocks can be subdivided further into partitions of various sizes (16x8, 8x16, 8x8, 4x4). Motion estimation is better with smaller subpartitions, but the overhead for such information is similarly increased.
DCT block size is separate from motion block ("partition") size. The DCT has a choice of 8x8 or 4x4. Motion partitions have a choice of 16x16, 16x8, 8x16, 8x8, 8x4, 4x8, or 4x4. Each (inter-) macroblock has both a DCT block size and some motion partition size(s). The only restriction on combining the two is that partitions can't be smaller than their DCT blocks.
Small partitions provide better prediction but cost more bits. There is no overhead one way or the other for DCT block size, but also one isn't always better compression than the other, so that decision is based just on which one is more appropriate to the given image region.
That said, high profile also introduced a new intra partition, "i8x8". Intra partition sizes now have a choice between 16x16, 8x8, and 4x4, which is decided based on prediction quality vs bit cost just like motion partitions are.
In x264, --8x8dct enables high profile, and x264 allows i8x8 partitions if high profile is enabled, but they should not be confused as being the same feature.
Here's where I put most of my emphasis in the tests for mobile:
    -refs (FFmpeg)
    One of H.264's most useful features is the abillity to reference frames other than the one immediately prior to the current frame. This parameter lets one specify how many references can be used, through a maximum of 16. Increasing the number of refs increases the DPB (Decoded Picture Buffer) requirement, which means hardware playback devices will often have strict limits to the number of refs they can handle. In live-action sources, more reference have limited use beyond 4-8, but in cartoon sources up to the maximum value of 16 is often useful. More reference frames require more processing power because every frame is searched by the motion search (except when an early skip decision is made). The slowdown is especially apparent with slower motion estimation methods. Recommended default: -refs 6
      -deblockalpha (FFmpeg)
      -deblockbeta (FFmpeg)
      One of H.264's main features is the in-loop deblocker, which avoids the problem of blocking artifacts disrupting motion estimation. This requires a small amount of decoding CPU, but considerably increases quality in nearly all cases. Its strength may be raised or lowered in order to avoid more artifacts or keep more detail, respectively. Deblock has two parameters: alpha (strength) and beta (threshold). Recommended defaults:-deblockalpha 0 -deblockbeta 0 (Must have '-flags +loop')

      me_method (FFmpeg)

      dia (x264) / epzs (FFmpeg) is the simplest search, consisting of starting at the best predictor, checking the motion vectors at one pixel upwards, left, down, and to the right, picking the best, and repeating the process until it no longer finds any better motion vector.

      hex (x264) / hex (FFmpeg) consists of a similar strategy, except it uses a range-2 search of 6 surrounding points, thus the name. It is considerably more efficient than DIA and hardly any slower, and therefore makes a good choice for general-use encoding.

      umh (x264) / umh (FFmpeg) is considerably slower than HEX, but searches a complex multi-hexagon pattern in order to avoid missing harder-to-find motion vectors. Unlike HEX and DIA, the merange parameter directly controls UMH's search radius, allowing one to increase or decrease the size of the wide search.

      esa (x264) / full (FFmpeg) is a highly optimized intelligent search of the entire motion search space within merange of the best predictor. It is mathematically equivalent to the bruteforce method of searching every single motion vector in that area, though faster. However, it is still considerably slower than UMH, with not too much benefit, so is not particularly useful for everyday encoding.

      One of the most important settings for x264, both speed and quality-wise. Looking at full vs. hex vs. umh.

      -subq 6 (FFmpeg)

      1: Fastest, but extremely low quality. Should be avoided except on first pass encoding.

      2-5: Progressively better and slower, 5 serves as a good medium for higher speed encoding.

      6-7: 6 is the default. Activates rate-distortion optimization for partition decision. This can considerably improve efficiency, though it has a notable speed cost. 6 activates it in I/P frames, and subme7 activates it in B frames.

      8-9: Activates rate-distortion refinement, which uses RDO to refine both motion vectors and intra prediction modes. Slower than subme 6, but again, more efficient.

      An extremely important encoding parameter which determines what algorithms are used for both subpixel motion searching and partition decision. Checking 6 or 8.


See also:
http://www.gdargaud.net/Hack/HtcHero.html (Android notes)
http://multimedia.cx/eggs/video-coding-concepts-quantization/ (and see the Wikipedia reference for more math)
http://sites.google.com/site/linuxencoding/x264-ffmpeg-mapping

BlackBerry Video specs

For Wordpress Video, right now each format is hard coded, so we'll want to make this more intelligent. Rather than build this from scratch, I'll try this on encoding.com or similar to see if they handle it well:

Why Google Android 1.6 will FAIL

I think that Google has flown the coop if they think they can drop support for such an important technology:

Other Notes and Resolved Issues

  • This SDK release adds support for Eclipse 3.5 (Galileo) and deprecates support for Eclipse 3.3 (Europa).
  • We regret to inform developers that Android 1.6 will not include support for RFC 2549
  • The issue preventing adb from recognizing Samsung Galaxy devices (linux SDK only) has been fixed.

Google Android Video Codecs

http://blogs.zdnet.com/Burnette/?p=1133
and to see the gory details:

Typical streams look like this:

  • 3GPP - lower quality, H.263, AMR-NB audio, bit rates up to 192Kbps
  • MPEG-4 - higher quality, H.264, AAC audio, Bit rates up to 500Kbps
Also from the SDK 1.6:

Emulator Skins, Android 1.6 Platform

The Android 1.6 platform included in the SDK provides a new set of emulator skins, including:

  • QVGA — 240 x 320, low density (120 dpi)
  • HVGA — 320 x 480, medium density (160 dpi)
  • WVGA800 — 480 x 800, high density (240 dpi)
  • WVGA854 — 480 x 854, high density (240 dpi)

Besides these defaults, You can also create an AVD that overrides the default density for each skin, to create any combination of resolution/density (WVGA with medium density, for instance). To do so, use the android tool command line to create a new AVD that uses a custom hardware configuration. See Creating an AVD for more information.

Thursday, October 29, 2009

Characterization of Parts SoxR and SoxS

BioBrick Parts
BBa_K223000 SoxR
Bba_K223001 SoxS promoter

SoxR belongs to the MerR family of transcriptional regulators and is a sensor for the superoxide anion (O2) that is generated, for example, by redox-cycling compounds such as methyl viologen (paraquat [PQ]). SoxR is activated by O2- dependent reversible one-electron oxidation of its iron-sulfur center (10, 12, 20). Nitric oxide (NO) also activates SoxR, however, by a different mechanism than that observed for O2. The NO-dependent activation takes place via nitrosylation of iron-sulfur centers with displacement of sulfide to form dinitrosyl-iron-dithiol(cysteine) complexes (9).
Both the active and the inactive forms of SoxR bind the target promoters, but only the active form stimulates transcription, probably by promoting an allosteric rearrangement of the SoxR-DNA complex that favors transcription initiation (22, 18, 21). Notably, SoxR activates the expression of a single gene, soxS, which encodes the global transcriptional activator SoxS (18).
SoxS has homology to the AraC/XylS family of regulators and activates expression of target genes by binding to their promoters and recruiting RNA polymerase (26) and/or by binding to their promoters after forming a complex with the polymerase (15, 28).
Thus the question is whether we can characterize the enzyme/DNA complex change via some microfluidic flow properties, electrical properties, or other chip-based signal.

References
http://en.wikipedia.org/wiki/Allosteric_regulation

Wednesday, October 28, 2009

Sprint's Ortiva Video Optimization Gateway

* for mobile
* monitor each individual conditions
* reconstitutes content --- adjust frame speed, type for conditions
* rebuffer, modify content as needed
* smooth the curve

Is there an open solution to this?

* Link locaton to progressive vs. streaming video - download vs. stream based on link quality predicted vs. location, time

Android UI Tricks and Tips for Performance

  1. Pre-scale your views
  2. Use Adaptors
  3. don't use invalidate() - use a specific Rect or region. Use Developer Settings, show screen redraw tools on the device/emulator
  4. look at the view hierarchy in the emulator --- avoid deep trees --- wide trees are preferred.
  5. emulator shows the rendering time, layout time, etc
  6. Compound drawables - collapse views/levels in the view hierarchy (e.g. images adjacent to text)
  7. ViewStub -- hidden portions- keeps UI views to a minimum (review the Android source code for examples)
  8. use the tag instead of a dummy frame or layout node/tag --- again, always avoid hierarchy levels, use the tag to be the root in xml
  9. Use RelativeLayouts instead of linear layouts
  10. Try using Custom View and Layout children from the Base classes --- rather than expand the hierarchy by nesting standard layouts
  11. Standard java performance stuff --- watch GC, allocations (Allocation Tracker tool) String, understand Soft,Weak References

Wednesday, October 07, 2009

Nice Use of CSS Animation

http://www.widgetpad.com/61/

Cross domain communication

http://shouldersofgiants.co.uk/Blog/post/2009/08/17/Another-Cross-Domain-iFrame-Communication-Technique.aspx

Sunday, September 27, 2009

Can Someone Explain to Me When to Use jQuery live() vs. liveQuery() plugin?

http://docs.jquery.com/Plugins/livequery
vs.
http://docs.jquery.com/Events/live

Seems the 1.3 live() feature is more limited than the plugin? But will the plugin become deprecated eventually?

This seems to be the best thread so far, but I'll have to head over to IRC perhaps:

Maybe a benchmark on large DOMs comparing the 2 solutions? Anyone interested? If so I'll whip something up.

In any case it seems that the current recommendation is to use live() until you need liveQuery() for some event not supported by live()

Generative shapes and Processing

This might be completely incompatible, but maybe Processing is a good language for "generative bio assay" design.

Some interesting Processing (http://processing.org/) work:
http://www.michael-hansmeyer.com/projects/project4.html

Wednesday, September 16, 2009

Wordpress Video Issues to Fix

  1. Right now the short code doesn't have a way to specify only a thumbnail. So if you want a preview image, but NOT the player, you can't do that. Useful for some theme layouts that want to overlay text or something else on top of the preview (like another image, like NEW!)
  • EDIT: Actually there does seem to be a way to do this: get_the_excerpt(). Unfortunately, it inserts the entire HTML element, and some breaks, which is not what I need. I need ONLY the url. I have temporarily added a new function that bascially does the same thing, only returns JUST the URL. Should I make this a template tag? Or should we add an attribute to the short code?
  • 10/25/09 EDIT: OK finally figured this out (mostly). wp_theme_videolink() is something I added to the video.php file of the WP Video project. I will review the code with Hailin and se if we can get it committed. It is basically a version of the wp_add_videolink() that I modified to be for themes, not RSS feeds. Below is a version I used for the eVid theme.
$thumb = wp_theme_videolink();
//Was this : $thumb = get_post_meta($post->ID, 'Thumbnail', $single = true);


  1. There is no way to render the same video in 2 different sizes. For example the eVid theme has a top "Featured Videos" panel which it wants to size large (and widescreen, more later). But the same post can/will appear in the main page. Well, since the theme just dumps the_content(), it uses the same dimensions in both places. So you'd have to hack in a custom meta value to support "featured" vs. "normal", etc. Not elegant. Nicer if the shortcode allowed you to specify a css class and then you could do something like [wpvideo Ay1SIRji class="letterbox"]
  2. Need an option for the shortcode to remove the player elements. Might want to disable some of the controls for various views.
  3. How can you remove/change the Wordpress watermark that the player renders? If it's not already supported, add support for making that dynamic in the shortcode. Not only would someone want to remove it, but they might want their own site logo on it, OR they may want different images depending on category, tag, page, etc.
  4. Need to review/grok how cacheing works. Should cache all thumbnails so loads fast.
  5. Sitemaps --- how to load to Google, Bing, Yahoo

Tuesday, September 15, 2009

SynBio Licensing and SimTK

Met Chris of SimTK.org tonight at Hacker Dojo. Chris is helping on their web site. Very cool site.
Especially interested in : https://simtk.org/xml/rna-folding.xml
Might be very useful to carry out simulation after BioBrick part assembly in silico.

Also of interest to the synbiolib discussions on licensing:

Home brew Microfluidics

Home brew Photolithography - large structures - high resolution imagesetter with plastic transparency for a mask
  1. Lateral positioning of the mask and substrate, resist material, and distance between mask and substrate
  2. exposure process of how to transfer patters to the photoresist layer: contact printing or projection printing
  3. development: dissolution or etching of the resist pattern
Are there any home brew or small scale/cost variants of:
  • interfereometric lithography
  • chemical vapor deposition
  • thermal oxidation
  • Sol-Gel deposition
  • Spin coating
  • wet etching
  • chemical or chemical-physical dry etching
What about pattern transfer processes? Steps in general (additive):
  1. deposit functional layer over a substrate
  2. spin coat with photoresist (positive or negative photoresist)
  3. select mask type (dark or clear field)
  4. transfer pattern to the resist layer
  5. etch the structure into the functional layer
  6. wash away photoresist
For lift-off you coat the photoresist directly on the sustrate, put on the mask, develop, but then deposit functional layer OVER the photo resist such that subsequently waashing off the parts with photoresist+functional wash away, but parts with substrate only+functional stay in place.

Sources: Fundamentals and Applications of Microfluidics, Second Ed. Nam-Trung Nguyen, Steven T. Wereley

BioBrick Assay Plan

* in vitro protein expression and interaction analysis
* platform based on a highly parallel and sensitive microfluidic affinity assay
* exhaustively measured the protein-protein interactions of parts (promoter and gene to express) etc.

Monday, September 14, 2009

Microfluidics Notes

Tabeling, Patrick. Circa 2003-2005
Market size for gene chips, lab on chips, sensors, flow controllers, nozzles, valves, pressure measurement, actuators, relays, sata storage, strain sensors $4-8Bln in 2003 USD.
(from DARPA estimates, so how good can those be!!?)

Look at Burns, Johnson, et. al. Science 282, 484 (1998) for (first?) bio lab on a chip paper.

Issues: data acquisition from chip to computer. diagnoses times/latencies

Review Capillary electro-chromatography

Overall architectural considerations:
  • kinematic properties, transport properties, thermodynamic properties, surface tension, vapor pressure, surface accomodation
  • Fluidic interconnects - standards? universal interconnects?
  • pumps and valves - fabrication, integration, elastomers
  • fluid injection = volume vs. sensitivity (if you only have one cell worth of protein, and add fluid you need sensitive sensors
Use cases:
  • Confining molecules for detection
  • kinetics of chemical reactions
  • molecule activitiy
  • intensifying and analyzing role of surfaces in catalytic reactions
  • applying magnetic fields to conotrol reactions
  • labelling molecules
  • biomolecules with NEMS
  • amplification of other than DNA biological structures
  • non-newtonian fluids
Viable marketable serviecs with demand/practical application today
  • sequenced low-nanogram scale bacterial and mammalian DNA
  • support of more efficient PCR prep, see example overview of the issues here:
http://www.genomeweb.com/sequencing/qpcr-method-improves-454-sample-prep-may-also-help-illumina-solid-users
"Accurate quantification of the sequencing library is essential to achieve high yield and high quality sequencing. Inaccuracy in quantification is addressed by the manufacturers through ‘titration’ runs of the sequencer, which are used to empirically divine the concentration of productive DNA fragments in the sequencing library."

Overview of modern sequencing:
In vitro clonal amplification
As molecular detection methods are often not sensitive enough for single molecule sequencing, most approaches use
an in vitro cloning step to generate many copies of each individual molecule. Emulsion PCR is one method, isolating
individual DNA molecules along with primer-coated beads in aqueous bubbles within an oil phase. A polymerase chain
reaction (PCR) then coats each bead with clonal copies of the isolated library molecule and these beads are
subsequently immobilized for later sequencing. Emulsion PCR is used in the methods published by Marguilis et al.
(commercialized by 454 Life Sciences, acquired by Roche), Shendure and Porreca et al. (also known as "polony
sequencing") and SOLiD sequencing, (developed by Agencourt and acquired by Applied Biosystems).[17][18][19]
Another method for in vitro clonal amplification is "bridge PCR", where fragments are amplified upon primers
attached to a solid surface, developed and used by Solexa (now owned by Illumina). These methods both produce
many physically isolated locations which each contain many copies of a single fragment. The single-molecule method
developed by Stephen Quake's laboratory (later commercialized by Helicos) skips this amplification step, directly
fixing DNA molecules to a surface.[20]
Parallelized sequencing
Once clonal DNA sequences are physically localized to separate positions on a surface, various sequencing approaches
may be used to determine the DNA sequences of all locations, in parallel. "Sequencing by synthesis", like the popular
dye-termination electrophoretic sequencing, uses the process of DNA synthesis by DNA polymerase to identify the
bases present in the complementary DNA molecule. Reversible terminator methods (used by Illumina and Helicos) use
reversible versions of dye-terminators, adding one nucleotide at a time, detecting fluorescence corresponding to that
position, then removing the blocking group to allow the polymerization of another nucleotide. Pyrosequencing (used
by 454) also uses DNA polymerization to add nucleotides, adding one type of nucleotide at a time, then detecting and
quantifying the number of nucleotides added to a given location through the light emitted by the release of attached
pyrophosphates.[17][21]
"Sequencing by ligation" is another enzymatic method of sequencing, using a DNA ligase enzyme rather than
polymerase to identify the target sequence.[22][18][19] Used in the polony method and in the SOLiD technology offered
by Applied Biosystems, this method uses a pool of all possible oligonucleotides of a fixed length, labeled according to
the sequenced position. Oligonucleotides are annealed and ligated; the preferential ligation by DNA ligase for
matching sequences results in a signal corresponding to the complementary sequence at that position.

Saturday, September 12, 2009

Vote on Wordpress Video features

Here's a list of feature I already want to add for my own selfish needs, so vote early and vote often:
*Support L10N strings
*more examples of CDN usage (maybe a step by step for S3 integration)
*implement support for Zend JobQueue or Amazon SQS, or some open
source Q --- any recommendation? I typically use ApacheMQ.
*support for players other than Flash (I do a lot of mobile video stuff).
*support for live video casting

If you have others, please add to the comments. I am gearing up for a hack fest on this starting Monday, so let me know what you want!

Friday, September 11, 2009

Wordpress Video 1.0 Released

One issue I hit is that the server for ffmpeg2theora seems to have gone down? Maybe due to traffic from this project? Anyway, I was able to get the source code from:
and from source.

UPDATE: it finally came back up! Got the binary and used it.

To get a fully functional FFMEG from source: Make sure you have libtheora, ogg, vorbis, x264 already built/installed. (TODO: need to see what configure opts vorbis and theora need, just used defaults)

Note: I used --enable-shared on all builds where the README.txt from the plugin did not explicitly give a command.

Also needed http://liba52.sourceforge.net/downloads.html, and http://ffmpeg.arrozcru.org/wiki/index.php?title=Libgsm (NOTE: libgsm has a crappy makefile, so you need to build it shared manually AFTER the initial build. That cost me 30 minutes in the wee hours of the night.)

NOTE: Be sure to use the EXACT snapshot of x264 as documented, else there is a missing symbol and it won't be used. I wonder why such an old version? TODO: look into updating to the latest if some benefit.

Also DON'T USE latest: libdc1394-2.1.2.tar.gz -- but getting config error:
libdc1394/dc1394_control.h
seems to be needed but this header is not available in the latest src. Must have been deprecated.
with ./configure --enable-shared
This required libraw1394 (I used libraw1394-1.2.0.tar.gz to be consistent with the older libdc1394)
Also need:
yum install xorg-x11-devel
(I actually gave up on dc1394 for the moment since my image didn't have X and getting it built from source was a nightmare...just gave up for lack of time...36 hours of no sleep is not good for the body. Give me 2 hours a day and I'm good!

(later that AM...)

For FFMPEG, I had to apply this patch:
http://replay.origo.ethz.ch/node/126

For setup, do make sure to set the video-config.php AND the video-lib.php constants. If you get an error email about ffmpeg, it is probably because your ffmpeg is wrong. If you get an error like "doesn't exist" then it is probably because it cannot find the database (I had to grant additional access to a new server, for example.)

When I created a new video (WMV from http://www.jhepple.com/support/sample_movies1.htm) the "capture thumbnail" button mentioned on the new media shows up once you start playing the video --- nice touch, but to a sleep deprived person like myself, it was a brain teaser...need my 4th Peets today :)

Also, immediately after upload, I tried to create a post with the short code and it failed. This was once; I tried but could not reproduce it...probably a race condition in the processing vs. posting.

Some nice to haves: thumbnail in the media library view, add a "copy shortcut to clipboard" button, or multi-select in the media library view.








Thursday, September 10, 2009

Overview of the Wordpress Video plugin 0.9


Well it's not a plugin, per se, but really set of plugins, and some background tasks, and an overall suggested architecture.

The basic flow so far is:
(Image checked in by Hailin @ Auttomatic)

  1. Register an action (remote_transcode_one_video) so that when the user attaches a file, it fires off an exec() to run video-upload.php
  2. in the transcoder (now we are on another server, most likely) it checks the auth, saves metadata about size, etc. and uses FFMPEG to transcode
  3. now it sends the file to the file server (send_to_fileserver) where it moves the files into the final resting place, and has a placeholder for sending it off to a CDN or other replicant
Basically, it bounces off the user's blog (WPMU) to the transcoder via a post which in turn bounces it off to the file server. I presume all these scripts should have access to the exact same database (or slave relicant) in order to keep the processing steps in synch, and to update the post metadata about where the final resting place of the video is, and the video info on size, etc.

It seems to me that in the best case, it will require a lot of (real-time HTTP) chatter and not very efficient file transfer through all this. Some folks out there are posting about getting evertything working on one box --- which should be very possible, but frankly I'd be more interested in getting it off the blog sever. I think that was the right decision architecturally, even if it means more complexity for the admin. But I wonder if moving the files around serially is the best way. Maybe some sort of queue (Amazon SQS for the message) + an S3 input bucket for the transcoding workload. Then when the transcoding process finishes a file, it moves it into the fileserver "master" bucket, and the fileserver caches pull from that bucket?? I know it will be fairly complex with load balancing, and geographic distribution, so it probably needs more thought that 30 seconds :)


The key point is to make the HTTP chatter asynchronous. Also by putting the messages into a queue, and having the LRPs just pull from the queue, you get more robustness against error. I'm sure there are all sorts of race conditions by simply relying on the simple status messages , i.e. update_video_info( $blog_id, $post_id, $format, $status ). I think having a workflow that can pick up in the case of error, and at least show the admin which ones are in various error states would be nice.

The other option, assuming a smaller site without geographic distribution, is to put it all on NFS servers behind the main server. That way the file transfers are not HTTP based. If you're not on EC2 I can see where this might be a viable alternative. We do this in the colo now. But if you're on EC2, then simply writing to S3 seems to be the way to go.

I think straight away we should implement an S3 option for the file server, use encoding.com or similar for the encoding such that it writes to your S3 bucket, and then use CloudFront or other CDN for distribution.

If you really want to duplicate the features of encoding.com, and hack all the ffmpeg options, then let me know and I can set up a high performance transcoder AMI on EC2 with the scripts.

Some other observations:
  • the upload script and the transcode script checks if you have a DATACENTER defined, presumably for load and redundancy; this is a stub, so you would have to add some logic to round robin or randomize or in other ways pick a "transcoder" server and a "fileserver"
  • there is a simple authentication mechanism using a simple md5 salty string to auth to the transcoder -- would be nice to add per user auth to prevent spam or abuse, or to block users who post illegal content -- CERTAINLY users should change this string in production! maybe in the comments we can make that more explicit. Security by obscurity is never the right way to go
  • we are using some standard formats, but there are ffmpeg templates with many more examples, lots for mobile video, etc. we should import these, and just have a list in the options of all template files, or maybe even create the template inside a post, perhaps with some sort of GUI
*For the fileserver: One nice thing might be to combine some of the logic from the S3 or CloudFront plugins.
*for both file server and transcoder: Also, might be nice to incorporate some sort of stats database on usage, RTT, etc
*for both: Might also be nice to check the country of origin from GeoIP/maxmind and pick a mirror based on that
*for the transcoder: How does this differ from encoding.com? Are these complimentary/orthogonal? Are they substitutes?
*for the ffmpeg calls in the transcoder, a lot of the variables are hardcoded --- we should have some way to specify these for different video types in some option set

Monday, September 07, 2009

Some Bio Toolkits

http://emboss.sourceforge.net/apps/cvs/emboss/apps/remap.html

Also to look at:
http://www.cs.ucdavis.edu/~gusfield/strmat.html
http://hkn.eecs.berkeley.edu/~dyoo/python/suffix_trees/

Saturday, September 05, 2009

Famiglia

Of course we came in steerage!

record id219405
date arrived06-02-1888
nameAntonio Ficcaglia
aka names
gender, ageMale, 12
relation
occupationFarmer
destinationUSA
native countryItaly
city fromU
embarkation portNaples
purposeStaying in the USA
travel compartmentSteerage
ship manifest id81199
ship nameCachemire

record id219406
date arrived06-02-1888
nameMichele Ficcaglia
aka names
gender, ageMale, 41
relation
occupationFarmer
destinationUSA
native countryItaly
city fromU
embarkation portNaples
purposeStaying in the USA
travel compartmentSteerage
ship manifest id81199
ship nameCachemire