Saturday, December 02, 2006
Java Architecture Exercise
---Vitruvius, Roman architect circa 40BC (requoted from Wikipedia)
Whever anyone asks me to review or define an architecture, the first thing I am reminded of is something Grady Booch said in a lecture I attended once: " What is an architect? An architect is someone who thinks up really cool ideas and gets someone else to create them!"
Hmmm...very often the "genius" of the architect gets lost in translation, and I end up reviewing code that leaves me thinking that an architect is someone who causes trouble...well, at least in software engineering.
To me an architect is someone who can create, critique, compare, contrast, measure, quantify, qualify, rareify, etc. etc., an "architecture". So what is an architecture (in software)? Again, to cite Grady: "Every system has an architecture. Some are not so pretty." In other words, there is no system without an architecture, and conversely, no architecture without a system. One cannot exist without the other.
As an aside, let's consider the most well-known architectural disciplines --- that of building construction. Here, architects are legends and architectures are obvious! No one would ask: What is the architecture of the White House, or the Empire State Building, or even my house. There are categories, and periods, and patterns, and styles that define an architecture for buildings in telescoping detail. As much or as little detail as is needed for the discussion. If you are a constrcution engineer, the architect drops detailed and standardized (digitized) drawings. If you are an interior decorator, you need only define the style and period perhaps. But everyone knows what it is and why and what makes something a good architecture, and what makes it bad (though taste and preference come into play, anyone can articulate WHY they prefer one to another.)
None of this is true today in software. So back to the question: what is a software architecture. The short answer: the set of constraints on a system's design, and therefore code. The long answer: it depends.
Today's exercise will be one of those "depends" cases. Someone dropped a huge pile of Java code on me and asked me to 1) define its architecture, and 2) redefine a better architecture. Note, that there was no definition of what "better" was. So join me if you will on this excursion through time and space...(check back soon)
Friday, November 03, 2006
Is OpenID Snake Oil?
So, if i register as Bill Gates on OpenID, post a blog with a picture I scraped from Wikipedia, and then start blogging on how much I love BSD and how I was forced to work at Microsoft all those years by aliens...then users of your system could detect that this is phony, and really NOT bill gates?
I now control billgates.pip.verisignlabs.com. I opened an account on LiveJournal. Now, I can post whatever I want and as long as the users don't explicitly apply a healthy amount of scrutiny, what's to stop me from influencing world events? And how often do the consumers of mass media scrutinize what they're reading?
So, here comes a very reasonable, and laudable project - OpenID - that wishes to make it "easier" to trust id through Diffe-Hellman shared secrets. Interesting? Yes. Open and therefore fully trustworthy? NO!
People trust authority. People want to trust governments and companies. They can't, but they want to. People trust what people see other people doing. Therefore, if OpenID is successful, and governments and companies do rely on them, then abuse of this scheme could cause trouble via mistaken or deliberately stolen urls/ids/etc:
* lost Jobs
* lost business
* lost credibility
Employers are now searching candidates on Google and blogs. Imagine "John Smith"...what if there are 5 of them in your town? What if one of them is a Neo-Nazi? What if the employer HR person doesn't have 30 minutes per candidate to scrutinize all these blogs and verify true human identity? What if their HR system AUTOMATICALLY trusts OpenIDs!!
Paranoia? Of course. Should be be concerned? I think so. Is it better than what's out there? Probably, but I reserve judgment.
Reprise:
Kim Cameron has a concise, brilliant synopsis of a valid and useable id system:
"Whatever it is, a real identity system needs us to do a lot better. In particular, the identity system must extend to and integrate the human user.
The Law of Human Integration
The universal identity system MUST define the human user to be a component of the distributed system, integrated through unambiguous human-machine communications mechanisms offering protection against identity attacks.
One of the people who has thought long and hard about these issues is Carl Ellison. He has coined the term Ceremony for interactions that span a mixed network of human and cybernetic system components. Carl worked on this idea when he was at Intel and I interview him about his work here." -- KC's blog
Thursday, November 02, 2006
A RoR-ing Good Time
8:00 PM
Dinner's done; Top Chef on Tivo skimmed. Ready to dive in.
8:03
Laptop finally Loads. I hate XP. Where's my old Fedora laptop anyway?
8:04
http://www.rubyonrails.org/docs
Seems like a good place to start.
Tutorials...hmm, ok. http://www.onlamp.com/pub/a/onlamp/2005/01/20/rails.html
OK, download stuff. No problem.
8:12
http://rubyforge.org/frs/?group_id=167
58.1Kb/sec! Comcast is blowie.
Ruby One Click still downloading...
Sure, it's about time I get a good UI for MySQL...why not
http://www.heidisql.com/
8:20
Ruby *finally* installing. Select all options. Disks are cheap.
meanwhile...reading on...http://www.onlamp.com/pub/a/onlamp/2005/01/20/rails.html?page=2
Hmmm...cooking metaphors...well, coincidentally I love to cook, but as with food, I never follow recipes to the letter...so let's change this one as we go.
8:25
Ah! Ruby install finished. Pop the stack and install the MySql GUI.
8:26
Cannot watch installer any longer...back to tutorial...
gem install rails --remote
8:28
OK, got me a Heidi GUI....create connection to my'SQL
8:31
Heidi looks good. Nothing fancy, nothing too complex. Brilliant. OK, back to the main thread...rails still installing...Y, Y, Y all dependencies...sheesh! Note to RoR maintainers...add an All option to gem.
...sigh...still installing rails...
8:40
Yikes, I thought I had apache on this machine. Hmmm...Tomcat....JBoss....hmmm, Java anyone? Well, time to download...Apache or something else. Let's see...
http://people.apache.org/~rbowen/presentations/apacheconEU2005/hate_apache.pdf
Too funny!
http://trac.lighttpd.net/trac/wiki/PoweredByLighttpd
OK, well good enuf for YouTube billions, good 'nuf for me.
8:55
Hey, I cannot find a valid link to a Windows lighttpd! What gives. OK OK, right, so I shouldn't be using Windows ;) Well, back to Apache for now.
8:57
Apache installing. Sheesh, this is why software development is in the stone ages...it takes an hour to just get your tools ready. Imagine if it took an hour for your doctor to get productive and ready to work...but I digress!
9:01
Dammit Jim! Skype uses port 80, too. Fine fine fine.
9:03
Hmmm...the tutorial assumes WEBrick...need the HOWTO for Apache integration.
9:07
Yikes. http://blog.duncandavidson.com/2006/01/deploying_rails.html
Hmmm...so not only do I need to venture out from my happy Java (ok and PHP) world, now I've gotta ditch my ol' feathery friend?! Sigh..times they are a changin'.
Well, then I see http://blog.codahale.com/2006/06/19/time-for-a-grown-up-server-rails-mongrel-apache-capistrano-and-you/
OK, so now 2 people mention a "proxy rails requests" approach. Hmm...gotta say, so far, installing Tomcat and deploying WARs seems like fun compared to this stuff...or better yet, PHP...still, remain calm and take a deep cleansing breath.
9:11
Let's try Mongrel. Why? Well I like the name better.
http://mongrel.rubyforge.org/
Hey, and it uses that nify gem thingy:
gem install mongrel
I dig it! Now I remember why I like consoles and not GUIs...well until my fingers cramp.
9:14
C:\temp\rails>gem install mongrel
Bulk updating Gem source index for: http://gems.rubyforge.org
Select which gem to install for your platform (i386-mswin32)
1. mongrel 0.3.13.4 (ruby)
2. mongrel 0.3.13.3 (mswin32)
3. mongrel 0.3.13.3 (ruby)
4. mongrel 0.3.13.2 (ruby)
5. mongrel 0.3.13.2 (mswin32)
6. mongrel 0.3.13.1 (ruby)
etc...
Huh? I guess what's behind door #1?
9:18
Uh, guess I chose poorly:
Building native extensions. This could take a while...
ERROR: While executing gem ... (RuntimeError) ERROR: Failed to build gem native extension.Gem files will remain installed in c:/ruby/lib/ruby/gems/1.8/gems/mongrel-0.3.13.4 for inspection.
Let's try door #2...
OK, now we're cooking! Install as service.
Done! Beautiful.
9:21
Uh oh. Lot's of mongrel errors.
Hmm..where did I go wrong....Aha! a little further down the page...http://mongrel.rubyforge.org/docs/win32.html
Well, ok this then points back to where I was. Does anyone ever proof read these!? C'mon guys. Well, I'll figure it out.
9:26
OK, mongrel_service installed.
Now where was I...oh yes, the tutorial. I should have been a doctor. Over an hour and still no progress.
9:27
C:\temp\rails>mongrel_rails service::install -N activation -c C:\temp\rails -p 4444 -e development!!! The path you specified isn't a valid Rails application.service::install reported an error. Use mongrel_rails service::install -h to get help.
Well, I'll admit that installing an empty project seems silly. But that's what the tutorial said to do (well ok, but not using mongrel, so I'm on my own at this point.) Fine. Deep breath. Think happy thoughts.
Let's just carry on and assume that the empty project installed and ran. After all, what is the sound of one hand clapping?
9:31
"Rails knows where to find things it needs within this structure, so you don't have to tell it. Remember, no configuration files!"
Laudable...until of course you need to change something...but let's not go there for now.
Back to the tutorial...
9:33
"First, it's important to understand how controllers work in Rails and how URLs map into (and execute) controller methods."
Uh, ok. First --- I *do* understand what these frameworks should do, shouldn't do, can do, cannot do, MVC, MVC2, AJAX (when it was just called Javascript and XML!), patterns, etc. etc....
BUT...WHY should I NEED to know? To me, the entire point of a framework is to get up and running and focus on the problem domain and solution space. Period. If I *need* to understand the evolution of flyweights, adapters, and mix-in classes, then the barrier to entry is high. Forget sending this off to junior programmers...god forbid that an MBA with an idea tries to prototype it on their own!
Of course, I suspect that to follow recipes, you don't really need to know this, well, insofar as you can shoot yourself in the foot. Still, I dislike already that I need to know so much about software to make this all work...but I digress.
9:48
Found my happy spot. Trouble running script\generate, but after some contemplation, realized I was off by one dir. Now I remember why I hate console UIs :)
Continuing with tutorial....
Oh wait! That's probably the reason Mongrel couldn't install the app before! Aha! Neurons are still firing and axons continue to myelinate! Neurotransmitters all 'round!
9:57
Recognition errors....what!? What's this underscore nonsense??? Uh, ok. Well, now I get some syntax error...oh yes, a typo. How surprising. Gee, and we wonder why static typing and a compiler is useful...but hey, as some article said, just write unit tests. Uh, sure. Unit test Hello World. Whatever.
Now I am getting really cranky. 2 hours, I FINALLY have Hello World. So far, I cannot understand why anyone likes this RoR stuff unless they never bothered to learn real programming. OK, well, let's go get some tea and calm down. After all, all these kids can't be completely wrong...there must be SOMETHING useful here. Remember, you also hate PHP for the same reason and you use that for some things. Relax. Deep breath....tea.
BACK IN 15
10:15
OK, so I checked on something tangential:
http://superhappydevhouse.org/
By complete coincidence, it's tomorrow night! Spooky. Maybe fate?
And forgot my tea.
10:24
OK, back online. Create the database and configure rails:
"Rails lets you run in development mode, test mode, or production mode, using different databases. This application uses the same database for each."
Well, now that IS useful. So my persnickety diatribe was perhaps a bit premature.
OK, well since I am deviating from the recipe and actually trying to create a useful service (much as I enjoy cookbooks), I need to pause and sketch out the data model. Ignoring all sorts of analysis and architecture, so it seems. Hmmm...well I don't presume that ANY framework that exists today is an ARCHITECTURAL framework. But I digress...
Today, in a StartupCamp session, the problem of viral user acquisition and "critical mass" was discussed. So I figure one nifty service would be to connect service providers with service consumers who can help them reach the necessary "activation" energy. So, why not create a site that tracks the viral "energy" for a site - both in terms of individual energy and group kinematics. I guess this would be like trackbacks, but rather than links, the "energy" is determined by ratings. The site allows services to embed a tool on their sites that let users anonymously or (for fun and profit) identifiably rate the service across one or more dimensions (some standard, some created by the service provider, some created by the users). Then the site also connects the users by allowing them to refer a friend in the embedded tool.
The idea is meant as a means, not an end, so don't tell me why it's a bad idea. I don't care. I just want something potentially interesting and more importantly, real enough and complex enough to put RoR through its paces. Mmmmm, tea.
10:42
OK, that will suffice for "business modeling" and "use case analysis". Skip ahead to data design.
I'll need a SQL database of the following (NOTE: I AM NOT A DBA OR DB DEVELOPER OF ANY SKILL):
users : id(primary key), uid(string) - ignoring security for now
assets: id(pk), content(string)
targets: id(pk), name(string)
ratings: id(pk), score(integer)
...now here's where it gets tricky...
graph: id(pk), edgetype(choice:directed, undirected)
nodes: id(pk) , color(string), data(string)
edges: id(pk), weight(float), source(int), target(int)
11:13
OK, try my new Heidi GUI...
11:16
enough GUI nonsense; I'll script it later! Taking too long. Let's get back to the tutorial with our new database.
11:38
Well
http://localhost:4444/user/new
Now I have a db, a model class, a controller class...and hey...no HTML written, and I get a form for all the basic CRUD! Pretty cool. Still, it's pretty brain dead simple at this point. Let's see how we continue.
"We now have an amazing amount of functionality, by merely building a database table and typing in a single line of code. It may not be pretty yet, but we'll fix that soon enough."
Well, I'm a bit more prosaic, but yes, I always like a free lunch.
11:47
I always like to look under the covers...
http://rails.rubyonrails.com/classes/ActionController/Scaffolding/ClassMethods.html
11:48
"Rails tries very hard to present the user with pretty URLs. Rails URLs are simple and straightforward, not long and cryptic."
Hmmm, laudable perhaps, but not really important. I think users are trained to pretty much cut and paste or use the browser or tools to pass around links.
11:51
"Let's create our own view template for the list action that only shows each recipe's title and date."
Oh, boo! Now I am crotchety again.
" If you have worked with JSP or ASP pages, this will look familiar."
Yeah! But that's what I hoped to avoid! Sigh.
12:15
Have finished the tutorial. Skipped all that template code for now; I don't need a fancy LnF for now. Reading:
http://www.slash7.com/articles/2005/1/24/really-getting-started-in-rails
Hmmm, I wonder what ActiveRecord does with table Cacti, or Fungi?
12:29
Well, I completed skimming of partII of the tutorial, looked at some supplementary sites and have my very simple scaffold-y rails app. Overall, it shows promise. Where to go from here? Ruby syntax? Code recipes? HOWTOs? Hmmmm....let's get some sleep and see what the StartupCamp session tomorrow...in 9.5 hours...reveals! Can't wait to get some feedback from real RoR users tomorrow. G'night!
Startup Camp: Anti-Mashup
Build a new web service from scratch in days...for fun and profit?
(a.k.a. Startups for the hyper-impatient)
(a.k.a. "Stealth mode" is for weenies!)
Topics include:
* Which framework(s)
* Which language(s)
* What skills required
The goal is not to lecture but to listen to real-world examples, hear tales from the trenches, and provide starting points for those interested in bringing a new service to the web...NOW.
Some data.
I did a simple search on HotJobs (keyword, 94043), first as a job seeker:
Ruby - 80 jobs
(Ruby Rails - 26 jobs)
PHP - 419
Python - 387
Java - 2342
(Java Spring - 100)
(J2EE - 886)
Web 2.0 - 150
Is Web 2.0 a good financial bet for engineers? Hmmmm....ok, now the demand side. (I have an employer account.) Search for keyword, Mountain View, CA and metro area, resumes from last 30 days:
Ruby - 36
(Ruby Rails - 7)
PHP - 353
Python - 93
Java - 1000 (hmmm, this seems like too round a number...perhaps their search only returns 1000 max?)
(Java Spring - 868)
(J2EE - 1000, ok now I *know* that something's funny. Fine, assume Java >>1000)
Web 2.0 - 1000 (again)
So, unfortunately HotJobs won't cooperate. If I get a chance, I'll try Dice. But still, to me, as an employer OR an engineer, the financial incentive to move from Java to Web 2.0 seems murky at best. Let's see what the group thinks.
If the short answer is: use rails stupid. Then perhaps we can take this forward through an exercise. To facilitate this, I propose the following... We take a hypothetical service idea, sketch the design, and then debate the architecture. Here are the rules I have in mind:
1. Must require data (what service doesn't)
2. Must require mass market UI appeal for web browsers
3. Must scale (architecturally and design, not business model) to >10M users within 3 months
4. Must allow for user content creation
5. Must have a "reasonable" chance of implementation within 48-96 hours
6. There are no other rules beyond these.
To start, I will propose the following hypothetical idea, but others can be presented and majority rules.
Idea 1: A site for startups to post projects to developers to work on for free in exchange for royalty payments if and when they are successful. Think of it as a site for "Venture Technologists" --- engineers who have knowledge capital, but no money. Startups who have ideas, but no engineers and no money, promise to reward participants for their contributions when/if they generate *Revenue*. (I think we all know profits are so pre-Web 2.0!)
Let's not get into legal details, business plans, etc.; let's focus on the architecture and tools in this session.
Here's a bootstrap list of frameworks I know of and have looked at...some briefly...some painfully over months of labor...please add more!
- Java - Spring, ServiceMix, Sails, Tapestry
- PHP - CakePHP, PRADO, Zend, Code Igniter
- Ruby - Rails
- Perl - Catalyst
- Python - Pylons
- Content Management Systems (in some apps, I consider these frameworks) - Drupal
Relevant links:
http://javaboutique.internet.com/reviews/ruby/
http://www.onjava.com/pub/a/onjava/2005/11/16/ruby-the-rival.html?page=1
NOTE: for the above idea, I hereby declare that any and all participants who contribute code will share equally in any proceeds if the service is actually developed and coded. Who knows!? I reserve final judgment to adjust allocations for those who contribute materially in non-code ways, or people I find amusing or entertaining. And myself of course :)
Saturday, October 14, 2006
A cursory look at ftp://ftp.ncbi.nih.gov/genbank/genomes/H_sapiens/ shows that the latest data weighs in at just over a Gig, compressed! I don't know if that contains annotations, SNPs, STSs, Cytogenetic Bands, etc. (Anyone care to give me some real world figures?)
The new X Prize requires the complete sequencing of 100 humans. So that's roughly 100GB compressed. So for decompression, add another 2X for temp working space. That's a total of 300GB of data. A quick trip to Fry's gets you that for under $300 these days, and falling.
But now consider that for quality control, archival purposes, analysis, transmission, sharing, etc. you will almost certainly need multiple copies. OK, so round up and be very conservative. Assume for each batch of 100 human genomes, you will need 1TB of storage. Now we're getting a bit pricey, but still well within the reach of any individual with a few K dollars to burn.
As of today, there are ~300 million (documented) US residents. That's 3M TB == 3Exabytes (10E18). Today, major financial datacenters top out in the petabyte range. Still, ignoring the technology advances needed, because they will exist soon enough, it will still be a massive outlay. If disk costs halve every 2 years, in 5 years (the anticipated completion of the X Prize), the cost per TB for off the shelf storage could be as low as $100 USD. That's still $300M USD. Certainly possible given the coffers of a large Fortune company, a wealthy institution, or the US Government. Note that this is just for the raw storage; no labor, no electricity, no cooling, no repairs, computing power, etc. OK, round up to $1B and it is still a plausible US Government effort.
Now, for those of you who (like me) foresee no end of medical advances when we have such a trove of genetic data readily available...wait! readily available?...how are you going to disseminate all this data to the large number of academic and corporate scientists who could benefit from the data? Ever try downloading 1GB of data, or 1TB? Surely we can burn it to optical media and transport whole copies to various local repositories (at $1B a pop, give or take); but even then, you will need database tools to analyze it, software to crunch the data, graphics to visualize the data, etc. etc.
What's the point? There needs to be an effort to match the X Prize effort --- a "D" prize for storing, transporting, and using the data that results from the X Prize.
Now, what happens when the technology is available to instantly (in relative terms) sequence every homo sapien in the US? Who polices the use of this data? In California, they routinely (by law) take a blood sample of every child born, which is maintained in a bank under somewhat vague authority. I looked into it briefly when my daughter was born. There wasn't much publicly available info. I'm no conspiracy theorist, neither am I paranoid, but to me, the threat of losing one's private genetic data is of the utmost concern. Not panic, but genuine consideration.
If one thing is certain, this data will be gathered, and there will be infinitely wonderful changes in human existence because of it. However, there will also be criminal abuses of the technology. What laws and enforcement policies do we need in place to deal with the inevitable evils? Or maybe we'll find the genes responsible for such malevolence and eradicate them? Hmmm...I predict a new occupation: genomic attorney!
Saturday, October 07, 2006
Yahoo Mail APIs: Cathedral or Bazaar?
As best I can tell from what little info is available (noting that I was not a participant of Hack Day, or in any way part of the Hack cognoscenti), a primary difference between the free and premium API access will be that the free account APIs will not allow full email content access. I can only assume that this is to protect the revenue stream for premium features derived from full content access, e.g. email archive.
I can only assume that, in the short term, premium users represent so large a revenue source for Yahoo that it is willing to risk stifling long-term creativity. To be fair, it would be a difficult pro forma analysis: maximize the feature set available to developers who want to expand the utility of the platform vs. cannibalize current revenue streams. In a corporate setting, this is what we call a CLM.
Yet, from the sidelines, I would assert that if a user has already made a choice to use free accounts, it seems unlikely that they would switch to premium for any reason relevant to the APIs. (Ah, something my microeconomics prof lectured about is buzzing in my head...but I think I was asleep that day...) Wouldn't it be more likely that users from other mail services would migrate to Yahoo --- both free and premium --- if more creative and innovative services were available? If anyone from Yahoo would care to send me the data, I would gladly develop a predictive model, gratis. (Not holding my breath.)
It certainly may be that some of the premium services such as archival could be re-implemented by 3rd parties; I would assert that rather than eroding Yahoo's business, it would open new --- and most likely unexpected --- services that increase the attractiveness of the platform. Admittedly, this is only an opinion and would benefit from some number crunching. If I have a point here, it is this: has anyone at Yahoo crunched those numbers, or is it just that no one wants to speak out in the cathedral down in Sunnyvale?
"If you have the right attitude, interesting problems will find you." -Eric Steven Raymond