Tuesday, December 09, 2008
I hate it when my friends let me down
I was trying to make an Ajax request (using Prototype and link_to_remote) from Firefox to a JRuby on Rails application running in Tomcat over an https connection proxied by Apache Web Server. That's a lot of pieces that could go wrong. I was getting page not found in my poor little div. The link was being generated correctly and I could navigate to it.
Everything worked fine on my box. Of course, the problem was on the real server. Which is still in dev mode and hasn't been opened up so that I can connect to it from my desktop. Which means I have to open an X session to the server over ssh to launch a browser. Which doesn't have Firebug. Double check everything else. Have to find Firebug and get into the back end network...time passes...
Firebug said it was a 403.
It turns out Firefox doesn't pass the content-length header on POST requests. It turns out the sysadmin had typed the innocent looking phrase SecFilterEngine On into the Apache config. This causes Apache to reject requests without a content length header on POST requests. So secure I can't connect. I guess Firefox should send that header, but should Apache really demand it? Now I have a hell of a lot of working around this to do. To the casual observer / tester / customer - it's a bug, a defect, something wrong with my code. I should, and I wish I could, have a dev environment that even remotely resembled production. Amazing waste of a day. What's going to happen tomorrow?
Details here...
Tuesday, September 23, 2008
Rolling Restarts, Migrations, and Deployments

The solution we came up with (before we heard about SeeSaw) was to take half of the mongrels off line from the load balancer. Shut them down. Update them. Start them up. Put those mongrels back online in the load balancer and take the other half off. Shut the second half down. Update the second half. Start them up. This greatly minimizes the time where you have two different versions of the application running simultaneously. I wrote a windows bat file to do this. (Deploying on Windows is not recommended, btw)
A truly awesome solution to this would be a load balancer that is somehow aware of the version level of the balanced set and just makes the switch for you. Until that is invented, Apache mod_proxy_balancer is easy enough to control remotely.
It is very important to note that having database migrations can make the whole approach a little dangerous. If you have only additive migrations, you can run those at any time before the deployment. If you are removing columns, you need to do it after the deployment. If you are renaming columns, it is better to split it into a create a new column and copy data into it migration to run before deployment and a separate script to remove the old column after deployment. In fact, it may be dangerous to use your regular migrations on a production database in general if you don't make a specific effort to organize them. All of this points to making more frequent deliveries so each update is lower risk and less complex, but that's a subject for another response.
What's a defect? What's a missing feature?
Many software testers cause great frustration among software developers. One of the biggest issues that arises on agile projects is that testers have a hard time distinguishing between defects and features that have not yet been implemented.
It's always a bit of a challenge to deal with bugs- even outside of agile. What to some people is an obvious bug is a feature that was never requested to someone else.
For example, I was called into a meeting today where a bug that had been discovered during a user demo. The search was not "not finding telephone numbers in documents". The search term entered was 5551212. Some documents contained 555 1212. Some contained 555-1212. Some contained 1(703)555-1212.
They tried 555*1212, still not working. Search is broken. Developer suggests- try "555 1212", magic happens. * matches any character, not word boundaries...
It was an obvious problem...to the developers who understood that searching for "breakup" was not going to match documents containing "break up". With text, it's obvious, but I can certainly see where people might not see the issue with phone numbers.
We'll add normalized versions of phone numbers into the search index and we'll normalize search terms that look like phone numbers or something like that, but...this is not an insignificant effort. (Even though there is some decent code out there to handle it.) There are trade-offs in performance that have to be considered. Ask a tester though- they'll say found a bug; it's their job, and they want to be able to show how good they are at finding them. If it's a bug, we can't even mark our existing search work as complete. If it's a missing feature, we have to allocate it to the next release. Fortunately, a tester didn't find this one, so we don't quite have that problem, but the users do want the feature of being able to match many different formats of phone numbers.
It seems like a simple semantic difference, and it seems like developers are being too sensitive, but it's actually a big deal. Some "bugs" might actually end up costing a huge amount of money and not be worth fixing- particularly if they aren't really bugs. I have seen teams show excessive deference to testers and spend a time equal to the time spent working on the basic features to handle some edge cases that would never really occur and were better handled by error messages than by trying to do something useful. Meanwhile, the project sponsor is seldom asked to decide whether a bug should be fixed- few are even estimated as to the cost.
I say- if you want to fix a bug, you have to pay up. If you are really smart, you do do the five whys and find the root cause, but if it happens to be something that it never occured to anyone to ask for, try not to ask the developers to take care of it on their own time.
Tuesday, September 16, 2008
Reading about Hadoop
There is a two day Hadoop Camp at the upcoming ApacheCon in New Orleans. Learning about Hadoop is a great way to become familiar with some of the innovations that Google has put forward in the last few years- and to see the technology behind Yahoo's big set of nodes. What follows is the beginning of Mr. White's book, I am looking forward to the chapter on HBase.
"Hadoop was created by Doug Cutting, the creator of Lucene, the widely-used text search library. Hadoop has its origins in Nutch, an open source web search engine, itself a part of the Lucene project.
Building a web search engine from scratch was an ambitious goal, for not only is the software required to crawl and index websites complex to write, it is also a challenge to run without a dedicated operations team — there are lots of moving parts. It's expensive too — Cutting and Cafarella estimated a system supporting a 1-billion-page index would cost around half a million dollars in hardware, with a $30,00 monthly running cost. Nevertheless, they believed it was a worthy goal, as it would open up and ultimately democratise search engine algorithms.
Nutch was started in 2002, and a working crawler and search system quickly emerged. However, they realized that their architecture wouldn't scale to the billions of pages on the web. Help was at hand with the publication of "The Google File System" (GFS) [ref] in 2003. This paper described the architecture of Google's distributed filesystem that was being used in production at Google. GFS, or something like it, would solve their storage needs for the very large files generated as a part of the web crawl and indexing process. In particular, GFS would free up time being spent on administrative tasks such as managing storage nodes. In 2004 they set about writing an open source implementation, the Nutch Distributed File System (NDFS) as it became to be known.
NDFS and the MapReduce implementation in Nutch were applicable beyond the realm of search, and in February 2006 they moved out of Nutch to form an independent subproject of Lucene called Hadoop. Yahoo! hired Doug Cutting and, with a dedicated team, provided the resources to turn Hadoop a system that ran at web scale. This was demonstrated in February 2008 when Yahoo! announced that their production search index was being generated by a 10,000 core Hadoop cluster.
Earlier, in January 2008, Hadoop was made a top-level project at Apache, confirming its success and dynamic community."
Tuesday, September 02, 2008
Another day, another blog
Friday, August 08, 2008
The KML Handbook

Wednesday, August 06, 2008
SpatialKey - Flash Mapping...
Thursday, July 31, 2008
Tribes...

Seth Godin got me to join triiibes- his amazing scheme to get people to pre-order his new book
I have been interested in the tribal dimensions of human behavior for a while. Ray Immelman's Great Boss Dead Boss
Another book a long the same lines, but with a completely different approach, is Status Anxiety
I do have a slight anxiety of being part of the sucker tribe for joining up with this stuff...well, I guess the sucker tribe is not really a tribe as much as it is the people excluded from the non-sucker tribe. Anyway, the book's pretty cheap, so it looks like the cost of briefly suspending my natural cynicism won't exceed $15.
Thursday, July 17, 2008
Job ad

We need a few more "can do it all" types at our company. I struggle with how to list these positions, because not all of the skills are required, but we need a few specializing generalists or generalizing specialists or what have you. I read "Hiring Technical People" (the book and the blog), but it's still not easy. Anyway- we are still making our stock plan available to people hired in 2008, so if you know anyone that is looking...we've got a smart team. If anyone wants to point out obvious errors or inconsistencies in the technology laundry list, it would be greatly appreciated, as I don't know all of this stuff.
GIS Integration Engineer
Duties to include: Integration and testing of a cutting edge commercial imagery / video storage product into various types of commercial tools and custom government software. This includes integrating an encoder for various data types and integration of the viewers also. The work will involve working with customer site representatives to analyze and product the solutions to interface the software together. This could involve using the products c API, Scripting languages, WMS, Java or .Net wrappers. If you are a hard core geospatial person with minimal programming experience or a expert generalist programmer with a little geospatial experience and/or a solid math background, please apply.
We are looking for people with some subset of these skills:
GIS, Imagery analysis, c and c integration, Java, .Net, SQL and database programming, Python, Perl, Unix Shell Scripting, Windows and Unix Installation packages
OGC standards (WMS, WCS, WFS), PCI Geomatica, ERDAS, ESRI, ITT, GDAL, OSSIM, RemoteView, Imagine, FalconView, SocetSet, Warp, Manipulation of commercial imagery, IKONOS, QuickBird, NTM, Aerial and Aerial Video feeds,file types such as GeoPDF, GeoTIFF, NITF, JPEG2000, pix, Shapefiles, MPEG and other video formats, NTM, Calculus or Trig, Orthorectification, Programmer on Unix variants and/or Windows
Location: Sterling, VA, USA, 20164
Compensation: Substantial!
resumes to: jobs@lmnsolutions.com
Friday, July 11, 2008
JRuby- The Element of Surprise

It's really suprising how much faster things seem to get done. Now, I know that's sort of silly. People can work fast in any technology that they are good at. In fact, I think the source of most of the disagreement in technology and product selection (which is a plague in the Java world, the GIS world, the database world, etc.) is that people want to use what they are best at because it allows them to shine. I ultimately only care about the end result. I don't mind switching to some technology I know nothing about. It is harder for me to provide value, but I love nothing more than learning new things and evaluating them.
Anyway, that's why JRuby is even more awesome. Let them write Java. The JVM is a environment where we can all get aong.
Saturday, June 21, 2008
Some fun agile observations
2. I think proposing software concepts like Enterprise Service Catalogs seems decidedly un-agile. No one really wants that crap. Just publish, document, and version some HTTP APIs for your apps. 100% of real developers would take some sample code that uses your service API over a catalog.
3. I was reading "The Toyota Way" this morning. Just the foreword and the preface while my kid was at Lil' Kickers. Some choice paraphrasing- Page xii: Embrace change. Page xv: None of the individual elements are special, but system as a whole is important. Success derives from balancing the role of people in an organizational culture that expects and values their continuous improvemnts, with a technical system focused on high value added flow. Did Kent Beck ever work there?
4. I've been getting some flak from one of my business partners because I can't get agile implemented in any useful fashion on the current project that I am working on. We're dealing with multiple several hundred page requirements documents that are written into the contract. And they all suck. Agile has to start earlier- in the contracting phase, as Alistair Cockburn (pronounced "Coe-burn") pointed out. Regardless, there is mandatory scrum training- we must be able to pretend that we are agile.
5. As I recently twittered, my Wii Fit thinks I'm a "yoga master" but in reality I can't touch my toes (without bending my knees). I think there is a good analogy for agile there- I love it, I just don't live it.
Sunday, June 01, 2008
What do computers want to do?
What if there is a book someday called, "What Computers Won't Do"? This of course would mean...that they want to do something. What would they want? When I was young I thought...power? (not in the megalomania sense, more in the megawatt sense) This is before I knew about Maslow. ("Honey, would you please fix the computer? It needs to be self-actualized again.")
But then again, what is "wanting"? Rocks don't want to roll down hills...but Sisyphus wants to push them up.
Monday, May 12, 2008
BigTable, NoJoins
I am still having trouble getting my head around the challenge of "no joins" and systems that do a lot of writes. The fundamental issue is- at some point you have to do so many writes to update the denormalized data that locking becomes a performance problem.
I was looking into updating my stupid application that was striving towards the NetFlix prize to make it use one of these services, just for fun. It's fine for that- no users making updates. But what if it were a system where you had many people updating the same data? It turns out that Google's BigTable actually handles that pretty well- with a very tight time driven history of updates.
Digging into HBase now trying to see if I can get some of that action.
Monday, April 28, 2008
Fewer, better developers
Even worse is just taking the first person to show up with the right languages on their resumes. That's happened to me quite a bit in the consulting world. Hey- train this new guy, get him up to speed. Three months later, he's still trying to get Word running. Once I was training someone to be the overseas system administrator for a system I had built. I walk over there and she's trying to talk into the mouse. It was a long week of training.
Depending on your needs, you may want a place to train new people. May I humbly suggest not throwing those people out there on consulting assignments? How do you do that as a consultant, where you don't want to put junior people out there? They've got to go out there with mentors and protectors. You've got to have a few big jobs where they get some support. But still it happens to us, where we have people that just end up out there alone on assignment. It can suck even if you are good, but it must be hell when you are not.
Maybe consulting shouldn't be someone's first job...
Monday, April 21, 2008
Don't Mix Transactions and Reporting

"You don't need to keep all historical data online in the same database. In particular, reporting and ad hoc analysis should never be done in the production database....Data mining, reporting, or any other kind of analysis should be done in a true warehouse anyway. The OLTP (Online Transaction Processing) schema is no good for data warehousing." -Michael Nygard, Release It!
I am working on a weird incremental system replacement project at the moment. It's odd in that they are replacing the reporting, search, and view parts of the system, without replacing the forms. A lot of the design and architecture is optimized for those operations. We're all really quite worried about what happens when we have to edit some of this data- turn into an OLTP. It's hard to update de-normalized data without all sorts of locking issues.
My current silly idea is to try to tell people that the schema and system that we are working on now is really just the reporting and search system. The "real" system replacement schema will be coming in the next increment. It sort of pains me, because the "genius" solution we came up with for incremental system replacement on my last project would have worked even better here- one screen at a time, one table at a time. Plenty of complexity, to be sure, but you can get new stuff out there tomorrow.
As it is, we are running a ridiculously high risk of creating shelfware at the moment. Sigh.
Sunday, April 13, 2008
Reading is FUNdamental
I didn't watch any of them. With A/V you are really making an investment in watching/listening. With text content, it's so easy to skim, skip the boring bits and figure out if you want to read something in depth. With A/V in the browser, you hit the big play button and then start to stream. You can't even jump ahead easily until you build up your buffer (okay, only takes a second, thanks FIOS!). I've watched and listened to things at high speed (current project indoctrination videos, this means you), and then, you're still listening to this thing. Podcasts are great for the car. Thirty second skip is great for skipping commercials, but can move through text just with a few long saccades.
I guess it comes down to this. I can read faster than I can listen, with the added bonus that I can skim text even faster than I can skim audio content. Thus text is more efficient for communicating a lot of information. I won't dispute that A/V content is richer, and can create more of a connection, but I just don't have time for all of that. I have to write blog posts for other people to skim.
Tuesday, April 08, 2008
Google App Engine

I got my Google App Engine account. I haven't gotten too far yet, "Hello World", a few silly deployment things. I am curious to see how nice the web app framework is. I am really curious to mess around with BigTable and see if I can put it to any good kind of use.
So far I'd compare it to Heroku (Rails thing on AWS), but the web framework standard WSGI is interesting. People really enjoy programming in Rails- it's unlikely, but possible, that Google's framework will achieve that. Just about any normal sort of web app thing you want to do in Rails is already done for you- and often done in a way that is better than I'd be able to do it myself in a reasonable amount of time. [update- it looks like you can run Django on there, minus the relational stuff, of course]
I know Google hired Guido, but I think they would have hit a home run if they had used a Rails stack with this. Of course, the piece that is "missing" is the relational database. BigTable might be a decent replacement, but certainly would be a serious adaptation of ActiveRecord.
I was looking around to see if anyone had done a knockout port of ActiveRecord to use SimpleDB, since that is somewhat similar to what we are doing on my own current web application project. We have stepped away from the RDBMS and are going with MarkLogic. It's great for reads, but I am really worried that doing updates is going to kill us.
Anyway, it's an interesting road. I'm going to hack some Python tonight and get my app deployed. Hope you're having fun too!
Monday, March 17, 2008
Release It!
Sample:
"Integration points are the number-one killer of systems. Every single one of those feeds presents a stability risk. Every socket, process, pipe, or remote procedure call can and will hang. Even database calls can hang, in ways obvious and subtle. Every feed into the system can hang it, crash it, or generate other impulses at the worst possible time. You’ll look at some of the specific ways these integration points can go bad and what you can do about them."
I wasn't really interested in the book until I heard the podcasts he did last year. A lot of sound advice in there...
Monday, March 10, 2008
Java for Rails developers
Resist urge to yell: "Stop making everything so complicated!". Merely state it politely, as often as you can. Perhaps a recording would come in handy here.
Much of what Spring does with dependency injection is better handled by either reflection or simplifying code.
The DAO pattern makes no sense when compared with the majesty of ActiveRecord.
Oh no, they use XML for THAT!?!?!? How much XML? Oh no...
Wednesday, February 20, 2008
Why Maven Sucks (as compared to good old ant)

Maven is a build tool for Java. It suuuuuuuuuucks. I am not alone in thinking this.
Get out while you still can.
[Ship, 2007]
Any other community would be like "what the hell is this?"
[Rocher, 2008]
While you may think of the Maven 2 version you downloaded as building your code, it actually is just an engine that by itself, can't build squat. When you start Maven 2, it will try to download the latest and greatest of every core plugin, the bits that actually do the work. This means that even though two developers are using Maven 2.0.8, they could be using different plugin versions and therefore, the build is not consistent.
[Brown, 2008] (Maven patch submitter)
Maven suffers from a lack of flexibility and robustness
[Hanin, 2008]
It might cost you as much time to build your first Maven project as it does to LEARN Ant and build your first complex project.
[Carapetyan, 2008] (Maven evangelist)
Do they have a chapter called "Why do the repositories suck so much?" :)
[TSS Comment on free Maven book]
It introduces additional dependencies in your build process.
- Dependencies suck. The simple formula for operational availability is to multiply the availabilities of the interdependent systems. When you start with maven, there are a lot more things that have to be running- just to be able to compile some code. This is a measurable cost.
It introduces additional complexity in your build process.
- You are adding a lot of code to a project by adding Maven. You are adding dependent jars without necessarily knowing where they came from. If you build your own jar repository- it is a significant amount of work. If you don't, you are at the mercy of the Internet.
It is hard to debug when something is going wrong.
It took us a couple of days to get working. And then it immediately broke again. With completely different errors on three separate developer machines.
The side of evil will persistently keep trying to shoehorn maven into their projects and pretend they’re enjoying it. The side of good would rather gnaw off their own unmentionables than touch maven, and merrily keep getting things done faster, better, and lighter with ant (without paying Bruce Tate any denomination of any currency).
[Hani, 2004]
[Edit, 2009]
New graph of Maven adoption curve!
Go check out the above link for more...

Thursday, February 07, 2008
Business and IT, together again
− Robert McGill, Standard Life, Edinburgh, Scotland
From Mary Poppendieck's slides here. The point being that this is IT+Business as an extension of the XP "onsite developer" concept.
The most rewarding period of my career was like this: working as part of a mission team, creating software for them, as it was needed to get our job done. Then came the CIO- in the name of killing off "duplicate" efforts they came at us. We had been smart and extracted a product from our work and had been offering it to other divisions in return for help in funding the organization. So was another group. Instead of taking a market based approach and letting the two applications find their niche- they had to merge us.
Oh, and at the same time as you merge the two systems, take these complex desktop applications and turn them into a web application that is 10 times slower and has almost none of the capabilities of the other systems. And use web standards that prevent adding usability features. And do everything with this XML services stuff that seems to be the future. Cue abject failure.
And then it happened again. And again. Soon the CIO is populated by tons of PowerPoint pushers with agendas and favorite technologies. Development talent is promoted into positions where they don't...make things. Budget is growing. No one I work with has seen a user for a long time. Does any of this software ever reach users? All we hear is they hate it and don't want it...So why are we doing it?
I've had the fortune to work on lots of good projects, but I have to say the whole concept of sucking IT out of the business and making it it's own shop that makes things for the business is not as effective as the opposite in 90% of cases I have seen. I do believe in core services of identity management, PKI, etc. But very very thin core services. I believe the CIO should have a budget to fund integrating core technologies and to sponsor the cost of implementing minimal standards across groups. But overall, it's a concept that doesn't make sense for most organizations.
According to Ms. Poppendieck, "You don’t get to be world class by chasing 'best practices'. You get there by inventing them." I don't care who invented it- let's just do it.
Monday, January 14, 2008
Yet another reason to raise my rates

Apparently, people drinking wines actually enjoy them more- based solely on the price.
The researchers discovered that people given two identical red wines to drink said they got much more pleasure from the one they were told had cost more. Brain scans confirmed that their pleasure centres were activated far more by the higher-priced wine.
I am not sure I want to activate customers' "pleasure centers", but if higher rates can lead to satisfaction- it's a win-win!! I can sorta see the logic too- the customer is thinking, "Wow, this guy seems slow, but he's the most expensive one we could afford, so I guess it must be how much consultants costs these days."
Then again, Seth Godin says we should be giving away stuff for free. Oh, maybe that's what this blog is! Even then, it's not worth the price.
Thursday, January 10, 2008
The Backbone of SOA is Standardization?!?

Hurry out and buy this report on SOA from Input. I read the insightful reformatting of it on Washington Technology. It is frankly amazing how this publication can continue to peddle such inane drivel. Apparently, SOA is about standardization now:
SOA calls for agencies to standardize their technologies, which could create a significant shift in the market, said Deniece Peterson, senior analyst with Reston, Va.-based Input’s Executive Program.
“Standardization is the backbone of SOA,” she said. “For providers who usually supply proprietary solutions, this will force them to find other ways to be best-of-breed. If they don’t, they’ll just become very easily replaced.” On the other hand, providers who understand SOA and can market their offerings accordingly stand to gain.
Well, at least it's not about alignment anymore. I am glad this genius is spreading her BS straight to the executives. Do they notice when everyone says SOA is about something different?
At least I understand how it works now. Years ago, some developer decided that applications should have APIs (and the running instances of those APIs are services) so the data+application could be used in a loosely joined confederation of multiple systems. And lots of other software architect types noticed and gave it a name and said it was good, while slipping in their own favorite protocol or application in there. So, before anyone really agreed on what it meant, the word was out that SOA was good. Since basically all it means is good, any fool can say whatever jibberish they want is SOA as a shorthand for saying it's good in some technical way you don't understand.
Googling "backbone of soa" is instructive. I found these:
"Backbone of SOA is Service"
"Web services are the backbone of SOA"
"ESBs have become universally accepted at the backbone of SOA"
"ESB is the backbone of SOA"
"we suggest that you seriously consider an en-masse deployment of AMQP as the backbone of SOA"
"Messaging is the backbone of SOA"
"XML is the backbone of SOA"
"With processes serving as the backbone of SOA-based composite applications"
"distributed computing, which is the backbone of SOA"
"It's very appealing to mainframe customers, and the System z, which is the latest mainframe, is fully enabled to be the backbone of SOA"
With all of these backbones, maybe we should switch the software design metaphor from architect to anatomist. I am personally in favor of REST+JSON as the middle finger of SOA.
Frequent readers of my blog may recall that the appeal of the content-free term is simple: people in the business appreciate being served by IT, as opposed to the unfortunate situations that always seem to occur when IT departments are giving the job of enforcer of business rules via the replacement of efficient ad hoc systems with official electronic processes from which thou shall not deviate.
Are you being served?