Back around the year two-thousand, shortly after the world failed to end on schedule, I worked as a sysadmin for a company now known only as “LexisNexis”. My job consisted of building out automated deployment and provisioning systems, all of which were written in Perl, and some web-based frontends to those tools, most of which were written in PHP.
Even in those days, when we had to walk uphill – both ways – just to get to our compilers, web development was a lot of fun, and I wanted to learn more. Since the company I worked for sold a large webapp built using Serious Enterprise Tools, I figured that if I wanted to get into Serious Web Development, I should probably learn Java.
So I signed up for an “Introduction to Object-Oriented Programming in Java” course at my local community college.
They might as well have taught me to play the banjo.
If you’ve ever taken a class on Object-Oriented Programming (OOP), or read through a book that explains the fundamental concepts, you probably saw an example – somewhere in the chapter where they explain what objects are – that looked a bit like this:
class Animal # methods shared by all animals end class Pig < Animal # pig-specific code end class Dog < Animal # dog-specific code end
There’s a mountain of weapons-grade badness here, but we’re going to focus
on one specific concept: the idea that objects are things. Maybe not
a Pig
or a Dog
, but certainly a User
, a Shipment
or a Product
.
It’s such a sneaky idea. So subtle. I didn’t even realize this how I was thinking of objects, until years later, when I received a much-needed smack-in-the-head by a talented coworker. What makes it super awful is that things work just fine… at first.
To see what I mean, we’re going to need to look at some real-world code.
Let’s make a mess.
Many moons ago, I was tasked with adding search to an application for a client. This isn’t the exact code I wrote for them, of course, but it’s a good idea of my general approach at the time. I’m also hugely simplifying the code, so that we can focus on the insidious – rather than the obvious – badness.
This particular client sold things on the internet, and the application was their online store, little more than a catalog with a shopping cart. The client wanted to add search to make products more discoverable, which is simple enough.
Since Product
is an object, it should know how to search
for itself, so
let’s add a static method to the class:
class Product < ActiveRecord::Base # Many, many other codes live up here. def self.search(query) keywords = split_query_into_keywords(query) build_finder_from_keywords(keywords) end def self.split_query_into_keywords(query) query.split(/\s+/) end def self.build_finder_from_keywords(keywords) keywords.inject(Product) do |model, keyword| model.where("description LIKE ?", keyword) end end end
Even if you’re not that into Rails, it should be easy to work out what’s
going on here. We’ve added a search
method to the Product
class, which
splits a user-supplied query apart by spaces, then searches for products
that match all terms in the description.
The next step is to try and make our search a little more relevant, by sorting the resulting product list by popularity. This isn’t so bad; we can just add another method to handle ordering:
def self.search(query) keywords = split_query_into_keywords(query) finder = build_finder_from_keywords(keywords) order_results_by_popularity(finder) end def self.order_results_by_popularity(finder) # Big SQL goes here. end
Still not too bad. Now, customers are price sensitive, so the client wanted
customers to be able to search by price as well as by the description. So if
the customer typed in something like $30
, then we should find products in
that price range:
def self.build_finder_from_keywords(keywords) keywords.inject(Product) do |model, keyword| result = maybe_finder_from_price(model, keyword) result || maybe_finder_from_keyword(model, keyword) end end def self.maybe_finder_from_price(model, keyword) return unless (match = keyword.match(/^\$(\d+)/)) price = match[1].to_i lower = (price * 0.90) upper = (price * 1.25) model.where("(price > ? AND price < ?)", lower, upper) end def self.maybe_finder_from_keyword(model, keyword) model.where("description LIKE ?", keyword) end
Things are starting to get a little uglier. It’s all contained within
Product
, but at every step, we’re adding more and more methods that do
nothing but hang off of Product.search
, and this is in a
simplified-for-a-blog-post example. The real code was as appealing as a a
pork milkshake.
The next thing my client wanted was suggestions. Given a user’s purchase history, recommend items that other people with similar tastes have purchased, that are in the same price range as what the user is searching for.
Now we’ve got a problem. The database queries aren’t that bad, but now
Product#search
needs to know about the user performing the search, and we
need to pass back more than just an array of results… where do we even put
things? Maybe just return a hash?
def self.search(query, user) keywords = split_query_into_keywords(query) finder = build_finder_from_keywords(keywords) ordered_results = order_results_by_recent_popularity(finder) suggestions = find_friend_suggestions(ordered_results, user) { results: ordered_results, suggestions: suggestions } end def self.find_friend_suggestions(products, user) # More SQL goes here. end
Of course, this breaks all of the code that was expecting Product.search
to return an array, so we’ll need to update that. And next in the pipeline
is saving which queries led to products being purchased…
Things are officially ugly.
Product
has a bunch of new methods, all of which get called from
Product#search
. We’ve got variables getting passed around everywhere, and
have no place to add out-of-band results to the search query.
Not to mention, we can’t store that user
or query
anywhere, because
we’re in the Product
class, and this is a multithreaded app, so if anybody
else runs a search at the same time, we’ve got an nondeterministic bug with
undefined behavior (those are buckets of fun to track down).
Every step was small and sensible, and yet I had neatly managed to wedge my wooden footwear deep inside the machine I was trying to build.
One object to rule them all.
By thinking of the Product
as the bucket that should hold all
product-related methods, I had accidentally created a God
Object, and not one of the cool
gods like Thor or Dionysus.
Cthulhu had come to my land of milk and honey.
Starting out, this felt like a very natural approach. Objects know everything about themselves. Rather than telling each object how to do something, you’re supposed to ask the object to do things for you – which is what I had been doing.
But this approach leads to a violation of the conjunction rule we touched on
when we talked about writing readable
code. To describe all of what Product
does, we need large assortment of “ands”, a choice selection of “ors” and
more than a few “buts” – and probably lots of profanity as well.
So it seems that we need to split things apart, but how?
You must construct additional pylons.
Let’s take a step back.
We write software to solve problems. This process involves taking what we know about the problem, figuring out what sort of solution would be acceptable, and then figuring out the bits in between those two things.
Those in-between bits usually involve turning data into other, different data. Our programs choreograph the process of turning web requests into data structures into credit-card transactions into boxes sent to peoples' houses on their birthdays – and in return, get a paycheck and an assortment of bagels on Thursdays.
To my mind, this sounds like a story – the epic tale of Alice Places An Order For Artisan Coffee. And stories are told by actors, which as it turns out, is a much better metaphor for how objects behave:
Each actor has a well-defined role: the hero, the villain, the plucky assistant, and so on.
When an actor does something out-of-character, the audience notices. We might not be able to predict the entire plot, but we have a good sense of how the character should behave.
Every action moves the plot forward.
Who knew that thespians had such a solid grasp of software engineering!
When I stopped thinking of objects as things, and started thinking of them as actors, something magical happened inside my brain.
Make a mess, then clean it up.
Let’s go back to our problem and try breaking things into a more sensible
set of objects. At the moment, we’ve got two roles in one body: the Holder
of Product-Knowledge and the Searcher of Things. We can start by moving
search out of Product
:
class Product < ActiveRecord::Base def search(query, user) ProductSearch.new(query, user).results end end
…and into a separate ProductSearch
class:
class ProductSearch attr_reader :query, :user def initialize(query, user) @query = query @user = user end def results keywords = split_query_into_keywords(query) finder = build_finder_from_keywords(keywords) order_results_by_popularity(finder) end private # Followed by all the search methods from Product. end
Leaving Product.search
where it is is intentional. Other code depends on
search
being where it is, and we don’t want to have to change any
more than is really necessary. This is a the essence of refactoring – the
code is cleaner, but we’re not adding any new features.
Now it’s time to tackle the problem we got stuck on before: adding suggestions to the search results. This creates a new role in our code: the Suggester of Products:
class ProductSuggester def initialize(finder, user) @finder = finder @user = user end def results # Many deep SQL magics. end # Other methods that also do magic. end
Which we then need to tie into the existing code. Right now, results are
returned as a plain Array
, which is no longer versatile enough to handle
the role of Holder of Results.
Since we don’t want to rewrite everything that relied on search
returning
an array, we can craft a replacement that respects the Array
interface, on
to which we can hang a set of product suggestions:
class ProductSearchResult include Enumerable attr_reader :products, :suggestions def initialize(products, suggestions) @products = products @suggestions = suggestions end def each(&block) products.each(&block) end end
Now that there’s a place to put those product suggestions, we can tie the final pieces together:
class ProductSearch def results keywords = split_query_into_keywords(query) finder = build_finder_from_keywords(keywords) products = order_results_by_popularity(finder) suggestions = ProductSuggester.new(finder, user).results results = ProductSearchResult.new(products, suggestions) end end
Not only is each piece small enough to understand, but we now have a clear path for adding new functionality going forward – and we didn’t even need to make drastic changes to the existing codebase.
A Natural Consequence of Extreme Sanity.
Even though we started with (my) tangled mess, it wasn’t that nasty to fix, because we rethought the code first, rather than just scrapping everything and starting over from scratch.
I’ve led my share of rewrites, and experience tells me that even if you rebuild everything, if you don’t change how you code, then the end result will probably be a worse mess than what you started with.
For me, this approach – thinking of objects as a set of actors working together to solve the problem – has been not inly incredibly enlightening, but a key part of learning to build software that can scale.
It’s a pity it never came up in any of my programming classes. Clever algorithms and data structures are certainly useful, but I would rather have learned first how to not bury myself – not to mention the rest of my team - under a pile of technical debt.
Maybe I should have studied theater instead…