Headshot-color me@jbrains.ca Find out where I'm appearing
« Previous 1

"Integration Tests Are A Scam" Is A Scam

I made this announcement at a TDD course in Bucuresti this week, and I wanted to make it here: I now fully and freely admit that when I use the term “integration tests” I confuse people unnecessarily, and so I will stop.

As a result, you will notice me change from “integration tests” to “integrated tests”, because I believe the latter term better fits the meaning I intend to convey as well as avoids confusion with what everyone else means by “integration tests”. I agree to reserve the term “integration tests” for tests that focus on checking the integration points between subsystems, systems, or any nontrivial client/supplier relationship. Integration tests might be integrated tests, and might be collaboration tests. Your choice.

I apologize for the confusion I created, and appreciate you for hanging in there while I refactor my considerable library of legacy articles. That will take time and I can’t make it my full-time job.

February 14, 2010 13:53 integrated tests are a scam

Using integration tests mindfully: a case study

Gus Power commented about the way he uses integration tests in his work.

Interesting series of articles & comments. I also read Steve Freeman’s article in response to the same topic. It’s got me thinking about how we work and I thought I’d take the time to describe it here.

You define an integration test as “… any test whose result (pass or fail) depends on the correctness of the implementation of more than one piece of non-trivial behavior.” We have many such components that exhibit such non-trivial behaviour in the products we create, many of which are not developed by us. And we have integration tests to verify they work. I’m not just talking about 3rd party libraries and frameworks here, I’m referring to the whole system: caching layers. load balancers, DNS servers, CDNs, virtualization etc. When we build software it only becomes a product or service for our users when it has been deployed into a suitable environment; an environment that typically contains more than just the software we have written and packaged. Since our users’ experience and perception of quality result from their interaction with a deployed instance of the whole system, not just their interaction with the software at a unit level, we have come to value end-to-end integration testing. I believe there’s merit in testing these components in symphony and will attempt to clarify what kind of integration testing I’m talking about.

For a given piece of functionality we write an executable acceptance test in human readable form (for web projects we typically use some domain-specific extensions to selenium, for services we have used FIT and it’s ilk, sometimes we roll our own if there’s nothing expressive enough available). We run it against a deployed version of the application (usually local though not always) which typically has a running web/application server and database. The test fails. We determine what endpoint needs to be created/enhanced and then we switch context down into unit-test land. A typical scenario would involve enhancing a unit test for the url mappings, adding one for the controller, then one for any additional service, domain object etc. When we’re happy and have tested and designed each of the required units we jump back up a level and get our acceptance test to progress further. The customer steers the development effort as he sees vertical ‘slices’ of functionality emerge. The acceptance test is added to a suite for that functional area. The continuous build system will then execute that test against a fully deployed (but scaled down) replica of the production environment, with hardware load balancer, vlans, multiple nodes (session affinity) and so forth. Any additional environmental monitoring (e.g. nagios alerting) is also done as part of this development effort and is deployed into the test environment along with the updated code.

Setting up the infrastructure to do this kind of testing takes investment, both initial and ongoing. The continuous build needs to be highly ‘parallelized’ so you get feedback from a checkin in 10 mins or less (we’re heavy users of virtualization, usually VMWare or OpenVZ). The individual acceptance test suites need to be kept small enough to run quickly before check-in.

Benefits of this approach
* The continuous context-switch between acceptance test and unit test is key to our staying focused on delivering what the customer actually wants.
* The customer has multiple feedback points that he can learn from and use to steer the development effort.
* It confirms that the whole system works together – networking, DNS, load balancing, automated deployment, session handling, database replication etc.
* We create additional ‘non-functional’ acceptance tests that automatically exercise other aspects of the system such as fail-over and recovery.
* Upgrades to parts of the system (switches, load balancers, web caches, library versions, database server versions etc.) can be tested in a known and controlled way.

We’ve caught a number of integration-related issues using this approach (a few examples: broken database failover due to missing primary keys, captcha validation not working due to a web cache not behaving correctly, data not persisting because one database server had the wrong locale) and stopped them before they have reached our users. We have used the feedback as a basis for improving our products and their delivery at a system level.

OK this reply has now become far too long :-/ It would of course be good to discuss this in person sometime :)

—Gus Power

Thanks for the substantial comment, Gus. For those who don’t know Gus, he is one of the joint recipients of the 2009 Gordon Pask Award for contribution to Agile practice. I invite you to follow his work and learn from his example. On to the substance of Gus’ comment.

Gus, it appears you do not use integration tests to check basic behavior exhaustively. While I try not to use integration tests to check basic behavior at all, I mostly hope to stop programmers from attempting to write exhaustive integration tests that check basic correctness conditions. I wrote in Not Just Slow: Integration Tests are a Vortex of Doom about the vicious cycle I see when teams rely on integration tests to check basic correctness. I encourage them to stop that particular insanity. I would hesitate to use integration tests as even smoke tests for basic correctness, but if I found myself in a situation where I needed to write such tests, I’d do it, then look for ways to render them obsolete.

Also, you mention writing “human-readable acceptance tests”, and I certainly use such tests in my work. When I counsel against using integration tests, I advise it within the context of programmer tests only. While I strongly encourage teams to allow even some of their acceptance tests to check policy or business rule behavior directly and in isolation, I understand and agree that one generally needs to write some acceptance tests as integration tests.

In general, you describe using integration tests quite purposefully, mindfully, and responsibly. I expect no less from a practitioner of your caliber. I would truly enjoy working with you on a project.

Finally, you mention that your integration tests catch system-level issues, such as a broken database schema, mistaken cache integration, and so on. I expect integration tests to find only, or at least mostly, these problems. None of these sound like basic correctness problems.

So Gus, I appreciate you for writing a great description of using integration tests well. I wish we had more examples like this. I truly wish I saw more examples like this. Sadly, I don’t: I see teams trying to check basic correctness issues with plodding, brittle, misleading tests. For those, I stress the need to eliminate integration tests.

January 31, 2010 08:00 testing, agile, design, integrated tests are a scam

Not Just Slow: Integration Tests are a Vortex of Doom

“Aha! So @jbrains is really against the integration tests just because they are too slow for hourly use”

It reminds me about the Ferrari IT story (XP team, dozens of deployments a year on many continents) that started from getting a big visible counter of a total number of tests and wrote just big amount of any tests first. You need to start somewhere and getting large integration tests is definitely better than nothing. As long as you are prepared to improve the testing practices later. —Artem Marchenko

I agree with this sentiment. I tell the story of my very first attempt at test-first programming1, how I wrote about 125 tests, many of which fit my definition of “integration test”, and which took 12 minutes to execute. This meant that, on average, I only made 8-12 edits per hour when writing that code. I recognized then, and I still recognize now, that even making only 8-12 edits per hour—4-6 edits per hour towards the end—that I produced better software than I did when I would write code almost continuously for several hours at a time. As much as I disparage those integration tests today, I appreciated them a great deal at the time I wrote them. I find integration tests useful for finding system-level problems, as the first step in fixing a mistake, and if I genuinely can’t write a focused object test, then I will usually write an integration test.

As you say, Artem, I simply don’t stop there.

When I label integration tests a scam, I mean to emphasize the self-replicating nature of integration tests. It starts simply enough: you write a handful of integration tests, which give you a lot of freedom to implement your design in a way that introduces unfortunate dependencies, which makes focused object testing quite difficult. As a result, you will probably resign yourself to writing more integration tests, which do nothing to improve your dependency problems, and the cycle begins again.

Integration tests help cause pain, even though they appear to help reduce pain. Therein lies the scam.

I must acknowledge this: if you started writing tests this week, or this month, or even this year, then you will probably benefit more from writing integration tests than trying to write perfectly focused object tests. I have said and written elsewhere that I believe a programmer needs to write about 1500 to burn into her brain the basic patterns of good tests. Even so, as you write those tests, I want you to remain aware of the cost. Even if you don’t know how to write a good, focused object test, if you want to write more such tests, and especially if you try to write more such tests, then I will have completed the first phase of my mission to eradicate programmer reliance on integration tests to show the basic correctness of their code.

Join us! Turn one integration test into a small suite of focused object tests today. If you don’t yet see how to replace an entire integration test with equivalent focused object tests, then write at least one or two focused object tests along side the integration test. Try it. I promise, you’ll like it.

1 I use the term test-first programming to refer to test-driven design without the evolutionary design part. With test-first programming, I develop a specific design, then I use tests to help me type it in correctly.

One last comment to my good friend Artem: please don’t put me to sleep with the word “just”!

Part 4: Surely we need integration tests for the Mars rover!

Recently, “Guest” commented about my Agile 2009 tutorial, Integration Tests Are A Scam. “Guest” wrote this:

A Mars rover mission failed because of a lack of integration tests. The parachute system was successfully tested. The system that detaches the parachute after the landing was successfully – but independently – tested. On Mars when the parachute successfully opened the deceleration “jerked” the lander, then the detachment system interpreted the jerking as a landing and successfully detached the parachute. Oops. Integration tests may be costly but they are absolutely necessary.

I don’t doubt the necessity of integration tests. I depend on them to solve difficult system-level problems. By contrast, I routinely see teams using them to detect unexpected consequences, and I don’t think we need them for that purpose. I prefer to use them to confirm an uneasy feeling that an unintended consequence lurks.

Let’s consider a clean implementation of the situation my commenter describes. I see this design, comprising the lander, the parachute, the detachment system, an accelerometer and an altimeter. A controller connects all these things together. Let’s look at the “code”, which I’ve written in a fantasy language that looks a little like Java/C# and a little like Ruby.

Ashley Moran has posted a working Ruby version of this example. If you speak Ruby, then I highly recommend looking at that example after you’ve read this.}

Controller.initialize() {
  parachute = Parachute.new(lander)
  detachment_system = DetachmentSystem.new(parachute)
  accelerometer = Accelerometer.new()
  lander = Lander.new(accelerometer, Altimeter.new())
  accelerometer.add_observer(detachment_system)
}
          
Parachute {
  needs a lander
  
  open() {
    lander.decelerate()
  }
  
  detach() {
    if (lander.has_landed == false)
      raise "You broke the lander, idiot."
  }
}
                        
AccelerationObserver is a role {
  handle_acceleration_report(acceleration) {
    raise "Subclass responsibility"
  }
}
                        
DetachmentSystem acts as AccelerationObserver {
  needs a parachute
  
  handle_acceleration_report(acceleration) {}
    if (acceleration <= -50.ms2) {
      parachute.detach()
    }
  }
}
 
Accelerometer acts as Observable {
  manages many acceleration_observers
                                    
  report_acceleration(acceleration) {
    acceleration_observers.each() {
      each.handle_acceleration_report(acceleration)
    }
  }
}
 
Lander {
  needs an accelerometer
  needs an altimeter
  
  decelerate() {
    // I know how much to decelerate by
    accelerometer.report_acceleration(how_much)
  }
}
 
view raw This Gist brought to you by GitHub.

I need to test what happens when I open the parachute. The lander should decelerate.

testOpenParachute() {
  parachute = Parachute.new(lander = mock(Lander))
  lander.expects().decelerate()
  
  parachute.open()
}
 
view raw This Gist brought to you by GitHub.

Since this test expects the lander to decelerate, I have to test that. When the lander decelerates, the accelerometer should report its deceleration.

testLanderDecelerates() {
  accelerometer = mock(Accelerometer)
  lander = Lander.new(accelerometer)
  accelerometer.expects().report_acceleration(-50.ms2)
  
  lander.decelerate()
}
 
view raw This Gist brought to you by GitHub.

Since this test shows that the accelerometer can report acceleration of -50 m/s2, I have to test that.

testAccelerometerCanReportRapidAcceleration() {
  accelerometer = Accelerometer.new()
  accelerometer.add_observer(observer = mock(AccelerationObserver))
  observer.expects().handle_acceleration_report(-50.ms2)
  
  accelerometer.report_acceleration(-50.ms2)
}
 
view raw This Gist brought to you by GitHub.

Since this test shows that any acceleration observer must be prepared to handle an acceleration report of -50 m/s2, I have to test that.

First, the general test for the contract of the interface:

AccelerationObserverTest {
  testAccelerationObserverCanHandleRapidAcceleration() {
    observer = create_acceleration_observer() // subclass responsibility
    this_block {
      observer.handle_acceleration_report(-50.ms2)
    }.should execute_without_incident
  }
}
 
view raw This Gist brought to you by GitHub.

Now the test for DetachmentSystem, which acts as an AccelerationObserver. What should it do if it detects such sudden deceleration? It should detach the parachute.

DetachmentSystemTest extends AccelerationObserverTest {
  // I inherit testAccelerationObserverCanHandleRapidAcceleration()
  
  create_acceleration_observer() {
    DetachmentSystem.new(parachute = mock(Parachute))
    parachute.expects().detach()
  }
}
 
view raw This Gist brought to you by GitHub.

You might find that easier to read this way, by inlining the method create_acceleration_observer():

DetachmentSystemTest {
  testRespondsToRapidAcceleration() {
    detachment_system = DetachmentSystem.new(parachute = mock(Parachute))
    parachute.expects().detach()
    this_block {
      detachment_system.handle_acceleration_report(-50.ms2)
    }.should execute_without_incident
  }
}
 
view raw This Gist brought to you by GitHub.

Since this test expects the parachute to be able to detach, I have to test that. Now, detaching only works if we’ve landed. (I’ve simplified on purpose. Suppose the parachute can’t survive a drop from any height. It’s easy to add that detail in later.)

ParachuteTest {
  testDetachingWhileLanded() {
    parachute = Parachute.new(lander = mock(Lander))
    lander.stubs().has_landed().to_return(true)
    this_block {
      parachute.detach()
    }.should execute_without_incident
  }
  
  testDetachingWhileNotLanded() {
    parachute = Parachute.new(lander = mock(Lander))
    lander.stubs().has_landed().to_return(false)
    this_block {
      parachute.detach()
    }.should raise("You broke the lander, idiot.")
  }
}
 
view raw This Gist brought to you by GitHub.

Hm. I notice that parachute.detach() might fail. But I just wrote a test that uses parachute.detach() and doesn’t yet show how it handles that method failing. I have to test that.

DetachmentSystemTest {
  testRespondsToDetachFailing() {
    detachment_system = DetachmentSystem.new(parachute = mock(Parachute))
    parachute.stubs().detach().to_raise(AnyException)
 
    this_block {
      detachment_system.handle_acceleration_report(-50.ms2)
    }.should raise(AnyException)
  }
}
 
view raw This Gist brought to you by GitHub.

Hm. So handling an acceleration report of -50 m/s2 can fail. Who might issue such a right? The accelerometer. Since the detach system doesn’t handle this failure, I have to test what the accelerometer does when issuing an acceleration report might fail.

testAccelerometerCanRespondToFailureWhenReportingAcceleration() {
  accelerometer = Accelerometer.new()
  accelerometer.add_observer(observer = mock(AccelerationObserver))
  observer.stubs().handle_acceleration_report().to_raise(AnyException)
 
  this_block {
    accelerometer.report_acceleration(-50.ms2)
  }.should raise(AnyException)
}
 
view raw This Gist brought to you by GitHub.

It turns out that the accelerometer might fail when reporting acceleration of -50 m/s2. When might it do that? When the lander decelerates. What happens then?

testLanderDeceleratesRespondsToFailure() {
  accelerometer = mock(Accelerometer)
  lander = Lander.new(accelerometer)
  accelerometer.stubs().report_acceleration().to_raise(AnyException)
 
  this_block {
    lander.decelerate()
  }.should raise(AnyException)
}
 
view raw This Gist brought to you by GitHub.

Hm. So decelerating could fail! All right, who causes the lander to decelerate? That code might fail. Oh yes… the parachute opening!

testOpenParachuteRespondsToFailure() {
  parachute = Parachute.new(lander = mock(Lander))
  lander.stubs().decelerate().to_raise(AnyException)
  
  this_block {
    parachute.open()
  }.should raise(AnyException)
}
 
view raw This Gist brought to you by GitHub.

So opening the parachute could fail! We probably want to nail down when that happens. We have a test that shows us when:

testDetachingWhileNotLanded() {
  parachute = Parachute.new(lander = mock(Lander))
  lander.stubs().has_landed().to_return(false)
  this_block {
    parachute.detach()
  }.should raise("You broke the lander, idiot.")
}
 
view raw This Gist brought to you by GitHub.

So the parachute opening could cause it to detach because the lander hasn’t landed yet. I don’t know about you, but I think the parachute provides the most value when its helps the lander land, and not once it has landed. That tells me that someone, somewhere needs to handle the exception that detach() would raise, or at least prevent detach() from happening while the altimeter reads above a few meters off the ground.

testDoNotDetachWhenTheLanderIsTooHighUp() {
  altimeter = mock(Altimeter)
  altimeter.stubs().altitude().to_return(5.m)
  
  DetachmentSystem.new(parachute = mock(Parachute))
  parachute.expects(no_invocations_of).detach()
  
  detachment_system.handle_acceleration_report(-50.ms2)
  
  // ???
}
 
view raw This Gist brought to you by GitHub.

In writing this test, I see that in order to stop the detachment system from telling the parachute to detach, it needs access to the altimeter.

Integration problem detected. When I wire the detachment system up to the altimeter, even the collaboration test shows how to ensure that the parachute doesn’t detach in this kind of dangerous situation.

testDoNotDetachWhenTheLanderIsTooHighUp() {
  DetachmentSystem.new(parachute = mock(Parachute), altimeter = mock(Altimeter))
  altimeter.stubs().altitude().to_return(5.m)
  parachute.expects(no_invocations_of).detach()
  
  detachment_system.handle_acceleration_report(-50.ms2)
}
 
view raw This Gist brought to you by GitHub.

This means I have to add the following production behavior.

DetachmentSystem acts as AccelerationObserver {
  needs a parachute
  needs an altimeter // NEW!
  
  handle_acceleration_report(acceleration) {}
    if (acceleration <= -50.ms2 and altimeter.altitude() < 5.m) {
      parachute.detach()
    }
  }
}
 
view raw This Gist brought to you by GitHub.

Integration problem solved with no integration tests. Instead, I have a bunch of collaboration tests, one important contract test, and the ability to notice things a systematic approach to choosing the next test, which I describe in the comments below. Any questions?

Dan Fabulich rightly jumped on me for using the phrase “an ability to notice things” just a little earlier in this article. I choose that phrase lazily because I didn’t want to patronize you by writing, “an ability to perform basic reasoning”. Oops. I thought about how I choose the next test, and I decided to take the time to include that here. Enjoy.

In this example, I used no magic to choose the next test; but rather some fundamental reasoning.

Every time I say “I need a thing to do X” I introduce an interface. In my current test, I end up stubbing or mocking one of those tests.

(See A sign you’re mocking too much for more about when I avoid interfaces and when I routinely create them.)

Every time I stub a method, I make an assumption about what values that method can return. To check that assumption, I have to write a test that expects the return value I’ve just stubbed. I use only basic logic there: if A depends on B returning x, then I have to know that B can return x, so I have to write a test for that.

Every time I mock a method, I make an assumption about a service the interface provides. To check that assumption, I have to write a test that tries to invoke that method with the parameters I just expected. Again, I use only basic logic there: if A causes B to invoke c(d, e, f) then I have to know that I’ve tested what happens when B invokes c(d, e, f), so I have to write a test for that.

Every time I introduce a method on an interface, I make a decision about its behavior, which forms the contract of that method. To justify that decision, I have to write tests that help me implement that behavior correctly whenever I implement that interface. I write contract tests for that. Once again, I use only basic logic there: if A claims to be able to do c(d, e, f) with outcomes x, y, and z, then when B implements A, it must be able to do c(d, e, f) with outcomes x, y, and z (and possibly other non-destructive outcomes).

I simply kept applying these points over and over again until I stopped needing tests. Along the way, I found a problem and fixed it before it left my hands.

If I can describe the steps well enough for others to follow – and I posit I’ve just done that here – then I don’t agree to labeling it “magic”.

Interlude: The Tutorial

I presented Integration Tests Are A Scam as a tutorial at Agile 2009. Since I haven’t managed to summarize more of my ideas on the subject, let me lean on Gabino Roche, Jr. to do that for me. He refers to my earlier work describing contract tests.

I hope this helps the ones of you who’ve anticipated my writing more on the topic.

September 02, 2009 08:00 integrated tests are a scam
« Previous 1