Headshot-color me@jbrains.ca Find out where I'm appearing
« Previous 1 3 4 5 6 7 8 9 13 14

Using integration tests mindfully: a case study

Gus Power commented about the way he uses integration tests in his work.

Interesting series of articles & comments. I also read Steve Freeman’s article in response to the same topic. It’s got me thinking about how we work and I thought I’d take the time to describe it here.

You define an integration test as “… any test whose result (pass or fail) depends on the correctness of the implementation of more than one piece of non-trivial behavior.” We have many such components that exhibit such non-trivial behaviour in the products we create, many of which are not developed by us. And we have integration tests to verify they work. I’m not just talking about 3rd party libraries and frameworks here, I’m referring to the whole system: caching layers. load balancers, DNS servers, CDNs, virtualization etc. When we build software it only becomes a product or service for our users when it has been deployed into a suitable environment; an environment that typically contains more than just the software we have written and packaged. Since our users’ experience and perception of quality result from their interaction with a deployed instance of the whole system, not just their interaction with the software at a unit level, we have come to value end-to-end integration testing. I believe there’s merit in testing these components in symphony and will attempt to clarify what kind of integration testing I’m talking about.

For a given piece of functionality we write an executable acceptance test in human readable form (for web projects we typically use some domain-specific extensions to selenium, for services we have used FIT and it’s ilk, sometimes we roll our own if there’s nothing expressive enough available). We run it against a deployed version of the application (usually local though not always) which typically has a running web/application server and database. The test fails. We determine what endpoint needs to be created/enhanced and then we switch context down into unit-test land. A typical scenario would involve enhancing a unit test for the url mappings, adding one for the controller, then one for any additional service, domain object etc. When we’re happy and have tested and designed each of the required units we jump back up a level and get our acceptance test to progress further. The customer steers the development effort as he sees vertical ‘slices’ of functionality emerge. The acceptance test is added to a suite for that functional area. The continuous build system will then execute that test against a fully deployed (but scaled down) replica of the production environment, with hardware load balancer, vlans, multiple nodes (session affinity) and so forth. Any additional environmental monitoring (e.g. nagios alerting) is also done as part of this development effort and is deployed into the test environment along with the updated code.

Setting up the infrastructure to do this kind of testing takes investment, both initial and ongoing. The continuous build needs to be highly ‘parallelized’ so you get feedback from a checkin in 10 mins or less (we’re heavy users of virtualization, usually VMWare or OpenVZ). The individual acceptance test suites need to be kept small enough to run quickly before check-in.

Benefits of this approach
* The continuous context-switch between acceptance test and unit test is key to our staying focused on delivering what the customer actually wants.
* The customer has multiple feedback points that he can learn from and use to steer the development effort.
* It confirms that the whole system works together – networking, DNS, load balancing, automated deployment, session handling, database replication etc.
* We create additional ‘non-functional’ acceptance tests that automatically exercise other aspects of the system such as fail-over and recovery.
* Upgrades to parts of the system (switches, load balancers, web caches, library versions, database server versions etc.) can be tested in a known and controlled way.

We’ve caught a number of integration-related issues using this approach (a few examples: broken database failover due to missing primary keys, captcha validation not working due to a web cache not behaving correctly, data not persisting because one database server had the wrong locale) and stopped them before they have reached our users. We have used the feedback as a basis for improving our products and their delivery at a system level.

OK this reply has now become far too long :-/ It would of course be good to discuss this in person sometime :)

—Gus Power

Thanks for the substantial comment, Gus. For those who don’t know Gus, he is one of the joint recipients of the 2009 Gordon Pask Award for contribution to Agile practice. I invite you to follow his work and learn from his example. On to the substance of Gus’ comment.

Gus, it appears you do not use integration tests to check basic behavior exhaustively. While I try not to use integration tests to check basic behavior at all, I mostly hope to stop programmers from attempting to write exhaustive integration tests that check basic correctness conditions. I wrote in Not Just Slow: Integration Tests are a Vortex of Doom about the vicious cycle I see when teams rely on integration tests to check basic correctness. I encourage them to stop that particular insanity. I would hesitate to use integration tests as even smoke tests for basic correctness, but if I found myself in a situation where I needed to write such tests, I’d do it, then look for ways to render them obsolete.

Also, you mention writing “human-readable acceptance tests”, and I certainly use such tests in my work. When I counsel against using integration tests, I advise it within the context of programmer tests only. While I strongly encourage teams to allow even some of their acceptance tests to check policy or business rule behavior directly and in isolation, I understand and agree that one generally needs to write some acceptance tests as integration tests.

In general, you describe using integration tests quite purposefully, mindfully, and responsibly. I expect no less from a practitioner of your caliber. I would truly enjoy working with you on a project.

Finally, you mention that your integration tests catch system-level issues, such as a broken database schema, mistaken cache integration, and so on. I expect integration tests to find only, or at least mostly, these problems. None of these sound like basic correctness problems.

So Gus, I appreciate you for writing a great description of using integration tests well. I wish we had more examples like this. I truly wish I saw more examples like this. Sadly, I don’t: I see teams trying to check basic correctness issues with plodding, brittle, misleading tests. For those, I stress the need to eliminate integration tests.

January 31, 2010 08:00 testing, agile, design, integrated tests are a scam

Not Just Slow: Integration Tests are a Vortex of Doom

“Aha! So @jbrains is really against the integration tests just because they are too slow for hourly use”

It reminds me about the Ferrari IT story (XP team, dozens of deployments a year on many continents) that started from getting a big visible counter of a total number of tests and wrote just big amount of any tests first. You need to start somewhere and getting large integration tests is definitely better than nothing. As long as you are prepared to improve the testing practices later. —Artem Marchenko

I agree with this sentiment. I tell the story of my very first attempt at test-first programming1, how I wrote about 125 tests, many of which fit my definition of “integration test”, and which took 12 minutes to execute. This meant that, on average, I only made 8-12 edits per hour when writing that code. I recognized then, and I still recognize now, that even making only 8-12 edits per hour—4-6 edits per hour towards the end—that I produced better software than I did when I would write code almost continuously for several hours at a time. As much as I disparage those integration tests today, I appreciated them a great deal at the time I wrote them. I find integration tests useful for finding system-level problems, as the first step in fixing a mistake, and if I genuinely can’t write a focused object test, then I will usually write an integration test.

As you say, Artem, I simply don’t stop there.

When I label integration tests a scam, I mean to emphasize the self-replicating nature of integration tests. It starts simply enough: you write a handful of integration tests, which give you a lot of freedom to implement your design in a way that introduces unfortunate dependencies, which makes focused object testing quite difficult. As a result, you will probably resign yourself to writing more integration tests, which do nothing to improve your dependency problems, and the cycle begins again.

Integration tests help cause pain, even though they appear to help reduce pain. Therein lies the scam.

I must acknowledge this: if you started writing tests this week, or this month, or even this year, then you will probably benefit more from writing integration tests than trying to write perfectly focused object tests. I have said and written elsewhere that I believe a programmer needs to write about 1500 to burn into her brain the basic patterns of good tests. Even so, as you write those tests, I want you to remain aware of the cost. Even if you don’t know how to write a good, focused object test, if you want to write more such tests, and especially if you try to write more such tests, then I will have completed the first phase of my mission to eradicate programmer reliance on integration tests to show the basic correctness of their code.

Join us! Turn one integration test into a small suite of focused object tests today. If you don’t yet see how to replace an entire integration test with equivalent focused object tests, then write at least one or two focused object tests along side the integration test. Try it. I promise, you’ll like it.

1 I use the term test-first programming to refer to test-driven design without the evolutionary design part. With test-first programming, I develop a specific design, then I use tests to help me type it in correctly.

One last comment to my good friend Artem: please don’t put me to sleep with the word “just”!

How test-driven development works (and more!)

It surprises me, from time to time, how much I still need to justify test-driven development to prospects and would-be course attendees. Many feel that TDD has crossed the chasm, while others still see TDD as a cultish practice worth marginalizing. I take some blame for those who find TDD cultish, because until now I haven’t had a strong, sensible, theoretical basis to justify TDD as an idea. I could do no better than “it works for me” or “my friends like it”. That has changed since I’ve started giving my talk “Introduction to Agile with the Theory of Constraints” in which I use concepts from Theory of Constraints to motivate the practices of agile software development, notably those of extreme programming. If you buy in to ideas from Theory of Constraints or Lean Manufacturing, then I think I now have a stronger argument to justify the core programming practices in extreme programming in particular and agile software development in general. I don’t even need all of the Theory of Constraints but rather a simple appeal to fundamental concepts in Queuing Theory.

Queuing Theory?

Yes, Queueing Theory. (And I don’t plan to capitalize that any longer.) I don’t proclaim to have any particular expertise in this area, but I have already seen how to use queuing theory ideas in optimizing network-based systems, and I see no reason we couldn’t extend that to software delivery systems. Better, I only need to appeal to a single idea from queuing theory to make my point.

Given a process B, which follows a process A, sometimes in performing B we need to perform some of A again. We can remove the need to rework by taking some portion of process B and performing it before process A1.

This merits a diagram. If we have this problem

then we can solve it by doing this

and the resulting system will work more efficiently by removing wasteful rework. I assume here that we derive no significant benefit from the rework itself, which I suppose I must justify, but let’s not ruin a good story with the truth. Here I’ve described the general problem, and by applying it to software development, I can… well, I find it more effective if I save the punchline for the end.

Winston Royce, 1970, revisited

I imagine you know this diagram

and appreciate that Royce wrote in his now infamous paper that this single-phase waterfall is risky and invites failure. If you don’t appreciate that, then I cannot strongly recommend enough your reading the original paper in its entirety, rather than stopping after page 2 as most people have done2.

We can apply the queuing theory result I’ve just cited to this diagram and generate some interesting conclusions. I’ll start by focusing in on this portion of the system

We write code, then we test it. Sadly, we occasionally find a bug3 which makes us change the code we wrote after we thought we’d finished it. That makes a loop of the type we can unravel with our queueing theory result.

Since “coding” is process A and “testing” is process B, we need to do some testing before we start coding.

It doesn’t take long for this to become a virtuous loop where we writing only the code we need to write in order to pass the tests we write.

I use the term test-first programming to describe this cycle4. When we practise test-first programming, we design as much detail as we can before writing the first test, then use the tests to help us type in our implementation correctly. Most teams most of the time can use test-first programming to reduce their defect mistake count to near zero, which increases their productivity and improves their ability to deliver, by helping them waste less time agonizing over whether to fix mistakes late in a release. I started this way in 2000 when I first discovered JUnit and stopped making silly mistakes in the code I wrote, which I found significantly beneficial in helping me code more confidently. I still designed most of what I built mostly up front.

After a while, though, I recognized a new process loop: I found some parts of my design difficult to test, or I found some parts of my design didn’t fit together when I tried to type them in.

Returning to our queuing theory result, since “designing” is process A and “doing test-first programming” is process B, we need to do some test-first programming before we start designing.

It doesn’t take long for this to become a virtuous loop where we check our design ideas as we think of them and implement only the parts of the design we can justify needing. When we include refactoring in our practice, we can confidently “under-design” compared to the level of design we expect to need by the end of a task, which I believe amounts to designing appropriately for the code we need to implement right now. This virtuous loop combines test-first programming and evolutionary design, including guiding principles like “you aren’t gonna need it” and the four elements of simple design into test-driven development, where we check our implementation by running tests and we check our design ideas by writing tests.

Where test-first programming helps most teams most of the time reduce their mistake count to near zero, test-driven development helps them reduce their design inventory—mostly code that gets in our way because it doesn’t actively help us deliver a feature—to near zero. This further increases productivity and improves their ability to deliver by helping them waste less time agonizing over design problems they find costly to fix. I waited until I’d spent an entire release practising test-first programming before doing more test-driven development. My transition consisted of trying to do less and less up-front design for each task, letting myself feel comfortable with each new step. Within two years I estimate I designed about 5% as much up front as I did before I started practising test-first programming. I can’t measure the corresponding improvement in my design, but I look back at projects that took 3 months before I practised test-driven development that I now feel confident I could complete—truly complete—in one week. Of course, we can’t stop here!

Enter our friend analysis. To simplify the discussion, I will treat analysis as “discovering the features we want in our software” without forcing myself to state too precisely how that happens5. Once again, we have our familiar situation.

Once again, we face the situation where in the process of implementing features we discover new features we need, current features we don’t need, and learn new things about features we know we need to build. This adds to our analysis, meaning that we should try test-driving some features before we try to implement others.

It doesn’t take long for this to become a virtuous loop in which our desire to implement (and deliver!) features drives them ever smaller, as we extract more concentrated value out of each one6. When we implement feature 12 we learn something about features 23, 30 and 52. We might decide not to deliver feature 30 any more. We might decide to expand feature 23 to encompass a few more key cases. We might decide to rush feature 52 to the top of the pile. Most teams most of the time find that this cycle helps them reduce the number of rarely- or infrequently-used features in their system7. This yet again increases productivity and improves their ability to deliver meaningful software to their stakeholders by eliminating the time wasted on delivering too much of a feature too soon, the time wasted on entire features we thought we needed but realized we don’t, and the time wasted arguing about what a feature means, rather than writing examples together: business-oriented tests that describe how a feature works in enough detail for the business and technical project community to agree on the conditions of satisfaction for delivering the feature.

I call this behavior-driven development, and refuse to spell it with the u that provides as much value to the word as your appendix does to your body8.

Once again, I didn’t coin the phrase, and some might argue against the way I use it, but I find it apt. This cycle include practices like business and technical people writing examples together, feature injection, feature splitting, and value-based (rather than cost-based) planning.

At this point, I think I’ve done my job. I believe I’ve justified not only test-first programming or test-driven development, but full-on behavior-driven development, using only a single result from fundamental queuing theory. I’ve made only a single assumption—that we agree on the appropriateness of applying queuing theory to a software development system. I’ve tried to add as little as possible to my reasoning in order to keep it as context-free as possible. As a result I claim that most teams most of the time will benefit from moving along the path from code-and-fix to test-first programming to test-driven development to behavior-driven development.

Now, for homework, what happens when we consider these processes?

Surely at least one you’ve needed to deliver more features for software you’d already deployed. How well does that work? What problems do you encounter? What if you applied our new favorite queuing theory result to that rework loop?


1 I really need a citation for this, and when I find it, I will place it here.

2 I digress, but I really can’t help myself on that one.

3 Also known as defect or, for the truly congruent, mistake.

4 Clearly I didn’t coin the phrase, but I know many people who treat “test-driven development” as a simple renaming of “test-first programming”, and I believe making a stronger distinction adds real value to the conversation.

5 I don’t think “gathering requirements”, as though we could pick them like berries, fits as a metaphor. I like “trawling for requirements”, which I believe I first read in Mike Cohn’s User Stories Applied.

6 We can easily apply the “Pareto Distribution” here in that we can deliver 80% of the value from implementing 20% of the feature.

7 You recall that Jim Johnson of the Standish Group reported in 1994 that 45% of developed features are “never used”. As I recall, only 7% of features were used very frequently.

8 My Canadian and British brethren and sistren be damned. I assert my right as a Canadian to choose the British spelling when I prefer it and the American spelling when it saves me time.

The World's Shortest Article on Behavior-Driven Development, revisited

I added more to this article on September 18, 2009.

On May 21, 2006, I wrote the world’s shortest article on Behavior-Driven Development. Although the title links to the entire article, it is so short that I can reproduce it here.

What is Behavior-Driven Development (BDD)?

It is Test-Driven Development (TDD) practiced correctly; nothing more.

At the time, I wrote this in anger, for reasons that I’m too tired to get in to just now (it is 4:30 AM on the last day of Agile 2006), but I wanted to share with you that my anger is changing to some more positive emotion regarding this topic.

The fact that BDD and TDD are equivalent—isomorphic, even—has its good points and bad points. I am unclear at the present moment whether the good outweigh the bad or the other way around.

What I dislike about the existence of two (perhaps three or more) different names for the same thing is that it can confuse people and divide them. Think of a single language written in two alphabets: while the speakers understand one another, they cannot read one another’s literature. I would hate to see that happen.

What I like about it is that we have two (perhaps three or more) standard approaches to explaining the technique that suit different audiences. To some, the word “test” resonates well, and to others, the words “behavior” or “example” resonate well. Rather than haphazardly sprinkling the word “behavior” into conversations about TDD, we can use an entire, cohesive vocabulary to explain TDD to someone who prefers to talk about behaviors over tests. I imagine this would help.

I would like to thank the people in room 2411 of the Hyatt Regency in Minneapolis for their willingness to participate in a spirited debate on this topic. It was tiring, and it was late, but I found it worth the effort.

Times have changed

In the time since I first wrote this article, BDD has evolved and my opinion of it has evolved as well. I now see how BDD ideas map well to the way I deliver features, complete with Feature Injection and the inner BDD design cycle. The BDD community have described how they set up a pull system for features, which I’ve been doing for years. As always seems the case, we had much more in common with one another than we originally thought!

Thanks to all the BDDers who have patiently worked with me on this unification, even when they didn’t know they were doing it: Dan North, Chris Matts, Olav Maassen, Aslak Hellesøy and Liz Keough.

September 19, 2009 08:00 testing, agile, people, agile 2006, article

When to fake; when to mock

I originally published this on January 3, 2005. One of my other postings links to this article, so I wanted to bring it back to life. Thanks to James Breen for pointing out the broken link to me.

In working with Pat Welsh I had occasion to describe in a rather pithy way when to fake and when to mock, so I thought I’d share that with you.

To be clear, I mean “mock” in the sense of interaction-based testing with mock objects, and I mean “fake” in the sense of state-based testing with fakes or stubs. Google if you need a stronger definition of those terms, but they suit me for now.

Note that I use the term “stub” when I used to say “fake”.

So then, I want to test some object in isolation from the rest of the system. In broad terms, I need to write tests to answer two essential questions:

  1. Am I using the objects around me correctly? Am I invoking the right methods at the right times with the right parameters?
  2. Am I reacting to the objects around me correctly? Do I handle errors reasonably? Do I respond well to the answers they send me?

For the first kind of test, I generally mock; and for the second kind of test, I generally fake.

Put another way, I lean on interaction-based testing to verify how my object talks to its collaborators; but I lean on state-based testing to verify how well that object listens.

« Previous 1 3 4 5 6 7 8 9 13 14