Physics And Testing

At a number of conferences I attended in the past, people connected several fields to software development in general and testing in particular, which (at first) seem unrelated.

Since then, I pondered this for a while (read: more than 4 years). With my background in physics, I see a number of parallels to software testing. This is also a great opportunity to answer a question I get asked frequently: How did you enter software testing, given your background in physics?

Physics

Let’s start with a definition for physics:

Physics is an experimental science. Physicists observe the phenomena of nature and try to find patterns that relate these phenomena.

— Young, Freedman. “Sears and Zemansky’s University Physics: With Modern Physics”. Pearson Education.
Also see the wikipedia article on physics.

The patterns that relate those phenomena are the theories (or laws) of physics. They are models that describe an aspect of reality. None of these models is complete, in the sense that it describes everything. There is no one physical theory that explains everything. A nice view of the landscape of physical models is shown in the image by Dominic Walliman (see sciencealert.com for details):

The landscape of physical theories, see
Dominic Walliman’s ‘Domain of Science’ video on YouTube

In physics (as in science in general), experimental results and predictions  created by models are compared, in order to find out how a model does not match observed behaviour. This is important: Experiments can only ever invalidate a model, but not generally confirm its correctness.

Software

To me software systems are models, too: Even though they may represent reality closely, a software system is not the thing it represents.
Peter Naur described the relationship between theory building and programming in his paper ‘Programming as Theory Building’ (Microprocessing and Microprogramming 15, 1985, pp. 253-261).

Testing

My mental model of software testing is very similar to the one of physics: I see software systems as partial implementations of aspects of a real expected behaviour. In my view testing a system means it and comparing observed results with expectations.
The expectations may come from requirements (written or otherwise), previous experiences with similar systems (i.e. another web application from the same company) or other sources.

There are many approaches to testing, in a similar way to the many approaches to physics. Some of them work good in one area but not so well in another. What kind of testing is done, heavily depends on the kind of the software system: Testing embedded software used in medical devices is drastically different from testing, say, a text editor.

Science

It is interesting to go one step further, from physics to science. The Cambridge Dictionary defines science as

the intellectual and practical activity encompassing the systematic study of the structure and behaviour of the physical and natural world through observation and experiment

https://dictionary.cambridge.org/dictionary/english/science

With a different wording, this seems to fit software testing as well:

Software testing the intellectual and practical activity encompassing the systematic study of the structure and behaviour of software systems through observation and experiment.

The definition with a different wording

To me, many very basic principles of scions in general and physic in particular apply to software testing, too. And that’s why I think that my physics background is very helpful in software testing.

Now, I interested in this: What is your background and how does it help you in software testing (especially if your background is not in computer science or software engineering)?

The Computer Is Always Right

The other day I had trouble getting a Cucumber scenario to work. Here’s what happened, partly to have a post I can come back to, when (!) my future self runs into a similar problem.

I am using Cucumber and a very stripped down version of the project looked like this:

% tree
.
└── features
    ├── example.feature
    ├── step_definitions
    │   └── step_def.rb
    └── support
        └── env.rb

All code is entirely contrived and tailored to demonstrate the issue I bumped into. The example file features/example.feature first:

Feature: What is happening?

Scenario Outline: Matching step definitions -- or not

Given a step that mentions '<a_string>'
And another step that uses <a_number> like this
Then all is good

Examples:
|  a_string |  a_number |
|      word |        42 |
| two words | 3.1415927 |

There are very plain step definitions too (in file step_definitions/step_def.rb):

Given("a step that mentions {string}") do |string|
  pending # Write code here …
end

Given(/another step that uses ((\d+)|(\d+\.\d+)) like this/) do |number|
  pending # Write code here …
end

Then("all is good") do
  pending # Write code here …
end

Running this gives the expected result: Cucumber kindly informs that there are pending steps:

% bundle exec cucumber
Feature: What is happening?

  Scenario Outline: Matching step definitions -- or not # features/example.feature:3
    Given a step that mentions '<a_string>'             # features/example.feature:5
    And another step that uses <a_number> like this     # features/example.feature:6
    Then all is good                                    # features/example.feature:7

    Examples:
      | a_string   | a_number  |
      | word       | 42        |
      | two words  | 3.1415927 |

2 scenarios (2 pending)
6 steps (4 skipped, 2 pending)
0m0.012s

However at some point, I started getting another result (comments added by cucumber removed):

% bundle exec cucumber
Feature: What is happening?

  Scenario Outline: Matching step definitions -- or not
    Given a step that mentions '<a_string>'
    And another step that uses <a_number> like this
    Then all is good

    Examples:
      | a_string   | a_number  |
      | word       | 42        |
      | two words  | 3.1415927 |

2 scenarios (2 undefined)
6 steps (4 skipped, 2 undefined)
0m0.009s

You can implement step definitions for undefined steps with these snippets:

Given("a step that mentions {string}") do |string|
  pending # Write code here that turns the phrase above into concrete actions
end

Wait. What?!? I stared at the existing step definition for a while, comparing it with the one printed in the message above:

Given("a step that mentions {string}") do |string|
  pending # Write code here …
end

Confusion and disbelief kicked in.

One of the principles I use is this:

The computer is always right.

— Not sure where I picked this up. If you know the (or a) source, tell me please.

This is true even if the behaviour is wrong. In this case: If cucumber cannot find a step definition … it CANNOT find a step definition. But why would that be? Why did it happen in this particular case?

I even called in colleagues (remotely) and we stared at the code collectively. Still nothing.

Luckily a trace of a previous successful run was still available in the console output. So I copy-and-pasted the scenarios, and compared them piece by piece in a Pry session:

% pry
[1] pry(main)> works = File.read 'features/works.feature'
=> "Feature: …"
[2] pry(main)> broken = File.read 'features/broken.feature'
=> "Feature: …"
[3] pry(main)> works == broken
=> false

So there is in fact a difference. But what? Where?
Using the same pry session we found out:

[4] pry(main)> works.each_codepoint.zip(broken.each_codepoint).select{|el| el[0] != el[1] }
=> [
    [0] [
        [0] 32,
        [1] 160
    ]
]

What this does: The codepoints of both strings are combined in pairs, and then the pairs that are different are selected. Here’s what the Ruby Documentation says about each_codepoint:

Passes the Integer ordinal of each character in str, also known as a codepoint when applied to Unicode strings to the given block. For encodings other than UTF-8/UTF-16(BE|LE)/UTF-32(BE|LE), values are directly derived from the binary representation of each character. If no block is given, an enumerator is returned instead.

“hello\u0639”.each_codepoint {|c| print c, ‘ ‘ }

produces:

104 101 108 108 111 1593

https://ruby-doc.org/core-2.7.1/String.html#method-i-each_codepoint

The small piece that changed the behaviour was a space, just not a ‘normal’ space, but a NO-BREAK SPACE (see, for example, https://en.wikipedia.org/wiki/Non-breaking_space for more about this topic).

Lessons I learned (again):

  1. Some problems are hard to see, and in this case it was even invisible.
  2. The message was correct: The step was not defined, it only looked (to the human eye) as if it was.
  3. Using <spacebar> gives a different result than <option>-<sapacebar>.

CRUDCA – Create, Read, Update, Delete, Create Again

I recently tweeted:

A new test sequence: CRUDCA: Create, read, update, delete, create again. — Because sometimes an object can’t be again.

I got some feedback from this, so let’s present the idea in a bit more detail.
Many systems know these 4 basic actions for an object:
  • Create
  • Read
  • Update
  • Delete
That’s where the abbreviation CRUD comes from. In many cases that’s also a rather typical test sequence to check whether the system can handle these actions. — And often it’s also good enough testing.
But sometimes there are more restrictions at work:
  • An object may only be created once and only once. It may be removed, but then not created again.
  • It must be possible to create an object with identical properties again and again.
This is how I came up with ‘CRUDCA’: Create, read, update, delete, create again.
Obviously, it depends on the particular (business) context whether or not it is allowed to create a certain object. The ‘create again’ part covers the action of attempting to create the object. The expected behaviour comes from the requirements or specification of the system.

Doing Something And The Result of It

While I was recently working on some test automation task, I had the feeling that the automating of the tests seemed to be more important, than the actual automated tests (or checks) I created. It seemed to me that this is very similar to the saying that the planning in agile projects (and likely in other projects as well) is more important (or valuable) than the plan you get out of the planning. So I tweeted about it and within seconds Lisa Crispin agreed to this.

It seems to me there is a pattern—actually a question—at work: Is doing something more important than the result you get out of that activity? There’s a book by Mary and Tom Poppendieck ‘Leading Lean Software Development’, subtitled ‘Results Are Not the Point’, that hints in the same direction.

Is actually doing something really more valuable or important than the result of this activity? I’m not sure about this.

The result of having done something seems obvious: We have the result, something that didn’t exist before and has some value to someone (hopefully). But what about the value of the activity itself? Two things come to my mind immediately:

  • While actively working on a task, no matter whether it’s performing or rehearsing, we exercise and  get more used to doing it. We’re very likely getting better at doing this in the future.
  • It’s also likely that we learn something about the task itself, some technology we’re using or social aspects of work life.
  • A secondary result are scripts that help me to create a certain state in the system, a base camp, from where I can start explorations into new areas of the system.

What do you think about this? What makes the doing itself valuable to you, apart from the outcome?

Let’s go back to the creating of automated tests: While I was looking at newly implemented parts of a software system, as soon as an automated test executes and yields a reproducible result… it’s a regression test (or check as some would say). Now, regression tests are not particularly likely to disclose new defects, so what’s the value of automating anyway. While I can’t offer a generally valid answer, the automation effort itself helped me to uncover a number of bugs.

PS: An additional example if Ben Simo (@qualityfrog) tweeted this:

Read in 100+ yo teach book: teacher should script lesson, then throw away script at class start. Scripting useful for prep, not instruction.

Another great example of the pattern.

The Opposite of ‘More of Something’ Is Not Less of It

One pattern I see often is that people (me included) apply more of some X, if applying the usual amount doesn’t yield the desired result (anymore).
Two examples from very unrelated fields:

  1. If a dog or other pet doesn’t react in the desired way to a command (‘sit’, ‘down’…), many people start shouting at the dog again and again, which may or may not work at first, but will also fail eventually. Stopping to shout at the poor dog won’t solve the problem either.
  2. In software development I’ve seen rules and policies added to a given process again and again. In these cases the goal was to make the teams produce better products faster (and probably more predictable). That doesn’t lead to the desired results either. But reducing the number of rules is not enough to get teams out of trouble.

So when X doesn’t work, or doesn’t work anymore, applying more of it won’t help. Alas reducing it (less/no shouting at the pet, removing process load from projects) won’t solve the issue.

It seems to be a lot more promising to try an entirely different approach. After realising issues with waterfall projects, we (as software developers) found agile methods to be helpful. In the case of training a dog it’s another way to help the dog learn a new command. In some cases it may even be necessary to teach it its name before.

Did you see anything like this? Let me know.

%d bloggers like this: