This is interesting food for thought. I think one of the complaints/perceived dangers of "confirmation bias" is whether people stop after confirming their initial bias. Taking the expanded data from the Wason study, did the 6/29 people who failed to guess the correct rule continue trying? What would have happened if it was a study where the participation had some barrier to entry that would weed out unmotivated participants? (E.g. conducting it over repeated visits or via email with long times between a guess and validation) How many would be content to never know the correct rule after a few of their guesses had been dismissed? What if there was *no* validation? How many would continue searching?
I think those situations would describe better what we think of nowadays as "confirmation bias" -- the person on facebook who finds one article that they agree with and share immediately. There is no immediate validation-- no one is in a position to say that their guess is absolutely right or wrong, they only have their friends who they may or may not trust as authorities on the subject. There may be little motivation to continually search for the correct answer in the face of failure.
I think the article makes a good point when it tries to steer away from the conclusion that trying to validate your hypotheses before trying to invalidate them is baddd. You need both kinds of data, and it doesn't particularly matter what order you test them in, positive or negative. However, I think that if you only perform one side of the test, then that is confirmation bias working and it is misleading or dangerous.
For example, in QA if you are looking for text to appear on the page in a certain context, you set up the context, and the text appears, then that looks like a passing test, right? You also need to test that the text does not appear when that context is not present. You cannot confidently say that the context was the factor causing the text to show up unless you test that without it, the text does not show up. In the example above, perhaps the logic that calculates the context is faulty, e.g. it always returns that the context is true, and you would not know this unless you checked the scenario without that context in place. There were many times in QA where some test would pass, but it only passed because a second bug was present that obscured the other. Using the example above, it might be that the text would appear in that context, but it might turn out that the logic determining that the context was there was not even connected to the text at all, so that the text would appear all the time. In that case, you cannot say that the logic is correct or not because it is completely decoupled from the behavior you're observing.
Your average QA is fairly unmotivated; it is a high stress, low wage, and frequently disrespected job, or at least, that is the impression I've gotten from my time in the industry. These are certainly the kinds of complaints that I would hear from the people I worked with. When training new QA, them stopping at checking the positive test worked was a pattern that I saw over and over. I couldn't say whether it was more on par with the 6/29 who were unable to guess a numerical rule after 5 tries, or with 16/29 who were unable to guess after 2 tries, or 23/29 who were unable to guess after 1 try. I think it would be worthwhile to try to find out.