Software Test & Performance Collaborative

Community, Resources & Knowledge Sharing for Test & QA Professionals

Article The Patterns That Surround Us

By Ross Collard on Jun 1st, 2009 | In

Share |

Patterns Exist Everywhere In Nature. Seeing How They Effect Our Data Can Be The Key To Unlocking Untold Efficiencies In Your Testing.

This is the concluding installment of a series of articles on using live data in testing, the objective of which is to improve your performance testing through the smarter use of live data. This is the second in the series specifically about test data patterns, and it categorizes and outlines patterns used by performance testers to enhance their data to fit specific testing goals.

Performance testers with intermediate or advanced skills are the intended audience for this article, but no specific technical knowledge is assumed or required. The content should also be useful for functional testers and for non-testers who manage performance test projects.

An Introduction to Patterns

The word “pattern” is defined here as a practice, model or blueprint. While repetition is central to the concept of patterns, so are learning and improving on them, In this article I describe more than 80 test data patterns, most of which are massaged or enhanced for a particular test purpose.

Here I describe them as many individual patterns so that the purpose and mechanics of each is clearer. Managing and using more than a few data patterns is unwieldy, so testers generally consolidate them into a single one or a small collection.

PART I:

Interaction Patterns

Rendezvous Pattern

This is a type of spike testing where at least two—and more likely many—events rendezvous. For example, they happen simultaneously, or within a small enough time interval that they are not independent, and they could impact or influence each other, for example, by competing for the same resources.

A rendezvous pattern might be when we use a software tool to simulate a hundred users hitting the enter keys on a hundred keyboards concurrently, at almost exactly the same time. Rendezvous tests are often unrealistic, because a hundred users are unlikely to hit their enter keys simultaneously. If we allow a small spread of events across time, instead of requiring an instantaneous happening, then the rendezvous test becomes much more realistic. For example, we might create a scenario to simulate what would happen if a hundred users all hit their enter keys across the span of two seconds.

Interference Pattern

This type of testing attempts to stress a system by having features, processes or threads interfere with each other.

Suppose that a system contains two features, A and B. In the feature testing, we run a set of test cases to exercise feature A, and another set for feature B. These features can interact and possibly interfere each other, for example, by both being able to simultaneously access and update the same records in a common database.

In this situation, the feature testing usually includes feature interaction testing, but only to a limited degree and for the simplest cases. For example, in a manual test two different testers, working from two workstations, attempt to update the same database record at the same time.

Usually the feature test team cannot easily try more complex combinations of many concurrent activities. By contrast, a load or stress test by its nature usually incorporates multiple concurrent demands on the system we are testing. In the feature interaction variation of a load test, we deliberately engineer the test workload to include complicated and interacting mixes of demands.

Interoperability, Interface Pattern

Errors can occur because of mismatches at interfaces. The U.S. lost a spacecraft at a cost approaching $200 million because of miscommunication. One software subsystem on the spacecraft assumed that numbers passed between subsystems were in meters, while another assumed feet.

Interoperability testing addresses the joint behavior of multiple different components and systems which interact, usually in complex ways. These may use different technologies and were probably built at different times by different people. In addition, the details of the internal workings of the individual systems may not be available to the testers. Both interoperability and interface testing also focus on conformance with standard protocols.

Deadlock Pattern

This type of testing attempts to stress a system by locking a database, either directly or through transactions which interfere with each other. The testers identify situations where deadlock might occur and design the test workload to try to trigger a deadlock.

Deadlocking is a situation in which a set of interdependent tasks is blocked, with each one waiting for an action by another one of interdependent tasks.

For example, let’s say program 1 requests resource A and receives it. Program 2 requests resource B and receives it. Program 1 requests resource B and is queued up, pending the release of B. Program 2 requests resource A and is queued up, pending the release of A. Now neither program can proceed until the other releases a resource. Deadlock often arises from adding synchronization mechanisms to avoid race conditions.

Most databases are built to support several concurrent users. This means that there is a risk of one user updating a piece of data while another user is trying to read or update the same data. To avoid this problem, stored procedures can be built to include locks. A lock temporarily denies access to other users, for a short duration, while one user is reading or updating the data. Careless use of locks also can lead deadlock.

Synchronization Pattern

This type of testing attempts to stress a system by causing timing problems and out-of-synch process. These are also called race conditions.

Systems often have inadvertent and unrecognized assumptions built into them, about the expected sequence of events of the expected timing of events. Let’s say that the system assumes if event A always precedes event B. What happens if this assumption is not met, if an event happens later or earlier than anticipated? Does the system timeout (and is not supposed to) because of the late event? Does the early event go unnoticed? The aim of synchronization test cases is to answer these questions.

PART II:

Human Error Patterns

“Bad Day” Pattern

A user scenario test case is one that employs a real-world set of activities based on how the users actually use the system. Feature testers develop conventional test cases using standard test case design techniques such as boundary value, and derive the test cases for one feature at a time.

User scenario test cases generally stress the system better than simple one-feature test cases, but tend to under-represent complex interactions among transactions. The user scenarios are often more messy, unstructured and demanding than feature test cases developed according to conventional test case design techniques

Bad day testing is based on the premise that we all can hit the wrong button or have a bad day. If someone does push the wrong button, we want to ensure that we don’t suffer any horrendous consequences. Bad day testing is similar to usability testing and operator error testing.

Soap Opera Pattern

A soap opera test is a type of user scenario test that exaggerates the day-by-day actions of users. (Just think of how many crises and storms-in-teacups are packed into a half-hour “soap.”)

There are two objectives in a soap opera test:

  • Increase the rate at which bugs are found by focusing on Alarger-than-life situations.
  • Stress the system with tougher-than-average user scenarios.

The intention is to exaggerate deliberately, but not to go to unrealistic extremes. For example, it might be appropriate to bang away frantically on the keyboard for a few minutes but not to draw a gun and blast the screen, unless your users are in unusually high stress situations.

Disaster Recovery Pattern

This type of testing uses the disaster scenarios which were identified in the organization’s disaster recovery plans as a source of test cases.

I will illustrate this point with an example of a system failure. In what was considered a major crisis, the Nasdaq stock market halted trading on a busy Friday in 2001. An employee of WorldCom, which provides communications services to the stock exchange, had inadvertently forced Nasdaq’s communications network to shut down. (WorldCom later said that testing a new system being developed for Nasdaq had caused the service interruption. With hindsight, a busy Friday was not a very smart time to run the system test.)

Nasdaq was able to restore service fairly quickly, but a secondary problem blocked its stockbroker clients from using the system for several more hours. The outage had disconnected all of Nasdaq’s users from their network. When these users attempted to log back in after the network administrators had resolved the problem–and all at approximately the same time–the system’s log-in process was unable to cope with the huge surge of demand.

Environmental Pattern

The term “environmental testing” originally came from hardware engineering, where it is defined as testing for physical factors such as the loss of power, vibration, G (gravity) forces, air pollutants in factories, electric shock and other hazards, electromagnetic radiation, extremes of temperature, humidity and so on.

In software, an environmental test is one that concentrates on the impact of physical hazards or physical failures on the operation of the software. For example, at altitudes above 50,000 feet, cosmic radiation can arbitrarily change the values of data (the radiation “writes” to the magnetic media which stores the data). The U.S. Air Force tests high-altitude software in labs with high radiation exposure.

Live Change Pattern

Many systems must keep running no matter what. One example is that of an aircraft flying over the ocean. What happens when an emergency fix or routine maintenance must be done on such systems?

This type of testing assesses the ability to make live modifications to the system without interrupting service.

Always-on 24×7 and 24×365 (23×366 in leap years) systems need to be maintained literally on the fly. If there is no place to land, by way of analogy, we cannot land an airplane, make the change on the ground with the system inactive, and then return to flight.

Examples of live maintenance include adding new devices to a network, changing the way systems are partitioned and resource capacity is dedicated to applications and backing up a database.

Since the live modifications create stresses which otherwise we may not encounter, and since preserving business continuity is critical in always-on operations, we need to try these adjustments as part of the test project.

System Change Impact Assessment Pattern

Assesses the impact of a change or a group of changes to an existing system.

Infrastructure Impact Assessment Pattern

Assesses the impact of a change on an existing infrastructure which is supporting a mix of work load demands. This testing focuses not so much on the immediate application being changed, or the new application being introduced into the environment, or a change in the demand patterns within one application, but on its side effects on the other uses of the infrastructure. The issues are the capacity and the utilization of resources in that infrastructure. This is also called an environmental assessment.

Error Detection & Recovery Pattern

Many software developers are blessed with eternal optimism. But have you ever seen a software product that you couldn’t make fail if you wanted to?

Error handling requires prevention and detection controls, dependable back-up systems and dependable recovery systems. Error recovery testing is intended to ensure that the system’s controls and manual and automated back-up and recovery mechanisms work as expected.

The ANSI/IEEE standards define recovery as the return of a system to a reliable operating state after failure. Systems have written and unwritten recovery objectives, stating how we expect them to recover from software errors, power failures, hardware and network outages or degradations and data errors.

The heart of this method is to reverse-engineer test cases from the set of error messages we expect the system to generate. We trace each output error message back to its various causes (as there may be more that one cause for a particular error message). Then we create test cases to generate each error message, with one test case for each significant cause.

Degraded Mode Of Operation Pattern

Systems are designed to use a given set of resources, such as hardware, networks and databases. Their users expect many systems to provide ongoing service, even at reduced rates of performance and capacity, when not all the resources are working (e.g., a database is unavailable). The purpose of degraded mode testing is to determine whether the system can still provide the reduced level of service as expected.

An example of a degraded mode test is to deliberately power down an application server in a server cluster with redundant application servers and attempt to continue normal operation.

Fault Injection Pattern

Software fault injection is a specialized type of design for testability, to provide the testers with the capability to easily and safely trigger or simulate system errors which otherwise might be difficult to observe in the test lab but which nevertheless may happen in the real world.

Just because these circumstances are unlikely to occur in live operation does not mean they should not be tested.– If the consequences of these weird, once-in-a-blue-moon circumstances could be catastrophic, they deserve attention from the testers.

Despite the similarity of names, software fault insertion is different from software fault injection, which is a way of assessing test effectiveness by deliberately inserting errors into systems in an experimental mode.

PART III:

Patterns That Trigger Measurable Behavior

Response Time Pattern

This testing measures how long the system takes to complete a task or group of tasks. It usually represents the user viewpoint, i.e., we measure the likely delay as perceived by an external user. We also can measure the efficiency of an internal software activity or hardware component that is not directly accessible by the user. Response time is the total end-to-end elapsed time, which includes wait time in a queue prior to processing and service time (the actual time to process the request for service). Wait time and processing time both can vary, and may be affected by different factors.

Throughput Pattern

Throughput testing measures how much traffic passes through a system within a specified period of time and under a specified load. The test load may be light, average, heavy or vary over time. We can measure throughput in megabits per second, events (database queries, requests or transactions) per second, or another metric.

The selection of the units of measure for throughput can influence the test results. For example, let’s say that the test objective is to rank a group of competing servers from best to worst in terms of throughput. Their rankings may be different if we measure the throughput in megabits per second than in events per second.

The recorded throughput also depends on where in a system we count the bits or the events – the volumes of events usually are not the same at each internal point. We can count the throughput in a network as the amount of traffic which originates from, is received at, or passes an internal point within a given period of time. In a simple situation where one specific input triggers each output, the count of the output traffic received at the destination is exactly the same as the count of the input traffic. But this ratio can be less than 1 to 1, within a given period of time, if there are bottlenecks and inefficiencies within the system, or can be higher than 1 to 1 if a single stimulus triggers the broadcast of multiple messages.

There also may be questions about what to count and how to count it. For example, let’s say that a Web server has an ongoing, low-level flow of administrative management messages and error messages, as well as the “real” traffic, namely the requests from visitors to the Web site which this server supports. The measurers will need to decide which of these traffic categories to include in their throughput counts.

Availability Pattern

Availability is the percentage of uptime for a system or component, so testing availability is essentially a process of recording when the system is up or down, under both typical and stress working conditions.

Availability measures can either include or exclude planned downtime, leading to apples and oranges comparisons. Another complication in measuring availability is that many systems can operate in a degraded mode if the need arises, e.g., if part of a network is down the other parts will still function. During this degraded mode, some users may experience limited availability.

Resource Utilization Pattern

Monitoring the levels of utilization of system resources provides insights into how the system works (which may not be the same as how its designers think it works). It helps to identify bottlenecks, assess spare capacity and the potential for scalability, and how to improve the efficiency of the system.

The resources and events which are monitored can include processor activity, use of cache memory and hard disk accesses, I/O traffic, page swaps, lengths of queues, overflows, number of ports which are busy, network bandwidth utilization, and number of concurrent software threads or processes which are running.

Monitoring the resource utilization means we need access to the system logs which are recorded by the operating system, network management system, and database management system. Plus– and this is an important plus–we need to know how to read these logs. Often the numbers of entries in these logs are so voluminous that it’s a good idea to use software tools to edit, extract and summarize the meaningful information.

Although they are voluminous, these logs generally do not provide everything we need. In addition, we may need home-built or third-party plug-in tools to place probes into the system under test and gather the data, hopefully without materially changing the system’s performance and robustness characteristics.

Testability Pattern

As systems become more complex, it becomes more difficult and eventually impossible to test them adequately unless they have been specifically designed to be testable. Much of a system’s behavior may be hidden and not directly observable from the outside, which severely limits the effectiveness of non-invasive black-box testing. For example, an internal buffer overflow may be extremely difficult to observe in testing or in live operation, unless a capability has deliberately been designed into the system to provide this information.

To be testable, a system has to be (a) observable and (b) controllable. A system is relatively easy to observe if the outputs from that system are dependent only on the inputs, regardless of the internal state of the system or the state of its supporting infrastructure. But it’s not easy to test without monitoring the internal behavior of the system, if the outputs are dependent not just on the inputs but also on hidden, transient internal states of the system.

Designing systems for testability is often not done very well. In small, simple systems, the system architecture is fairly obvious to the test professionals, and there is a ready availability of access points to observe the internal states of the system. It is in large, complex systems that designing for testability becomes more important and also, unfortunately, much more difficult.

Usually the main problem is one of communications. Designing for testability requires a solid gray-box understanding of the system. With many large systems, the test & QA professionals do not understand the intricacies of the system architecture. This happens because the systems architects have not adequately tutored the test & QA people, so that they do not know how to exploit the gray-box perspective in testing.

The system architects may know the system but are not focused on its testability because they do not have the perspective of the test professionals. Testability is often a side issue or an afterthought, if it is considered by the architects at all. The test & QA professionals need to train and show the system architects what features need to be built into the system to make it testable.

A worse situation occurs when nobody understands the system architecture. Sometimes the system we are testing is an acquired product, where the testers do not have access to the designers, or the system is integrated from many disparate sources with no overall architect. Or the system is not new – the original architects may no longer be available, the system may have been patched heavily and the architecture “muddied” over the years.

Capacity Forecasting Pattern

Capacity is the ability of a system to grow or to support an additional work load without degrading performance to an unacceptable degree.

This type of testing aims to measure whether the allocated resources are sufficient for the job, how much spare capacity still remains in system for further growth of demand, and at what point in the growth the resources supporting the system will need to be upgraded. This is the point at which the response time or throughput become unacceptable as the demand grows.

Performance is mostly non-linear; as the load increases, the response time may not increase at all because the system has ample spare capacity, or may instead rapidly approach infinity when the system nears the point of saturation. So we usually need to test with normal and peak loads, and with overloads.

Non-linearity complicates the forecasting: a 10 gallon bucket when partly filled with let’s say seven gallons of liquid still has the spare capacity to hold another three gallons. Since system performance can degrade as utilization increases, effectively a half-full bucket may be considered to have no spare capacity.

For example, let’s say a system provides acceptable response times under a given work load, but the CPU and the computer’s semiconductor memory happen to be fully utilized at this load. In this situation, there is no remaining capacity for any minor increase in load.

We typically perform capacity testing, and testing for related interests such as scalability by steadily increasing the work load on the system and measuring the performance at each level of load, until we reach the point where the performance becomes unacceptable.

We typically measure the capacity of processors in MHz or GHz, which we also refer to as the processor speed, or as performed the number of calculations per second. The practical or real processor capacity is usually somewhat less than the rated or theoretical capacity because of overhead of the operating system. In systems with an appropriate level of resources and which are well-tuned, the system designers typically target the processor utilization to be within in the range of 40and 65 percent of the rated capacity.

We usually measure the capacity of databases in MB or GB. The practical or real database capacity is usually somewhat less than the rated or theoretical capacity, because of overheads and design limitations such as index files, links between records, and overflow buffers. Often databases, especially if they have not been de-fragmented, begin to provide unacceptable performance when they are only about two-thirds full according to their rated capacities.

The capacity of a network, which also is called bandwidth, we can measure in bits per second, packets per second, Erlangs or other units. (Erlang is considered to be the father of modern queuing theory.) The Erlang is a calculated, dimensionless measure of traffic intensity.

The practical or real capacity is usually somewhat less than the rated or theoretical capacity, because of overheads and design limitations. For example, in local area networks, which use CSMA/CD technology (carrier sense multiple access / collision detect) the practical capacity is usually only one-third to one-half of the rated or official capacity.

Measurement of Delays Pattern

To be able to measure response times for particular events, we need to assume a straightforward cause-and-effect relationship: this stimulus triggers that outcome. In this situation, we can easily identify the stimulus for each outcome, and we can measure the delay from this particular stimulus to its particular outcome.

If we cannot easily link the system outcomes to the stimuli, though, we need a more elaborate measurement (and model) of system behavior than the end-to-end response time. Consider a situation, for example, where the system logs and stores a stream of events in a file, but takes no action until the accumulated number of events reaches a certain threshold.

We could reach this threshold within seconds or not for several weeks, depending on the work conditions. In this situation, we may be interested in three elapsed-time numbers: (1) from the first event to the observable outcome, (2) from the very last event to the outcome, and (3) the average response time (from the median event to the outcome).

Loss Measurement Pattern

In networks especially, losses are a way of life. In analog networks, signals can attenuate (weaken) and their wave shapes become corrupted. In a congested switch, blocking may cause a loss – all ports or connections into the switch are already busy, and the system simply drops an incoming message when the input hopper (buffer) is already full. In packet-switched digital networks, a data packet can be lost in transit. In the Internet, for example, a data packet is deliberately killed when the number of hops which the packet takes from node to node (i.e., from router to router) exceeds a threshold (usually 16 hops), in order to prevent them endlessly circulating within the Internet and royally gumming up the works. Elaborate facilities have been designed into the Internet to take care of these packet losses.

Despite the fact that losses are considered routine, a high rate of loss is unattractive (usually anything more than a few percent). Every lost packet in the Internet generates at least two more messages, a request back to the point of origin to re-send the lost packet, and the re-sending of that packet.

The rate of loss tends to have a major impact on performance. A NASA study found that a 3 percent loss of data packets in the Internet leads to a 50 percent degradation in throughput.

Error Rate Measurement Pattern

Let’s say that the response to a database query happens in 0.1 second, but this response says: “Database not available”. Or that a Web service can handle 10,000 users simultaneously, but 500 of these users receive error messages. Fast response time and high throughput are irrelevant if the user can’t do his job.

Since this type of testing counts the incidence of errors or failures, we need a catalog of errors. Some lists contains items relevant to system and network administrators, such as “race conditions: timing out of sync.”, “memory leaks”, “page locking”, and “processor saturation,” but which are meaningless to the end users.

We also need a user-centric list of errors. We need to distinguish between symptoms of failure and causes of failure. Here we will be focusing almost exclusively on the symptoms, not the underlying causes, and only those symptoms which and meaningful to the end users see. Most of the items on the user error list are not catastrophic errors (“dark screen,” “dead keyboard”), but annoyances to which the system administrators may be oblivious.

The next question for the test designer is how to observe and capture these user errors. One way is for knowledgeable users to manually test and evaluate the results. Another is to use automated feature-level testing tools. Both have strengths and limitations, so it’s generally a good idea to employ a mix of manual and automated error detection.

Component-Specific Test Pattern

This type of testing examines the robustness of one system component (or sub-assembly).

It can be done as soon as the component is ready, before other components are built and well before the fully integrated system is ready for testing. By examining the behavior of one component in isolation, this testing makes it easier to isolate and pinpoint problems. And component bugs which are found earlier can also be eliminated earlier, improving the initial quality of the fully integrated system when it is delivered for testing.

Component-specific testing may require component test drivers, which can be expensive to build.

Calibration or Settings Measurement Pattern

This type of testing is interleaved with tuning, and its purpose is to provide feedback on the consequences of each iteration of tuning.

The test work load is kept the same, and typically the testers strive for exact repeatability of the test run from iteration to iteration of tuning.

Scalability Pattern

This type of testing investigates a system’s ability to grow. Growth can occur in several ways, which we may need to separately test: increase in the total load; increase in the number of concurrent users; increase in the size of a database; increase in the number of devices connected to a network, and so on.

We can test systems for their ability to scale down as well as up. For example, we may be interested in this question: can the software run adequately on a cheaper, slower processor or with less memory?

Compatibility And Configuration Pattern

This method considers the various configurations in which a system can be used, and how to check for compatibility or consistent behavior across these configurations.

Risk Prioritization Pattern

This method uses a risk assessment to identify and prioritize the likely risks which the system faces in live operation. We use this risk assessment to allocate test resources to the various aspects of the system, i.e., to focus the test effort to the areas which need the depth and intensity.

Failure Modes Effects And Assessment (FMEA) Pattern

One of the main objectives of a stress or robustness test is to see if we can make the system fail within the relatively safe and controlled confines of the test lab, in order to observe the conditions under which the system fails, how it fails (what happens), and whether it recovers in an acceptable manner.

Systems can fail in many ways, ranging from relatively minor ways such as dropping transactions or failing to give appropriate and timely warnings, to catastrophic interruptions of service. We care about how a system fails for two reasons: (1) its behavior in failure may vary based on how it failed, and (2) it is useful to know how the system might fail, in order to know how to cause it to fail in testing. A list of the ways in which a system can fail is a useful source of test cases.

The number of ways in which a system can fail also becomes higher if we broaden the interpretation of the word “failure” to mean not only catastrophic termination of services but also unacceptable service, such as failure to meet a service level agreement (SLA). In developing the robustness test strategy, we have to take a broad view of the system’s possible modes of failure, and not simply test the obvious.

For example, consider a non-essential server which occasionally crashes, but never with any data loss or corruption, and we can always re-start the system with minimal fuss and delay. We’d view the crash more as an inconvenience than as a catastrophe.

By contrast, many users and system administrators have not seen the presence of incorrect data values in a database as a failure. Creeping database corruption without a crash has more impact on system quality than losing the non-essential server, though, in part because the consequences of data corruption are less immediately obvious.

In any type of risk-based testing, the testers also can use a list of potential problems (i.e., modes of failure) as an important method to focus the testing efforts. If we can’t test everything, we should concentrate on the most significant opportunities: those risks which have the greatest exposure, or combination of likelihood of occurring multiplied by the consequences if the failure does happen.

About Authors

Ross Collard Ross Collard
As an unusually experienced and accomplished consultant and selfproclaimed software quality guru, Ross says he functions best as a trusted senior advisor in information technology. The founder in 1980 of Collard & Com...

Leave Comment

Please log in or signup to leave comments.