NECESSARY AND SUFFICIENT – LET’S THINK ABOUT THOSE TERMS FOR A MOMENT

We use the words necessary and sufficient almost everyday – but they have a specific meaning in evaluation, and play an important role in Impact Evaluation.

According to Dictionary.com:

  • Necessary:  being essential, indispensable or requisite; and
  • Sufficient:  adequate for the purpose, enough.

These absolutely hold true in evaluation nomenclature as well… but let’s take a closer look.

When we undertake an Impact Evaluation, we are looking to verify causality.  We want to know the extent to which the program caused the impacts or outcomes we observed.  The determination of causality is the essential element to all Impact Evaluations, as they not only measure or describe changes, but seek to understand the mechanisms that produced the changes.

This is where the words necessary and sufficient play an important role.

Imagine a scenario where your organisation delivers a skill-building program, and the participants who successfully complete your program have demonstrably improved their skills.  Amazing – that’s the outcome we want!

But, can we assume that the program delivered by your organisation caused the improvement in skills? 

Some members of the team are very confident – ‘yep, our program is great, we’ve received lots of comments from participants that they couldn’t have done it without the program.  It was the only thing that helped’.  Let’s call them Group 1.

Others in the team think that the program definitely had something to do with the observed success, but they think it also had something to do with the program the organisation ran last year in confidence-building, and that they build on each other.  We’ll call them Group 2.

Some others in the team think the program definitely helped people build their skills, but they’re also aware of other programs delivered by other organisations, that have also achieved similar outcomes.  Let’s call them Group 3.

Who is correct?  The particular strategies deployed within an Impact Evaluation will help determine this for us, but hopefully you can start to see an important role for the words necessary and sufficient.

  • Group 1 would assert that the program is necessary and sufficient to produce the outcome.  Their program, and only their program, can produce the outcome.
  • Group 2 would assert that the program is necessary, but not sufficient on its own, to cause the outcome.  Paired with the confidence-building program, the two together might be considered the cause of the impact.
  • Group 3 would claim that their program isn’t necessary, but is sufficient to cause the outcome.  It would seem there could be a few programs that could achieve the same results, so whilst their program might be effective, others are too.

Patricia Rogers has developed a simple graphic depicting the different types of causality – sole causal, joint causal and multiple causal attribution. 

Sole causal attribution is pretty rare, and wouldn’t usually be the model we would propose is at play.  But a joint causal or multiple causal model can usually explain causality. 

Do you think about the terms necessary and sufficient a little differently now? Whilst we use them almost every day, when talking causality, they are very carefully and purposefully selected words – they really do mean what they mean.

CLARITY OF PURPOSE IS SO IMPORTANT

Everything always comes back to purpose.

Have you been part of evaluations, where 6-12 months in, you’re starting to uncover some really important learnings… but you can’t quite recall exactly what you set out to explore when you started, and now you’re overwhelmed with choices about what to do with what you’ve learned… and sometimes you don’t end up doing anything with the learnings?

Or perhaps the opposite, where 6-12 months in, the learnings that are starting to emerge are really not meeting your expectations, and you’re wondering if this whole evaluation thing was a waste of time and resources?

Whilst there are many types of evaluations, one evaluation cannot evaluate everything.  A good evaluation is purposely designed to answer the specific questions of the intended users, to ensure it can be utilised for its intended use.  It’s critically important to ensure the evaluation, and all those involved in it, remain clear about its intended use by intended users.

A simple taxonomy that I find helpful is one proposed by Huey T. Chen (originally presented 1996, but later adapted in his 2015 Practical Program Evaluation).

Chen’s framework acknowledges that evaluations tend to have two main purposes or functions – a constructive function, with a view to making improvements to a program; and a conclusive function, where an overall judgement of the program’s merit is formed.  He also noted that evaluations can be split across program stages – the process phase, where the focus is on implementation; and the outcome phase, where the focus is on the impact the program has had.

The four domains are shown below:

  • Constructive process evaluation – provides information about the relative strengths and weaknesses of the program’s implementation, with the purpose of program improvement.
  • Conclusive process evaluation – judges the success of program implementation, eg, whether the target population was reached, whether the program has been accepted or embedded as BAU.
  • Constructive outcome evaluation – explores various program elements in an effort to understand if and how they are contributing to outcomes, with the purpose of program improvement.
  • Conclusive outcome evaluation – provides an overall assessment of the merit or worth of the program.

This simple matrix can serve to remind us of the purpose of the particular evaluation work we are doing at any given time.  It is simple, and there are of course nuances, where you may have an evaluation that spans neighbouring domains, or transitions from one domain to another, but despite its simplicity, I have found it a useful tool to remind me about the focus of the current piece of work or line of enquiry.

ARE WE TRACKING THE RIGHT INDICATORS?

Not all indicators are created equal, and if you’ve ended up with less-than-optimal ones – they’ll be a constant thorn in your side. 

Indicators tell us the state of something – but there’s two critical elements to that statement that need to be clearly defined.  The ‘us’ and the ‘something’. 

Let’s start with the ‘something’.

Good indicators should tell us the extent to which we are achieving our objective or goal – this is the ‘something’.  Good indicators are derived from selecting the best evidence that would indicate that we are achieving our objective or goal.  Let’s use a pretty common objective as an example – Improved client satisfaction.  What have been some indicators your organisation has tracked to give you an indication that this objective is on track or heading in the right direction?  How about numbers of complaints or did-not-attend rates or program completion rates?  Do these give you the best evidence of improved client satisfaction?  Whilst they might be loosely linked, they do not give you the best evidence of the extent to which you are improving client satisfaction. 

  • Tracking numbers of complaints is flawed, as would tracking any spontaneously provided feedback – positive or negative, as it is biased.  Only a subset of the population are motivated to spontaneously provide feedback about their experience.  It is not a true representation of client satisfaction for people who access your program… therefore, not the best evidence.
  • Monitoring did-not-attend rates might give you some indication of client satisfaction, but people may not attend for a host of other reasons, which may speak more to the appropriateness or accessibility of your program, and not so much about its quality, or how satisfied clients are with it.  Maybe you run a weekly group program, and part way through the program, you make a change to the start time from 10:00am to 9:00am, and subsequent to this, two of the 10 participants do not attend for two weeks straight.  Their non-attendance could be due to poor satisfaction, or it could just as well be due to accessibility issues, as the earlier time clashes with school drop-off.  The converse could also be true, where improved attendance may not necessarily indicate improved satisfaction, but rather could be due to other circumstances that have changed in the person’s life.
  • Using program completion rates to give an indication of satisfaction is similarly flawed.  It could just as well give you an indication of accessibility or appropriateness – not necessarily satisfaction.

Good indicators need to convince us that we are achieving our objective, by presenting evidence that is as indisputable as possible. It can’t just be loosely aligned – and this is where the ‘us’ comes in.

For indicators to convince ‘us’, we need to be the ones determining the best evidence.  The best evidence for improved client satisfaction, or improved outcomes, or improved performance in any area needs to be determined by the people who will use the indicator to inform a judgement or decision, and obviously should take into consideration the person most impacted by the decision.

Let’s use another common example of improved client outcomes.  The evidence that will convince a funder that clients are experiencing improved outcomes might be changes in a standard outcome measure (eg the widely used K10) whereas the evidence that would convince a front-line service provider, or more importantly, the client, might be different. We need to imagine what success looks like – what would we see; what would we hear; how would people be different; what would have changed? These tangible things that align with our definition of achieving the objective, should be considered as evidence.

The indicators we commit to tracking must be bespoke and fit-for-purpose.  It is a burdensome task collecting, tracking and reporting on indicators – so if they’re not the right ones, it’s a waste of precious time and resources.

DO WE REALLY TARGET OUR TARGET POPULATION?

When designing programs, we usually have a specific target population in mind.  In fact it would be highly unusual if we didn’t.  The target population, after all, is the group of people for whom the program is purposely designed.  Entry into the program is often restricted by certain criteria that help to differentiate between people who are, and are not, the target population. 

Once our program is operational, we actively recruit the target population into our programs, we advise our referrers who the target population is, and is not, and we try and be clear and transparent about this in all our comms.  We may have an alternative pathway for people who are referred to our programs who are not the target population.

Examples of target populations could be:

  • adults experiencing mild psychological distress;
  • young people aged 12-25 experiencing mild to moderate mental illness;
  • adults experiencing two or more chronic illnesses;
  • adults with a BMI of 30 or over;
  • young people who identify as part of the LGBTIQ+ community;
  • people from Culturally and Linguistically Diverse backgrounds;
  • young people disengaged from schooling;
  • people who smoke and reside in a particular geographic area…

Whilst all the above populations are different, it’s helpful to think of target populations as groups defined by a common need profile – people within a target population are connected by a common set of needs.

There may be a range of needs, with people at the lower and upper end of a need profile still being considered part of the target population, but it’s important to acknowledge that there are boundaries to our target population.  These boundaries are critically important, because we have designed our program to target a particular set of needs. 

No matter how amazing your program is, it can never meet everyone’s needs, which is what’s great about the diverse market of health and social service providers we have in Australia – we have a variety of services and service providers, with a variety of practitioner types, skilled in a variety of areas, located in a variety of settings.  Each individual program cannot appropriately and effectively serve every need profile, by design.

So, when we end up with people in our programs who don’t improve as we expect they should – what do we do?  A lot of the time, when we look closely at where programs have failed people, we find that their need profile is not within the range of the target population.  They are either beyond the lower or upper limits.  Maybe their needs were beyond the lower limits of our target population, and therefore they didn’t really need our program, and didn’t find value in it, and therefore didn’t genuinely commit… or maybe there needs were very high, and beyond the upper limit of our target population, and the program or service wasn’t comprehensive, intensive or frequent enough to genuinely meet their needs.

There are many reasons why people beyond the range of our target populations end up enrolled in our programs, but do we know who they are, and do we take them into consideration when we’re assessing our program’s performance.  When you look at the effectiveness of your program, or the appropriateness or quality of your program as determined by those who use it – do you think it would make a difference if you split your sample into two groups – those in the target population (who have a need profile that the program was designed to support), vs those outside the target population

It’s reasonable to expect, if you have a program that’s well designed, that effectiveness, appropriateness, and quality would be higher for the target population.  If you did discover that your program really wasn’t meeting the needs of people outside your target population – what would you change?  Are we doing people a disservice by enrolling them in our program if the evidence suggests it’s not going to be helpful?  If you discover that your program is actually reasonably effective for people with a more complex need profile than it was originally designed for – perhaps you could expand your scope.  This evidence could be used to lobby for additional funding.

Are you targeting your target population?

ARE WE NATURAL EVALUATORS?

Think about the last time you bought a car, chose where to live or decided which breakfast cereal to throw in the trolley (or in the online cart if you’re isolating like we have been for the past few days) … without overtly realising it, you quite possibly followed the logic of evaluation.

For those interested, you can read more about Michael Scriven’s work on the logic of evaluation here, but to paraphrase, evaluative thinking has four key steps.

  1. Establishing criteria of merit
  2. Constructing standards
  3. Assessing performance
  4. Evaluative judgement

Let’s apply this to buying your next car. 

Establishing criteria of merit

What criteria are important to consider in buying a car?  Maybe we have a firm budget, so any car we’re going to consider will need to perform well against that criteria.  Maybe we’re climate conscious, so we will place more value on a car that has a better emission rating.  Maybe our family has just expanded, so need to find a car that will accommodate a minimum number of people.  Maybe we don’t like white cars, maybe we’re looking for a car with low kms, maybe we’re only interested in cars made by a certain company, because we believe them to be a safe and trust-worthy company.

The list of possible meritorious criteria is extensive, and quite dependant on who needs to make the evaluative judgement.  Two people shopping for a car at the same time may have quite different criteria.

Constructing standards

This is where we define how well any potential car needs to perform against the various criteria we identified earlier.  If our budget is a firm $10,000, then any car that comes in over budget is not going to score well against that criteria.  Some of us may even write a list of our must haves and nice to haves, and include our standards in that list.  Maybe our next car must be within our budget, must be able to carry five people, must a be no older than 10 years old.  We’d also, if possible, like to find a purple car, that’s located within 50km of where we live, so we don’t have to travel too far to collect it, and has had less than 2 previous owners.  We now have a list of criteria that are important to us, and we have set standards against each of those criteria.

Assessing performance

Now’s the point at which we assess how any contenders stack up.  Regardless of whether we’re scrolling through an online marketplace for cars, browsing various websites, or physically walking through car yards – we are assessing how each car we review compares to our defined criteria.  Maybe we find a purple five-seater car within budget, but it’s 200km away.  Maybe we find a blue five-seater car within budget and within 50km, but it’s 15 years old.  In the same way that our programs, policies, products and people don’t perform ideally against all criteria – our potential cars are the same.  We will take note in our heads, or some of us will actually take notes in a notebook or prepare a spreadsheet, where we track how each car performs.  We may find that one or two of the criteria we determined were important to us, were simply unable to be assessed because data wasn’t available.  We should learn from this, and perhaps rethink our criteria and/or standards for next time.

Evaluative judgement

Now we need to make our evaluative judgement, which will inform our decision.  Which car performed the best?  Often not all criteria are equally important, and we apply different weightings.  Maybe we determine that blue is pretty close to purple, so despite the car not being the right colour, it’s still a contender.  We need to determine how we will synthesise all we have learned from our evaluation activities, and make an evaluative judgement. (For those interested in more about how to integrate or synthesise multiple criteria, Jane Davidson has a nice chapter on Synthesis Methodology in her book Evaluation Methodology Basics).

We often work through these steps fairly quickly in our heads, especially if the decision is about what cereal to buy, or what to have for dinner.  Are all the kids going to be home? Does one child have a particular allergy or dietary preference? Is the budget tighter this week because we needed to spend extra on fuel?  We might not go to the effort of making a spreadsheet, or getting evaluation support to make these decisions… but we do employ some logical evaluative thinking more often than we realise.

NEW EVALUATION RESOURCE AVAILABLE

Late last year (December 2021), the Commonwealth Government released a swanky new evaluation resource, complete with a bunch of useful information, resources and templates to help guide evaluation planning, implementation and use.

“The guide is designed to help Commonwealth entities and companies meet existing requirements. It provides a principles-based approach for the conduct of evaluations across the Commonwealth, and can be used to help plan how government programs and activities will be evaluated across the policy cycle in line with better practice approaches.”

The Guide is comprised of the Commonwealth’s Evaluation Policy, together with an Evaluation Toolkit, inclusive of a swag of templates, completed exemplars, case studies and other frameworks and tools to support evaluation activities.

Check it out – maybe there’s something that might be of use for your next evaluation activity.

WHAT’S NOT WORKING – THE INTERVENTION, OR IMPLEMENTATION OF IT?

I watched a really informative webinar on applying implementation science to evaluation a few years back that really struck a chord with me.  The facilitator, Dr Jessica Hateley-Brown walked participants through the foundations and key concepts of implementation science, including the Consolidated Framework for Implementation Research (CFIR), which if you get a chance – definitely dig into a little more… but it was a little snippet on the differentiation between intervention failure and implementation failure that blew my mind.  In hindsight, it’s still a little embarrassing that I hadn’t understood it so clearly prior to this, but I guess sometimes we can’t see the forest for the trees.

Having spent many years delivering services as a front-line clinician, and then managing and commissioning services from a bit further afar, explaining the obvious difference between intervention failure and implementation failure was like giving me a language that I could finally use to explain what I hadn’t been able to put words to.  I had lived and breathed the difference between intervention failure and implementation failure so many times – but I’d never thought about it so simply.

The concept of implementation outcomes – which are the result of deliberate actions and strategies to implement our interventions – was not new to me, and won’t be new to most of you.  We often collect data about the implementation of our services… but we don’t review it and use it as much as we should.  Implementation outcomes are the things that give us an indication of the quality of the implementation of our program or intervention.  It’s things like reach, acceptability, fidelity, cost and sustainability.  These things don’t tell us anything about the service outcomes or client outcomes.  They are the outcomes of successful implementation – if we’re lucky – and therefore give us an indication of the quality of the implementation.  Hopefully the quality is high, which lays the foundations for achieving our desired service outcomes and ultimately the client outcomes.

Service outcomes give us an indication of the quality of the service.  This might include things like safety, person-centredness, timeliness, and satisfaction.  These things don’t tell us about the quality of implementation, nor about any outcomes experienced by the client. 

And finally, client outcomes are the ultimate outcomes we are hoping clients experience – and might look like changes in knowledge, skillset, confidence, wellbeing, functioning or health status.

The outcomes of implementation are distinct from service outcomes, which are distinct from client outcomes.  Obvious yet mind-blowing at the same time!! 

As front-line staff working with people and implementing programs every day would well be aware, program implementation is dynamic.  Of course there’s a service or operating model guiding practice, but minor adjustments are made often to accommodate people, or meet people where they’re at.  We may learn that some staff are better suited to certain tasks, or some clients are more engaged on certain days of the week.  Noticing these things, and making adjustments, can have significant effects on the reach or acceptability of our programs.  It’s an early step towards the client outcomes we are hoping eventuate.

But sometimes… programs don’t work.  The people we are working with don’t experience the outcomes both they and the provider were hoping for.  Is it intervention that failed, or did we fail in the implementation of it? 

Maybe staff weren’t trained adequately to deliver the program; maybe the program was new and never fully embraced by the organisation; maybe the workplace had poor communication channels; maybe the program was seen as an add-on to existing work, and attitudes towards it were negative.  All of these things will likely effect implementation quality.  In some situations, it might be that the program or intervention never really got a chance, and it was deemed ineffective and phased out… when in fact it was poor implementation, or implementation failure that caused its demise.

When thinking about the programs you deliver, support or manage – can you articulate the outcomes of successful implementation, as distinct from service outcomes and client outcomes?  It might be a useful task to undertake as a team.  Of course, some programs or interventions are flawed in their design… but in many cases, failure to achieve client outcomes is not always due to intervention failure… but could be partially, of fully, the result of implementation failure.

WHAT’S MORE IMPORTANT – FINANCIAL STABILITY OR POSITIVE IMPACT?

I guess depending on your upbringing, your educational background, or the role you play in an organisation – you may pick one option more easily over the other.

Obviously I feel more strongly about the impact you or your organisation has.  What does it matter if you’re in a good position financially if you’re not achieving great things? 

But, the converse is also not ideal – achieving great things, but with the very real likelihood that you’ll lose great staff because of poor job security, or needing to let people go because you simply cannot fit them into the budget.  Before too long, this starts impacting the quality and reach of your programs… and then all of a sudden, you’re not achieving great things anymore.  You really can’t have one without the other.

Unsurprisingly, the title of this post is purposefully misleading, as both these areas of your business are critically important and deserve dedicated attention to ensure their success… but it struck me recently that the foundations for both areas are not the same. It’s much easier to reach a shared understanding, and therefore have meaningful conversations about our financial position, than it is for us to reach a shared understanding about our impact.

Think of regular reporting to Boards as an example.  It’s pretty commonplace to have a dedicated section of the Board report that details the organisation’s financial position.  Skilled staff within organisations prepare complex reports which speak to the solvency, liquidity, and cashflow of the organisation, and Boards usually have a director or two with similar skillsets who speak the same language.  When complex tables, charts and ratios are presented to the Board – they know what it means.  They can quickly form a judgement about whether things are good, or whether there is cause for concern.  The transition from the ‘what?’ to the ‘so what?’ happens pretty seamlessly.

For some reason, the same situation doesn’t exist for our impact reports.  I absolutely acknowledge that some organisations prepare meaningful impact reports for their Executive and Boards, but lots don’t, and of those that do, there doesn’t seem to be the same common language spoken.  In financial nomenclature, solvency and liquidity ratios have a specific and objective meaning.  The same can’t be said about effectiveness, efficiency, impact and reach.  It also certainly isn’t as commonplace for there to be Board members with particular skillsets in the performance monitoring or evaluation space.  This generally results in less questions being asked about performance in terms of impact, and a less than concrete awareness of how the organisation is performing when it comes to the impact on the people and communities they are funded to serve.  The translation of the ‘what?’ to the ‘so what?’ is not as easy.

The nebulous space that performance assessment occupies is often complex and subjective – but it doesn’t need to be. 

Organisations can certainly progress work to agree on what success looks like regarding the impact of their various programs.  They don’t need to bite off more than they can chew at their first bite either – it can be a staged process.

My experience has been that Board Directors enjoy the increased awareness they have about performance, and are genuinely excited by the prospect of learning more about the great things that are being achieved.  The same could be said about most people really – in that most of us enjoy knowing and learning more about something than we did previously.

So… how can we remedy this?  What can we do to make conversations about performance more mainstream?  How can we normalise a language that people understand and aren’t afraid to use?  How do we create an environment where the skilled staff who understand and champion better performance in both these areas of our businesses, can flourish, and can increase awareness and excitement in others?