An Institutional View of Algorithmic Impact Assessments

Selbst, A. (2021). An Institutional View of Algorithmic Impact Assessments. Harvard Journal of Law and Technology, 35(10), 78. The author has indicated that paper that can be downloaded has a “draft” status.
First some general points about its relevance:
  1. Rich people get personalised one-to-one attention and services. Poor people get processed by algorithms. That may be a bit of a caricature, but there is also some truth there. Consider loan applications, bail applications, recruitment decisions, welfare payments. And perhaps medical diagnoses and treatments, depending to the source of service.  There is therefore a good reason for any evaluators concerned with equity to pay close attention to how algorithms affect the lives of the poorest sections of societies.
  2. This paper reminded me of the importance of impact assessments, as distinct from impact evaluations. The former are concerned with “effects-of-a-cause“, as distinct from the “causes-of-an-effect” , which is the focus of impact evaluations. In this paper impact assessment is specifically concerned about negative impacts, which is a narrower ambit than I  have seen previously in my sphere of work. But complementary to the expectations of positive impact associated with impact evaluations.  It may reflect the narrowness of my inhabited part of the evaluation world, but my feeling is that impact evaluations get way more attention than impact assessments. Yet once could argue that the default situation should be the reverse. Though I cant quite articulate my reasoning … I think it is something to do with the perception that most of the time the world acts on us, relative to us acting on the world.
Some selected quotes:
  1. The impact assessment approach has two principal aims. The first goal is to get the people who build systems to think methodically about the details and potential impacts of a complex project before its implementation, and therefore head off risks before they become too costly to correct. As proponents of values-in-design have argued for decades, the earlier in project development that social values are considered, the more likely that the end result will reflect those social values. The second goal is to create and provide documentation of the decisions made in development and their rationales, which in turn can lead to better accountability for those decisions and useful information for future policy  interventions (p.6)
    1. This Article will argue in part that once filtered through the institutional logics of the private sector, the first goal of improving systems through better design will only be effective in those organizations motivated by social obligation rather than mere compliance, but second goal of producing information needed for better policy and public understanding is what really can make the AIA regime worthwhile (p.8)
  2. Among all possible regulatory approaches, impact assessments are most useful where projects have unknown and hard-to-measure impacts on society, where the people creating the project and the ones with the knowledge and expertise to estimate its impacts have inadequate incentives to generate the needed information, and where the public has no other means to create that information. What is attractive about the AIA (Algorithmic Impact Assessment) is that we are now in exactly such a situation with respect to algorithmic harms. (p.7)
  3. The Article proceeds in four parts. Part I introduces the AIA, and
    explains why it is likely a useful approach….Part II briefly surveys different models of AIA that have been proposed as well as two alternatives: self-regulation and audits…Part III examines how institutional forces shape regulation and compliance, seeking to apply those lessons to the case of AIAs….Ultimately, the Part concludes that AIAs may not be
    fully successful in their primary goal of getting individual firms to consider
    social problems early, but that the second goal of policy-learning may well be
    more successful because it does not require full substantive compliance. Finally, Part IV looks at what we can learn from the technical community. This part discusses many relevant developments within technology industry and scholarship: empirical research into how firms understand AI fairness and ethics, proposals for documentation standards coming from academic and industrial labs, trade groups, standards organizations, and various self-regulatory framework proposal.(p.9)

 

 

The revised UNEG Ethical Guidelines for Evaluations (2020)

The UNEG Ethical Guidelines for Evaluation were first published in 2008. This document is a revision of the original document and was approved at the UNEG AGM 2020. These revised guidelines are consistent with the standards of conduct in the Charter of the United Nations, the Staff Regulations and Rules of the United Nations, the Standards of Conduct for the International Civil Service, and in the Regulations Governing the Status, Basic Rights and Duties of Officials other than Secretariat. They  are also consistent with the United Nations’ core values of Integrity, Professionalism and Respect for Diversity, the humanitarian principles of Humanity, Neutrality, Impartiality and Independence and the values enshrined in the Universal Declaration of Human Rights.

This document aims to support UN entity leaders and governing bodies as well as those organizing and conducting evaluations for the UN to ensure that an ethical lens informs day to day evaluation practice.

This document provides:

  • Four ethical principles for evaluation;
  • Tailored guidelines for entity leaders and governing bodies, evaluation organizers, and evaluation practitioners;
  • A detachable Pledge of Commitment to Ethical Conduct in Evaluation that all those involved in evaluations will be required to sign.

These guidelines are designed to be useful and applicable to all UN agencies, regardless of differences in mission (operational vs. normative agencies), in structures (centralized vs. decentralized), in the contexts for the work (development, peacekeeping, humanitarian) and in the nature of evaluations that are undertaken (oversight/accountability focused vs. learning).

“The Checklist Manifesto”, another perspective on managing the problem of extreme complexity

The Checklist Manifesto by Atul Gwande, 2009

Atul differentiates two types of problems that we face when dealing with extreme complexity. One is that of ignorance, there is a lot we simply don’t know. Unpredictability is a facet of complexity that many writers on the subject of complexity have given plenty of attention to, along with possible ways of managing that unpredictability. The other problem that Atul identifies is that of ineptitude. This is our inability to make good use of knowledge that is already available. He gives many examples where complex bodies of knowledge already exist that can make a big difference to people’s lives, notably in the field of medicine. But because of the very scale of those bodies of knowledge the reality is that people often are not cable of making full use of it and sometimes the consequences are disastrous. This facet of complexity is not something I’ve seen given very much attention to in the literature on complexity, at least that which I have come across. So I read this book with great interest, an interest magnified no doubt by my previous interest in, and experiments with, the use of weighted checklists, which are documented elsewhere on this website.

Another distinction that he makes is between task checklists and communication checklists. The first are all about avoiding dumb mistakes, forgetting to do things we should know that have to be done. The second is about coping with unexpected events, and the necessary characteristics of how we should cope by communicating relevant information to relevant people. He gives some interesting examples from the (big) building industry, where given the complexity of modern construction activities, and the extensive use of task checklists,  there are still inevitably various unexpected hitches which have to be responded to effectively, without jeopardising the progress or safety of the construction process.

Some selected quotes:

  • Checklists helped ensure a higher standard of baseline performance.
  • Medicine has become the art of managing extreme complexity  – and a test of whether such extreme complexity can, in fact, be humanely mastered”
  • Team work may just be hard in certain lines of work. Under conditions of extreme complexity, we inevitably rely on a division of tasks and  expertise…But the evidence suggests that we need them to see their job not just as performing their isolated  set of tasks well, but also helping the group get the best possible results
  • It is common to misconceived power checklists function in complex lines of work. They are not comprehensive how to guides whether for building a skyscraper or getting a plane out of trouble. They are quick and simple tools aimed to buttress the skills of expert professionals. And by remaining swift and usable and resolutely modest, they are saving thousands upon thousands of lives.
  • When you are making a checklist, you have a number of key decisions. You must define a clear pause point at which the checklist is supposed to be used (unless the moment is obvious, like when a warning light goes on or an engine fails) you must decide whether you want a do-confirm checklist or read-do checklist. With a do-confirm checklist team members perform their jobs from memory and experience, often separately. But then they stop. They paused to run the checklist and confirm that everything that was supposed to be done was done. With the read-do checklist, on the other hand, people carry out the task as they check them off, it’s more like a recipe. So for any new checklist created from scratch, you have to pick the type that makes the most sense of the situation.
  • We are obsessed in medicine with having great components – the best drugs, the best devices, the best specialists – but paid little attention to how to make them fit together well. Berwisk notes how wrongheaded this approach is ‘anyone who understands systems will know immediately that optimising part is not a great route to system excellent ‘he says.

I could go on, but I would rather keep reading the book… :-)

 

On the usefulness of deliberate (but bounded) randomness in decision making

 

An introduction

In many spheres of human activity, relevant information may be hard to find, and it may be of variable quality. Human capacities to objectively assess that information may also be limited and variable. Extreme cases may be easy to assess e.g projects or research that is definitely worth/not worth funding or papers that are definitely worth/not worth publishing. But in between these extremes there may be substantial uncertainty and thus room for tacit assumptions and unrecognised biases to influence judgements.  In some fields the size of this zone of uncertainty may be quite big (see Adam, 2019 below), so the consequences at stake can be substantial. This is the territory where a number of recent papers have argued that an explicitly random decision making process may be the best approach to take.

After you have scanned the references below, continue on to some musings about implications for how we think about complexity

The literature (a sample)

  • Osterloh, M., & Frey, B. S. (2020, March 9). To ensure the quality of peer reviewed research introduce randomness. Impact of Social Sciences. https://blogs.lse.ac.uk/impactofsocialsciences/2020/03/09/to-ensure-the-quality-of-peer-reviewed-research-introduce-randomness/  
    • Why random selection of contributions to which the referees do not agree? This procedure reduces the “conservative bias”, i.e. the bias against unconventional ideas. Where there is uncertainty over the quality of a contribution, referees have little evidence to draw on in order to make accurate evaluations. However, unconventional ideas may well yield high returns in the future. Under these circumstances a randomised choice among the unorthodox contributions is advantageous.
    • …two [possible] types of error: type I errors (“reject errors”) implying that a correct hypothesis is rejected, and type 2 errors implying that a false hypothesis is accepted (“accept errors”). The former matters more than the latter. “Reject errors” stop promising new ideas, sometimes for a long time, while “accept errors” lead to a waste of money, but may be detected soon once published. This is the reason why it is more difficult to identify “reject errors” than “accept errors”. Through randomisation the risks of “reject errors” are diversified.
  • Osterloh, M., & Frey, B. S. (2020). How to avoid borrowed plumes in academia. Research Policy, 49(1), 103831. https://doi.org/10.1016/j.respol.2019.103831 Abstract: Publications in top journals today have a powerful influence on ac
  • Liu, M., Choy, V., Clarke, P., Barnett, A., Blakely, T., & Pomeroy, L. (2020). The acceptability of using a lottery to allocate research funding: A survey of applicants. Research Integrity and Peer Review, 5(1), 3. https://doi.org/10.1186/s41073-019-0089-z
    • Background: The Health Research Council of New Zealand is the first major government funding agency to use a lottery to allocate research funding for their Explorer Grant scheme. …  the Health Research Council of New Zealand wanted to hear from applicants about the acceptability of the randomisation process and anonymity of applicants.   The survey asked about the acceptability of using a lottery and if the lottery meant researchers took a different approach to their application. Results:… There was agreement that randomisation is an acceptable method for allocating Explorer Grant funds with 63% (n = 79) in favour and 25% (n = 32) against. There was less support for allocating funds randomly for other grant types with only 40% (n = 50) in favour and 37% (n = 46) against. Support for a lottery was higher amongst those that had won funding. Multiple respondents stated that they supported a lottery when ineligible applications had been excluded and outstanding applications funded, so that the remaining applications were truly equal. Most applicants reported that the lottery did not change the time they spent preparing their application. Conclusions: The Health Research Council’s experience through the Explorer Grant scheme supports further uptake of a modified lottery.
  • Roumbanis, L. (2019). Peer Review or Lottery? A Critical Analysis of Two Different Forms of Decision-making Mechanisms for Allocation of Research Grants. Science, Technology, & Human Values44(6), 994–1019. https://doi.org/10.1177/0162243918822744  
  • Adam, D. (2019). Science funders gamble on grant lotteries.A growing number of research agencies are assigning money randomly. Nature, 575(7784), 574–575. https://doi.org/10.1038/d41586-019-03572-7
    • ….says that existing selection processes are inefficient. Scientists have to prepare lengthy applications, many of which are never funded, and assessment panels spend most of their time sorting out the specific order in which to place mid-ranking ideas. Low­ and high­ quality applications are easy to rank, she says. “But most applications are in the midfield, which is very big
    • The fund tells applicants how far they got in the process, and feedback from them has been positive, he says. “Those that got into the ballot and miss out don’t feel as disappointed. They know they were good enough to get funded and take it as the luck of the draw.”
  • Fang, F. C., & Casadevall, A. (2016). Research Funding: The Case for a Modified Lottery. MBio, 7(2). https://doi.org/10.1128/mBio.00422-16
    • ABSTRACT The time-honored mechanism of allocating funds based on ranking of proposals by scienti?c peer review is no longer effective, because review panels cannot accurately stratify proposals to identify the most meritorious ones. Bias has a major in?uence on funding decisions, and the impact of reviewer bias is magni?ed by low funding paylines. Despite more than a decade of funding crisis, there has been no fundamental reform in the mechanism for funding research. This essay explores the idea of awarding research funds on the basis of a modi?ed lottery in which peer review is used to identify the most meritorious proposals, from which funded applications are selected by lottery. We suggest that a modi?ed lottery for research fund allocation would have many advantages over the current system, including reducing bias and improving grantee diversity with regard to seniority, race, and gender.
    • See also: Casadevall, F. C. F. A. (2014, April 14). Taking the Powerball Approach to Funding Medical Research. Wall Street Journal. https://online.wsj.com/article/SB10001424052702303532704579477530153771424.html
  • Stone, P. (2011). The Luck of the Draw: The Role of Lotteries in Decision Making. In The Luck of the Draw: The Role of Lotteries in Decision Making. https://doi.org/10.1093/acprof:oso/9780199756100.001.0001
    • From the earliest times, people have used lotteries to make decisions–by drawing straws, tossing coins, picking names out of hats, and so on. We use lotteries to place citizens on juries, draft men into armies, assign students to schools, and even on very rare occasions, select lifeboat survivors to be eaten. Lotteries make a great deal of sense in all of these cases, and yet there is something absurd about them. Largely, this is because lottery-based decisions are not based upon reasons. In fact, lotteries actively prevent reason from playing a role in decision making at all. Over the years, people have devoted considerable effort to solving this paradox and thinking about the legitimacy of lotteries as a whole. However, these scholars have mainly focused on lotteries on a case-by-case basis, not as a part of a comprehensive political theory of lotteries. In The Luck of the Draw, Peter Stone surveys the variety of arguments proffered for and against lotteries and argues that they only have one true effect relevant to decision making: the “sanitizing effect” of preventing decisions from being made on the basis of reasons. While this rationale might sound strange to us, Stone contends that in many instances, it is vital that decisions be made without the use of reasons. By developing innovative principles for the use of lottery-based decision making, Stone lays a foundation for understanding when it is–and when it is not–appropriate to draw lots when making political decisions both large and small

Randomness in other species

  • Drew, L. (2020). Random Search Wired Into Animals May Help Them Hunt. Quanta Magazine. Retrieved 2 February 2021, from https://www.quantamagazine.org/random-search-wired-into-animals-may-help-them-hunt-20200611/
    • Of special interest here is the description of  Levy walks, a variety of randomised movement where the frequency  distribution of distances moved has one long tail. Levy walks have been the subject of exploration across multiple disciples, as seen in…
  • Reynolds, A. M. (2018). Current status and future directions of Lévy walk research. Biology Open, 7(1). https://doi.org/10.1242/bio.030106
    • Levy walks are specialised forms of random walks composed of clusters of multiple short steps with longer steps between them…. They are particularly advantageous when searching in uncertain or dynamic environments where the spatial scales of searching patterns cannot be tuned to target distributions…Nature repeatedly reveals the limits of our imagination. Lévy walks once thought to be the preserve of probabilistic foragers have now been identified in the movement patterns of human hunter-gatherers
Levy walk random versus Brownian motion random movement

Implications for thinking about complexity

Uncertainty of future states is a common characteristic of many complex systems, though not unique to these.  One strategy that human organisations can use to deal with uncertainty is to build up capital reserves, thus enhancing longer term resilience albeit at the cost of more immediate efficiencies. From the first set of papers referenced above, it seems like the deliberate and bounded use of randomness could provide a useful second option. The work being done on Levy walks also suggests that there are interesting variations on randomisation that should be explored.  It is already the case the designers of search/opitimisation algorithms have headed this way. If you are interested, you can read further on the subject of what are called  “Levy Flight ” algorithms.

On a more light hearted note, I would be interested to hear from the Cynefin school on how comfortable they would be marketing this approach to “managing” uncertainty to the managers and leaders they seem keen to engage with.

Another thought…years ago I did an analysis of data that had been collected on development projects that had been funded by the then DFID’s funded Civil Society Challenge Fund. This included data on project proposals, proposal assessments, and project outcomes. I used Rapid Miner Studio’s Decision Tree  module to develop predictive models of achievement ratings of the funded projects. Somewhat disappointingly, I failed to identify any attributes of project proposals, or how they had been initially assessed, which were good predictors of the subsequent performance of those projects. There are number of possible reasons why this might so. One of which may be the scale of the uncertainty gap between the evident likely failures and the evident likely successes. Various biases may have skewed judgements within this zone in a way that undermined the longer term predictive use of the proposal screening and approval process. Somewhat paradoxically, if instead a lottery mechanism had been used for selecting fundable proposals in the uncertainty zone this may well have led to the approval process being a better predictor eventual project performance.

Postscript: Subsequent finds…

  •  The Powerball Revolution. By Malcom Gladwell (n.d.). Revisionist History Season 5 Episode 3. Retrieved 7 April 2021, from http://revisionisthistory.com/episodes/44-the-powerball-revolution
    • On school student council lotteries in Bolivia
      • “Running for an office” and “Running an office” can be two very different things. Lotteries diminish the former and put the focus on the latter
      • “Its a more diverse group” that end up on the council, compared to those selected via election
      • “Nobody knows anything” -initial impressions of capacity are often not good predictors of leadership capacity. Contra assumption that voters can be good predictors of capacity in office.
    • Medical research grant review and selection
      • Review scores of proposals are poor predictors of influential and innovative research (based on citation analysis), but has been in use for decades.
    • A boarding school in New Jersey

 

Mapping the Standards of Evidence used in UK social policy.

Puttick, R. (2018). Mapping the Standards of Evidence used in UK social policy. Alliance for Useful Evidence.
.
“Our analysis focuses on 18 frameworks used by 16 UK organisations for judging evidence used in UK domestic social policy which are relevant to government, charities, and public service providers.
.
In summary:
• There has been a rapid proliferation of standards of evidence and other evidence frameworks since 2000. This is a very positive development and reflects the increasing sophistication of how evidence is generated and used in social policy.
• There are common principles underpinning them, particularly the shared goal of improving decision-making, but they often ask different questions, are engaging different audiences, generate different content, and have varying uses. This variance reflects the host organisation’s goals, which can be to inform its funding decisions, to make recommendations to the wider field, or to provide a resource for providers to help them evaluate.
• It may be expected that all evidence frameworks assess whether an intervention is working, but this is not always the case, with some frameworks assessing the quality of evidence, not the success of the intervention itself.
• The differences between the standards of evidence are often for practical reasons and reflect the host organisation’s goals. However, there is a need to consider more philosophical and theoretical tensions about what constitutes good evidence. We identified examples of different organisations reaching different conclusions about the same intervention; one thought it worked well, and the other was less confident. This is a problem: Who is right? Does the intervention work, or not? As the field develops, it is crucial that confusion and disagreement is minimised.
• One suggested response to minimise confusion is to develop a single set of standards of evidence. Although this sounds inherently sensible, our research has identified several major challenges which would need to be overcome to achieve this.
• We propose that the creation of a single set of standards of evidence is considered in greater depth through engagement with both those using standards of evidence, and those being assessed against them. This engagement would also help share learning and insights to ensure that standards of evidence are effectively achieving their goals.

Computational Modelling: Technological Futures

Council for Science and Technology & Government Office for Science, 2020. Available as pdf

Not the most thrilling/enticing title, but differently of interest. Chapter  3 provides a good overview of different ways of building models. Well worth a read, and definitely readable.

Recommendation 2: Decision-makers need to be intelligent customers for models, and those that supply models should provide appropriate
guidance to model users to support proper use and interpretation. This includes providing suitable model documentation detailing the model purpose, assumptions, sensitivities, and limitations, and evidence of appropriate quality assurance.


Chapters 1-3

The Alignment Problem: Machine Learning and Human Values

By Brian Christian. 334 pages. 2020 Norton. Author’s web page here

Brian Christian talking about his book on YouTube

RD comment: This is one of the most interesting and informative books I have read in the last few years. Totally relevant for evaluators thinking about the present and about future trends

Releasing the power of digital data for development. A guide to new opportunities

Releasing the power of digital data for development: A guide to new opportunities. (2020). Frontier Technologies, UKAID, NIRAS.
Contents

Section 1  Executive Summary
Section 2 Introduction
Section 3 Understanding and navigating the new data landscape
Section 4  What is needed to release the new potential?
Section 5  Further considerations
Appendix 1: Data opportunities potentially useful now in testing  environments
Appendix 2: Bibliography and further reading
Appendix 3: Methodological notes

Executive Summary

There are 8 conclusions we discuss in this report.

1. There is justified excitement and proven benefits in the use of new digital data sources, particularly where timeliness of data is important or there are persistent gaps in traditional data sources.  This might include data from fragile and conflict-affected states, data supporting decision-making about marginalised population groups, or in finding solutions to address persistent ethical issues where traditional sources have not proved adequate.

2. In many cases, improvements in and greater access to traditional data sources could be more effective than just new data alone, including developing traditional data in tandem with new data sources. This includes innovations in digitising traditional data sources, supporting the sharing of data between and within organisations, and integrating the use of new data sources with traditional data.

3. Decision-making around the use of new data sources should be highly devolved by empowering individual staff and be focused on multiple dimensions of data quality, not least because there are no “one size fits all” rules that determine how new digital data sources fit to specific needs, subject matters or geographies. This could be supported by ensuring:
a. Research, innovation, and technical support are highly demand-led, driven by specific data user needs in specific contexts; and
b. Staff have accessible guidance that demystifies the complexities of new data sources, clarifies the benefits and risks that need to be managed, and allows them to be ‘data brokers’ confident in navigating the new data landscape, innovating in it, and coordinating the technical expertise of others.

The main report includes a description of the evidence and conclusions in a way that supports these aims, including a set of guides for staff about the most promising new data sources.

4. Where traditional data sources are failing to provide the detailed data needed, most new data sources provide a potential route to helping with the Agenda 2030 goal to ‘leave no-one behind,’ as often they can provide additional granularity on population sub-groups.  But, to avoid harming the interests of marginalised groups, strong ethical frameworks are needed, and affected people should be involved in decisionmaking about how data is processed and used. Action is also required to ensure strong data protection environments according to each type of new data and the contexts of its use.

5. New data sources with the highest potential added value for exploitation now, especially when combined with each other or traditional data sources, were found to be:
a. data from Earth Observation (EO) platforms (including satellites and drones)
b. passive location data from mobile phones

6. While there are specific limitations and risks in different circumstances, each of these data sources provides for significant gains in certain dimensions of data quality compared to some traditional sources and other new data sources. The use of Artificial Intelligence (AI) techniques, such as through machine learning, has high potential to add value to digital datasets in terms of improving aspects of data quality from many different sources, such as social media data, and particularly with large complex datasets and across multiple data sources.

7. Beyond the current time horizon, the most potential for emerging data sources is likely to come from:
• The next generation of Artificial Intelligence
• The next generation of Earth Observation platforms
• Privacy Preserving Data Sharing (PPDS) via the Cloud and
• the Internet of Things (IoT).
No significant other data sources, technologies or techniques were found with high potential to benefit FCDO’s work, which seems to be in line with its current research agenda and innovative activities. Some longer-term data prospects have been identified and these could be monitored to observe increases in their potential in the future.

8. Several other factors are relevant to the optimal use of digital data sources which should be investigated and/or work in these areas maintained. These include important internal and external corporate developments, importantly including continued support to Open Data/ data sharing and enhanced data security systems to underpin it, learning across disciplinary boundaries with official statistics principles at the core, and continued support to capacity-building of national statistical systems in developing countries in traditional data and data innovation.

Calling Bullshit: THE ART OF SKEPTICISM IN A DATA-DRIVEN WORLD

Reviews

Wired review article

Guardian review article

Forbes review article

Kirkus Review article

Podcast Interview with the authors here

ABOUT CALLING BULLSHIT (=publisher blurb)
“Bullshit isn’t what it used to be. Now, two science professors give us the tools to dismantle misinformation and think clearly in a world of fake news and bad data.

Misinformation, disinformation, and fake news abound and it’s increasingly difficult to know what’s true. Our media environment has become hyperpartisan. Science is conducted by press release. Startup culture elevates bullshit to high art. We are fairly well equipped to spot the sort of old-school bullshit that is based in fancy rhetoric and weasel words, but most of us don’t feel qualified to challenge the avalanche of new-school bullshit presented in the language of math, science, or statistics. In Calling Bullshit, Professors Carl Bergstrom and Jevin West give us a set of powerful tools to cut through the most intimidating data.

You don’t need a lot of technical expertise to call out problems with data. Are the numbers or results too good or too dramatic to be true? Is the claim comparing like with like? Is it confirming your personal bias? Drawing on a deep well of expertise in statistics and computational biology, Bergstrom and West exuberantly unpack examples of selection bias and muddled data visualization, distinguish between correlation and causation, and examine the susceptibility of science to modern bullshit.

We have always needed people who call bullshit when necessary, whether within a circle of friends, a community of scholars, or the citizenry of a nation. Now that bullshit has evolved, we need to relearn the art of skepticism.”

Evaluation Failures: 22 Tales of Mistakes Made and Lessons Learned

Edited by: Kylie Hutchinson – Community Solutions, Vancouver, Canada. 2018 Published by Sage. https://us.sagepub.com/en-us/nam/evaluation-failures/book260109

But $30 for 184-page paperback is going to limit its appeal! The electronic version is similarly expensive, more like the cost of a hardback. Fortunately, two example chapters (1 and 8) are available as free pdfs, see below. Reading those two chapters makes me think the rest of the book would also be well worthwhile reading. It is not ofter you see anything written at length about evaluation failures. Perhaps we should set up an online-confessional, where we can line up to anonymously confess our un/professional sins. I will certainly be one of those needing to join such a queue! :)

PART I. MANAGE THE EVALUATION
Chapter 2. The Scope Creep Train Wreck: How Responsive Evaluation Can Go Off the Rails
Chapter 3. The Buffalo Jump: Lessons After the Fall
Chapter 4. Evaluator Self-Evaluation: When Self-Flagellation Is Not Enough
PART II. ENGAGE STAKEHOLDERS
Chapter 5. That Alien Feeling: Engaging All Stakeholders in the Universe
Chapter 6. Seeds of Failure: How the Evaluation of a West African
Chapter 7. I Didn’t Know I Would Be a Tightrope Walker Someday: Balancing Evaluator Responsiveness and Independence
PART III. BUILD EVALUATION CAPACITY
Chapter 9. Stars in Our Eyes: What Happens When Things Are Too Good to Be True
PART IV. DESCRIBE THE PROGRAM
Chapter 10. A “Failed” Logic Model: How I Learned to Connect With All Stakeholders
Chapter 11. Lost Without You: A Lesson in System Mapping and Engaging Stakeholders
PART V. FOCUS THE EVALUATION DESIGN
Chapter 12. You Got to Know When to Hold ’Em: An Evaluation That Went From Bad to Worse
Chapter 13. The Evaluation From Hell: When Evaluators and Clients Don’t Quite Fit
PART VI. GATHER CREDIBLE EVIDENCE
Chapter 14. The Best Laid Plans of Mice and Evaluators: Dealing With Data Collection Surprises in the Field
Chapter 15. Are You My Amigo, or My Chero? The Importance of Cultural Competence in Data Collection and Evaluation
Chapter 16. OMG, Why Can’t We Get the Data? A Lesson in Managing Evaluation Expectations
Chapter 17. No, Actually, This Project Has to Stop Now: Learning When to Pull the Plug
Chapter 18. Missing in Action: How Assumptions, Language, History, and Soft Skills Influenced a Cross-Cultural Participatory Evaluation
PART VII. JUSTIFY CONCLUSIONS
Chapter 19. “This Is Highly Illogical”: How a Spock Evaluator Learns That Context and Mixed Methods Are Everything
Chapter 20. The Ripple That Became a Splash: The Importance of Context and Why I Now Do Data Parties
Chapter 21. The Voldemort Evaluation: How I Learned to Survive Organizational Dysfunction, Confusion, and Distrust
PART VIII. REPORT AND ENSURE USE
Chapter 22. The Only Way Out Is Through
Conclusion