Artificial Intelligence & Machine Learning

What is AI and ML?

Artificial Intelligence (AI) is a field of computer science dedicated to solving cognitive problems commonly associated with human intelligence, such as learning, problem solving, and pattern recognition. Put another way, AI is a catch-all term used to describe new types of computer software that can mimic human intelligence. There is no single, precise, universal definition of AI.

Machine learning (ML) is a subset of AI. Essentially, machine learning is one of the ways computers “learn.” ML is an approach to AI that relies on algorithms on large datasets trained to develop their own rules. This is an alternative to traditional computer programs, in which rules have to be hand-coded in. Machine Learning extracts patterns from data and places that data into different sets. ML has been described as “the science of getting computers to act without being explicitly programmed.” Two short videos provide simple explanations of AI and ML: What Is Artificial Intelligence? | AI Explained and What is machine learning?

Other subsets of AI include speech processing, natural language processing (NLP), robotics, cybernetics, vision, expert systems, planning systems and evolutionary computation. (See Artificial Intelligence – A modern approach).

artificial intelligence, types

The diagram above shows the many different types of technology fields that comprise AI. When referring to AI, one can be referring to any or several of these technologies or fields, and applications that use AI, like Siri or Alexa, utilize multiple technologies. For example, if you say to Siri, “Siri, show me a picture of a banana,” Siri utilizes “natural language processing” to understand what you’re asking, and then uses “vision” to find a banana and show it to you. The question of how Siri understood your question and how Siri knows something is a banana is answered by the algorithms and training used to develop Siri. In this example, Siri would be drawing from “question answering” and “image recognition.”

Most of these technologies and fields are very technical and relate more to computer science than political science. It is important to know that AI can refer to a broad set of technologies and applications. Machine learning is a tool used to create AI systems.

As noted above, AI doesn’t have a universal definition. There are lots of myths surrounding AI—everything from the notion that it’s going to take over the world by enslaving humans, to curing cancer. This primer is intended to provide a basic understanding of artificial intelligence and machine learning, as well as outline some of the benefits and risks posed by AI. It is hoped that this primer will enable you to a conversation about how best to regulate AI so that its potential can be harnessed to improve democracy and governance.

Definitions

Algorithm:An algorithm is defined as “a finite series of well-defined instructions that can be implemented by a computer to solve a specific set of computable problems.” Algorithms are unambiguous, step-by-step procedures. A simple example of an algorithm is a recipe; another is that of a procedure to find the largest number in a set of randomly ordered numbers. An algorithm may either be created by a programmer or generated automatically. In the latter case, it is generated using data via ML.

Algorithmic decision-making/Algorithmic decision system (ADS): A system in which an algorithm makes decisions on its own or supports humans in doing so. ADSs usually function by data mining, regardless of whether they rely on machine learning or not. Examples of a fully automated ADSs are the electronic passport control check-point at airports, and an online decision made by a bank to award a customer an unsecured loan based on the person’s credit history and data profile with the bank. An example of a semi-automated ADS are the driver-assistance features in a car that control its brake, throttle, steering, speed and direction.

Big Data: Extremely large data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions. Data is classified as Big Data based on its volume, velocity, variety, veracity and value. This video provides a short explainer video with an introduction to big data and the concept of the 5Vs.

Class label: The label applied after the ML system has classified its inputs, for example, Is a given email message spam or not spam?

Data mining: The practice of examining large pre-existing databases in order to generate new information.” Data mining is also defined as “knowledge discovery from data.

Deep model, also called a “deep neural network” is a type of neural network containing multiple hidden layers.

Label: A label is what the ML system is predicting.

Model: The representation of what a machine learning system has learned from the training data.

Neural network: A biological neural network (BNN) is a system in the brain that enables a creature to sense stimuli and respond to them. An artificial neural network (ANN) is a computing system inspired by its biological counterpart in the human brain. In other words, an ANN is “an attempt to simulate the network of neurons that make up a human brain so that the computer will be able to learn things and make decisions in a humanlike manner.” Large-scale ANNs drive several applications of AI.

Profiling: Profiling involves automated data processing to develop profiles that can be used to make decisions about people.

Robot: Robots are programmable, artificially intelligent automated devices. Fully autonomous robots, e.g., self-driving vehicles, are capable of operating and making decisions without human control. AI enables robots to sense changes in their environments and adapt their responses/ behaviors accordingly in order to perform complex tasks without human intervention. – Report of COMEST on robotics ethics (2017).

Scoring: ¨Scoring is also called prediction, and is the process of generating values based on a trained machine-learning model, given some new input data. The values or scores that are created can represent predictions of future values, but they might also represent a likely category or outcome.” When used vis-a-vis people, scoring is a statistical prediction that determines if an individual fits a category or outcome. A credit score, for example, is a number drawn from statistical analysis that represents the creditworthiness of an individual.

Supervised learning: ML systems learn how to combine inputs to produce predictions on never-before-seen data.

Unsupervised learning: Refers to training a model to find patterns in a dataset, typically an unlabeled dataset.

Training: The process of determining the ideal parameters comprising a model.

 

How do artificial intelligence and machine learning work?

Artificial Intelligence

Artificial Intelligence is a cross-disciplinary approach that combines computer science, linguistics, psychology, philosophy, biology, neuroscience, statistics, mathematics, logic and economics to “understanding, modeling, and replicating intelligence and cognitive processes by invoking various computational, mathematical, logical, mechanical, and even biological principles and devices.”

AI applications exist in every domain, industry, and across different aspects of everyday life. Because AI is so broad, it is useful to think of AI as made up of three categories:

  • Narrow AI or Artificial Narrow Intelligence (ANI) is an expert system in a specific task, like image recognition, playing Go, or asking Alexa or Siri to answer a question.
  • Strong AI or Artificial General Intelligence (AGI) is an AI that matches human intelligence.
  • Artificial Superintelligence (ASI) is an AI that exceeds human capabilities.

Modern AI techniques are developing quickly, and AI applications are already pervasive. However, these applications only exist presently in the “Narrow AI” field. Narrow AI, also known as weak AI, is AI designed to perform a specific, singular task, for example, voice-enabled virtual assistants such as Siri and Cortana, web search engines, and facial-recognition systems.

Artificial General Intelligence and Artificial Superintelligence have not yet been achieved and likely will not be for the next few years or decades.

Machine Learning

Machine Learning is an application of Artificial Intelligence. Although we often find the two terms used interchangeably, machine learning is a process by which an AI application is developed. The machine-learning process involves an algorithm that makes observations based on data, identifies patterns and correlations in the data, and uses the pattern/correlation to make predictions about something. Most of the AI in use today is driven by machine learning.

Just as it is useful to break-up AI into three categories, machine learning can also be thought of as three different techniques: Supervised Learning; Unsupervised Learning; and Deep Learning.

Supervised Learning

Supervised learning efficiently categorizes data according to pre-existing definitions embodied in a labeled data set. One starts with a data set containing training examples with associated labels. Take the example of a simple spam-filtering system that is being trained using spam as well as non-spam emails. The “input” in this case is all the emails the system processes. After humans have marked certain emails as spam, the system sorts spam emails into a separate folder. The “output” is the categorization of email. The system finds a correlation between the label “spam” and the characteristics of the email message, such as the text in the subject line, phrases in the body of the message, or the email address or IP address of the sender. Using the correlation, it tries to predict the correct label (spam/not spam) to apply to all the future emails it gets.

“Spam” and “not spam” in this instance are called “class labels”. The correlation that the system has found is called a “model” or “predictive model.” The model may be thought of as an algorithm the ML system has generated automatically by using data. The labelled messages from which the system learns are called “training data.” The “target variable” is the feature the system is searching for or wants to know more about — in this case, it is the “spaminess” of email. The “correct answer,” so to speak, in the endeavor to categorize email is called the “desired outcome” or “outcome of interest.” This type of learning paradigm is called “supervised learning.”

Unsupervised Learning

Unsupervised learning involves having neural networks learn to find a relationship or pattern without having access to datasets of input-output pairs that have been labelled already. They do so by organizing and grouping the data on their own, finding recurring patterns, and detecting a deviation from the usual pattern. These systems tend to be less predictable than those with labeled datasets and tend to be deployed in environments that may change at some frequency and/or are unstructured or partially structured. Examples include:

  1. an optical character-recognition system that can “read” handwritten text even if it has never encountered the handwriting before.
  2. the recommended products a user sees on online shopping websites. These recommendations may be determined by associating the user with a large number of variables such as their browsing history, items they purchased previously, their ratings of those items, items they saved to a wish list, the user’s location, the devices they use, their brand preference and the prices of their previous purchases.
  3. detection of fraudulent monetary transactions based on, say, their timing and locations. For instance, if two consecutive transactions happened on the same credit card within a short span of time in two different cities.

A combination of supervised and unsupervised learning (called “semi-supervised learning”) is used when a relatively small dataset with labels is available, which can be used to train the neural network to act upon a larger, un-labelled dataset. An example of semi-supervised learning is software that creates deepfakes – photos, videos and audio files that look and sound real to humans but are not.

Deep Learning

Deep learning makes use of large-scale artificial neural networks (ANNs) called deep neural networks, to  create AI that can detect financial fraud, conduct medical image analysis, translate large amounts of text without human intervention and automate moderation of content of social networking websites . These neural networks learn to perform tasks by utilizing numerous layers of mathematical processes to find patterns or relationships among different data points in the datasets. A key attribute to deep learning is that these ANNs can peruse, examine and sort huge amounts of data, which enables them, theoretically, to find new solutions to existing problems.

Although there are other types of machine learning, these three – Supervised Learning, Unsupervised Learning and Deep Learning – represent the basic techniques used to create and train AI systems.

Bias in AI and ML

Artificial intelligence doesn’t come from nowhere; it comes from data received and derived from its developers and from you and me.

And humans have biases. When an AI system learns from humans, it may inherit their individual and societal biases. In cases where it does not learn directly from humans, the “predictive model” as described above may be biased because of the presence of biases in the selection and sampling of data that train the AI system, the “class labels” identified by humans, the way class labels are “marked” and any errors that may have occurred while identifying them, the choice of the “target variable,” “desired outcome” (as opposed to an undesired outcome), “reward”, “regret” and so on. Bias may also occur because of the design of the system; its developers, designers, investors or makers may have ended up baking their own biases into it.

There are three types of biases in computing systems:

  • Pre-existing bias has its roots in social institutions, practices, and attitudes.
  • Technical bias arises from technical constraints or considerations.
  • Emergent bias arises in a context of use.

Bias may affect, for example, the political advertisements one sees on the Internet, the content pushed to the top of the pile in the feeds of social media websites, the amount of insurance premium one needs to pay, if one is screened out of a recruitment process, or if one is allowed to go past border-control checks in another country.

Bias in a computing system is a systematic and repeatable error. Because ML deals with large amounts of data, even a small error rate gets compounded or magnified and greatly affects the outcomes from the system. A decision made by an ML system, especially one that processes vast datasets, is often a statistical prediction. Hence, its accuracy is related to the size of the dataset. Larger training datasets are likely to yield decisions that are more accurate and lower the possibility of errors.

Bias in AI/ ML systems may create new inequalities, exacerbate existing ones, reproduce existing biases, discriminatory treatment and practices, and hide discrimination. See this explainer related to AI bias.

Back to top

How are AI and ML relevant in civic space and for democracy?

Elephant tusks pictured in Uganda. In wildlife conservation, AI/ ML algorithms and past data can be used to predict poacher attacks. Photo credit: NRCN.

The widespread proliferation, rapid deployment, scale, complexity and impact of AI on society is a topic of great interest and concern for governments, civil society, NGOs, human-rights bodies, businesses and the general public alike. AI systems may require varying degrees of human interaction or none at all. AI/ML when applied in design, operation and delivery of services offers the potential to provide new services, and improve the speed, targeting, precision, efficiency, consistency, quality or performance of existing ones. It may provide new insights by making apparent previously undiscovered linkages, relationships and patterns, and offer new solutions. By analyzing large amounts of data, ML systems save time, money and effort. Some examples of the application of AI/ ML in different domains include using AI/ ML algorithms and past data in wildlife conservation to predict poacher attacks  and discovering new species of viruses.

TB Microscopy Diagnosis in Uzbekistan. AI/ML systems aid healthcare professionals in medical diagnosis and easier detection of diseases. Photo credit: USAID.

The predictive abilities of AI and the application of AI and ML in categorizing, organizing, clustering and searching information have brought about improvements in many fields and domains, including healthcare, transportation, governance, education, energy, security and safety, crime prevention, policing, law enforcement, urban management and the judicial system. For example, ML may be used to track the progress and effectiveness of government and philanthropic programs. City administrations, including those of smart cities , use ML to analyze data accumulated over time, about energy consumption, traffic congestion, pollution levels, and waste, in order to monitor and manage them and identify patterns in their generation, consumption and handling.

Digital maps created in Mugumu, Tanzania. Artificial intelligence can support planning of infrastructure development and preparation for disaster. Photo credit: Bobby Neptune for DAI.

AI is also used in climate monitoring, weather forecasting, prediction of disasters and hazards, and planning of infrastructure development. In healthcare, AI systems aid healthcare professionals in medical diagnosis, robot-assisted surgery, easier detection of diseases, prediction of disease outbreaks, tracing the source(s) of disease spread, and so on. Law enforcement and security agencies deploy AI/ML-based surveillance systems, face recognition systems, drones , and predictive policing for the safety and security of the citizens. On the other side of the coin, many of these applications raise questions about individual autonomy, personhood, privacy, security, mass surveillance, reinforcement of social inequality and negative impacts on democracy (See the Risks section ).

Fish caught off the coast of Kema, North Sulawesi, Indonesia. Facial recognition is used to identify species of fish to contribute to sustainable fishing practices. Photo credit: courtesy of USAID SNAPPER.

The full impact of the deployment of AI systems on the individual, society and democracy is not known or knowable, which creates many legal, social, regulatory, technical and ethical conundrums. The topic of harmful bias in artificial intelligence and its intersection with human rights and civil rights has been a matter of concern for governments and activists. The EU General Data Protection Regulation (GDPR) has provisions on automated decision-making, including profiling. The European Commission released a whitepaper on AI in February 2020 as a prequel to potential legislation governing the use of AI in the EU, while another EU body has released recommendations on the human rights impacts of algorithmic systems. Similarly, Germany, France, Japan and India have drafted AI strategies for policy and legislation. Physicist Stephen Hawking once said, “…success in creating AI could be the biggest event in the history of our civilization. But it could also be the last, unless we learn how to avoid the risks.”

Back to top

Opportunities

Artificial intelligence and machine learning can have positive impacts when used to further democracy, human rights and governance issues. Read below to learn how to more effectively and safely think about artificial intelligence and machine learning in your work.

Detect and overcome bias

Humans come with individual and cognitive biases and prejudices and may not always act or think rationally. By removing humans from the decision-making process, AI systems potentially eliminate the impact of human bias and irrational decisions, provided the systems are not biased themselves, and that they are intelligible, transparent and auditable. AI systems that aid traceability and transparency can be used to avoid, detect or trace human bias (some of which may be discriminatory) as well as non-human bias, such as bias originating from technical limitations. Much research has shown how automated filtering of job applications reproduces multiple biases; however research has also shown that AI can be used to combat unconscious recruiter biases in hiring. For processes like job hiring where many hidden human biases go undetected,  responsibly-designed algorithms can act as a double check for humans and bring those hidden biases into view, and in some cases even nudge people into less-biased outcomes, for example by masking candidates’ names and other bias-triggering features on a resume.

Improve security and safety

Automated systems based on AI can be used to detect attacks, such as credit card fraud or a cyberattack on public infrastructure. As online fraud becomes more advanced, companies, governments, and individuals need to be able to identify fraud even more quickly, even before it occurs. It is like a game of cat and mouse. Computers are creating more complex, unusual patterns to avoid detection, the human understanding of these patterns is limited— humans need to use equally agile and unusual patterns too, that can adapt and iterate in real time, and Machine Learning can provide this.

Moderate harmful online content

Enormous quantities of content uploaded every second to the social web (videos on YouTube and TikTok, photos and posts to Instagram and Facebook, etc.). There is simply too much for human reviewers to examine themselves. Filtering tools like algorithms and machine-learning techniques are used by many social media platforms to screen through every post for illegal or harmful content (like child sexual abuse material, copyright violations, or spam). Indeed, artificial intelligence is at work in your email inbox, automatically filtering unwanted marketing content away from your main inbox. Recently, the arrival of deepfake and other computer-generated content requires similarly advanced approaches to identify it. Deepfakes take their name from the deep learning artificial-intelligence technology used to make them. Fact-checkers and other actors working to diffuse the dangerous, misleading power of these false videos are developing their own artificial intelligence to identify these videos as false.

Web Search

Search engines run on algorithmic ranking systems. Of course, search engines are not without serious biases and flaws, but they allow us to locate information from the infinite stretches of the internet. Search engines on the web (like Google and Bing) or within platforms (like searches within Wikipedia or within The New York Times) can enhance their algorithmic ranking systems by using machine learning to favor certain kinds of results that may be beneficial to society or of higher quality. For example, Google has an initiative to highlight “original reporting.”

Translation

Machine Learning has allowed for truly incredible advances in translation. For example, Deepl is a small machine-translation company that has surpassed even the translation abilities of the biggest tech companies. Other companies have also created translation algorithms that allow people across the world to translate texts into their preferred languages, or communicate in languages beyond those they know well, which has advanced the fundamental right of access to information, as well as the right to freedom of expression and the right to be heard.

Back to top

Risks

The use of emerging technologies can also create risks in civil society programming. Read below on how to discern the possible dangers associated with artificial intelligence and machine learning in DRG work, as well as how to mitigate for unintended – and intended – consequences.

Discrimination against marginalized groups

There are several ways in which AI may make decisions that can lead to discrimination, including  how the “target variable” and the “class labels” are defined; during the process of labeling the training data; when collecting the training data;  during the feature selection; and when proxies are identified. It is also possible to intentionally set up an AI system to be discriminatory towards one or more groups. This video explains how commercially available facial recognition systems trained on racially biased data sets discriminate against people of dark skin, women and gender-diverse people.

The accuracy of AI systems is based on how ML processes Big Data , which in turn depends on the size of the dataset. The larger the size, the more accurate the system’s decisions are likely to be. However, women, Black people and people of color (PoC), disabled people, minorities, indigenous people, LGBTQ+ people, and other minorities, are less likely to be represented in a dataset because of structural discrimination, group size, or attitudes that prevent their full participation in society. Bias in training data reflects and systematizes existing discrimination. Because an AI system is often a black box, it is hard to conclusively prove or demonstrate that it has made a discriminatory decision, and why it makes certain decisions about some individuals or groups of people. Hence, it is difficult to assess whether certain people were discriminated against on the basis of their race, sex, marginalized status or other protected characteristics. For instance, AI systems used in predictive policing, crime prevention, law enforcement and the criminal justice system are, in a sense, tools for risk-assessment. Using historical data and complex algorithms, they generate predictive scores that are meant to indicate the probability of the occurrence of crime, the probable location and time, and the people who are likely to be involved. When relying on biased data or biased decision-making structures, these systems may end up reinforcing stereotypes about underprivileged, marginalized or minority groups.

A study by the Royal Statistical Society notes that “…predictive policing of drug crimes results in increasingly disproportionate policing of historically over‐policed communities… and, in the extreme, additional police contact will create additional opportunities for police violence in over‐policed areas. When the costs of policing are disproportionate to the level of crime, this amounts to discriminatory policy.” Likewise, when mobile applications for safe urban navigation, software for credit-scoring, banking, housing, insurance, healthcare, and selection of employees and university students rely on biased data and decisions, they reinforce social inequality and negative and harmful stereotypes.

The risks associated with AI systems are exacerbated when AI systems make decisions or predictions involving minorities such as refugees or “life and death” matters such as medical care. A 2018 report by The University of Toronto and Citizen Lab notes, “Many [asylum seekers and immigrants] come from war-torn countries seeking protection from violence and persecution. The nuanced and complex nature of many refugee and immigration claims may be lost on these technologies, leading to serious breaches of internationally and domestically protected human rights, in the form of bias, discrimination, privacy breaches, due process and procedural fairness issues, among others. These systems will have life-and-death ramifications for ordinary people, many of whom are fleeing for their lives.” For medical and healthcare uses, the stakes are especially high because an incorrect decision made by the AI system could potentially put lives at risk or drastically alter the quality of life or wellbeing of the people affected by it.

Security vulnerabilities

Malicious hackers and criminal organizations may use ML systems to identify vulnerabilities in and target public infrastructure or privately-owned systems such as IoT devices and self-driven cars, for example.

If malicious entities target AI systems deployed in public infrastructure, such as smart cities , smart grids, and nuclear installations as well as healthcare facilities and banking systems, among others, they  “will be harder to protect, since these attacks are likely to become more automated and more complex and the risk of cascading failures will be harder to predict. A smart adversary may either attempt to discover and exploit existing weaknesses in the algorithms or create one that they will later exploit.” Exploitation may happen, for example, through a poisoning attack, which interferes with the training data if machine learning is used. Attackers may also “use ML algorithms to automatically identify vulnerabilities and optimize attacks by studying and learning in real time about the systems they target.”

Privacy and data protection

The deployment of AI systems without adequate safeguards and redress mechanisms may pose many risks to privacy and data protection (See also the Data Protection . Businesses and governments collect immense amounts of personal data in order to train the algorithms of AI systems that render services or carry out specific tasks and activities. Criminals, rogue states/ governments/government bodies, and people with malicious intent often try to target these data for various reasons ranging from carrying out monetary fraud to commercial gains to political motives. For instance, health data captured from smartphone applications and Internet-enabled wearable devices, if leaked, can be misused by credit agencies, insurance companies, data brokers, cybercriminals, etc. The breach or abuse of non-personal data, such as anonymized data, simulations, synthetic data, or generalized rules or procedures, may also affect human rights.

Chilling effect

AI systems used for surveillance, policing, criminal sentencing, legal purposes, etc. become a new avenue for abuse of power by the state to control citizens and political dissidents. The fear of profiling, scoring, discrimination and pervasive digital surveillance may have a chilling effect on citizens’ ability or willingness to exercise their rights or express themselves. Most people will modify their behavior in order to obtain the benefits of a good score and to avoid the disadvantages that come with having a bad score.

Opacity (Black box nature of AI systems)

Opacity may be interpreted as either a lack of transparency or a lack of intelligibility. Algorithms, software code, ‘behind-the-scenes’ processing and the decision-making process itself may not be intelligible to those who are not experts or specialized professionals. In legal/judicial matters, for instance, the decisions made by an AI system do not come with explanations, unlike those of judges which are required to state the reasons on which their legal order or judgment is based. The legal order or judgment is quite likely to be on public record.

Technological unemployment

Automation systems, including AI/ML systems, are increasingly being used to replace human labor in various domains and industries, eliminating a large number of jobs and causing structural unemployment (known as technological unemployment). With the introduction of AI/ML systems, some types of jobs will be lost, some others will be transformed, and new jobs will appear. The new jobs are likely to require specific or specialized skills that are amenable to AI/ML systems.

Loss of individual autonomy and personhood

Profiling and scoring in AI raise apprehensions that people are being dehumanized and reduced to a profile or score. Automated decision-making systems may impact the wellbeing, physical integrity, quality of life of people, the information they find or are targeted with, the services and products they can or cannot avail, among other things. This affects what constitutes an individual’s consent (or lack thereof), the way consent is formed, communicated and understood, and the context in which it is valid. “[T]he dilution of the free basis of our individual consent – either through outright information distortion or even just the absence of transparency – imperils the very foundations of how we express our human rights and hold others accountable for their open (or even latent) deprivation”. – Human Rights in the Era of Automation and Artificial Intelligence

Back to top

Questions

If you are trying to understand the implications of artificial intelligence and machine learning in your work environment, or are considering using aspects of these technologies as part of your DRG programming, ask yourself these questions:

  1. Is artificial intelligence or machine learning an appropriate, necessary, and proportionate tool to use for this project and with this community?
  2. Who is designing and overseeing the technology? Can they explain to you what is happening at different steps of the process?
  3. What data are being used to design and train the technology? How could these data lead to biased or flawed functioning of the technology?
  4. What reason do you have to trust the technology’s decisions? Do you understand why you are getting a certain result, or might there be a mistake somewhere? Is anything not explainable?
  5. Are you confident the technology will work as it is claimed when it is used with your community and on your project, as opposed to in a lab setting (or a theoretical setting)? What elements of your situation might cause problems or change the functioning of the technology?
  6. Who is analyzing and implementing the AI/ML technology? Do these people understand the technology, and are they attuned to its potential flaws and dangers? Are these people likely to make any biased decisions, either by misinterpreting the technology or for other reasons?
  7. What measures do you have in place to identify and address potentially harmful biases in the technology?
  8. What regulatory safeguards and redress mechanisms do you have in place, for people who claim that the technology has been unfair to them or abused them in any way?
  9. Is there a way that your AI/ML technology could perpetuate or increase social inequalities, even if the benefits of using AI and ML outweigh these risks? What will you do to minimize these problems and stay alert to them?
  10. Are you certain that the technology abides with relevant regulations and legal standards, including GDPR?
  11. Is there a way that this technology may not discriminate against people by itself, but that it may lead to discrimination or other rights violations, for instance when it is deployed in different contexts or if it is shared with untrained actors? What can you do to prevent this?

Back to top

Case Studies

“Preventing echo chambers: depolarising the conversation on social media”

“Preventing echo chambers: depolarising the conversation on social media” 

“RNW Media’s digital teams… have pioneered moderation strategies to support inclusive digital communities and have significantly boosted the participation of women in specific country settings. Through Social Listening methodologies, RNW Media analyses online conversations on multiple platforms across the digital landscape to identify digital influencers and map topics and sentiments in the national arena. Using Natural Language Processing techniques (such as sentiment and topic detection models), RNW Media can mine text and analyze this data to unravel deep insights into how online dialogue is developing over time. This helps to establish the social impact of the online moderation strategies while at the same time collecting evidence that can be used to advocate for young people’s needs.”

Forecasting climate change, improving agricultural productivity

In 2014, the International Center for Tropical Agriculture, the Government of Colombia, and Colombia’s National Federation of Rice Growers, using weather and crop data collected over the prior decade, predicted climate change and resultant crop loss for farmers in different regions of the country. The prediction “helped 170 farmers in Córdoba avoid direct economic losses of an estimated $ 3.6 million and potentially improve productivity of rice by 1 to 3 tons per hectare. To achieve this, different data sources were analyzed in a complementary fashion to provide a more complete profile of climate change… Additionally, analytical algorithms were adopted and modified from other disciplines, such as biology and neuroscience, and were used to run statistical models and compare with weather records.”

Doberman.io developed an iOS app

Doberman.io developed an iOS app that employs machine learning and speech recognition to automatically analyze speech in a meeting room. The app determines the amount of time each person has spoken and tries to identify the sex of each speaker, using a visualization of the contribution of each speaker almost in real time with the relative percentages of time during which males and females have spoken. “When the meeting starts, the app uses the mic to record what’s being said and will continuously show you the equality of that meeting. When the meeting has ended and the recording stops, you’ll get a full report of the meeting.”

Food security: Detecting diseases in crops using image analysis (2016)

Food security: Detecting diseases in crops using image analysis (2016) 

“Using a public dataset of 54,306 images of diseased and healthy plant leaves collected under controlled conditions, we train a deep convolutional neural network to identify 14 crop species and 26 diseases (or absence thereof). The trained model achieves an accuracy of 99.35% on a held-out test set, demonstrating the feasibility of this approach.”

Can an ML model potentially predict the closure of civic spaces more effectively than traditional approaches? The USAID-funded INSPIRES project is testing the proposition that machine learning can help identify early flags that civic space may shift and generate opportunities to evaluate the success of interventions that strive to build civil society resilience to potential shocks.

Back to top

References

Find below the works cited in this resource.

Additional Resources

Back to top

Categories

Data Protection

What is data protection?

Data protection refers to practices, measures and laws that aim to prevent certain information about a person from being collected, used or shared in a way that is harmful to that person.

Interview with fisherman in Bone South Sulawesi, Indonesia. Data collectors must receive training on how to avoid bias during the data collection process. Photo credit: Indah Rufiati/MDPI – Courtesy of USAID Oceans.

Data protection isn’t new. Bad actors have always sought to gain access to individuals’ private records. Before the digital era, data protection meant protecting individuals’ private data from someone physically accessing, viewing or taking files and documents. Data protection laws have been in existence for more than 40 years.

Now that many aspects of peoples’ lives have moved online, private, personal and identifiable information is regularly shared with all sorts of private and public entities. Data protection seeks to ensure that this information is collected, stored and maintained responsibly and that unintended consequences of using data are minimized or mitigated.

What are data?

Data refer to digital information, such as text messages, videos, clicks, digital fingerprints, a bitcoin, search history and even mere cursor movements. Data can be stored on computers, mobile devices, in clouds and on external drives. It can be shared via e-mail, messaging apps and file transfer tools. Your posts, likes and retweets, your videos about cats and protests, and everything you share on social media is data.

Metadata are a subset of data. It is information stored within a document or file. It’s an electronic fingerprint that contains information about the document or file. Let’s use an email as an example. If you send an email to your friend, the text of the email is data. The email itself, however, contains all sorts of metadata like, who created it, who the recipient is, the IP address of the author, the size of the email, etc.

Large amounts of data get combined and stored together. These large files containing thousands or millions of individual files are known as datasets. Datasets then get combined into very large datasets. These very large datasets, referred to as to big data , are used to train machine-learning systems.

Personal Data and Personally Identifiable Information

Data can seem to be quite abstract, but the pieces of information are very often reflective of the identities or behaviors of actual persons. Not all data require protection, but some data, even metadata, can reveal a lot about a person. This is referred to as Personally Identifiable Information (PII). PII is commonly referred to as personal data. PII is information that can be used to distinguish or trace an individual’s identity such as a name, passport number or biometric data like fingerprints and facial patterns. PII is also information that is linked or linkable to an individual, such as date of birth and religion.

Personal data can be collected, analyzed and shared for the benefit of the persons involved, but they can also be used for harmful purposes. Personal data are valuable for many public and private actors. For example, they are collected by social media platforms and sold to advertising companies. They are collected by governments to serve law-enforcement purposes like prosecution of crimes. Politicians value personal data to target voters with certain political information. Personal data can be monetized by people with criminal purposes such as selling false identities.

“Sharing data is a regular practice that is becoming increasingly ubiquitous as society moves online. Sharing data does not only bring users benefits, but is often also necessary to fulfill administrative duties or engage with today’s society. But this is not without risk. Your personal information reveals a lot about you, your thoughts, and your life, which is why it needs to be protected.”

Access Now’s ‘Creating a Data Protection Framework’, November 2018.

How does data protection relate to the right to privacy?

The right to protection of personal data is closely interconnected to, but distinct from, the right to privacy. The understanding of what “privacy” means varies from one country to another based on history, culture, or philosophical influences. Data protection is not always considered a right in itself. Read more about the differences between privacy and data protection here.

Data privacy is also a common way of speaking about sensitive data and the importance to protect it against unintentional sharing and undue or illegal  gathering and use of data about an individual or group. USAID recently shared a resource about promoting data privacy in COVID-19 and development, which defines data privacy as ‘the  right  of  an  individual  or  group  to  maintain  control  over  and  confidentiality  of  information  about  themselves’.

How does data protection work?

Participant of the USAID WeMUNIZE program in Nigeria. Data protection must be considered for existing datasets as well. Photo credit: KC Nwakalor for USAID / Digital Development Communications

Personal data can and should be protected by measures that protect from harm the identity or other information about a person and that respects their right to privacy. Examples of such measures include determining which data are vulnerable based on privacy-risk assessments; keeping sensitive data offline; limiting who has access to certain data; anonymizing sensitive data; and only collecting necessary data.

There are a couple of established principles and practices to protect sensitive data. In many countries, these measures are enforced via laws, which contain the key principles that are important to guarantee data protection.

“Data Protection laws seek to protect people’s data by providing individuals with rights over their data, imposing rules on the way in which companies and governments use data, and establishing regulators to enforce the laws.”

Privacy International on data protection

A couple of important terms and principles are outlined below, based on The European Union’s General Data Protection Regulation (GDPR).

  • Data Subject: any person whose personal data are being processed, such as added to a contacts database or to a mailing list for promotional emails.
  • Processing data means that any operation is performed on the personal data, manually or automated.
  • Data Controller: the actor that determines the purposes for, and means by which, personal data are processed.
  • Data Processor: the actor that processes personal data on behalf of the controller, often a third-party external to the controller, such as a party that offers mailing list or survey services.
  • Informed Consent: individuals understand and agree that their personal data are collected, accessed, used and/or shared and how they can withdraw their consent.
  • Purpose limitation: personal data are only collected for a specific and justified use and the data cannot be used for other purposes by other parties.
  • Data minimization: that data collection is minimized and limited to essential details.

 

Healthcare provider in Eswatini. Quality data and protected datasets can accelerate impact in the public health sector. Photo credit: Ncamsile Maseko & Lindani Sifundza.

Access Now’s guide lists eight data-protection principles that come largely from international standards, in particular the Council of Europe Convention for the Protection of Individuals with regard to Automatic Processing of Personal Data (widely known as Convention 108) and the Organization for Economic Development (OECD) Privacy Guidelines and are considered to be “minimum standards” for the protection of fundamental rights by countries that have ratified international data protection frameworks. 

A development project that uses data, whether establishing a mailing list or analyzing datasets, should comply with laws on data protection. When there is no national legal framework, international principles, norms and standards can serve as a baseline to achieve the same level of protection of data and people. Compliance with these principles may seem burdensome, but implementing a few steps related to data protection from the beginning of the project will help to achieve the intended results without putting people at risk. 

common practices of civil society organizations relate to the terms and principles of the data protection framework of laws and norms

The figure above shows how common practices of civil society organizations relate to the terms and principles of the data protection framework of laws and norms.  

The European Union’s General Data Protection Regulation (GDPR)

The data-protection law in the EU, the GDPR, went into effect in 2018. It is often considered the world’s strongest data-protection law. The law aims to enhance how people can access their information and limits what organizations can do with personal data from EU citizens. Although coming from the EU, the GDPR can also apply to organizations that are based outside the region when EU citizens’ data are concerned. GDPR therefore has a global impact.

The obligations stemming from the GDPR and other data protection laws may have broad implications for civil society organizations. For information about the GDPR- compliance process and other resources, see the European Center for Not-for-Profit Law ‘s guide on data-protection standards for civil society organizations.

Notwithstanding its protections, the GDPR also has been used to harass CSOs and journalists. For example, a mining company used a provision of the GDPR to try to force Global Witness to disclose sources it used in an anti-mining campaign. Global Witness successfully resisted these attempts.

Personal or organizational protection tactics

How to protect your own sensitive information or the data of your organization will depend on your specific situation in terms of activities and legal environment. A first step is to assess your specific needs in terms of security and data protection. For example, which information could, in the wrong hands, have negative consequences for you and your organization 

Digitalsecurity specialists have developed online resources you can use to protect yourself. Examples are the Security Planner, an easy-to-use guide with expert-reviewed advice for staying safer online with recommendations on implementing basic online practices. The Digital Safety Manual offers information and practical tips on enhancing digital security for government officials working with civil society and Human Rights Defenders (HRDs). This manual offers 12 cards tailored to various common activities in the collaboration between governments (and other partners) and civil society organizations. The first card helps to assess the digital security

Digital Safety Manual

  1. Assessing Digital Security Needs
  2. Basic Device Security
  3. Passwords and Account Protection
  4. Connecting to the Internet Securely
  5. Secure Calls, Chat, and Email
  6. Security and Social Media Use
  7. Secure Data Storage and Deletion
  8. Secure File Transfer
  9. Secure Contract Handling
  10. Targeted Malware and Other Attacks
  11. Phone Tracking and Surveillance
  12. Security Concerns Related to In-Person Meetings

 

The Digital First Aid Kit is a free resource for rapid responders, digital security trainers, and tech-savvy activists to better protect themselves and the communities they support against the most common types of digital emergencies. Global digital safety responders and mentors can help with specific questions or mentorship, for example, The Digital Defenders Partnership and the Computer Incident Response Centre for Civil Society (CiviCERT) . 

Back to top

How is data protection relevant in civic space and for democracy?

Many initiatives that aim to strengthen civic space or improve democracy use digital technology. There is a widespread belief that the increasing volume of data and the tools to process them can be used for good. And indeed, integrating digital technology and the use of data in democracy, human rights and governance programming can have significant benefits; for example, they can connect communities around the globe, reach underserved populations better, and help mitigate inequality.

“Within social change work, there is usually a stark power asymmetry. From humanitarian work, to campaigning, documenting human rights violations to movement building, advocacy organisations are often led by – and work with – vulnerable or marginalised communities. We often approach social change work through a critical lens, prioritising how to mitigate power asymmetries. We believe we need to do the same thing when it comes to the data we work with – question it, understand its limitations, and learn from it in responsible ways.”

What is Responsible Data?

When quality information is available to the right people when they need it, the data are protected against misuse and the project is designed with protection of its users in mind, it can accelerate impact.

  • USAID’s funding of improved vineyard inspection using drones and GPS-data in Moldova, allowing farmers to quickly inspect, identify, and isolate vines infected by a ​phytoplasma disease of the vine. 
  • Círculo is a digital tool for female journalists in Mexico to help them create strong networks of support, strengthen their safety protocols and meet needs related to protection of themselves and their data. The tool was developed with the end-users through chat groups and in-person workshops to make sure everything built in the app was something they needed and could trust.

At the same time, data-driven development brings a new responsibility to prevent misuse of data, when designing,  implementing or monitoring development projects. When the use of personal data is a means to identify people who are eligible for humanitarian services, privacy and security concerns are very real.  

  • Refugee camps In Jordan have required community members to allow scans of their irises to purchase food and supplies and take out cash from ATMs. This practice has not integrated meaningful ways to ask for consent or allow people to opt-out. Additionally, the use and collection of highly sensitive personal data like biometrics to enable daily purchasing habits is disproportionate, because other less personal digital technologies are available and used in many parts of the world.  

Governments, international organizations, private actors can all – even unintentionally – misuse personal data for other purposes than intended, negatively affecting the wellbeing of the people related to that data. Some examples have been highlighted by Privacy International: 

  • The case of Tullow Oil, the largest oil and gas exploration and production company in Africa, shows how a private actor considered extensive and detailed research by a micro-targeting research company into the behaviors of local communities in order to get ‘cognitive and emotional strategies to influence and modify Turkana attitudes and behavior’ to the Tullow Oil’s advantage.
  • In Ghana, the Ministry of Health commissioned a large study on health practices and requirements in Ghana. This resulted in an order from the ruling political party to model future vote distribution within each constituency based on how respondents said they would vote, and a negative campaign trying to get opposition supporters not to vote.  

There are resources and experts available to help with this process. The Principles for Digital Development  website offers recommendations, tips and resources to protect privacy and security throughout a project lifecycle, such as the analysis and planning stage, for designing and developing projects and when deploying and implementing. Measurement and evaluation are also covered. The Responsible Data website offers the Illustrated Hand-Book of the Modern Development Specialist with attractive, understandable guidance through all steps of a data-driven development project: designing it, managing data, with specific information about collecting, understanding and sharing it, and closing a project. 

NGO worker prepares for data collection in Buru Maluku, Indonesia. When collecting new data, it’s important to design the process carefully and think through how it affects the individuals involved. Photo credit: Indah Rufiati/MDPI – Courtesy of USAID Oceans.

Back to top

Opportunities

Data protection measures further democracy, human rights and governance issues. Read below to learn how to more effectively and safely think about data protection in your work.

Privacy respected and people protected

Implementing dataprotection standards in development projects protects people against potential harm from abuse of their data. Abuse happens when an individual, company or government accesses personal data and uses them for purposes other than those for which the data were collected. Intelligence services and law enforcement authorities often have legal and technical means to enforce access to datasets and abuse the data. Individuals hired by governments can access datasets through hacking the security of software or clouds. This has often led to intimidation, silencing and arrests of human rights defenders and civil society leaders criticizing their government. Privacy International maps examples of governments and private actors abusing individuals’ data.  

Strong protective measures against data abuse ensure respect for the fundamental right to privacy of the people whose data are collected and used. Protective measures allow positive development such as improving official statistics, better service delivery, targeted early warning mechanisms and effective disaster response. 

It is important to determine how data are protected throughout the entire life cycle of a project. Individuals should also be ensured of protection after the project ends, either abruptly or as intended, when the project moves into a different phase or when it receives funding from different sources. Oxfam has developed a leaflet to help anyone handling, sharing or accessing program data to properly consider responsible data issues throughout the data lifecycle, from making a plan to disposing data. 

Back to top

Risks

The collection and use of data can also create risks in civil society programming. Read below on how to discern the possible dangers associated with collection and use of data in DRG work, as well as how to mitigate for unintended – and intended – consequences.

Unauthorized access to data

Data need to be stored somewhere. On a computer or an external drive, in a cloud or on a local server. Wherever the data are stored, precautions need to be taken to protect the data from unauthorized access and to avoid revealing the identities of vulnerable persons. The level of protection that is needed depends on the sensitivity of the data, i.e. to what extent it could have negative consequences if the information fell into the wrong hands.

Data can be stored on a nearby and well-protected server that is connected to drives with strong encryption and very limited access, which is a method to stay in control of the data you own. Cloud services offered by well-known tech companies often offer basic protection measures and wide access to the dataset for free versions. More advanced security features are available for paying customers, such as storage of data in certain jurisdictions with data- protection legislation. The guidelines on how to secure private data stored and accessed in the cloud help to understand various aspects of clouds and to decide about a specific situation.

Every system needs to be secured against cyberattacks and manipulation. One common challenge is finding a way to protect identities in the dataset, for example, by removing all information that could identify individuals from the data, i.e. anonymizing it. Proper anonymization is of key importance and harder than often assumed.

One can imagine that a dataset of GPS-locations of People Living with Albinism across Uganda requires strong protection. Persecution is based on the belief that certain body parts of people with albinism can transmit magical powers, or that they are presumed to be cursed and bring bad luck. A spatial-profiling project mapping the exact location of individuals belonging to a vulnerable group can improve outreach and delivery of support services to them. However, hacking of the database or other unlawful access to their personal data might put them at risk by people wanting to exploit or harm them.

One could also imagine that the people operating an alternative system to send out warning sirens for air strikes in Syria run the risk of being targeted by authorities. While data collection and sharing by this group aims to prevent death and injury, it diminishes the impact of air strikes by the Syrian authorities. The location data of the individuals running and contributing to the system needs to be protected against access or exposure.

Another risk is that private actors who run or cooperate in data-driven projects could be tempted to sell data if they are offered large sums of money. Such buyers could be advertising companies or politicians that aim to target commercial or political campaigns at specific people.

The Tiko system designed by social enterprise Triggerise rewards young people for positive health-seeking behaviors, such as visiting pharmacies and seeking information online. Among other things, the system gathers and stores sensitive personal and health information about young female subscribers who use the platform to seek guidance on contraceptives and safe abortions, and it tracks their visits to local clinics. If these data are not protected, governments that have criminalized abortion could potentially access and use that data to carry out law-enforcement actions against pregnant women and medical providers.

Unsafe collection of data

When you are planning to collect new data, it is important to carefully design the collection process and think through how it affects the individuals involved. It should be clear from the start what kind of data will be collected, for what purpose, and that the people involved agree with that purpose. For example, an effort to map people with disabilities in a specific city can improve services. However, the database should not expose these people to risks, such as attacks or stigmatization that can be targeted at specific homes. Also, establishing this database should answer to the needs of the people involved and not driven by the mere wish to use data. For further guidance, see the chapter Getting Data in the Hand-book of the Modern Development Specialist and the OHCHR Guidance to adopt a Human Rights Based Approach to Data, focused on collection and disaggregation. 

If data are collected in person by people recruited for this process, proper training is required. They need to be able to create a safe space to obtain informed consent from people whose data are being collected and know how to avoid bias during the data-collection process. 

Unknowns in existing datasets

Data-driven initiatives can either gather new data, for example, through a survey of students and teachers in a school, or use existing datasets from secondary sources, for example by using a government census or scraping social media sources. Data protection must also be considered when you plan to use existing datasets, such as images of the Earth for spatial mapping. You need to analyze what kind of data you want to use and whether it is necessary to use a specific dataset to reach your objective. For third-party datasets, it is important to gain insight into how the data that you want to use were obtained, whether the principles of data protection were met during the collection phase, who licensed the data and who funded the process. If you are not able to get this information, you must carefully consider whether to use the data or not. See the Hand-book of the Modern Development Specialist on working with existing data. 

Back to top

Questions

If you are trying to understand the implications of lacking data protection measures in your work environment, or are considering using data as part of your DRG programming, ask yourself these questions:

  1. Are data protection laws adopted in the country or countries concerned?
    Are these laws aligned with international human rights law, including provisions protecting the right to privacy?
  2. How will the use of data in your project comply with data protection and privacy standards?
  3. What kind of data do you plan to use? Are personal or other sensitive data involved?
  4. What could happen to the persons related to that data if the government accesses these data?
  5. What could happen if the data are sold to a private actor for other purposes than intended?
  6. What precaution and mitigation measures are taken to protect the data and the individuals related to the data?
  7. How is the data protected against manipulation and access and misuse by third parties?
  8. Do you have sufficient expertise integrated during the entire course of the project to make sure that data are handled well?
  9. If you plan to collect data, what is the purpose of the collection of data? Is data collection necessary to reach this purpose?
  10. How are collectors of personal data trained? How is informed consent generated when data are collected?
  11. If you are creating or using databases, how is anonymity of the individuals related to the data guaranteed?
  12. How is the data that you plan to use obtained and stored? Is the level of protection appropriate to the sensitivity of the data?
  13. Who has access to the data? What measures are taken to guarantee that data are accessed for the intended purpose?
  14. Which other entities – companies, partners – process, analyze, visualize and otherwise use the data in your project? What measures are taken by them to protect the data? Have agreements been made with them to avoid monetization or misuse?
  15. If you build a platform, how are the registered users of your platform protected?
  16. Is the database, the system to store data or the platform auditable to independent research?

Back to top

Case Studies

People Living with HIV Stigma Index and Implementation Brief

The People Living with HIV Stigma Index is a standardized questionnaire and sampling strategy to gather critical data on intersecting stigmas and discrimination affecting people living with HIV. It monitors HIV-related stigma and discrimination in various countries and provides evidence for advocacy in countries. The data in this project are the experiences of people living with HIV. The implementation brief provides insight in data protection measures. People living with HIV are at the center of the entire process, continuously linking the data that is collected about them to the people themselves, starting from research design, through implementation, to  using the findings for advocacy. Data are gathered through a peer-to-peer interview process, with people living with HIV from diverse backgrounds serving as trained interviewers. A standard  implementation  methodology has been developed, including the establishment if a steering committee with key  stakeholders and population groups. 

RNW Media’s Love Matters Program Data Protection

RNW Media’s Love Matters Program offers online platforms to foster discussion and information-sharing on love, sex and relationships to 18-30 year-olds in areas where information on sexual and reproductive health and rights (SRHR) is censored or taboo. RNW Media’s digital teams introduced creative approaches to data processing and analysis, Social Listening methodologies and Natural Language Processing techniques to make the platforms more inclusive, create targeted content and identify influencers and trending topics. Governments have imposed restrictions such as license fees or registrations for online influencers as a way of monitoring and blocking “undesirable” content, and RNW Media has invested in security of its platforms and literacy of the users to protect them from access to their sensitive personal information. Read more in the publication 33 Showcases – Digitalisation and Development – Inspiration from Dutch development cooperation’, Dutch Ministry of Foreign Affairs, 2019, p 12-14. 

The Indigenous Navigator

The Indigenous Navigator is a framework and set of tools for and by indigenous peoples to systematically monitor the level of recognition and implementation of their rights. The data in this project are experiences of indigenous communities and organizations and tools facilitate indigenous communities’ own generation of quality data. One objective of the navigator is that this quality data can be fed into existing human rights and sustainable development monitoring processes at local, national, regional and international levels. The project’s page about privacy shows data protection measures such as the requirement of community consent and how to obtain it and an explanation about how the Indigenous Navigator uses personal data.  

Girl Effect

Girl Effect, a creative non-profit working where girls are marginalized and vulnerable, uses media and mobile tech to empower girls. The organisation embraces digital tools and interventions and acknowledge that any organisation that uses data also has a responsibility to protect the people it talks to or connects online. Their ‘Digital safeguarding tips and guidance’ provides in-depth guidance on implementing data protection measures while working with vulnerable people. Referring to Girl Effect as inspiration, Oxfam has developed and implemented a Responsible Data Policy and shares many supporting resources online. The publication ‘Privacy and data security under GDPR for quantitative impact evaluation’ provides detailed considerations of the data protection measures Oxfam implements while doing quantitative impact evaluation through digital and paper-based surveys and interviews.  

The LAND (Land Administration for National Development) Partnership

The LAND (Land Administration for National Development) Partnership led by Kadaster International aims to design fast and affordable land administration to meet people’s needs. Through the processing and storage of geodata such as GPS, aerial photographs and satellite imagery (determining general boundaries instead of fixed boundaries), a digital spatial framework is established that enables affordable, real-time and participatory registration of land by its owners. Kadaster is aware of the sensitive nature of some of the data in the system that needs to be protected, in view of possible manipulation and privacy violation, and the need to train people in the digital processing of data. Read more in the publication 33 Showcases – Digitalisation and Development – Inspiration from Dutch development cooperation’, Dutch Ministry of Foreign Affairs, 2019, p. 25-26.

Back to top

References

Find below the works cited in this resource.

Additional Resources

Back to top

Categories

Digital IDs

What are digital IDs?

Families displaced by Boko Haram violence in Maiduguri, Northeast Nigeria. Implementation of a digital ID system requires informed consent from participants. Photo credit: USAID.
Families displaced by Boko Haram violence in Maiduguri, Northeast Nigeria. Implementation of a digital ID system requires informed consent from participants. Photo credit: USAID.

Digital IDs are identification systems that rely on digital technology. Biometric technology is one kind of tool often used for digital identification: biometrics allow people to prove their identity based on a physical characteristic or trait (biological data). Other forms of digital identification include cards and mobile technologies. This resource, which draws on the work of The Engine Room, will look at different forms and implications of digital IDs, with a particular focus on biometric IDs, including their integration with health systems and their potential for e-participation.

“Biometrics are not new – photographs have been used in this sector for years, but current discourse around ‘biometrics’ commonly refers to fingerprints, face prints and iris scans. As technology continues to advance, capabilities for capturing other forms of biometric data are also improving, such that voice prints, retinal scans, vein patterns, tongue prints, lip movements, ear patterns, gait, and of course, DNA, can be used for authentication and identification purposes.”

The Engine Room

Definitions

Biometric Data: automatically measurable, distinctive physical characteristics or personal traits used to identify or verify the identity of an individual.

Consent: Article 4(11) of the General Data Protection Regulation (GDPR) defines consent: “Consent of the data subject means any freely given, specific, informed and unambiguous indication of the data subject’s wishes by which he or she, by a statement or by a clear affirmative action, signifies agreement to the processing of personal data relating to him or her.” See also the Data Protection resource .

Data Subject: the individual whose data are collected.

Digital ID:  an electronic identity-management system used to prove an individual’s identity or their right to access information or services.

E-voting: an election system that allows a voter to record their secure and secret ballot electronically.

Foundational Biometric Systems: systems that supply general identification for official uses, like national civil registries and national IDs.

Functional Biometric Systems: systems that respond to a demand for a particular service or transaction, like voter IDs, health records, or financial services.

Identification/One-to-Many Authentication: using the biometric identifier to identify the data subject from within a database of other biometric profiles.

Immutability: the quality of a characteristic that does not change over time (for example, DNA).

Portable Identity: an individual’s digital ID credentials may be taken with them beyond the initial issuing authority, to prove official identity for new user-relationships/entities, without having to repeat verification each time.

Self-Sovereign Identity: a digital ID that gives the data subject full ownership over their digital identity, guaranteeing them lifetime portability, independent from any central authority.

Uniqueness: a characteristic that sufficiently distinguishes individuals from one another. Most forms of biometric data are singularly unique to the individual involved.

Verification/One-to-One Authentication: using the biometric identifier to confirm that the data subject is who they claim to be.

How do digital IDs work?

Young Iraq woman pictured at the Harsham IDP camp in Erbil, Iraq. Digital IDs and biometrics have potential to facilitate the voting process. Photo credit: Jim Huylebroek for Creative Associates International.
Young Iraq woman pictured at the Harsham IDP camp in Erbil, Iraq. Digital IDs and biometrics have potential to facilitate the voting process. Photo credit: Jim Huylebroek for Creative Associates International.

There are three primary categories of technology used for digital identification: biometrics, cards, and mobile. Within each of these areas, a wide range of technologies may be used.

The NIST (National Institute of Standards and Technology, one of the primary international authorities on digital IDs) identifies three parts in how the digital ID process works.

Part 1: Identity proofing and enrollment

This is the process of binding the data on the subject’s identity to an authenticator, which is a tool that is used to prove their identity.

  • With a biometric ID, this involves collecting the data (through an eye scan, fingerprinting, submitting a selfie, etc.), verifying that the person is who they claim to be, and connecting the individual to an identity account (profile).
  • With a non-biometric ID, this involves giving the individual a tool (an authenticator) they can use for authentication, like a password, a barcode, etc.

Part 2: Authentication

This is the process of using the digital ID to prove identity or access services.

Biometric authentication: There are two different types of biometric authentication.

  • Biometric Verification (or One-to One Authentication) confirms that the person is who they say they are. This allows organizations to determine, for example, that a person is entitled to certain food, vaccine or housing.
  • Biometric Identification (or One-to-Many Authentication) is used to identify an individual from within a database of biometric profiles. Organizations may use biometrics for identification to prevent fraudulent enrollments and to “de-duplicate” lists of people. One-to-many authentication systems pose more risks than one-to-one systems because they require a larger amount of data to be stored in one place and because they lead to more false matches. (Read more in the Risks section ).

The chart below synthesizes the advantages and disadvantages of different biometric authentication tools. For further details, see the World Bank’s “Technology Landscape for Digital Identification (2018).”

Biometric ToolAdvantages Disadvantages
FingerprintsLess physically/personally invasive; advanced and relatively affordable method Not fully inclusive: some fingerprints are harder to capture than others
Iris Scan Fast, accurate, inclusive, and secure More expensive technology; verification requires precise positioning of data subject; can be misused for surveillance purposes (verification without data subject’s permission)
Face Recognition Relatively affordable Prone to error; can be misused for surveillance purposes (verification without data subject’s permission); not enough standardization among technology suppliers, which could lead to vendor lock-in
Voice Recognition Relatively affordable; no concerns about hygiene (unlike some other biometrics that involve touch) Collection process can be difficult and time-consuming; technology is difficult to scale
Behavior Recognition, also known as “Soft Biometrics” (i.e., a person’s gait, how they write their signature) Can be used in real time Prone to error; not yet a mature technology; can be misused for surveillance purposes (verification without data subject’s permission)
Vascular Recognition

(A person’s distinct pattern of veins)
Secure, accurate, and inclusive technology More expensive; not yet a mature technology and not yet widely understood; not interoperable/data are not easily portable
DNA Profiling Secure; accurate; inclusive; useful for large populations Collection process is long; technology is expensive; involves extremely sensitive information which can be used to identify race, gender, and family relationships, etc. that could put the individual at risk

Non-biometric authentication: There are two common forms of digital ID that are not based on physical characteristics or traits, which also have authentication methods. Digital ID cards and digital ID applications on mobile devices can also be used to prove identity or to access services or aid (much like a passport, residence card, or drivers’ license).

  • Cards: These are a common digital identifier, which can rely on many kinds of technology, from microchips to barcodes. Cards have been in use for a long time which makes them a mature technology, but they are also less secure because they can be lost or stolen. “Smart cards” exist in the form of an embedded microchip combined with a password. Cards can also be combined with biometric systems. For example, Mastercard and Thales began offering cards with fingerprint sensors in January of 2020.
  • Apps on mobile devices: Digital IDs can be used on mobile devices by relying on a password, a “cryptographic” (specially encoded) SIM card, or a “Smart ID” app. These methods are fairly accurate and scalable, but they have security risks and also risks over the long term due to reliance on technology providers: the technology may not be interoperable or may become outdated (see Privatization of ID and Vendor Lock-In in the Risks section ).

Part 3: Portability and interoperability
Digital IDs are usually generated by a single issuing authority (NGO, government entity, health provider, etc.) for an individual. However, portability means that digital ID systems can be designed to allow the person to use their ID elsewhere than with the issuing authority — for example with another government entity or non-profit organization.

To understand interoperability, consider different email providers, for instance Gmail and Yahoo mail: these are separate service providers, but their users can send emails to one another. Data portability and interoperability are critical from a fundamental rights perspective, but it is first necessary that different networks (providers, governments) be interoperable with one another to allow for portability. Interoperability is increasingly important for providing services within and across countries, as can be seen in the European Union and Schengen community, the East African community, and the West African ECOWAS community.

Self-Sovereign Identity (SSI) is an important, emerging type of digital ID that gives a person full ownership over their digital identity, guaranteeing them lifetime portability, independent from any central authority. The Self-Sovereign Identity model aims to remove the trust issues and power imbalances that generally accompany digital identity, by giving a person full control over their data.

Back to top

How are digital IDs relevant in civic space and for democracy?

People across the world who are not identified by government documents face significant barriers to receiving government services and humanitarian assistance. Biometrics are widely used by donors and development actors to identify individuals and connect them with services. Biometric technology can increase access to finance, healthcare, education, and other critical services and benefits. It can also be used for voter registration and in facilitating civic participation.

Resident of the Garin Wazam site in Niger exchanges her e-voucher with food. Biometric technology can increase access to critical services and benefits. Photo credit: Guimba Souleymane, International Red Cross Niger.

The United Nations High Commissioner for Refugees (UNHCR) began its global Biometric Identity Management System (“BIMS”) in 2015, and the following year the World Food Program began using biometrics for multiple purposes, including refugee protection, cash-based interventions and voter registration. In recent years, a growing preference in aid delivery for cash-based interventions has been part of the push towards digital IDs and biometrics, as these tools can facilitate monitoring and reporting of assistance distribution.

The automated nature of digital IDs brings many new challenges, from gathering meaningful informed consent, to guaranteeing personal security and organization-level security, to potentially harming human dignity and increasing exclusion. These technical and societal issues are detailed in the Risks section .

Ethical Principles for Biometrics

Founded in July 2001 in Australia, the Biometrics Institute is an independent and international membership organization for the biometrics community. In March of 2019, they released seven “Ethical Principles for Biometrics.”

  1. Ethical behaviour: We recognise that our members must act ethically even beyond the requirements of law. Ethical behaviour means avoiding actions which harm people and their environment.
  2. Ownership of the biometric and respect for individuals’ personal data: We accept that individuals have significant but not complete ownership of their personal data (regardless of where the data are stored and processed) especially their biometrics, requiring their personal data, even when shared, to be respected and treated with the utmost care by others.
  3. Serving humans: We hold that technology should serve humans and should take into account the public good, community safety and the net benefits to individuals.
  4. Justice and accountability: We accept the principles of openness, independent oversight, accountability and the right of appeal and appropriate redress.
  5. Promoting privacy-enhancing technology: We promote the highest quality of appropriate technology use including accuracy, error detection and repair, robust systems and quality control.
  6. Recognising dignity and equal rights: We support the recognition of dignity and equal rights for all individuals and families as the foundation of freedom, justice and peace in the world, in line with the United Nations Universal Declaration of Human Rights.
  7. Equality: We promote planning and implementation of technology to prevent discrimination or systemic bias based on religion, age, gender, race, sexuality or other descriptors of humans.

Back to top

Opportunities

Biometric voter registration in Kenya. Collection and storage of biometric data require strong data protection measures. Photo credit: USAID/Kenya Jefrey Karang’ae.

If you are trying to understand the implications of digital IDs in your work environment, or are considering using aspects of digital IDs as part of your DRG programming, ask yourself these questions:
Potential fraud reduction

Biometrics are frequently cited for their potential to reduce fraud and more generally manage financial risk by facilitating due diligence oversight and scrutiny of transactions. According to The Engine Room, these are frequently-cited justifications for the use of biometrics among development and humanitarian actors, but The Engine Room also found a lack of evidence to support this claim. It should not be assumed that fraud only occurs at the beneficiary level: the real problems with fraud may occur elsewhere in an ecosystem.

Facilitate E-Voting

Beyond the distribution of cash and services, the potential of digital IDs and biometrics is to facilitate the voting process. The right to vote, and to participate in democratic processes more broadly, is a fundamental human right. Recently, the use of biometric voter registration and biometric voting systems has become more widespread as a means of empowering civic participation and of securing electoral systems and protecting against voter fraud and multiple enrollments.

Advocates claim that e-voting can reduce costs to participation and make the process more reliable. Meanwhile, critics claim that digital systems are at risk of failure, misuse, and security breach. Electronic ballot manipulation, poorly written code, or any other kind of technical failure could compromise the democratic process, particularly when there is not a back-up paper trail. For more, see “Introducing Biometric Technology in Elections” (2017) by the International Institute for Democracy and Electoral Assistance, which includes detailed case studies on e-voting in Bangladesh, Fiji, Mongolia, Nigeria, Uganda, and Zambia.

Health Records

Securing electronic health records, particularly when care services are provided by multiple actors, can be very complicated, costly, and inefficient. Because biometrics link a unique verifier to a single individual, they are useful for patient identification, allowing doctors and health providers to connect someone to their health information and medical history. Biometrics have potential in vaccine distribution, for example, by being able to identify who has received specific vaccines (see the case study by The New Humanitarian about Gavi technology).

Access to healthcare can be particularly complicated in conflict zones, for migrants and displaced people, or for other groups without their documented health records. With interoperable biometrics, when patients need to transfer from one facility to another for whatever reason, their digital information can travel with them. For more, see the World Bank Group ID4D, “The Role of Digital Identification for Healthcare: The Emerging Use Cases” (2018).

Increased access to cash-based interventions

Digital ID systems have the potential to include the unbanked or those underserved by financial institutions in the local or even global economy. Digital IDs grant people access to regulated financial services by enabling them to prove their official identity. Populations in remote areas can benefit especially from digital IDs that permit remote, or non-face-to-face, identity proofing/enrollment for customer identification/verification. Biometrics can also make accessing banking services much more efficient, reducing the requirements and hurdles that beneficiaries would normally face. The WFP provides an example of a successful cash-based intervention: in 2017, it launched its first cash-based assistance for secondary school girls in northwestern Pakistan using biometric attendance data.

According to the Financial Action Task Force, by bringing more people into the regulated financial sector, biometrics further reinforce financial safeguards.

Improved distribution of aid and social benefits

Biometric systems can reduce much of the administrative time and human effort behind aid assistance, liberating human resources to devote to service delivery. Biometrics permit aid delivery to be tracked in real time, which allows governments and aid organizations to respond quickly to beneficiary problems.

Biometrics can also reduce redundancies in social-benefit and grant delivery. For instance, in 2015, the World Bank Group found that biometric digital IDs in Botswana achieved a 25 percent savings in pensions and social grants by identifying duplicated records and deceased beneficiaries. Indeed, the issue of “ghost” beneficiaries is a common problem. In 2019, the Namibian Government Institutions Pension Fund (GIPF) began requiring pension recipients to register their biometrics at their nearest GIPF office and return to verify their identity three times a year. Of course, social-benefit distribution can be aided by biometrics, but it also requires human oversight, given the possibility of glitches in digital service delivery and the critical nature of these services (see more in the Risks section ).

Proof of identity

Migrants, refugees, and asylum seekers often struggle to prove and maintain their identity when they relocate. Many lose the proof of their legal identities and assets — for example, degrees and certifications, health records, financial assets — when they flee their homes. Responsibly-designed biometrics can help these populations reestablish and maintain proof of identity. For example in Finland, a blockchain startup called MONI has been working since 2015 with the Finnish Immigration Service to provide refugees in the country with a prepaid credit card backed by a digital identity number stored on a blockchain . The design of these technologies is critical: data should be distributed rather than centralized to prevent security risks and misuse or abuse that come with centralized ownership of sensitive information.

Back to top

Risks

The use of emerging technologies can also create risks in civil society programming. Read below on how to discern the possible risks associated with use of digital ID tools in DRG work.
Dehumanization of beneficiaries

The way that biometrics are regarded — bestowing an identity on someone as if they did not have an identity previously — can be seen as problematic and even dehumanizing.

As The Engine Room explains, “the discourse around the ‘identifiability’ benefits of biometrics in humanitarian interventions often tends to conflate the role that biometrics play. Aid agencies cannot ‘give’ a beneficiary an identity, they can only record identifying features and check those against other records. Treating the acquisition of biometric data as constitutive of identity risks dehumanising beneficiaries, most of whom are already disempowered in their relationship with humanitarian entities upon whom they rely for survival. This attitude is evident in the remarks of one Burmese refugee undergoing fingerprint registration in Malaysia in 2006 — ‘I don’t know what it is for, but I do what UNHCR wants me to do’ — and of a Congolese refugee in Malawi, who upon completing biometric registration told staff, ‘I can be someone now.’”

Lack of informed consent

It is critical to obtain the informed consent of individuals in the process of biometric enrollment. But it’s rarely the case in humanitarian and development settings, given the many confusing technical aspects of the technology, language and cultural barriers, etc. An agreement that is potentially coerced, as illustrated by the case of the biometric registration program in Kenya, which was challenged in court after many Kenyans felt pressured into it, does not constitute consent. It is difficult to guarantee and even to evaluate consent when the power imbalance between the issuing authority and the data subject is so great. “Refugees, for instance, could feel they have no choice but to provide their information, because they are in a vulnerable situation.”

Minors also face a similar risk of coerced or uninformed consent. As the Engine Room pointed out in 2016, “UNHCR has adopted the approach that refusal to submit to biometric registration amounts to refusal to submit to registration at all. If this is true, this constrains beneficiaries’ right to contest the taking of biometric data and creates a considerable disincentive to beneficiaries voicing opposition to the biometric approach.”

For consent to be given truly, the individual must have an alternative method available to them so they feel they can refuse the procedure without being disproportionately penalized. Civil society organizations could play an important role in helping to remedy this power imbalance.

Security risks

Digital ID systems provide many important security features, but they increase other security risks, like the risk of data leakage, data corruption or data use/misuse by unauthorized actors. Digital ID systems can involve very detailed data about the behaviors and movements of vulnerable individuals, for example, their financial histories and their attendance at schools, health clinics, and religious establishments. This information could be used against them, if in the hands of other actors (corrupt governments, marketers, criminals).

The loss, theft or misuse of biometric data is one of the greatest risks for organizations deploying these technologies. By collecting and storing their biometric data in centralized databases, aid organizations could be putting their beneficiaries at serious risk, particularly if their beneficiaries are people fleeing persecution or conflict. In general, because digital IDs rely on the Internet or other open communications networks, there are multiple opportunities for cyberattacks and security breaches. The Engine Room also cites anecdotal accounts of humanitarian workers losing laptops, USB keys and other digital files containing beneficiary data. See also the Data Protection resource .

Data Reuse and Misuse

Because biometrics are unique and immutable, once biometric data are out in the world, people are no longer the only owners of their identifier. The Engine Room describes this as the “non-revocability” of biometrics. This means that biometrics could be used for other purposes than those originally intended. For instance, governments could require humanitarian actors to give them access to biometric databases for political purposes, or foreign countries could obtain biometric data for intelligence purposes. People cannot easily change their biometrics as they would a driver’s license or even their name: for instance, with facial recognition, they would need to undergo plastic surgery in order to remove their biometric data.

There is also the risk that biometrics will be put to use in future technologies that may be more intrusive or harmful than current usages. “Governments playing hosts to large refugee populations, such as Lebanon, have claimed a right to access to UNHCR’s biometric database, and donor States have supported UNHCR’s use of biometrics out of their own interest in using the biometric data acquired as part of the so-called ongoing “war on terror”

The Engine Room

For more on the potential reuse of biometric data for surveillance purposes, see also “Aiding surveillance: An exploration of how development and humanitarian aid initiatives are enabling surveillance in developing countries,” I&N Working Paper (2014).

Malfunctions and inaccuracies

Because they are so technical and rely on multiple steps and mechanisms, digital ID systems can experience many errors. Biometrics can return false matches, linking someone to the incorrect identity, or false negatives, failing to link someone to their actual identity. Technology does not always function as it does in the laboratory setting when it is deployed within real communities. Furthermore, some populations are at the receiving end of more errors than others: for instance, as has been widely proven, people of color are more often incorrectly identified by facial recognition technology.

Some technologies are more error prone than others, for example, soft biometrics like a person’s gait are less mature and accurate technologies than iris scans. Even fingerprints, though relatively mature and widely used, still have a high error rate. The performance of some biometrics can also diminish over time: aging can change a person’s facial features and even their irises in a way that can impede biometric authentication. Digital IDs can also suffer from connectivity issues: lack of reliable infrastructure can reduce the system’s functioning in a particular geographic area for a significant period of time. To mitigate this, it is important that digital ID systems be designed to support both offline and online transactions.

When it comes to providing life-saving aid services, even a small mistake or malfunction during a single step in the process can cause severe harm. Unlike manual processes where humans are involved and can intervene in the case of error, automated processes bring the possibility that no one will notice a seemingly small technicality until it is too late.

Exclusionary potential

Biometrics may exclude individuals for several reasons, according to The Engine Room: “Individuals may be reluctant to submit to providing biometric samples because of cultural, gender or power imbalances. Acquiring biometric samples can be more difficult for persons of darker skin color or persons with disabilities. Fingerprinting, in particular, can be difficult to undertake correctly, particularly when beneficiaries’ fingerprints are less pronounced due to manual and rural labor. All of these aspects may inhibit individuals’ provision of biometric data and thus exclude them from the provision of assistance.”

The kinds of errors mentioned in the section above are more frequent with respect to minority populations who tend to be underrepresented in training data sets, for example, people of color, persons with disabilities.

Lack of access to technology or lower levels of technology literacy can compound exclusion: for example, lack of access to smartphones or lack of cellphone data or coverage may increase exclusion in the case of smartphone-reliant ID systems. As mentioned, manual laborers’ typically have worn fingerprints which can be difficult when using biometric readers; similarly, the elderly may experience match failure due to changes in their facial characteristics like hair loss or other signs of aging or illness — all increasing risk of exclusion.

The World Bank ID4D program explains that they often note differential rates in coverage for the following groups and their intersections: women and girls; orphans and vulnerable children; poor people; rural dwellers; ethnolinguistic minorities; migrants and refugees; stateless populations or populations at risk of statelessness; older people; persons with disabilities; non-nationals. It bears emphasizing that these groups tend to be the most vulnerable populations in society — precisely those that biometric technology and digital IDs aim to include and empower. When considering which kind of ID or biometric technology to deploy, it is critical to assess all of these types of potential errors in relation to the population, and in particular how to mitigate against the exclusion of certain groups.

Insufficient regulation

“Technology is moving so fast that laws and regulations are struggling to keep up… Without clear international legislation, businesses in the biometrics world are often faced with the dilemma, ‘Just because we can, should we?’”

Isabelle Moeller, Chief Executive of the Biometrics Institute

Digital identification technologies exist in a continually evolving regulatory environment, which presents challenges to providers and beneficiaries alike. There are many efforts to create international standards for biometrics and digital IDs — for example, by the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC). But beyond the GDPR, there is not yet sufficient international regulation to enforce these standards in many of the countries where they are being implemented.

Privatization of ID and Vendor Lock-In

The technology behind digital identities and biometrics is almost always provided by private-sector actors, often in partnership with governments and international organizations and institutions. The major role played by the private sector in the creation and maintenance of digital IDs can put both the beneficiaries and aid organizations and governments at risk of vendor lock-in: if the cost of switching to a new service provider is too expensive or onerous, the organization/actor may be forced to stay with their original supplier. Overreliance on a private-sector supplier can also bring security risks (for instance, when the original supplier’s technology is insecure) and can pose challenges to partnering with other services and providers when the technology is not interoperable. For these reasons it is important for technology to be interoperable and to be designed with open standards.

IBM’s Facial Recognition Ban
In June of 2020, IBM decided to withdraw its facial-recognition technology from use by law enforcement in the U.S. These one-off decisions by private actors should not replace legal judgments and regulations. Debbie Reynolds, data privacy officer for Women in Identity, believes that facial recognition will not soon disappear, and so, considering the many flaws in the technology today, companies should focus on further improving the technology rather than on banning it. International regulation and enforcement are necessary first and foremost, as this will provide private actors with guidelines and incentives to design responsible, rights respecting technology over the long term.

Back to top

Questions

If you are considering using digital ID tools as part of your programming, ask yourself these questions to understand the possible implications for your work and for your community and partners.

  1. Has the beneficiary given their informed consent? How were you able to check their understanding? Was consent coerced in any way, perhaps due to a power dynamic or lack of alternative option?
  2. How does the community feel about the technology? Does the technology fit with cultural norms and uphold human dignity?
  3. How affordable is the technology for all stakeholders, including the data subjects?
  4. How mature is the technology? How long has the technology been in use, where, and with what results? How well is it understood by all stakeholders?
  5. Is the technology accredited? When and by whom? Is the technology based on widely accepted standards? Are these standards open?
  6. How interoperable is the technology with the other technologies in the identity ecosystem?
  7. How well does the technology perform? How long does it take to collect the data, to validate identity, etc. What is the error rate?
  8. How resilient is the digital system? Can it operate without internet access or without reliable internet access?
  9. How easy is the technology to scale and use with larger or other populations?
  10. How secure and accurate is the technology? Have all security risks been addressed? What methods to you have in terms of back-up (for example, a paper trail for electronic voting)
  11. Is the collection of biometric data proportional regarding the task at hand? Are you collecting the minimal amount of data necessary to achieve your goal?
  12. Where are all data being stored? What other parties might have access to this information? How are the data protected?
  13. Are any of the people who would receive biometric or digital IDs part of a vulnerable group? If digitally recording their identity could put them at risk, how could you mitigate against this? (for instance, avoiding a centralized data base, minimizing the amount of data collected, taking cybersecurity precautions, etc.).
  14. What power does the beneficiary have over their data? Can they transfer their data elsewhere? Can they request that their data be erased, and can the data in fact be erased?
  15. If you are using digital IDs or biometrics to automate the fulfillment of fundamental rights or the delivery of critical services, is there sufficient human oversight?
  16. Who is technological error most likely to exclude or harm? How will you address this potential harm or exclusion?

Back to top

Case studies

Aadhaar, India, the world’s largest national biometric system

Aadhaar is India’s national biometric ID program, and the largest in the world. It is an essential case study for understanding the potential benefits and risks of such a system. Aadhaar is controversial. Many have attributed hunger-related deaths to failures in the Aadhaar system, which does not have sufficient human oversight to intervene when the technology malfunctions and prevents individuals from accessing their benefits. However, in 2018, the Indian Supreme Court  upheld the legality of the system, saying it does not violate Indians’ right to privacy and could therefore remain in operation. “Aadhaar gives dignity to the marginalized,” the judges asserted, and “Dignity to the marginalized outweighs privacy.”
WFP Iris Scan Technology in Zaatari Refugee Camp
In 2016, the World Food Program introduced biometric technology to the Zataari Refugee camp in Jordan. “WFP’s system relies on UNHCR biometric registration data of refugees. The system is powered by IrisGuard, the company that developed the iris scan platform, Jordan Ahli Bank and its counterpart Middle East Payment Services. Once a shopper has their iris scanned, the system automatically communicates with UNHCR’s registration database to confirm the identity of the refugee, checks the account balance with Jordan Ahli Bank and Middle East Payment Services and then confirms the purchase and prints out a receipt – all within seconds.” As of 2019, the program, which relies in part on blockchain technology, was supporting more than 100,000 refugees.
Kenya’s Huduma Namba
In January 2020, the New York Times reported that Kenya’s Digital IDs may exclude millions of minorities. In February, the Kenyan ID Huduma Namba was suspended by a High Court ruling, halting the $60 million Huduma Namba scheme until adequate data protection policies are implemented. The panel of three judges ruled in a 500-page report that the National Integrated Identification Management System (NIIMS) scheme is constitutional, reports The Standard, but current laws are insufficient to guarantee data protection. […] Months after biometric capture began, the government passed its first data protection legislation in late November 2019, after the government tried to downgrade the role of data protection commissioner to a ‘semi-independent’ data protection agency with a chairperson appointed by the president. The data protection measures have yet to be implemented. The case was brought by civil rights groups including the Nubian Rights Forum and Kenya National Commission on Human Rights (KNCHR), citing data protection and privacy issues, that the way in which data protection legislation was handled in parliament prevented public participation, and how the NIIMs scheme is proving ethnically divisive in the country, particularly in border areas.”
E-voting terminated in Kazakhstan
A study published in May 2020 on the discontinuation of e-voting in Kazakhstan highlights some of the political challenges around e-voting. Kazakhstan used e-voting between 2004 and 2011 and was considered a leading example. See Kazakhstan: Voter registration Case Study (2006) produced by the Ace Project Electoral Knowledge Network. However, the country returned to a traditional paper ballot due to lack of confidence from citizens and civil society in the government’s ability to ensure the integrity of e-voting procedures. See Politicization of e-voting rejection: reflections from Kazakhstan, by Maxat Kassen. It is important to note that Kazakhstan did not employ biometric voting, but rather electronic voting machines that operated via touch screens. 
Biometrics for child vaccination
As explored in The New Humanitarian, 2019: “A trial project is being launched with the underlying betting that biometric identification is the best way to help boost vaccination rates, linking children with their medical records. Thousands of children between the ages of one and five are due to be fingerprinted in Bangladesh and Tanzania in the largest biometric scheme of its kind ever attempted, the Geneva-based vaccine agency, Gavi, announced recently. Although the scheme includes data protection safeguards – and its sponsors are cautious not to promise immediate benefits – it is emerging during a widening debate on data protection, technology ethics, and the risks and benefits of biometric ID in development and humanitarian aid.”
Financial Action Task Force Case Studies
See also the case studies assembled by the Financial Action Task Force (FATF), the intergovernmental organization focused on combating terrorist financing. They released a comprehensive resource on Digital Identity in 2020, which includes brief case studies.

Back to top

References

Find below the works cited in this resource.

This primer draws from the work of The Engine Room, and the resource they produced in collaboration with Oxfam on Biometrics in the Humanitarian Sector, published in March 2018.

Back to top

Categories

Digital Development in the time of COVID-19