Preventing Ethnic Group (Race) and Gender Bias in Algorithms and Models

Machine Learning - ML | Deep Learning - DL | Artificial Neural Networks - ANNs | Artificial Intelligence - AI

These are the steps we take to avoid Ethnic Group (race) and Gender Bias in our Algorithms and Models:

  • Algorithm and Model Training Data | We carefully evaluate where the data that we use to train our algorithms and models comes from.
  • Algorithm and Model Training Data Collection | We carefully evaluate the methods used to collect the data used to train our algorithms and models.
  • Algorithm and Model Training Data Collection Criteria | We carefully evaluate the criteria that was established/set to collect the data we use to train our algorithms and models.
  • Remove Race and Gender Bias from Algorithm and Model Training Data | We take the time to filter out all data that contains Ethnic Group (race) and Gender Bias before using it to train our models and algorithms.
  • Ethical Parameters are Backed Into Our Algorithms and Models during the Design and Development Phase | We ensure that all of our algorithms and models are constructed/developed without an Ethnic Group (race) and Gender Bias in them.
  • Human Rights and Ethics are Backed Into our Algorithms and Models | We establish ethical parameters within all of our algorithms and models to ensure that they behave in an ethical way, that supports Human Rights, Civil Rights, Equal Rights, and Disability Rights.
  • Algorithm and Model Ethical Testing Phase | Algorithms and Models must pass a range of Ethical, Human Rights, Civil Rights, Equal Rights, and Disability Rights test before being released into full production.
  • Algorithm and Model Performance Evaluation Phase | Algorithms and Models must past a performance evaluation phase where they are tested for their ability to handle Ethnic Group (race) and Gender Bias as they encounter them under diverse set of circumstances.
All policies, even the most trivial, are either racist or anti-racist, — they support equity, or they don't

Editorial | Volume 1, ISSUE 8, Pe375, December 01, 2019

There is no such thing as race in health-care algorithms

The Lancet Digital Health

Open Access Published: December, 2019


Article Information: “All policies, even the most trivial, are either racist or anti-racist, — they support equity, or they don't”

On Oct 24, 2019, a study published in Science showed how bias is scaled up and compounded by algorithms. Software sold by Optum, a leading health services company, used to identify high-risk patients for complex health needs, has been unintentionally but systemically discriminating against black people. The researchers found that the algorithm assigned consistently lower risk scores to black patients despite being equally as sick as their white counterparts. Hospitals and insurers use this algorithm, and others like it to help manage care for over 200 million people in the USA each year.

Obermeyer and colleagues found that the algorithm assigned risk scores to patients based on total health-care costs accrued in 1 year. Given that higher health-care costs are associated with greater needs, this assumption seemed reasonable. However, the data showed that the health care provided to black people cost an average of US$1 800 less per year than the care given to a white person with the same number of chronic health problems. This finding meant that the algorithm failed to identify less than half the number of black patients at risk of complicated medical needs as white people. Obermeyer and colleagues are quick to comment that the development of the racist algorithm might not have been driven by hateful ideology, but the result was that black patients were less likely to receive targeted interventions to improve their health.

The algorithm excluded race in their data, therefore the developers considered their algorithm to be “race-blind”. However, they did not account for embedded prejudice and that projected cost as a risk factor discriminates against black people. This study was conducted on one health-care system in the USA, using only patients with commercial insurance or Medicare. However, imbalances embedded in society, such as poverty, geography, and education, all differentially affect black patients, which is noted in the major historic role Medicare plays in ensuring health-care coverage to black people. A recent survey highlighted that a third of very ill Medicare beneficiaries had trouble paying for prescriptions drugs and medical bills, and 40% of these patients said that they had exhausted all their savings, while about 30% report being contacted by a collection's agency. Deep inequalities in institutional infrastructures are major drivers of bias in data used to train algorithms and Obermeyer and colleagues suggested similar biases exist across several industries globally.

To ensure that systemic racism is not amplified in our health-care systems by rapidly progressing artificial intelligence (AI), forward-thinking policies are desperately needed. The NHSx, a UK health department focused on digital transformation, published a report last month with an ambitious title “Artificial intelligence: how to get it right”, which highlighted policies to ensure AI is developed within frameworks that “not only cover the intentions and responsibilities of different people involved in developing, deploying or using AI, but also the impacts that AI has on individuals, groups, systems, or whole populations.” The report identifies the need for algorithms to be developed with a deeper understanding of the wider cultural context.

However, in the 107-page NHSx report, there is little guidance on the use of proprietary algorithms. Obermeyer and colleagues' study is unique as the researchers were allowed to access the algorithm and modify the model, which lead to an 84% reduction of bias. This level of transparency, sharing, and accountability is currently not common practice. Regulators, such as NHSx, must mandate open and transparent evaluation of algorithms and data sharing for rigorous validation to ensure safe real-world clinical use in patients. Health-care software developers must test and validate algorithms to ensure that they do not reinforce racial biases and, like doctors, their guiding principle must be to do no harm. Emphasis must be placed on the importance of diversifying the AI workforce in health-care, to ensure that race as a proxy for bias, disadvantage, and poor treatment is better understood.

A 2016 NEJM editorial stated that the term racism is rarely used in medical literature and called for a shift in clinical and research focus from race to racism. In the application of AI there is certainly no such thing as race in health-care, only racism, and this must be overcome to safeguard the future equity of our health-care system.



Publication History

Published: December 2019



Copyright © 2019 The Author(s). Published by Elsevier Ltd.

User License: Creative Commons Attribution (CC BY 4.0)

ScienceDirect | Access this article on ScienceDirect

Algorithms and Models

A short-list of some of the algorithms and models we use at HubBucket

  • Linear Discriminant Analysis
  • Classification and Regression Trees
  • K-Nearest Neighbors - kNN
  • Learning Vector Quantization - LVQ
  • Self-Organizing Map - SOM
  • Locally Weighted Learning - LWL
  • Support Vector Machine - SVM
  • Ridge Regression
  • Least Absolute Shrinkage and Selection Operator - LASSO
  • Elastic Net
  • Least-Angle Regression - LARS
  • Classification and Regression Tree - CART
  • Iterative Dichotomiser 3 - ID3
  • C4.5 and C5.0 algorithms
  • Chi-squared Automatic Interaction Detection - CHAID
  • Decision Stump
  • M5 algorithm
  • Conditional Decision Trees
  • Naive Bayes
  • Naive Bayes Classifier
  • Gaussian Naive Bayes
  • Multinomial Naive Bayes
  • Averaged One-Dependence Estimators - AODE
  • Bayesian Belief Network - BBN
  • Bayesian Network - BN
  • k-Means
  • k-Medians
  • K-Means Clustering
  • Expectation Maximisation - EM
  • Hierarchical Clustering
  • Apriori algorithm
  • Eclat algorithm
  • Perceptron
  • Multilayer Perceptrons - MLP
  • Back-Propagation
  • Stochastic Gradient Descent
  • Hopfield Network
  • Radial Basis Function Network - RBFN
  • Convolutional Neural Networks - CNN / CovNets
  • Recurrent Neural Networks - RNNs
  • Long Short-Term Memory Network - LSTM
  • Stacked Auto-Encoders
  • Deep Boltzmann Machine - DBM
  • Deep Belief Networks - DBN
  • Principal Component Analysis - PCA
  • Principal Component Regression - PCR
  • Partial Least Squares Regression - PLSR
  • Sammon Mapping
  • Multidimensional Scaling - MDS
  • Projection Pursuit
  • Linear Discriminant Analysis - LDA
  • Mixture Discriminant Analysis - MDA
  • Quadratic Discriminant Analysis - QDA
  • Flexible Discriminant Analysis - FDA
  • Boosting algorithm
  • Bootstrapped Aggregation - Bagging
  • AdaBoost algorithm
  • Weighted Average - Blending
  • Stacked Generalization - Stacking
  • Gradient Boosting Machines - GBM
  • Gradient Boosted Regression Trees - GBRT
  • Random Forest algorithm
  • Feature Selection Algorithms
  • Algorithm Accuracy Evaluation
  • Performance Measures
  • Optimization algorithms
  • Evolutionary algorithms
  • Genetic algorithms
  • Genetic Programming
  • Evolutionary Programming
  • Gene Expression Programming - GEP
  • Evolution Strategy
  • Differential Evolution
  • Neuroevolution algorithm
  • Learning Classifier System
  • Artificial Bee Colony algorithm
  • Cuckoo Search algorithm
  • Particle Swarm Optimization
  • Hunting Search algorithm
  • Adaptive Dimensional Search
  • Firefly algorithm
  • Harmony Search algorithm
  • Gaussian Adaption algorithm
  • Memtic algorithm
  • Monte Carlo algorithm
  • Q-Learning algorithm
  • SARSA algorithm
  • Q-Learning - Lambda algorithm
  • SARSA - Lambda algorithm
  • DQN algorithm - Deep Q Network
  • DDPG algorithm
  • A3C algorithm
  • NAF algorithm
  • TRPO algorithm
  • PPO algorithm
  • TD3 algorithm
  • SAC algorithm
  • Natural Language Processing - NLP / NLProc algorithms
  • Natural Language Understanding - NLU algorithms
  • Neural Machine Translation - NMT / MT algorithms
  • Computer Vision algorithms
  • Machine Vision algorithms
  • Recommender Systems algorithms
  • Reinforcement Learning algorithms
  • Graphical Models algorithms
  • Random Forest algorithms
  • Decision Tree algorithms
  • Ordinary Least Square Regression algorithm
  • Liner Regression algorithm
  • Logistic Regression algorithm
  • Stepwise Regression algorithm
  • Multivariate Adaptive Regression Splines - MARS
  • Locally Estimated Scatterplot Smoothing - LOESS
  • Ensemble Methods
  • Apriori algorithm
  • Principal Component Analysis
  • Singular Value Decomposition
  • Reinforcement / Semi-Supervised Machine Learning algorithm
  • Independent Component Analysis