Mindful Data Scientists: Create a Bias Free Model

Mindful Data Scientists: Create a Bias Free Model
Photo by Iryna Drozd on canva

Originally published here

Be aware of cognitive biases

Data scientists can be vulnerable to inherent flaws in the models they build if they do not carefully consider cognitive biases. These biases may lead to mistaken conclusions about causation and linkages between variables, which can result in incorrect predictions and judgments. It’s important for any data scientist to be aware of cognitive biases so you can make sure your model is as unbiased as possible!

Cognitive biases can stem from a variety of causes, but mental shortcuts may play an important role. While this is occasionally surprisingly correct, it could lead to incorrect ideas. Cognitive flexibility can decrease cognitive biases as we become older. Age and cognitive flexibility (or a lack of) in the human mind are other reasons for these biases. Cognitive biases are often more likely to occur as someone has to think about every possible option while making a decision.

Let’s go through a number of them.

1. Confirmation bias

Confirmation bias is a psychological phenomenon in which people seek out, interpret, and remember information that confirms their preconceived ideas and attitudes. Confirmation bias is a data science problem because data scientists must be willing to test assumptions with data in order to find patterns or truth within the data they are analyzing. It could lead data scientists to confirm data samples, create data models that are biased towards a certain set of beliefs or values, and interpret data inaccurately. This data bias can lead data scientists to invalid conclusions that could be potentially dangerous.

A data scientist should always remain unbiased when creating data models and it’s important to seek out contradictory data in order to test assumptions, avoid confirmation biases within the data science process, and create useful data products for consumers of their data.

To reduce confirmation bias during the data science process data scientists must create data models that focus on the data itself and not their personal beliefs. It’s important to remain unbiased in order to produce data products that provide value without causing harm or bias towards a certain set of values, beliefs, and opinions.

Being aware of confirmation bias and other cognitive biases can help data scientists overcome these behavioral challenges. You can still fight confirmation bias by being open-minded and ready to examine things from a new angle than you are accustomed to. This inclination is unconscious, but training your mind to be more flexible in its thinking patterns can aid in the mitigation of confirmation bias’s effects.

2. Availability bias

Availability bias (also referred to as the availability heuristic) refers to the inclination to believe that examples of things that spring readily to mind are more prevalent than they are. It’s not the result of faulty memory, but rather an exaggeration of one.

If you can immediately imagine several facts supporting a judgment, you can probably think the judgment is indeed correct. What should you consider if you see stories and headlines about alcohol and bar fights? When information is available on a constant basis your memory is stronger. According to research, information that may be readily found in memory appears to be more reliable than it appears.

As data scientists, we strive to build unbiased models which are data-driven and data-supported. Awareness of availability bias is a key element in building data models especially not ignoring potentially relevant data or features.

3. Anchoring bias

Similar to availability bias, anchoring bias has to do with over-emphasizing one piece of information and potentially ignoring relevant information. Anchoring bias is the propensity to rely excessively on the first information obtained when making a decision. What you learn early in research can have a greater impact on your hypothesis and modeling.

Once anchor data is obtained, it becomes the basis for data analysis and interpretation. This would typically lead to the data model being biased, in the sense that they are over-emphasizing data points that may not be relevant to the problem statement.

The anchoring bias occurs when data scientists put too much weight on one data point and fail to consider other data points, leading them astray or into making incorrect decisions. Anchoring bias can occur in a variety of ways that result from over-weighting an initial data point in the data analysis.

The anchoring bias can impact data scientists by making them prone to certain types of errors when analyzing data sets for particular outcomes. This is because data science involves statistical thinking and reasoning about uncertainty. The data scientists should always ask the question: what could we be missing?

To address potential anchoring bias, data scientists must form data acquisition strategies that are not solely reliant on one data point or data source. Data scientists should instead be asking the question: what would I miss if I only look at this data set? Further, don’t forget to question the applicability of the first dataset.

4. Self-serving bias

The self-serving bias is a psychological concept that refers to the inclination of people to take personal credit for good results and blame external circumstances for unfavorable ones. Our minds prefer to reaffirm our beliefs and we feel good when we are right. It feels safe for us to know that what we believe is true. Basically that way, people will like us and not get mad at us.

You have seen this at work. When your model succeeds, some of us quickly take credit for it, attributing the success to our hard work and dedication, but pointing to external factors such as a shift in the external environment or unanticipated events when it fails.

We can’t help it. We want to be right and get recognition for our work while we want to avoid being responsible when things don’t go the way that we wanted. This leads to not being open to data that doesn’t support our existing ideas. We only seek out data to confirm what we already believe and ignore data or results that challenge us. Additionally, we typically become defensive when others are trying to give us valuable feedback.

Cognitive biases are just one of the many reasons why data scientists should be vigilant in their work. It is important to take care when building models so that they do not become biased with errors or faulty conclusions, which can have detrimental effects on your business and its success!

Are you able to think of other common cognitive biases I didn’t mention? Leave your comments below and let me know!