Discussions about biased algorithms, data handling and inclusive technologies have taken over news, podcasts, events and online conferences around the world.
One of the reasons we have this problematic situation is called Data Desert.
What is Data Desert?
There’s no much sense in having technology, committed professionals and resources to create innovative artificial intelligence if we don’t have data to train them!
The data desert is exactly that, the lack of reliable and complete data on a given subject. Generally, those who are left out are the most vulnerable populations and minorities. There may also be a lack of information in certain locations, such as more peripheral neighborhoods.
The existence of data deserts means that there is data not being counted. Stories that aren’t told. Realities that aren’t portrayed.
Data deserts also enables incorrect and discriminatory interpretations about groups or subjects that have very little or no information.
This gap in data generates conclusions based on the generalization of groups and has very serious social consequences. Decisions are made without considering these groups or considering them incorrectly!
Working with incorrect and incomplete data has consequences as uneven advances in the treatment of diseases, public policies that do not understand or solve the root of the problems, investments in less priority areas and lead to accident falacy.
Underreporting, incomplete and non-standardized records can impact the identification of situations of vulnerability by preventing the real picture of being seen, making it even more difficult to find reliable solutions and seek correlations with other social indicators.
Why data deserts happen?
Difficult access to digital and communication services by certain groups and locations
Without access to services and systems that collect and store data, these groups cannot collaborate with data about themselves. The disparity in internet access and technology, and the level of digital literacy must always be considered to avoid data deserts.
It is a must to think of different solutions for different realities. More than that, solutions that engage these groups! We need the right tool and mobilize people for its use, with an accessible language that fits each group culture. We cannot allow these groups to be forgotten and have even a harder time developing themselves.
An interesting example is Cocôzap. Cocôzap partned with 2 mobilizers who live in Complexo da Maré and, based on communications through Whatsapp, gathers information on access to basic sanitation, open sewage and garbage accumulation.
The mobilizers engage local leaders and residents, asking them to collect and send data in a simple way, through text message, audio or video, or on the project website. Now, residents are more participative and aware of the conditions of the place where they live. This is empowerment!
Data Aggregation
Whether for convenience, unfamiliarity or indeed due to lack of data, much information is stored at levels of aggregation/granularity that are incompatible with the solution to which they will contribute. This can create generalization of groups and discrimination, leading to data deserts.
For example, in disaster prevention, there are some ways to predict events such as landslides and floods by analyzing previous events. However, each location has its physical and geological characteristics, so the prediction for one part of the city cannot be repeated in exactly the same way for others.
What if we imagined an artificial intelligence that would define the assistance program to offer to a person? Let’s assume 2 people: women, poor, in the same age group and similar family structure, living in the same country and sharing the same values. We can imagine that the same assistance would be offered to both women, right? But, if we disaggregate a little more the data of where these women live, we realize that one of them lives in a rural area, with a low literacy rate and predominantly agricultural, while the other inhabits a technological and industrial region, with high rates of employability… OK, let’s say that the assistance they need is very different indeed …
We can also remember, both for social and business problems, that we are often divided only by the gender “man” or “woman”, without taking into account other factors that may influence our models, such as color, age, background and other preferences.
This type of data collection generally consumes more time and resources but is able to reduce the presence of data deserts.
How to avoid data deserts?
Humanize data
We have to remember that behind each row there is a person with name, gender, color, dreams, wishes and ideas; or a place with its own characteristics, unique interactions between its population and specific needs. We need to humanize our analysis and remember that the machine does not have our ability to interpret and does not know how to deal with new information without having been trained.
Awareness
Particularly in public sectors, there is a need to increase efforts to collect data from that is normally unrepresented population. Ensuring the representation of the entire population or location is essential to avoid data deserts.
Report limitations
When publishing a database, make it clear what are the limitations in data collection and processing, as well as problems in representativeness. Indicate unrepresented or underrepresented groups.
Get to know underrepresented groups
Meet the groups that are part of the data desert. Understand their culture, their desires, their fears, their suspicions. Talk to community leaders and find out how to engage the group and insert technology into their daily lives, so that technology is a solution and not an agent of segregation.
Identify problems and evaluate solutions
Understand where there is a lack of information and which groups are inserted in it. Besides, identify what is currently being done that does not meet the needs of these groups. Study the current data collection methodologies and find out what is not working. Research solutions implemented in other locations that worked and adapt them for your case to reduce data deserts.
References