Open data for the social good

A rapidly increasing number of businesses are coming to an understanding about the importance of a company’s wider role in society and in having corporate social responsibility (CSR) programmes to support their growth.

As such, many have been investing in programmes to help overcome the systemic issues surrounding social injustice and economic inequity – and this has grown in importance through increased awareness due to political and social movements such as Black Lives Matter and the COVID-19 pandemic.

Today, data is at the heart of everything, from health research and science, marketing and advertising to political policy-making and business objectives. What was once a concept for free knowledge sharing has now exploded into a community driving innovation globally.

However, given the amount of data generated each day, it’s often challenging to know how to best extract value from all the information. This is why we need new roles, such as data analysts and data scientists, to better inform decision making. Groups of data specialists may suffer from hive mind but if we can bring together diverse minds from around the world, we can come up with new creative applications of data. And this worldwide community has been progressively focusing on data for good.

Undeniably, data helps paint a much clearer picture of the world, and being able to leverage it can make a real difference when it comes to eradicating global problems, from social and health to environmental challenges.

Overcoming COVID-19 is one of the world’s toughest problems today. Everyone can and should contribute to solving one of the biggest challenges in human history with the basics – social distancing and washing hands. However, battling COVID-19 doesn’t only involve protecting ourselves from the virus, we also need to tackle issues around drug discovery, vaccines and testing, as well as patient care and resource allocation. In order to make important and well-informed decisions it is fundamental to understand the underlying data and work together as a community to solve the problems we face.

Ryan Boyd

Data’s role

Data plays a key role in the fight against the pandemic, and the community can help by providing crucial and actionable insight on the patterns behind the data – the growth rate of confirmed cases and deaths in each region in correlation with social distancing guidelines, understanding how we are flattening the curve by social distancing, and more.

At its core, COVID-19 is a medical problem affecting people’s wellbeing and lives, but it is also an epidemiological problem. Ultimately, improving understanding through data will help the medical community make better decisions and inform appropriate public health policies that will keep people from becoming patients.

As part of the community, at Databricks, we are determined to play our part in data for good against not only coronavirus but other issues as well. This is why we came up with Hackathon for Social Good – a competition encouraging anyone, from first-time data explorers to data professionals, to participate in the effort. The science and data analysis, as well as machine learning technologies used during the competition, will allow end-users to better comprehend the data related to the issues and provide greater insights not only into the pandemic, but a range of challenges ranging from localised community issues to those affecting our planet, such as climate change.

The open source solution

Open source technology has a big role to play in the current COVID-19 pandemic. In fact, those who competed in Hackathon for Social Good, were able to use Apache Spark, Python, pandas, BERT, Delta Lake and many other open source technologies.

These technologies are useful for a number of reasons. They can be used to perform exploratory data analysis to gain insight and intuition into the parsed data, to execute natural language processing (NLP) on completely unstructured data related to scientific papers, and also to build machine learning models to help experts have a better understanding of what may happen next. And these are just a few examples of how data and open source can be used to tackle the current massive disruption brought by the pandemic. However, it can also be used for hospital resource utilisation modelling, pandemic projections and real-time tracking of pathogen evolution.

The pandemic and the Black Lives Matter movement only increased the importance of corporate citizenship and our motivation to ask “how can we help” – our communities and our world, promote human rights and environmental sustainability and ultimately bring about change. Global health issues, community problems or climate change can’t be solved by a single organisation, nor can they be solved effectively without data.

Data enterprises have a huge part to play in the tech for good movement. It is our responsibility to ensure the right data is broadly available and actionable so data teams around the globe can do their best work to tackle the world’s toughest problems. Data and analytics leaders have the power to build programmes and enterprises that transcend organisational boundaries and encourage the use of data to improve society globally. As those in the front lines fighting to better our world still lack analytics resources and expertise, the data community can help by ensuring that data is truly used for the good of the planet.

Ultimately, the future of open source technologies and open data will be centred around how businesses handle the data they capture or create, and the ecosystem built around it. It’s incredible to see the emergence of economies built around data and we’re only just getting started.


Ryan Boyd

Head of Developer Relations at Databricks


Back to top