Originally published in Forbes
Enterprise companies accumulated massive amounts of data over the years in pursuit of the promised rewards of big data. The question that now often comes from management is, “There must be some value, somewhere—can’t you find it?”
The data democratization movement was born, in part, in response to the uncomfortable dialogue around this question.
The transition to a more democratized view of data should not be a binary choice—fully centralized or decentralized. Data democratization must be viewed as a full continuum of options between those extremes. The ideal solution is found by monitoring for a ‘goldilocks’ state for data democratization (not too much and not too little) and engaging in the culture and process changes inside a company that makes it possible.
Unless there is an existential threat to an organization, entrenched business culture can block technology transformation. Moving to a more democratized use of data may mean significant organizational change that a company has not previously considered and for which it is unprepared. Where do you start?
Whether intentional or not, your organization has already chosen to live in a specific way in an increasingly quantified world. You have a particular vision of data science, and your IT organization reflects this. IT as an isolated cost center, or IT as an integrated and well-understood part of daily organization practices, are two very different scenarios.
Given your cultural attitude and organizational structure around data, what’s the current predisposition for change in parts of the company that have not yet embraced or experimented with data?
People’s thinking is shaped by those with whom they interact. To alleviate fears about the unknowns of data and understand its power, individuals should engage in mentoring relationships with data scientists.
Rotation programs are a practical way for data experts to temporarily move onto other teams to provide education within the context of how those people work daily and the specific challenges they face. It’s about meeting people where they already are and helping them become citizen developers.
In many organizations, newly data-informed mindsets and relationships will quickly run head-on into the roadblock of organizational inertia, which has limited where data can go and what it can do. The old way of thinking about data is founded on applying digital tools to incrementally improve an existing process inside an existing structure. The new way of thinking is to use digital tools to inform decision-making that may necessitate organizational change.
The types of data you expose inside and potentially outside the company have ethical, compliance and competitive intelligence ramifications. And there are always bad actors lurking to take advantage of unwitting mistakes and naïve uses.
The basics of data privacy laws need to be understood so they are properly considered and abided by a company’s citizen developers. Strict compliance needs to be maintained around national laws such as GDPR in Europe, regional laws such as CCPA in California and regulations within specific industries such as healthcare.
As an organization moves to a more democratized data environment, new people and teams may honestly believe they are aggregating and maintaining anonymized data. But that exposed data maintains hidden relationships that can be used to reverse engineer private data by correlating multiple data sources. As an organization decentralizes certain aspects of data access, governance must largely remain centralized with experts who provide clear guardrails.
The means by which bad actors can de-anonymize data are generally a significant, and unpleasant, surprise to new data users and citizen developers. They’re surprised to learn that there are many external sources of data that can be used with bad intent to ‘enrich’ the data they release and de-anonymize it—even when the belief is that what is shared complies with privacy laws and the company’s ethical stance.
Among better-known examples is the reliable identification of people based on a combination of ZIP code, date of birth and gender or by a partial social graph.
An enterprise needs to consider where it wants to be on the continuum from a fully centralized to decentralized approach to data, to all the infinite ‘goldilocks solutions’ in between.
Guidance around when and how an organization exposes data to other departments or to the public should always remain a centralized responsibility. The ethical, financial and legal costs are too high for inadvertent mistakes about complex laws from well-meaning individuals. Centralized governance must define the nature of actions around data access, formats, quality, authoring, dictionaries and sharing.
Individual departments are responsible for deciding what data can be exposed from their department to other groups to optimize operations and create new applications. A key element of this model is that the publisher does not control the who and how of published data consumption.
For instance, a production plant may decide to start exposing volumes and inventory turnaround times by product line—data that can be used by the sales and marketing organizations to tune outreach based on available supply. Once that data is available, other departments can use it in their own, unforeseen ways, compounding the return on investment.
The journey to the optimal position on the centralized-decentralized data democratization continuum is unique for every company, and it will continually change over time. What is your company’s goldilocks decision about data democratization, and what culture and organizational changes are you willing to make to get there?