Jargon Buster: Big Data De-Mystified

For many businesses, big data is a hot topic.  Whether you’ve got it, whether you’re building it, or whether you thinking about it, tech start-ups have demonstrated how important big data is.  But for those who are yet to take the plunge and embrace big data, I thought we should go back to basics and talk about what is Big Data?

Simply put, Big Data is an on-trend label for having lots and lots of data.  When you have a lot of data, and you can link it all together, you can start to see patterns, trends and behaviours.  Once you are able to observe such information, you can use the data to experiment with an approach, track different expected outcomes and even start to predict future behaviours and responses to products and services.  It’s all very complex and scientific, so let’s think through a simplified example.

For years and years, you’ve been purchasing food. Perhaps you often shop at a particular store, but you may have moved to a new town, and switched to a different store.  Maybe you sometimes eat out, and of course, you’ll eat somewhere completely different when you are on holiday.  Imagine if you could gather together every single item of food you’d purchased ever, and link them into a database. You could add all sorts of additional data (e.g. “I love chicken, but I hate carrots”), you could add pricing information (e.g. “wow, it was cheap to eat out on holiday”), even the evolution of your taste buds (“I never used to enjoy celery until my 30s”)

What you’d create is a huge database with huge amounts of data.  You’d be able to query the data, look at your spend, how many calories you consumed, see if you ate more healthily at home or on holiday, etc.  In fact, you might even be able to predict how much money you’ll spend on food in the next five years.  Amazing!  This is Big Data, and that is why it’s so exciting!

So now you’re “hungry” to get yourself a slice of Big Data – but where are you going to put it all?  Big Data resides in a data warehouse.  Unlike CRM systems, which over the years have evolved to offer a single product that allows you to do everything you need (e.g. Salesforce, Sugar, Oracle, etc.), Data Warehouse software is still maturing.  You’ll need some technical expertise to help you build your data warehouse. Most wise businesses employ the services of a data scientist, to build the data warehouse, connect up the data and help you find the patterns to analyse.

The first challenge is working out where you’ll store all of this data.  Security, privacy & cost are all factors in your decision – but you can be assured that a couple of servers in your IT comms room won’t cut the mustard!

Next, linking all of your disparate data sources, and ensuring that like for like data is matched.  For example, if in your CRM package, you use the DD-MM-YY date format, but in your accounting package you use MM-DD-YY date format, your data scientist needs to make sure the data warehouse sees this.  Another complexity faced by the data scientist, especially were you have numerous data sources is understanding the hierarchy – what data supersedes what, and which systems are already connected up, populating each other with data. Don’t underestimate the time and complexity to build a single view of your data.  Data Scientists tend to use ETL tools (Extract – Transform – Load) to assist with this stage.

Once the data warehouse has been set up, accessing the data is the next challenge.  Whilst a data scientist can easily set up lots of generic reporting to be created and emailed around your business, asking questions about the data or being able to pull together the right fields and sources to see patterns or trends is more difficult.  Most businesses will overlay their data warehouse with a reporting tool (more correctly referred to as a business intelligence tool) to make this less mind-boggling.

Managing and controlling access to the data is another core task for the data scientist, because once you have a single view of your customer, you’ll be very eager to experiment.  Allow the data scientist to keep the data sources pure and maintaining the data integrity.

Finally, employing a data analyst to work between the business and the data scientist, and to do the “crunching” of the data warehouse output is a wise decision.  This removes the data scientist from day-to-day requests, and sets a clear responsibility for maintaining your Big Data.  The data scientist acts as guardian of the data integrity, responsible for single view of the customer & predictive analytics.  The data analyst is guardian of the infrastructure, responsible for managing the data flow and business needs.

Hope this helped give some clarity, and whet your appetite for digging into your very own Big Data!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s