>>> This is the first part (1/2) of our interview with Paul Sonderegger <<<
We have an excellent interview with big data strategist Paul Sonderegger from Oracle at ‘Oracle Big Data and Analytics Summit’ in Istanbul – Turkey.
Hi, Paul. Could you tell us about yourself, first of all, please?
I am a big data strategist at Oracle. What I do at Oracle is to lead the company’s research on data capital, involves talking with heads of various product groups in data management, integration, analytics, applications both in the cloud and on-premise pulling those pieces together in all these technologies interact. I spend time with our customers helping them understand how the changing world of data changes the way the companies compete.
You say that big data is changing everything. Could you tell us about what is big data? What kind of data collected means big?
It seems that there are as many definitions of big data as there are companies who use it. Our definition, ‘Oracle’s definition of big data’ is very simple. Big data is the capture and the use of more data in more daily activities. Big data is not a particular technology. It’s not a specific solution not even one made by Oracle. And, you don’t have to rise above a certain amount to say you have a big data or have a certain number of kinds of data to say you have big data.
The important thing about big data is companies are now capturing more data from more daily activities in lots of different shapes and sizes video, audio in addition to digital records, then using that data and much more ways using in new algorithms, analytics, and applications.
You said that digital data. Is video capturing also a data?
It can be. There is a funny little distinction here. There is a difference between digitization and datafication. Digitization is when you take something analog and turn it to digital. So it might have information on a piece of paper.
Let’s take video. When you capture a video, you’re digitizing that experience. But you need to do some additional work to turn it into data, analyzing that footage to break it up the scenes, may be doing some voice analysis to figure out how, when there is a lot of action or when things are silent. The metadata that you add, all of this is datafication. So, there’s a difference between those two things.
We saw a term on the internet called ‘datavist’. What is datavist?
Datavist is the digital native in analytics. One of the big shortages right now is managers who can think with data, managers who solve problems through using data to put context to the challenge and the solution. And lots of big companies need are more managers who are a datavist. So, they may be familiar with marketing strategy, may be familiar with customer segmentation but they also need to be familiar with how to process and refine and data to get more out of it.
For example, traditional marketing managers will be a superb at thinking about customer segmentation, demographics certainly, psychographics how these people feel, how their feelings change the way that interacts with the products. But a datavist marketing manager will say that we should tap the Twitter firehose. We should pull a particular stream from just that mentions, these hashtags and includes these individual product names. Then let’s run some entity extraction a natural language processing over those things to turn these tweets into data pull out pieces of those messages between people to give his data that we can analyze better understand that we want. The difference in datavist managers can look out of the world and see data resources that haven’t been used before to figure out new ways to solve problems.
Are there any artificial intelligence algorithms for this kind of purposes?
There is and every day there are more.
Are they enough?
It depends on what you want to do. For example, Oracle provides a product called ‘big data prep service’. Big data prep is a cloud-based service for preparing data for analysis, and it uses a lot of natural language processing techniques which are artificial intelligence techniques to pour over that data and has been described pull out particular mentions of things perhaps also identify phrases and grammatical relationships. That’s all artificial intelligence comparing that data to make it more useful.
Otherwise, it is not possible to take care of all those tweets. It’s impossible for a human to do it without AI. Isn’t it?
That’s exactly right. We’re producing data beyond all human scale. We need tools that help us to analyze and beyond what humans capable of.
The future of a datavist, who will not collect all the data but the data will be gathered and put in front of the person, manager.
Is the datavist the person that asks the right questions how to analyze this data? What is a more specific feature of datavist?
Datavist mindset; you can find in managers, you can find in data scientists, you can find in journalists as a matter of fact. When it comes to analyzing data to make it more useful what we see with most of our customers are teams of people. And on that team, you’ll have different skill sets. There is often someone who is familiar with the business, and they’re capable of coming up with new questions to ask. They have a sense of what kind of data is available, but they don’t have programming skills, they don’t have any statistical skills necessarily either. So, they then rely on the data scientist who does have the mathematical skills to represent relationships in the real world mathematically. But usually those people don’t like to write code, and they certainly don’t like to do data preparation, and so there’s another person who’s the data engineer. Data engineer is usually a good programmer often in Python, sql, and this person prepares the data for the data scientist.
Then there is another person often could be one of the three to talk about, and this person is the storyteller. Maybe another individual, maybe one of these three, but the storyteller then helps somebody else to understand why they asked this question.
What datavist was necessary to build answer the question and what they did to the data so that you know you can rely on the answer. These are the four roles that we see in these data science teams.
You mentioned the storytelling. Do you have a storyteller position in Oracle?
It’s more of a job for many people at Oracle. This is something that we look for certainly in the leadership of the product groups you have to be able to talk about why does it matter. Of course, you have to know what the product does. You have to know why that’s important to be able to communicate. We have lots of people to do.
What is data wars? Is it related to big data? Are you using these concepts?
Let me put it in a different light, in light of data capital.
The reason that big data is such a big deal is because there’s a larger economic story behind all the talk about big data and that economic story is the rise of data capital. So, what we see happening is that data is now in part with financial capital when it comes to creating new digital products and services. This is not a metaphor.
This is not like data is new oil, data is new gold, data is new electricity although that one is a good metaphor. What we’re saying is that data fulfills the literal economic textbook definition of capital. Produced goods with a natural resource to invest in building this capital whether its equipment whether its financial capital. And then it is a necessary input into some other good or service. That’s what capital is. What we are saying is the data fulfills those definitions.
For example, the retailer who wants to open up a new market they need financial capital to do that. They need money. If they lack financial capital to expand their inventory, to build new facilities in that region they can’t do it. That same retailer if it wants to create a new pricing algorithm, new recommendation engine if they lack the data to feed those services they can’t do it. Data is now in an economic factor of production in new digital products and service. If you don’t have the data necessary, you can’t deliver that service that you’re thinking of. Here is where this becomes crucial regarding companies competing with one another, means that businesses are in a race to digitize and datafi activities before their rivals do.
We see for example insurers providing health and life insurance. They are concerned that consumer electronics makers – making wearable gadgets, bands, watches -. Those companies will end up with the data that prices risk better than the insurers have, sets of a real competition for that particular source of data capital.
Who will use big data have an advantage. Can we name it as a data wars?
That’s one to look at it. But this is essentially about competitive strength and the strategy is creating unique value in a unique way. That’s how Michael Porter from Harvard Business School defines strategy.
You end up with the situation where companies have to digitise and datafi activities before their rivals do. So, they can capture the source of data. Because once the activity happens if you are not recording, your chance to capture the data is lost, it doesn’t come back. So, there is a competition for the sources of data.
The next piece though is the data tends to make more data. When you create algorithms that use that data you captured, then those algorithms create data about their own performance that can be fed back into the system to improve their future performance.
We have customers who do fraud detection on person-to-person mobile payments; payments made between people using mobile phones and their algorithm checks those transactions in real-time as they have every single one for fraudulent activity. They capture data about what fraudulent transactions did they miss through other forms of investigation. They also capture data about legitimate transactions that they mistakenly flagged as a fraud because customers complain. That data goes back into the system to change the algorithm to improve its future performance. All of that data produced by that algorithm only this company has. That virtuous circle becomes a competitive advantage that is very hard to catch.
The next thing that happens is that platforms tend to win. And, platform competition is something that we see in information intensive industries like software, like equity trading because of the digitization and datafication of more daily activities. Platform competition is coming to real-world industries that haven’t seen it before.
If you are in farming, for example, it’s now possible to have a drone that will go out, take pictures of crops, it analyzes that photo looking for how much green is in the picture and that is a proxy for chlorophyll content of crops. That information is then fed to the fertilizer spreader on the back of the tractor to change the amount of fertilizer and the mix that’s put on the field tailoring for the fertilizer to those individual plants. The tractor in the middle is now in competition to be the platform for digital agricultural services. That tractor maker is in competition with drone maker, in competition with the maker of the fertilizer spreader too. It is not the traditional way that makers of big agricultural equipment think about competition, but it’s the way to have to think about it now.
>>> This is the end of first part (1/2) of our interview with Paul Sonderegger <<<
May we invite you to watch the video interview, we have accomplished after the meeting with Big Data Strategist Paul Sonderegger from Oracle.