top of page

"What do you do, anyway?"


I have recently been having a lot of conversations with people where they ask me “what do I do” and are looking for more than “I am a Data Scientist/Analyst.” I have struggled with this question a multitude of reasons. The primary struggle is that no one really wants to spend the time to understand what Data Science is and instantly decides that it is something magical; beyond their comprehension. Second, I have struggled to answer this question because I have largely been a generalist and applied Data Science to a wide range of problems up to this point (as I have basically been developing a startup inside a larger company). So many times, when I do attempt to describe what I do, I say something along the lines of “I improve/optimize systems.” This generally receives a response of “what type of systems or problems,” to which I must respond “any type really.”

People seem to think this is a ridiculous notion. How might one have a skill set which can optimize/improve any type of problem/system regardless of field or industry? Well, the answer is really quite simple: I do the same thing everyone else does to solve a problem.

  1. Listen to what people's problems are and investigate all of the surrounding touch points

  2. Decipher and identify what the primary issues really are

  3. Develop KPIs which will accurately measure the problem and the impact of surrounding touch points

  4. Analyze any relevant data and acquire additional data if necessary

  5. Use a wide range of math, algorithms, and processes to improve/balance the KPIs as necessary

  6. Validate and test expected optimal results

  7. Implement the best method

While this is a simple enough system which I think everyone uses even unconsciously to solve problems, it is the direct implementation of math in steps 3, 4, and mainly 5 that people start to become confused with. However, elaborated on, these steps are not all that mysterious.

Let’s walk through a quick example that doesn’t seemingly have anything to do with math.

“Mary is an adult women who lives with her new husband Joe and his dog Sam. Joe and Mary have been living with each other for the past year. Recently, Mary has been growing increasingly irritated with Sam and has even asked Joe to start looking for a new home for Sam despite that when Joe and Mary first met, Mary loved the dog. Mary says she is fed up with the dog barking all the time and that her head can handle it no longer. While it was true that Sam definitely barks and chases after the mailman, Joe had not really noticed that much of a change in Sam’s behavior over the past year. However, to make his new wife happy, Joe compromised and suggested that they get a shock collar to help teach Sam not to bark. Mary agreed to this, but after several months Mary was still insisting that Sam get a new home because her head was going to burst from the noise. Joe was perplexed because he was sure the Sam had been more quiet since the acquisition of the shock collar. In the end Joe was forced to find Sam a new home.”

While this problem has seemingly nothing to do with Data Science, I can solve the issue with Data Science and ensure Sam gets to stay with his loving father Joe.

First we look at the initial information.

  • Mary is continuously getting upset

  • Mary is getting upset because her head is hurting

  • Mary’s hypothesis is that Sam’s barking is making her head hurt

What is the fundamental problem?

  • The fundamental problem is that Mary’s head is hurting-- not Sam’s existence

How should we move forward?

  1. Acquire data

  • Create a log of when, where, and why the dog is barking

  • Create a log of when, where, and for how long Mary’s headaches start

  • What is Mary’s regular schedule?

  • Look for other data which may be impacting Mary in other ways

  • Has Mary’s diet changed?

  • Has Mary gone through any major changes in health since moving in with Joe?

  • How is the economy doing? Is there stress on Mary’s company?

  • Has the environment had any impact on Mary in any capacity?

  • How has their relationship changed, have Joe and Mary started to fight more over the past year?

  1. Analyze the data

  • Well it seems the headaches only really happen on days the mailman comes

  • Mary works until the afternoon just before the mailman arrives

  • Sam does seem to predominantly only bark in the afternoons when the mailman comes

  • Mary’s company has been doing great recently and so is the economy

  • Mary makes most of the dinners in her traditional manner and hasn’t changed diets much

  • Mary has started working out more with Joe during the week

  • Mary has started eating larger breakfasts in the morning since moving in with Joe

  • Mary has started attending church with Joe

  • Joe and Mary have almost never fought

  1. Use algorithms and maths

  • Correlation between work and headaches

  • Dog barking and Mary headaches highly correlated

  • Greater negative correlation between church and headaches

  • Greatest correlation between breakfast and headaches

  1. Test and Validate Hypothesis

  • Does Mary have headaches during week Sam is at a friends house?

  • Yes but fewer than expected based on historical data

  • Does Mary have headaches if we stop going to church on Sundays?

  • Yes

  • Does Mary stop having headaches if she eating Joe’s big breakfast?

  • Almost completely!

  • Test if dog at friends and not eating Joes breakfast stops headaches?

  • Completely gone!

While this is a simplistic problem, it illustrates that a methodical mathematical approach is much more likely to find seemingly hidden information and increases your ability to not only measure impact, but also truly identify what is impacting your systems. In the example, we see that it is likely Mary’s headache was predominantly a result of Joe’s big breakfasts in some capacity and was likely only partially caused by Sam’s presence. As a result, in the original story Mary would have continued to have headaches (although likely fewer) despite Sam’s absence. It was just purely happenstance that the only day that Joe and Mary did not have a big breakfast was the day that the mailman does not come! The fact that the headaches were delayed until the afternoon was a false confirmation bias to Mary that Sam was the issue. However, in the initial story, because Joe or Mary never wanted to look at the issue at the basis or bring it down to the fundamentals, they never even realized that it was possible that the big breakfast was the primary factor causing the headaches.

 

What are some examples that your company can likely use and implement quickly?

As mentioned above, it can be hard to see areas for improvement for your company or the fundamental issues of a problem if you have never thought to look for them. Below are some easier ways to significantly improve your systems with only some minor research and development.

Marketing

  • Determine your Customer Lifetime Value

  • If you don’t know what your CLV is, how can you know how much your able to spend to acquire a customer?

  • Are you using surveys to determine what your customers really think?

  • Are you really asking the right questions to improve your company?

  • Are you getting the number of responses you need?

  • Are your questions good enough to be used for a wide range of data science exploration?

  • Are you offering incentives for survey response? Are they immediately gratifying? Are you willing to pay more?

  • Are your responses easily accessible inside of databases?

  • Are you still loading those surveys into your system by hand? Is there any room for error?

  • Are you administering your survey through the right channels?

  • Model your survival rates

  • Want to improve CLV? Find out the easiest method of having them return.

  • Do you even prospect?

  • If you are not targeting your prospects and spending the same amount on everyone, you are just throwing away money.

  • Are you taking advantage of all of the most reasonable marketing channels?

  • Attribution Modeling

  • Are you really spending your advertising dollars as efficiently as you should be?

  • Do you know mathematically how your customers compare you to your competition?

  • Are you pricing your product in the best way possible?

  • Are you accounting for seasonality, demand, inflation, perception, and even the the local and national economic climate?

  • Do you know the areas of your product that customers care the most about?

  • Are you A/B testing your various campaigns and methods?

  • Are you optimizing your advertising campaign spend?

  • Are you winning the SEO wars for not only your own brand name but for other highly correlated keywords?

  • Are you really getting the most out of your Google Ads?

  • Is your product being perceived how you think it is being perceived?

  • Are you controlling the perception through PR campaigns?

  • Are you placing too much money on content generation or not enough?

Operations

  • Are you focusing too much on the quality of your product?

  • Are you paying your employees to much or too little?

  • Are you accounting for random variations and standard deviations within your process modeling? Do you know what the losses are if you don’t?

  • Do you understand your communication webs? Have you optimize your company’s structure around it?

  • Have you modeled your expected demand? Accounted for seasonality and expected economic indicators? Can you handle the capacity?

  • Where are your bottlenecks? Are your process flows optimized?

  • Are you getting the attention of the right employees?

  • Are you hiring the right employees?

  • Are you hiring the correct number of employees?

  • Are you scheduling employees in the most effective manner?

  • Do you know what your customers think of your employees?

  • Do you know your employees’ strengths and weaknesses?

  • Are you doing any continuous busy work? Data entry? Ect?

  • Because that is, like, sooo 2000.

bottom of page