3 factors for successful Data Science Projects
3 factors for successful Data Science Projects
According to Gartner 60% of Data Science projects fail. Some others have estimated much higher failure rates and can be up to 85%.
Well, the key is to understand what it takes to be in the remaining 15%. Isn’t it?
Based on our own experience, we were able to identify key success factors that can take us there!
Watching these factors and taking the right steps, we have successfully delivered Data Science projects, from concept to execution to production.
CoreView has built a strong Data Engineering and Data Science competency and successfully completed Machine Learning projects involving Prediction Engines, Computer Vision, Sentiment Analysis, Churn Analysis, etc.
In the paragraphs that follow, I would love to share my findings of the top 3 success factors.
1) Balancing the team:
Data Science project team usually consists of stakeholders from Operations, Data Science and IT Development. Operations team focuses on the required business outcomes, Data Scientist model the existing data to get the desired outcome, and the development team creates the lake and visualization. Well if you observe closely, all of them have their own language, a way of communication and focus. What we are missing is the link in between.
The “technology generalist” at CoreView bridges this gap. This person is in charge of the overall success of the project and is required to be competent enough to collaborate and communicate between the business, the scientists & IT.
2) Featured Data:
The right amount and right quality of data is a must for the success of a project.
Typically having a lot of data is assumed to be good to solve the problems. More data means more processing, more exploration, complex tech stacks, more computing, scale issues. Everything adds up to the cost.
What it takes is spending a good amount of time in data exploration to identify the “most influential features” for achieving desired results. Build your data engineering processes and models around these features.
Not all Data Science projects need big data, and not all projects have the same objectives.
3) Uncertainty is given:
Most of the data science projects have a good amount of uncertainty in terms of achievable results Vs expected results. Are you really going to get what you want? Asking/answering the question “upfront” is very critical.
It helps to start with a POC phase, perform EDA, identify features, try out different algorithms and see what combinations give us the desired results.
By the end of this phase, we refine expectations, identify efforts, data, team, budgets, achievable results, based on the quality and quantity of the available data.
At CoreView, we believe and rigorously practice all of the above.
To summarise
- Bring in “technical generalist”
- Identify “Most Influential features”
- Conduct “POCS”, and redefine expectations
Hope you like what I am saying. I will be very happy to see your comments and hear from you about your own experience in this area.