When I took my first ML course (By Andrew Ng) in 2015, data science wasn’t something that rings bells, at least not for me. Back then, I was just curious about data, and the possibilities it could unleash, I had participated in an analytics hackathon where I somehow managed to place 3rd. That was not an expected outcome, considering the little skills and knowledge I had. Learning hive queries on the weekend was odyssey and Hortonworks did not even work on my laptop. I felt the least prepared, and the little SQL along with some reading on Big Data was all I had in my toolbox. Winning that day lead me to believe I should pursue a career in Data.
Then, there were only two ways in my mind in which data could deliver value, either creating analytics or serving predictions. 6 years later, I have come to learn even more than I knew possible. The data science field exploded my simplified idea of just analytics and predictions and I’ve had opportunities to pioneer some of these in the companies I’ve worked at.
My first customer-facing data product is a recommendation engine which I have written about before here. I worked on several more projects later on around recommendations, relevancy ranking engine for search tool as well as some others. These were all full-fledged customer-facing data science applications and as I developed these, I have come to appreciate the distinct lessons that come from building data-fuelled tools that influence real-life user’s experiences. Here I will share the most important lessons in my journey so far.
A data product is described as “an application or tool that uses data to help businesses improve their decisions and processes”. — Dr. Carlo Velten.
Take a hybrid mindset
Many new data learners come with a fascination of complex machine learning (ML) models. They want to try deep neural networks in most problems and due to that, often hyperparameter tuning comes before any thought of feature engineering. I would not claim that the data-centric approach is the best, but I would still advocate for thinking about your data first. In many industry problems, algorithms can only take you halfway. Going the rest of the way will lie in the details of your data and the quality of the features you have crafted.
These are also popular opinions of Prof Andrew Ng: his amazing talk on data-centric AI.
Let’s be real, context is the most important and therefore a data scientist should optimize his work based on the problem they are trying to solve. Some limitations such as wanting rapid results could determine the options you can explore for a particular problem.
When we want fast results, engineering new features, or devising preprocessing techniques might be the less efficient way forward. Despite my affinity for the data-centric approach, one should understand that some data issues cannot be fixed sustainably downstream. So, be context-aware that there could be times when a working product is what you need to ship first.
In the early phase of the project like when you are building a proof-of-concept (POC), go with the path of least resistance. Getting buy-in is more important than an extra % point in the model accuracy. If you are building for production, you can guess-estimate which approach can improve the performance and optimize with that in mind. In future iterations, you can validate both data and model-centric approaches. That is the time when you can adopt a hybrid approach and explore the best ideas.
Business metrics first
Let me put it this way, business is king. I have spent too much time pointing out my RSME scores or F1 score to people who don’t care. Data scientists are taught in kaggle and in academia to strive for the extra X-% of accuracy, but in the industry, data scientists are expected to bring value and impact. The statistical metrics do not show the value or impact to your business stakeholders. Your stakeholders don’t understand how your RMSE translates into metrics they care about.
As the field matures and companies focus on the usefulness of data science, one should pay more attention to building her/his business acumen and learn how to prove the impact of his work.
It’s one of the things that took me a long time to learn. In fact, learning this took me longer than learning backpropagation and gradient descent combined. You have to figure out the language and ways to articulate your model’s impact in terms of revenue, clicks, and conversion.
UI matters so much
Fine-tuning parameters and feature engineering can take you far but doesn’t matter if user perception or appeal of the product is lacking. This portion of the data science project is the toughest as it requires A/B testing and synergy with the products and developer teams to help facilitate the experiments. Customer-facing products have the attention of the customer as a pre-requisite and from experience, you can drive significant impact from improvements to the UI.
Additionally, the positioning, wording of the product title are all part of creating the appeal. It goes without saying but often the excitement of having that project out in production might blind the data scientist from these details. As an example, my recommendation carousel was titled “items you may like” resulted in mediocre returns. Interaction spiked from just tweaking the shading of the carousel and using “For you” instead which portrays the personal touch of this section.
There are many more lessons but these 3 lessons were the most expensive in terms of time and perhaps most underrated. During this journey, there were times of frustration when I thought my manager or stakeholders are unreasonable. It was tough to work for weeks only to be hindered deployment. Only when I allowed myself to see past my work that I learned what my humble self should have realized earlier. I took time to learn some of the rules and I hope you benefit from this to play a better game in your next project.
Did you find this article valuable?
Support Fares Hasan by becoming a sponsor. Any amount is appreciated!