Can we abstract best practices to real design patterns yet?

According to its definition, a design pattern is a reusable solution to a commonly occurring problem. In software engineering, the concept dates back to 1987 when Beck and Cunningham started to apply it to programming. By the 2000s, design patterns — especially the SOLID design principles for OOP — were considered common knowledge to programmers. Fast forward 15 years and we arrive at the era of Software 2.0: machine learning models start to replace classical functions in more and more places of code. Today, we look at software as a fusion of traditional code, machine learning models and the underlying…

With automation in machine learning the field opens up and one may ask whether we need stats skills in the future

The world of automated machine learning

Today, if you want to be a data scientist there are countless courses promising quick entrance to the field of data. For sure, nobody believes that one can learn in a matter of weeks the ins and outs of a complex field that requires hard skills in math/stats, software engineering, and soft skills in R&D development. Yet, because of automation it seems sensible to train machine learning technicians: people with knowledge of the tools and APIs that unleash the power of data analytics.

because of automation it seems sensible to train machine learning technicians

In fact, I’m also trying to…

Bogeyman tasks are best handled just like bogeymen in movies: by chunking.

I’m part of a cross-functional data science team and we are working on cutting edge AI-based products. When a product is built around AI it usually means a complex software architecture under the hood… and a backlog that is full of frightening bogeyman tasks. Today, I had a discussion with one of our new interns about exactly these.

Everybody tries to avoid dealing with these tasks but it could be that either the working method of the team does not allow it (e.g. kanban) or they just simply realize it late that the story they picked is Mr. Bogeyman. In…

We asked ourselves this question many times. Here is the answer.

The prototypical situation that has been puzzling me for a while is the following:
- We estimated 3 days for a task
- Bob has been working on the task for 3 days already
- On the daily scrum, he says that he just needs to take care of a couple more things and the task will be finished
- The task is not finished the next day

Sometimes such tasks are not finished even after 5 days of work. Since they are typically “more research-less development” tasks, many people suggest accepting this kind of uncertainty as inherent to research…

Working on AI products — Whether it is retail, IoT, or marketing — data scientists face similar challenges. Some of these challenges are general to many technology areas, others are specific to AI, due to its unique way this field blurs the boundaries between state-of-the-art research and application development. Let’s look at three prototypical examples of these challenges in the areas of business specification, iterative development of analytics, and testing.

Specification of business requirements

The challenge

One of the most frequent topics that we data science practitioners like to talk about is the question of how business requirements are formulated. This can bring with it a…

1. data science practice is full of waste
2. explicit hypothesis testing helps to finetune ideas
3. communication is the key for integrating data scientists into the software development lifecycle

I joined this field because of the excitement that we feel upon discovering new patterns in data. With time though, it became clear that identifying patterns is just the first half of this journey. There is another part, which is about sharing this discovery. ‘Sharing’ can be in the form of a presentation, in the form of a change in an existing product or even an entirely new product…

At first, I intended this as a post around stories and conclusions. But since I did not find any good tutorials online that would explain TDD from scratch to a working model, I’ve decided to make one myself to supplement this post. If you are familiar with TDD and would like jump right to the tutorial then click here. In the following, I introduce shortly TDD, the reasons why I think TDD is useful in data analysis, and then I conclude with a complete step-by-step TDD workflow using an example project from Kaggle.

Test-Driven Development

Test-driven development is popularized by Kent Beck…

From talents to assets

After the first part of this mini-series, I received some very interesting insights from fellow data professionals. Interestingly, one of the recurring topics was talent management. This topic is also very close to me. I know from first-hand experience that without good mentors and talent management I would never have been able to get to this point and still enjoy so much my everyday work. In this article, I would like to discuss the two specific aspects of talent management. …

The business

This is the first part of a mini-series that summarizes my impressions on what makes certain data science teams effective. Over the years, I have worked in several teams: teams of various sizes, from as small as a duo to as large as a dozen data experts; and of various players, from college dropout savants through rebranded software engineers to PhD/postdocs with long years of academic experience. My role was also changing, I started as a junior and with time and experience developed to be a team lead.

Data science is a new field. It is so new that even…

Ágoston Török

I’m the Director of Data Science at AGT. Previously, I was R&D Lead at Synetiq, have a PhD in cognitive neuroscience, and did research around the globe

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store