PyData Global 2022

Steering a data science project
12-02, 13:30–14:00 (UTC), Talk Track I

Starting a new data science project is an exciting time, full of exotic models possibilities and faraway incredible features. However this ocean of potentialities is treacherous and the risks of veering off numerous.

This talk aims to provide a checklist to help you set a course for your data science project, and keep it. An industrial project about images pseudo-classification will be used as a working example.


The early stages of data science projects are full of the potentialities and wonders of discovery. Yet, to be able to see the end of a project, you need a clear vision of your final destination, some idea on how to reach it and methods to confirm you are on the right course.

While each project represents its own uncharted sea, there are a lot of tips, tricks and tools to navigate the unknown. I aim to compile a few of them in a checklist (PORTULAN) here and use an industrial image processing project as a practical application (retail products image embedding and comparison).

While this talk project is data science focused, a lot of the tips are more project management best practices and apply outside the domain. People having experience in data science in the industry context are more likely to relate with the practical application.

What you can expect from this talk:

  • a portulan as takeaway checklist to help steer a data project

  • an example application of said checklist

  • bad maritime puns


Prior Knowledge Expected

No previous knowledge expected

Data scientist at Spark hq, an Irish consulting company specialized in data projects. I started my career with a phd in material science simulations and have spend a few years as data engineer, including two for AWS. Since working with Spark hq, I had the opportunity to be lead data scientist on several projects. I guide people underwater in my free time

Data scientist at Spark, an Irish consulting company. Since I joined this company, I had the occasion to lead the data science part of a few projects. I have a PhD in materials science (polymer networks simulation) and some years as data engineer, including at AWS. I guide people underwater in my free time.