13.7.2023
Mariia Snihyr
Table of contents
- Introduction
- Pandays 2.0 and beyond
- Large Scale Feature Engineering and Data Science mit Python & Snowflake
- An opinionated introduction to Polars
- Common issues with Time Series data and how to solve them
- WALD: A Modern & Sustainable Analytics Stack
- Towards Learned Database Systems
- Rusty Python: A Case Study
- The search for meaningful test data
- Creating Synthetic Data for Open Access
- Most of you don’t need Spark. Large-scale data management on a budget with Python
- Apache Arrow: connecting and accelerating dataframe libraries across the PyData ecosystem
- Postmodern Architecture – The Python Powered Modern Data Stack
I am Mariia, a Data Engineer in the Data Product team at FELD M. In April 2023, my colleague and I visited Berlin to attend the famous PyCon – the largest European convention for the discussion and promotion of the Python programming language.
Every year it gathers Python users and enthusiasts from all over the world and gives them a platform to share information about new developments, exchange knowledge, and learn best practices from each other.
In 2023, PyCon Berlin was merged with PyData, a forum for users and developers of data analysis tools. It lasted for three days and included so many presentations that it would take a team of at least seven people to attend all of them.
Fortunately, the sessions were recorded, and now, after some months, they are available for everyone. You will find a link to the YouTube playlist of PyCon Berlin 2023 talks at the end of this article.
But first, I would like to offer you my own overview of the presentations that we attended and liked the most. Please remember that this overview is based on personal opinion, so it may be biased and different from yours. Feel free to add your perspective in the comments!
1. Pandas 2.0 and beyond
-
For whom: Software and Data Engineers, Data Scientists, and everyone who works with Pandas (except animal keepers in public zoos, maybe)
-
Why it’s worth watching: The talk not only covers the changes that were implemented in Pandas 2.0 in comparison with Pandas 1.0, but also touches on the topic of PyArrow which is actively used in the latest version of Pandas. (If you are curious about what PyArrow is, there is a link to the talk about it at the end of this list).
-
Our verdict: Interesting topic, very relevant for our work, Rating: 9/10
-
More details can be found here
-
View a video of the talk on YouTube
2. Large Scale Feature Engineering and Data Science with Python & Snowflake
-
For whom: Data Scientists, Data Engineers, and those who are interested in Snowflake
-
Why it’s worth watching: This talk was essentially an introduction to Snowpark, Snowflake’s framework for machine learning development that can work with big data in Python, Scala, or Java.
-
Our verdict: Good presentation, but you wouldn’t get too much out of it if you don’t work with Snowflake on a regular basis. Rating: 7/10
-
More details can be found here
-
View a video of the talk on YouTube
3. Raised by Pandas, striving for more: An opinionated introduction to Polars
-
For whom: Software and Data Engineers, Data Scientists, and everyone who works with Pandas (but is striving for more)
-
Why it’s worth watching: The talk gives a really good overview of Polars and inspires you to test it as a more powerful alternative to Pandas.
-
Our verdict: The speaker was passionate about the framework and a very engaging speaker. The slides were great fun! Above all, the topic of Polars is quite hot at the moment, so definitely: Rating: 10/10
-
More details can be found here
-
View a video of the talk on YouTube
4. Common issues with Time Series data and how to solve them
-
For whom: mostly Data Scientists, but still relevant for anyone working with data
-
Why it’s worth watching: This talk walks you through four common issues with Time Series data and gives you hints on how to resolve them.
-
Our verdict: The presentation was quite good, but covered relatively basic things, hence: Rating: 7/10
-
More details can be found here
-
View a video of the talk on YouTube
5. WALD: A Modern & Sustainable Analytics Stack
-
For whom: Data Engineers, BI specialists, and companies and teams who aim to become more data-driven
-
Why it’s worth watching: The presentation was dedicated to the tools you can use for building a modern reporting pipeline, and WALD, a solution in which these tools are already combined.
-
Our verdict: We were really curious to check out which technologies our colleagues from other companies use for building reporting pipelines. Also, I have to admit, the slides were very cool! Rating: 8/10
-
More details can be found here
-
View a video of the talk on YouTube
If you are looking for a ready-to-use solution that would help you extract more value from your data, check out the development of our Data Product team: Datacroft Analytics Stack - contact us for more details!
6. Towards Learned Database Systems
-
For whom: Anyone working with databases
-
Why it’s worth watching: It’s a presentation of the new direction of so-called Learned Database Management Systems (DBMS) where core parts of DBMS are being replaced by machine learning models, which has shown significant performance benefits.
-
Our verdict: The topic is exciting per se, but kudos to the speaker – he made it even better with his excellent and well-balanced presentation! Rating: 10/10
-
More details can be found here
-
View a video of the talk on YouTube
7. Rusty Python: A Case Study
-
For whom: Software and Data Engineers working with Python
-
Why it’s worth watching: An overview of Rust and its benefits for Python developers. Exciting presentation about implementing a solution in Rust and integrating it with a Python application using PyO3.
-
Our verdict: Very interesting topic and excellent presentation, Rating: 10/10
-
More details can be found here
-
View a video of the talk on YouTube
8. "Lorem ipsum dolor sit amet"
-
For whom: Everyone working with software and data
-
Why it’s worth watching: The talk with its tongue-in-cheek title is dedicated to the process of finding meaningful test data for your software. The importance of this topic can’t be overestimated, so those who work with data on a regular basis should definitely check it out.
-
Our verdict: Fun slides, but I’ve got a feeling that the main message was a bit diluted by the amount of jokes and examples. Still, it was a useful and engaging session. Rating: 8/10
-
More details can be found here
-
View a video of the talk on YouTube
9. Unlocking Information – Creating Synthetic Data for Open Access
-
For whom: Data Scientists, but might be interesting to anyone working with data
-
Why it’s worth watching: If you’ve ever wondered how to make the data you used in your work public without disclosing any personal information, this presentation might be exactly what you are looking for.
-
Our verdict: The topic is a bit niche, though still good for general professional development. Rating: 7/10
-
More details can be found here
-
View a video of the talk on YouTube
10. Most of you don’t need Spark. Large-scale data management on a budget with Python
-
For whom: Software and Data Engineers, Data Scientists
-
Why it’s worth watching: The talk covered a lot of aspects and technologies that can help you manage large volumes of data and build scalable infrastructure for its processing.
-
Our verdict: The speaker asks some questions that might make you feel a bit dumb and trigger an episode of impostor syndrome, but besides that the talk was great! Rating: 9/10
-
More details can be found here
-
View a video of the talk on YouTube
11. Apache Arrow: connecting and accelerating dataframe libraries across the PyData ecosystem
-
For whom: Software and Data Engineers, Data Scientists
-
Why it’s worth watching: If you have heard about PyArrow or Apache Arrow before (e.g., while watching the “Pandas 2.0 and beyond” talk) and you want to dive deeper and find out more about this technology, this presentation is for you. If you haven’t heard of PyArrow before, this presentation is even more perfect for you.
-
Our verdict: Arrow is fantastic, but the talk was not too light-hearted, so it requires some concentration. Rating: 8/10
-
More details can be found here
-
View a video of the talk on YouTube
12. Postmodern Architecture – The Python Powered Modern Data Stack
-
For whom: Data Engineers, BI specialists, companies, and teams who aim to become more data-driven
-
Why it’s worth watching: The speaker and his team basically built a competitor of WALD (check #5 in the list). They offer it as a set of technologies forming a flexible stack that can deal with integrating data and extracting value from it.
-
Our verdict: Again, if you are curious about technologies that can be used for building a modern reporting pipeline, you should watch it. And as a fan of the Brooklyn 99, I can’t help but admire the slides. Rating: 8/10
-
More details can be found here
-
View a video of the talk on YouTube
As already mentioned above, there were many more exciting presentations at PyCon Berlin 2023. You can find the full list of sessions with descriptions on the conference schedule page. And, fortunately, the majority of the recordings are now available to everyone on YouTube!
To wrap it up, I can say that PyCon is a great event for everyone who is passionate about programming, data, and, of course, Python. It inspires you to try new things and re-think your approaches, brings you closer to your fellow developer community, and gives you the joy of learning from the best experts in your field.
And of course, it’s a perfect reason to visit the vibrant city of Berlin and enjoy its amazing local food, nightlife scene, rich history and some of the most remarkable sights! We are looking forward to PyCon 2024, and hope that after this article you are too!
If you are interested in our work within the Data Product Team, you can find more information here.
We also showcase some of our data engineering & architecture projects here.