Insights from the Databricks Data + AI Summit 2024: Data Engineering, Architectures and Real-World Applications

FeldM_Blog_Illustration_useR!2019Toulouse

Our team recently attended the Databricks  Data + AI World Tour Munich, where several companies shared their experiences with Databricks and the latest developments of the platform were presented. We gained insights from both the main event and a dedicated training session on data engineering with Databricks, which one of our team members attended.

Here, we share our main takeaways, which may be interesting for other data engineers, architects, and organizations building data-driven infrastructures. 

 

Bildschirmfoto-2024-11-18-um-13.23.37-799x600

 

Training Session Highlights: Data Engineering with Databricks 

One of our colleagues joined a hands-on training session focused on data engineering with Databricks. This session provided an in-depth look at how the platform structures and manages data and offered insights into its core components for data processing, transformation, and governance. Databricks also offers a dedicated learning page with different free courses and learning paths. 

 

Key highlights included: 

  • Control and Data Plane Architecture

    The session underscored Databricks’ separation of the control and data planes. The control plane, hosted by Databricks, interacts with users and APIs, while the data plane can be independently managed, allowing flexibility in how resources are allocated and scaled. 

  • Delta Lake and Unity Catalog

    Delta Lake as an open source data storage protocol, supporting ACID transactions and schema evolution, was emphasized. Unity Catalog provides centralized governance across workspaces, simplifying user and group access control and unifying metadata management.

  • Medallion Architecture for Data Transformation

    The Medallion Architecture, used to progressively refine data quality through Bronze, Silver, and Gold layers, was a core part of the training. This staged approach is designed to help data engineers organize data workflows and align them with analytics-ready quality standards. The architecture’s similarity to dbt’s layered approach makes it a familiar yet flexible method for those already accustomed to dbt practices.

  • Orchestration with Delta Live Tables and Workflow Jobs

    The training session also introduced Delta Live Tables (DLT) and Workflow Jobs, tools for organizing ETL processes. DLT, useful for moving data through the medallion layers, and Workflow Jobs, which handle broader orchestration tasks, allow teams to structure data pipelines efficiently and with a high level of automation. 

Bildschirmfoto-2024-11-18-um-13.24.20-768x1020

 

Summit Insights: Emerging Trends in Databricks 

Databricks’ focus on Serverless Architecture 

At the main event, Databricks introduced updates to its infrastructure, including a shift towards serverless, service-oriented architecture. This design, aligned with industry-wide trends, supports flexible scaling and provides data engineering teams with increased control over their data environment. 

Delta Lake and Unity Catalog for Governance 

As a highlighted feature in both the training and main sessions, Delta Lake and Unity Catalog support structured data governance. Delta Lake offers data storage with transaction consistency, historical tracking, and schema flexibility, while Unity Catalog centralizes access control across workspaces. 

AI/BI Genie powered by generative AI 

Of course, the topic of AI was also present on stage and one of the latest releases is AI/BI Genie.  enables users to interact with their data through natural language. It makes use of the Unity Catalog to answer business questions in a ChatGPT-like conversation. This means of course that the quality of your conversation is based on the quality of your catalog. The tool also learns from the users’ feedback to improve the conversations. 

 

Industry Applications of AI and Data Engineering 

Vehicle Data with CARIAD 

CARIAD demonstrated Databricks’ application in managing IoT data for connected vehicles. By monitoring sensor data in real-time, they enhance safety and aim to reduce vehicle production costs through advanced data engineering techniques. CARIAD’s presentation underscored Databricks’ capability to support IoT applications where data scalability is key. 

Frankfurt Airport’s AI-Powered Operations 

Frankfurt Airport shared their experience with AI-driven process automation for airport operations. Using image recognition and automation, they’ve streamlined labor-intensive tasks, such as baggage handling, reducing the need for manual supervision. This is one example of how AI can be applied to operational tasks traditionally managed by human oversight. 

Finance AI at BASF 

BASF highlighted their use of Databricks to support finance and controlling tasks. Their AI assistant is designed to answer common queries in finance, such as booking codes and financial analysis, with data-driven support. The session provided a look into how AI tools are being integrated into finance departments to support data engineers and finance teams alike. 

 

Bildschirmfoto-2024-11-18-um-13.23.23-798x600

 

Conclusion:

From the hands-on training to the main presentations, the event showcased Databricks’ developments alongside the broader trends in data management and AI applications. The event attracted a lot of visitors, and one had to be quick for some of the sessions to get a good spot. Still, we had a good time at the event and got some new insights into the Databricks cosmos.

In case you also need help to organize sorting out your data and your data architecture, we are happy to support you on this with our service teams from Data Engineering to Data Science. 

 

Contact us