ON DATA ENGINEERING

Financial methods, applications, and modeling for Data Engineers

Photo by on

Being a role meant to support the decision-making process, Data Engineers need to understand certain Financial concepts and know-how to best leverage them in their data models.

Some concepts are particularly important for Data Engineers activities, namely amortization, and allocation. While other concepts of controlling such as , can also be helpful to understand how some of their data consumers might be leveraging the data.

Amortization & Depreciation

Introduction to amortization

What is amortization?

Amortization represents the process of gradually writing off the initial cost of an asset over its useful life — when we distinguish between amortization and depreciation…


ON Data Engineering

Data Engineering application of data classes

Photo by on

Data classes are a relatively new introduction to Python, first released in Python 3.7 which provides an abstraction layer leveraging type annotations to define container objects for data. Compared to a normal Python class, data classes make do of some of the syntactic sugar for instantiation, and there are a number of areas where data class can add value to data engineering.

Understanding Data Classes

Data classes

The data class library introduces a lightweight way to define objects, providing getters and setters for the different fields define within it.

from dataclasses import dataclass
class CustomerDataClass:

As shown above, it relies on a decorator pattern…


ON DATA ENGINEERING

Reflections on one year of using DBT for modeling a data warehouse

Photo by on

is a tool that aims at facilitating the work of analysts and data engineering in transforming within a data warehouse. It provides a command-line as well as a documentation and RPC server.

After more than a year working with DBT, I thought it would be good to reflect on what it offers, what it is currently lacking, and what features might be desirable to have incorporated in the tool.

Jinja capabilities

Jinja is a python templating engine, used in data tools such as Airflow, Superset, or infrastructure as code tools such as Ansible.

DBT leverages Jinja, at the…


Photo by on

Data Modeling seems to have become a lost art amongst data engineers. What was once the primal part of the job of a data engineer seems to have been relegated to a secondary rank.

Shaping the data by developing an understanding of the underlying data and the business process going along with it doesn’t seem nearly as important these days as the ability to move data around.

In a large number of organizations, the role of a data engineer has been transformed from a data shaper to a data mover.

Data Engineering’s role shift

Data Engineering has been a , at the same…


Photo by on

This month was quite active with news and release in the Data space, with two big conferences going on — Amazon’s re:Invent and Neurips, as well as the official release of Airflow 2.0 and an introduction to the principles and architecture of the Data Mesh…

SQL, Databases, and ETL

providing increased support for recursive queries, and an increased query planner.

Amazon to provide a SQL Server/T-SQL compatibility layer for Postgres. Postgres also received .

Cockroach DB explained why they are , as well as


ON DATA ENGINEERING

Photo by on

The shift towards real-time data flow has a major impact on the way applications are designed and on the . Dealing with real-time data flows brings a paradigm shift and an added layer of complexity compared to traditional integration and processing methods (i.e., batch).

There are real benefits to leveraging real-time data, but it requires specialized considerations in setting up the ingestion, processing, storing, and serving of that data. It brings about specific operational needs and a change in the way data engineers work. These should be taken into account when considering embarking on a real-time journey.

Use cases for leveraging Real-time Data


Photo by on

There are different estimates for Salaries available online. Glassdoor provides a general overview of salaries for Data Scientists, StackOverflow provides a , and several recruitment agencies provide salary estimates for different positions.

For instance, Harnham produces an annual salary guide for many data professions across the UK, several European countries, and the US. This is an exercise shared by , and . provides an estimate of tech salaries for fintech companies. While specifically focused on a report of Data Science salaries across Europe. …


Photo by on

With everyone trying to get everything out of the door before the Holiday season, November was a busy month for the data world, Airflow 2.0 moved to beta status, new tooling was released by Google to help with Machine Learning in the space of NLP and managing ML model bias, and Apple released some benchmark of the performance of their new M1 chip for ML workloads.

SQL and ETL

SQL got some attention this month; Google released an instance to the latest version, Postgres 13. Databricks released providing a familiar SQL interface for querying delta lake…


ON DATA ENGINEERING

Photo by on

Postgres as a database is a very versatile database, with a high degree of extensibility. It can be extended through , UDFs, , . There are quite a few features not currently available within the native implementation. Not all extensibility options are supported in PaaS (platform as a service) implementations, AWS for instance, doesn’t support PL/Python as part of AWS Relational Databases (RDS).

Some companies such as Uber have explained why they have been , but for Data Engineers, different functionality for a database used as a data warehouse than one…


ON DATA ENGINEERING

Photo by on

SQL is one of the key tools used by data engineers to model business logic, extract key performance metrics, and create reusable data structures. There are, however, different types of SQL to consider for data engineers: Basic, Advanced Modelling, Efficient, Big Data, and Programmatic. The path to learning SQL involves progressively learning these different types.

Basic SQL

What is “Basic SQL”

Learning “Basic SQL” is all about learning the key operations in SQL to manipulate the data such as aggregations, grain, and joins, for example.

Where to learn it

Basic SQL can be learned from websites such as or looking for a more practical approach to learning from websites…

Julien Kervizic

Living at the interstice of business, data and technology | Head of Data | iptiQ by SwissRe, Facebook and Amazon | linkedin:

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store