top of page


MultiGPU Kubernetes Cluster for Scalable and Cost-Effective Machine Learning with Ray and Kubeflow
Introduction Large Language Models (LLMs) are very much in demand right now, and they need a lot of compute power to train. Llama 1 used...

Sadik Bakiu
Aug 19, 20238 min read


Dockerizing dbt Transformations for Managed Airflow: Docker, dbt, and GCP Cloud Composer
Airflow is one of the most popular pipeline orchestration tools out there. It has been around for more than 8 years, and it is used...

Bujar Bakiu
Oct 14, 20225 min read


Distributed Machine Learning Model Training with Spark (PySpark)
GitHub repo: https://github.com/data-max-hq/pyspark-3-ways What is Spark? Apache Spark was designed to function as a simple API for...

Kejdi Tako
Sep 14, 20223 min read


Serving Dog Breed Classification model with Seldon-Core, TensorFlow Serving and Streamlit
GitHub Repo: https://github.com/data-max-hq/dog-breed-classification-ml In a modern Machine Learning workflow, after figuring out the...

Bujar Bakiu
Aug 29, 20226 min read


Deploy Airflow and Metabase in Kubernetes using Infrastructure-as-Code
A step-by-step guide to deploying Airflow and Metabase in GCP with Terraform and Helm providers. With the extensive usage of cloud...

Igli
Aug 24, 20224 min read


A hands-on project with dbt, Streamlit, and PostgreSQL
Data Engineering with dbt and streamlit. How to build a project with dbt, Streamlit and PostgresSQL.

Bujar Bakiu
Jul 12, 20227 min read


Modern Data Team Hats
This blog was written together Martin Rusnak from Rusnak Consulting and Bujar Bakiu. Not that long ago (maybe somewhere this is still the...

Sadik Bakiu
Apr 23, 20223 min read
bottom of page