Elvijs Pārpucis-Senior Data Engineer
Check rate
Experience
Senior Data Engineer
SolutionLab
- Architected and built scalable ETL/ELT data pipelines using Python, SQL, and Spark (EMR/Databricks) to process large-scale batch and streaming datasets for analytics and reporting systems.
- Designed and optimized cloud data warehouse solutions (Snowflake, Redshift, BigQuery, Azure ADX), including schema design, data modeling (star/snowflake/medallion), and query performance tuning.
- Developed and maintained real-time and batch data ingestion pipelines using Kafka, Kinesis, Azure Data Factory, AWS Glue, and Pub/Sub to support high-volume, low-latency data processing.
- Designed and implemented real-time event streaming pipelines using Apache Flink and Kafka to enable live user behavior tracking and support personalized feature delivery.
- Implemented workflow orchestration and scheduling systems using Apache Airflow, Prefect, AWS Step Functions, and Kubernetes-based jobs for reliable data operations.
- Established enterprise-grade data governance, quality, and observability frameworks using Great Expectations, Soda, Monte Carlo, Prometheus, Grafana, and ELK stack.
- Designed and maintained data models, warehouses, and analytical marts supporting BI dashboards and reporting tools such as Power BI and Tableau.
- Led system optimization, cost tuning, and performance engineering efforts, including query optimization, storage efficiency, and compute cost reduction across distributed cloud systems.
Data Engineer
Deloitte
- Designed and implemented scalable end-to-end data pipelines (ETL/ELT) using Python, SQL, and Apache Spark (EMR/Databricks) to process high-volume transactional and behavioral data.
- Built and optimized AWS-based data architectures (S3, Glue, Lambda, Kinesis, Redshift) enabling reliable ingestion, storage, transformation, and analytics at scale.
- Developed and maintained orchestrated workflows using Apache Airflow and AWS Step Functions, ensuring fault-tolerant, scheduled, and observable data processing pipelines.
- Designed robust data models (star schema, dimensional modeling, analytical data marts) to support BI dashboards, reporting, and self-service analytics across business teams.
- Implemented Infrastructure as Code (Terraform/CloudFormation) and integrated CI/CD pipelines (Jenkins, CircleCI) to automate deployment and improve system reliability.
- Established data quality frameworks, monitoring, logging, and alerting systems, improving data accuracy, pipeline observability, and production stability.
- Partnered with product, engineering, analysts, and ML teams to standardize data access, support feature engineering, and enable machine learning and analytics workloads.
- Performed SQL performance tuning and big data optimization, improving query efficiency and reducing processing time on large-scale datasets across cloud warehouses (Redshift/Snowflake/BigQuery).
Data Scientist
Zabbix
- Led end-to-end development of machine learning and statistical models in Python (pandas, scikit-learn), covering problem framing, feature engineering, training, evaluation, and production deployment.
- Designed and executed structured experiments (A/B testing, hypothesis testing, causal inference methods) to evaluate model effectiveness and optimize data-driven decision-making.
- Performed deep exploratory data analysis (EDA) on large, messy datasets to uncover patterns, detect data quality issues, and identify key predictive drivers.
- Engineered and optimized data pipelines using SQL and big data frameworks (Spark, Hadoop) to support scalable feature generation and high-volume data processing.
- Developed and deployed deep learning models (CNN, RNN, LSTM) using TensorFlow, Keras, and PyTorch for complex predictive and pattern recognition tasks.
- Developed and maintained messaging-based data ingestion pipelines using RabbitMQ, enabling asynchronous, scalable, and reliable data movement across distributed systems.
- Collaborated with cross-functional teams (product, engineering, marketing, stakeholders) to integrate ML solutions into production systems and deliver measurable business impact.
Data Analyst
SIDC Group Ltd
- Leveraged SQL, Python, Excel, and Power Query to analyze large-scale datasets covering SKU performance, cost structures, service levels, and reporting KPIs across multiple markets.
- Designed and built ETL data pipelines using Python and Apache Spark to enable reliable processing of large-scale structured enterprise datasets for reporting, analytics, and business intelligence.
- Designed and maintained data validation frameworks for supplier invoices, ensuring financial accuracy, operational efficiency, and reduced discrepancies.
- Managed and optimized relational and distributed data platforms, including PostgreSQL and Hive, to support high-volume reporting and large-scale historical data processing.
Industry Experience
See where this freelancer has spent most of their professional time.
Experienced in Information Technology and Professional Services.
Business Area Experience
See which departments and functions this freelancer has contributed to most.
Experienced in Business Intelligence and Information Technology.
Summary
Senior Data Engineer & Data Scientist with 10+ years of experience designing and building large-scale distributed data systems, real-time streaming architectures, and cloud-native AI/ML data platforms across fintech, healthcare, SaaS, and e-commerce domains. Strong expertise in Python, SQL, Apache Spark, Flink, Kafka, Airflow, Databricks, Snowflake, TensorFlow, PyTorch, AWS, Azure, and GCP, with deep experience in ETL/ELT pipeline development supporting both batch and streaming workloads. Skilled in modern data lake and analytics ecosystems including Trino, ClickHouse, HDFS, and S3-compatible storage, focused on performance, scalability, and reliability. Experienced in containerized and cloud-native infrastructure using Kubernetes, Docker, CI/CD pipelines, and GitOps (Argo CD). Proven ability to deliver end-to-end solutions across data engineering, machine learning systems, MLOps workflows, and real-time analytics that drive business growth, operational efficiency, and data-driven decision-making.
Skills
- Programming Languages: Python (Pandas, Numpy, Matplotlib), Sql, Java, Bash, Shell Scripting, Yaml
- Data Engineering & Pipelines: Etl/Elt Design, Batch & Real-Time Streaming, Data Modeling, Data Warehousing, Lakehouse Architectures, Data Integration, Data Transformation
- Big Data & Distributed Systems: Apache Spark, Flink, Kafka, Rabbitmq, Hadoop (Hive, Hdfs), Trino, Clickhouse, Databricks, Airflow, Dbt, Prefect
- Analytics Tools: Tableau, Power Bi, Looker, Google Data Studio
- Cloud Platforms & Data Services: Aws (S3, Glue, Lambda, Redshift, Athena), Azure (Data Factory, Synapse), Gcp (Bigquery, Pub/Sub, Cloud Storage)
- Databases & Storage: Postgresql, Mysql, Cassandra, Redis, Snowflake, Data Lakes (S3, Hdfs, Ceph)
- Devops & Ci/Cd: Docker, Kubernetes, Terraform, Helm, Argo Cd, Jenkins, Infrastructure As Code (Iac), Ci/Cd
- Monitoring & Reliability: Prometheus, Grafana, Elk Stack, Cloudwatch, Logging, Alerting, Performance Tuning
- Ai / Ml Data Engineering & Tools: Feature Engineering, Ml Pipelines, Ai Data Pipelines, Sagemaker, Git, Agile, Sdlc, Jira, Confluence, Ai-Assisted Tools (Cursor, Claude, Github Copilot)
Languages
Education
Riga Technical University
Bachelor’s Degree · Computer Science · Riga, Latvia
Statistics
Experience
Global Experience
Expertise
Qualifications
Profile
Frequently asked questions
Have questions? Find more information here.
Average rates for similar positions
Rates are based on recent contracts and do not include FRATCH margin.
Similar Freelancers
Discover other experts with similar qualifications and experience
Experts recently working on similar projects
Freelancers with hands-on experience in comparable project as a Senior Data Engineer
Nearby freelancers
Professionals working in or nearby Riga, Latvia
