Akram - Python Data Engineer |
[email protected] |
Location: Addison, Illinois, USA |
Relocation: Any |
Visa: H1b |
AKRAM
4693344999 | | LinkedIn ________________________________________ PROFESSIONAL SUMMARY Experienced Data Engineer with 7+ years of in-depth knowledge in the design, development, testing, administration, maintenance, and domain expertise in Big Data and Data Warehousing. Specializing in building data engineering applications and implementing data warehousing solutions on Microsoft Azure Cloud. Actively involved in data lake projects using Databricks Spark and implementing business applications with Microsoft Azure Cloud, Apache Hadoop, Apache Spark, HDFS, MapReduce, Hive, Spark SQL, Python, Oracle, PL/SQL, MS SQL Server, Git, and Unix Shell Scripting. Planning, designing, and developing applications on Microsoft Azure Cloud. Developed prototypes using Azure Cloud services such as Azure Data Factory (ADF), Logic Apps, Virtual Machines, Key Vaults, Azure SQL DB, Kubernetes services, and Azure Data Lake Storage (ADLS) to validate proposed solutions and seek feedback from stakeholders. Designing and developing Azure Data Factory (ADF) pipelines to extract data from relational sources like Teradata, Oracle, SQL Server, Sybase, Postgres, DB2, and non-relational sources such as flat files. Working on Azure Data Lake (ADLS) process implementation by developing Databricks Spark notebooks and integrating ETL data pipelines in Azure Data Factory. Built processes to implement Data Marts in SQL-MI on top of Azure Data Lake to support Power BI and Tableau applications. Proficient in Azure Cosmos DB, Azure Functions, Azure DevOps, and Blob Storage. Involved in data migration from MS SQL Server, Oracle, MySQL, and PostgreSQL to Teradata, Hadoop, and Cloud Lakes. Developed external tables in Azure Synapse Analytics (SQL Data Warehouse) for data visualization and reporting purposes. Created stored procedures to implement complex business transformations in Azure Synapse Analytics. Extensive experience in building Hadoop solutions with strong knowledge in Hive, MapReduce, Spark, and Sqoop. Expertise in writing large/complex queries using SQL and hands-on experience with query tools like SQL Developer and Teradata SQL Assistant. Good knowledge of relational and dimensional data modeling (Star Schema, Snowflake Schema). ________________________________________ TECHNICAL SKILLS Big Data & Cloud Platforms: Databricks Spark, Azure Data Factory, Azure Synapse Analytics, Azure Data Lake Storage, Azure Logic Apps, Azure Key Vault, Kubernetes. Programming & Scripting: Python, SQL, JAVA, Unix Shell Scripting. Data Engineering Tools: Hadoop (Hive, MapReduce, Sqoop). Data Modeling: Relational and Dimensional (Star and Snowflake schemas). Version Control & CI/CD: Git, Azure DevOps. Databases: SQL Server, Oracle, PostgreSQL, DB2. Certifications: Azure Data Engineer Associate ________________________________________ EDUCATION MS in Computer Science, NIU May 2021 Bachelor of Technology in Computer Science and Engineering, JNTU June 2017 ________________________________________ EXPERIENCE Greenbrier, Oregon Sr. Data Engineer Nov 2021 - Present Data Management Solutions: Designed and implemented robust data engineering solutions for the top U.S. railroads, integrating and managing diverse data from over 90 sources. Strategic Transformation: Spearheaded a transformative data management initiative for the railcar industry, projected to save billions by optimizing operations and enhancing decision-making capabilities. End-to-End Pipelines: Delivered scalable, high-performance ETL/ELT pipelines leveraging Databricks, Azure Blob Storage, Data Lake, and Azure DevOps, ensuring seamless data ingestion, transformation, and delivery. Data Lineage Framework: Developed a comprehensive data lineage framework across 10+ transformation phases, enabling complete traceability and lifecycle transparency. Automated Data Validation: Implemented asynchronous validation workflows for multi-format datasets, integrated with lineage tracking for accurate and efficient issue resolution. Stakeholder Collaboration: Acted as the primary liaison with SMEs and business stakeholders, delivering bi-weekly progress updates and ensuring alignment with business goals. Leadership: Led a team of four developers, driving onboarding, knowledge transfer, and adherence to best practices in data engineering and governance. Optimization & Cost Savings: Redefined workflows to eliminate bottlenecks, achieving a 30% reduction in costs and significantly improving processing efficiency. Observability Enhancements: Introduced asynchronous logging with Elastic Logger, boosting system diagnostics and monitoring capabilities. KPI Dashboards: Built dynamic dashboards in Databricks to monitor project KPIs, pipeline performance, and system health, enabling proactive issue resolution. Successfully processed over 20M daily records with complex transformations in under 5 minutes. Designed a scalable architecture for asynchronous data validation across multi-phased pipelines. Achieved a 30% cost reduction by identifying redundant resource utilization. Enabled onboarding of new clients by demonstrating MVP results to business stakeholders, earning accolades. Environment: Azure Databricks, Azure Data Factory, Azure Repos, Databricks Notebooks, Azure Data Lake Storage, Azure Blob Storage, Azure DevOps, Python, SQL, PySpark, Elastic Logger Verizon, Irving, TX Azure Data Engineer Dec 2019 Nov 2021 Designed and developed scalable pipelines in Azure Data Factory to integrate data from relational (Teradata, SQL Server) and non-relational sources (Flat files, SharePoint). Engineered ETL workflows using Databricks Notebooks and BigQuery, optimizing business logic implementation and performance across environments. Automated data validation processes and built robust frameworks to streamline modifications to existing code. Designed interfaces with Azure Databricks for seamless data ingestion and export from Azure Data Lake Storage. Developed stored procedures for complex transformations in BigQuery, ensuring high-performance processing. Utilized Azure DevOps for end-to-end CI/CD pipelines, leveraging Git and automation scripts to manage deployments efficiently. Engaged with stakeholders to translate business requirements into technical solutions, presenting progress through Agile ceremonies and sprint reviews. Conducted unit, system, and integration testing to ensure pipeline reliability and data accuracy. Supported cross-functional teams by reviewing code, ensuring adherence to best practices, and enabling process improvements. Environment: ADF, Azure SQL DB, Azure Synapse analytics, Azure Logic Apps, Azure Databricks, ADLS, ADO, Azure BLOB Storage, Hive, Oracle Goldengate (OGG), Informatica Power Center 10.x, Teradata, Oracle, SQL Server S&P Global, India Big Data/Hadoop Developer Feb 2018 Mar 2019 Engineered MapReduce programs and optimized Hive queries to process and transform large datasets for business analytics. Designed and deployed scalable ETL solutions using Informatica, integrating data from diverse sources, including Oracle and flat files, into data warehouses. Created and optimized Hive tables with dynamic partitioning and bucketing, improving data accessibility and query performance. Developed workflows with Oozie to automate data ingestion and processing in HDFS using Pig for transformation and pre-aggregation tasks. Designed integrated dashboards to interact with HBase data using the Thrift API, supporting CRUD operations. Migrated HiveQL queries to SparkSQL for improved performance and scalability, leveraging Spark for real-time analysis of structured and semi-structured data. Troubleshot and optimized cluster performance, resolving errors in Shell, Hive, and MapReduce jobs. Environment: Hadoop, MapReduce, HDFS, Hive, Pig, Oozie, HBase, Linux, Spark, SparkSQL, Oracle, Scala. Vertilink Technologies, India Java/J2EE Developer Jan 2017 Feb 2018 Designed and developed multi-tiered web applications using Struts, EJB, and JDBC, implementing MVC architecture for enhanced maintainability. Created reusable components with Java Persistence API (JPA) and optimized database interactions through stored procedures and PL/SQL scripts in Oracle. Led UI development efforts, ensuring responsive designs with JSP, HTML, and CSS, and implemented client-side validations using JavaScript. Automated data updates and maintenance with UNIX scripts, ensuring database integrity and operational consistency. Participated in UAT testing and production support, resolving issues efficiently to meet deployment deadlines. Environment: Java, Struts, Oracle, EJB, JPA, JDBC, PL/SQL, Shell Scripting, Apache Tomcat, HTML, CSS. Keywords: continuous integration continuous deployment user interface business intelligence database microsoft procedural language Michigan Texas |