Home

Harsh Jain - Data Engineer
[email protected]
Location: Austin, Texas, USA
Relocation: Yes Onsite Anywhere in US
Visa: H1b
HARSH JAIN
United States | +1 (737) 304 0779 |

PROFESSIONAL SUMMARY
Experienced Data Engineer with 8 years of experience in developing and optimizing data pipelines, ETL processes, and data models on AWS and Azure platforms. Proven ability to design scalable, secure, and efficient data integration solutions across industries such as pharmaceuticals, finance, and retail.
Expertise in Azure (Data Factory, Synapse Analytics, Logic Apps) and AWS (Glue, Lambda, S3) for cloud-based data solutions, ensuring seamless data flow and real-time analytics.
Extensive experience in Snowflake, including tools like Snow pipe and DBT, enabling efficient data ingestion and transformation for large datasets.
Skilled in Python scripting and SQL, including writing complex queries and optimizing data workflows, reducing processing times by up to 40%.
Strong experience in T-SQL for writing complex queries, stored procedures, and optimizing performance for relational databases in Azure SQL, Redshift, and Snowflake.
Advanced knowledge of big data technologies (Spark, Hadoop) and DevOps practices (Azure DevOps, AWS Code Pipeline, Git) support scalable and automated CI/CD pipelines.
Strong focus on data security and compliance, implementing robust measures such as column-level encryption and role-based access controls in line with GDPR and CCPA regulations.
Exceptional collaboration and communication skills, consistently translating complex technical solutions into actionable insights for business stakeholders.

WORK EXPERIENCE
Data Engineer
NYSE New York, NY Jun 2023 Present
Collaborated with cross-functional teams on a data migration project to transition NYSE s legacy on-premises data infrastructure to a modern cloud-based architecture on Azure and Snowflake, ensuring minimal disruption to critical financial operations.
Designed and developed scalable, real-time data pipelines using Azure Data Factory and Azure Event Hubs to migrate and process large volumes of financial data, reducing data latency by 30%.
Migrated and re-engineered SSIS packages within Azure Data Factory s SSIS integration runtime, facilitating seamless migration of on-premises ETL processes to the cloud.
Implemented advanced data processing frameworks using Apache Spark and Hadoop to handle batch and real-time ingestion, transformation, and storage of vast financial datasets. This improved data processing efficiency by 40%.
Utilized Spark and SQL to clean, transform, and aggregate data during migration, enhancing data quality by 25% and reducing processing time by 35%.
Optimized and modernized data warehousing workflows by designing and implementing Snowflake data warehouses, reducing query execution time by 40% and providing efficient data retrieval for analytical and operational needs.
Automated routine data workflows and integrated new cloud-based pipelines using Azure Logic Apps, reducing manual intervention by 50% and ensuring faster time-to-insight.
Developed interactive Tableau dashboards to monitor migration progress and post-migration performance, empowering stakeholders with real-time insights into the health of the new cloud infrastructure.
Implemented robust security measures, including data encryption, role-based access controls, and data masking, ensuring secure handling of sensitive financial information and compliance with GDPR and CCPA regulations during and after migration.
Applied Agile methodologies to plan and execute migration phases iteratively, improving project delivery speed by 20% and enabling rapid adaptation to evolving stakeholder requirements.
Conducted in-depth code reviews, performance tuning, and system optimizations throughout the migration, resulting in a 25% reduction in system downtime and ensuring high availability of critical data systems.
Azure Data Engineer
Hilmar Cheese Company Dallas, TX Nov 2020 May 2023
Designed and implemented scalable data storage solutions using Azure Data Lake and Blob Storage, managing large volumes of production and supply chain data.
Developed and maintained robust ETL pipelines integrating data from multiple sources, leveraging Snowflake for real-time ingestion and transformation, achieving a 99.5% data accuracy rate.
Executed complex SQL queries and created stored procedures to support the analysis of production metrics, inventory levels, and sales data, enhancing data-driven decision-making.
Identified and resolved bottlenecks in data processing workflows, improving data processing speeds by 20% and optimizing system performance.
Implemented data governance policies and procedures, ensuring data quality and achieving a 95% success rate in internal audits.
Developed real-time data solutions using Apache Kafka and Azure Stream Analytics, enabling immediate insights into production and supply chain operations.
Created and maintained Power BI dashboards featuring key performance indicators (KPIs) for production and sales teams, contributing to a 15% increase in operational efficiency.
Worked closely with data scientists and business analysts to design and implement data solutions supporting predictive analytics, such as inventory forecasting and demand planning.
Automated routine data processing tasks using Python and Azure Data Factory, reducing manual efforts by 30% and minimizing the risk of human error.
Utilized GitLab for version control and CI/CD pipelines, streamlining deployment across development, testing, and production environments and ensuring smooth operations.
Collaborated with IT and production teams to integrate new data sources, ensuring seamless data flow and timely access to business-critical information.
Data Engineer
Pfizer Boston, MA Mar 2019 Oct 2020
Designed and implemented data integration solutions to aggregate clinical trial, patient, and supply chain data from diverse sources, supporting business intelligence and research initiatives.
Collaborated with business stakeholders and research teams to define data requirements, developing scalable relational and NoSQL data models that improved data storage efficiency by 15%.
Automated complex ETL processes using AWS Glue, processing datasets exceeding millions of records daily and reducing data processing time by 30%, enabling faster analysis of clinical trial outcomes.
Contributed to the migration of legacy on-premises data warehouses to Snowflake, ensuring a seamless transition and optimizing data storage and query performance for analytical workloads.
Leveraged Snowflake s built-in scaling capabilities to manage large volumes of structured and semi-structured data, enhancing query performance and ensuring real-time data availability for critical decision-making.
Conducted performance tuning on AWS RDS, Redshift, and Snowflake, improving SQL query efficiency by 40% and streamlining data workflows across research and operational units.
Implemented robust data security measures using AWS KMS, IAM, and Snowflake s role-based access control, ensuring secure handling of sensitive patient and research data and compliance with HIPAA, GDPR, and CCPA regulations.
Developed interactive dashboards and real-time visualizations in Amazon Quick Sight, enabling researchers and executives to gain actionable insights into clinical and operational data.
Collaborated with data scientists to operationalize machine learning models into production using AWS Lambda and Sage Maker, supporting real-time analytics for drug development and supply chain optimization.
Actively participated in Agile development sprints, working with cross-functional teams to deliver high-quality, scalable data solutions within project timelines.
Identified and resolved bottlenecks in data pipelines, improving system reliability and reducing downtime, ensuring high availability of critical data for research and operations.
Data Engineer
Carrer Soft Solutions India Jan 2016 Feb 2019
Analyzed customer transactional data to identify patterns and trends, contributing to a 15% increase in cross-selling opportunities.
Collaborated with marketing teams to develop targeted campaigns based on data insights, resulting in improved customer engagement metrics.
Implemented Azure ADF pipelines to orchestrate data movement and executed complex data transformations, ensuring seamless data flow.
Conducted data cleansing and normalization processes using Python scripting and SQL, enhancing data accuracy by 20% and improving the reliability of analytical reports.
Leveraged Azure Blob Storage, ADLS Gen 2, and Delta Lake for storing and managing various types of data, optimizing storage solutions for performance and scalability.
Managed project timelines and delivery commitments, effectively communicating with management to ensure successful project completion.
Developed predictive models for customer churn analysis using Python, achieving a significant reduction in customer attrition rates and saving millions in potential revenue loss.
Utilized SQL and T-SQL queries for data extraction, transformation, and loading (ETL) from relational databases, improving query performance and reducing report generation time.
Designed and maintained Tableau dashboards for executive reporting, leading to enhanced data visualization capabilities and more informed decision-making.
Employed advanced SQL concepts such as joins, subqueries, indexing, and stored procedures to optimize database performance and ensure efficient data retrieval.
Worked closely with business teams and technical analysts to understand business requirements and translate them into technical solutions.
Participated in cross-functional meetings to gather requirements and prioritize project deliverables, resulting in improved project turnaround time.
Provided data-driven insights and recommendations to senior management, contributing to increased adoption of data-driven decision-making practices across the organization.
TECHNICAL SKILLS
Programming: Python, NumPy, Pandas, Spark, Scikit-Learn, SQL (Joins, Stored procedures, Views, Scalar/Window Functions, CTEs), R, Java, PowerShell, Bash scripting, NoSQL databases
Data warehousing concepts, ETL processes, DBT, Snowflake, data pipeline development, cloud-based solutions on Azure platform, Data Modelling
Cloud/Big Data: Azure Data Factory, Logic Apps, Functions, Azure Databricks, Azure Synapse Analytics, API Gateway, Azure Blob Storage, Azure SQL Database, Hadoop, AWS (EC2, S3, Glue, IAM, Kinesis, Sage Maker, Lambda, RedShift, RDS, Athena)
Power BI, Tableau, Custom visualization design, dashboard development, storytelling through data
Statistical analysis, data preprocessing, exploratory data analysis, data cleansing/transformation
Machine learning algorithms (regression, classification, clustering), predictive modeling, NLP, deep learning techniques
Big Data, Azure DevOps, CI/CD pipelines, Apache Spark, Airflow, Kafka, Hadoop, Git, GitHub, JIRA, Docker, Kubernetes

EDUCATION
Bachelor of Engineering in Computer Science Chennai, India
Anna University, CGPA: 3.8

ACHEIVEMENT & CERTIFICATIONS
Microsoft Azure Fundamentals (AZ-900), Microsoft Azure Administrator (AZ-104), Microsoft Azure Data Fundamentals (DP-900), Tableau Desktop Certified Professional
Keywords: continuous integration continuous deployment business intelligence sthree rlang information technology Arizona Massachusetts New York Texas

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];5816
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: