| Tarun - Data Engineer |
| [email protected] |
| Location: Lubbock, Texas, USA |
| Relocation: Yes |
| Visa: H1B |
| Resume file: TarunChitturi_DE_1761142457171.docx Please check the file(s) for viruses. Files are checked manually and then made available for download. |
|
Over 9+ years of hands-on experience designing, developing, and optimizing complex data pipelines to extract actionable business insights.
Proficient in big data technologies including PySpark, Snowflake, GCP, AWS and Databricks ensuring scalable solutions for large datasets. Architected and deployed cloud-based data warehousing solutions, facilitating integration with the CrowdStrike Marketplace and driving $700K in annual revenue growth. Extensive experience with GCP, AWS, Databricks and Snowflake, leveraging these platforms for data processing, storage, and real-time analytics. Architected a robust data monitoring and alerting system using Kafka, Falcon Logscale, and NoSQL databases, enhancing security teams' ability to detect ePHI-related threats. Implemented high-performance data pipelines for ePHI activity and telemetry log data, reducing processing time by 50%. Developed and optimized an ETL framework using Python, PySpark, and SQL, enabling efficient data migration from multiple sources to Snowflake Improved query performance by 25% by optimizing complex SQL queries during data migration from Oracle & GCP to Snowflake. Developed an automation testing model using Databricks that reduced 80% of manual data validation, speeding up the deployment process by over 30 hours per sprint. Skilled in tools such as Alteryx, Tableau, Power BI, and Falcon Logscale, converting raw data into structured formats for insightful analysis. Maintained and monitored real-time data pipelines using Apache Airflow, reducing troubleshooting time by 60% and ensuring regular data loads in production environments. Demonstrated ability to work closely with cross-functional teams, including security, development, and operations, to deliver data-driven solutions that drive business growth. Developed and optimized high-volume data pipelines using Python Spark and SQL, ingesting telemetry into data lakes and facilitating seamless data joins between diverse cloud sources. Developed and optimized Apache Spark jobs within Databricks to process streaming data and implementing efficient data pipelines for real-time analytics using Apache Airflow. Expertise in building robust ETL pipelines from Oracle Fusion Cloud to Snowflake using PySpark Proficient in data transformation and ETL processes using Python, PySpark Implemented Databricks for big data processing, leveraging Delta Lake for ACID transactions and optimized query performance. Optimized Databricks cluster configurations, fine-tuning resource allocation and autoscaling for performance and cost-efficiency. Utilized Databricks collaborative notebooks for cross-functional projects, facilitating seamless teamwork across data science and engineering teams. ________________________________________ TECHNICAL SKILLS Languages Python, SQL, NoSQL, Unix Shell Scripting, Bash scripting Big Data Technologies PySpark, Snowflake, Google Cloud Platform (GCP), Apache Airflow, AWS, Hadoop Data Engineering ETL pipeline development, data warehousing, real-time data processing, data modeling, and optimization Data Visualization & Modeling Alteryx, Talend, Tableau, Power BI, Falcon Logscale CrowdStrike Cloud Technologies AWS (Glue, Lambda, EC2, Redshift), Google Cloud Platform (BigQuery, Dataflow, Dataproc) Tools & Technologies MySQL, Kafka, Git, Jira, Docker, Kubernetes, Jenkins Database Management Systems PostgreSQL, Oracle, SQL Server ________________________________________ Keywords: business intelligence |