srikanth - Data Engineer |
[email protected] |
Location: Remote, Remote, USA |
Relocation: yes |
Visa: h1b |
SUMMARY
VISHWESHWAR REDDY ABBIDI s Texas, USA | +1 (614) 504-3594 | [email protected] Overall, 7 years of experience in Data Engineering, specializing in building and maintaining robust data pipelines and architectures on cloud platforms. Expertise in managing distributed data processing systems using Spark and Hive for real-time data streaming and batch processing. Extensive experience in managing and processing large datasets using Big Data technologies such as HDFS for storing and managing large-scale datasets. Experience in the development of Big Data projects using Hadoop, Spark, Kafka, Hive, HDP, Pig, Flume, Storm, and MapReduce open-source tools. Proficient in building data pipelines and integrating data from multiple sources into data lakes and warehouses using AWS Redshift, EMR, and S3. Expertise in building scalable data pipelines using AWS services such as Redshift, Glue, S3, Lambda, and EC2 for efficient data extraction, transformation, and loading (ETL). Proficient in designing and deploying serverless applications using AWS Lambda and automating workflows with AWS Glue and CloudFormation. Experience in using Shell scripting for batch processing in Big Data environments and scheduling jobs on Apache Airflow and AWS Glue. Proficient in creating scalable data pipelines using Azure Data Factory to automate complex ETL workflows and ensure seamless data integration from multiple sources. Hands-on experience with Azure SQL Database for efficient data storage and retrieval, optimizing query performance, and managing relational data systems. Implemented CI/CD (Continuous Integration/Continuous Deployment) pipelines using Azure DevOps to automate the build, test, and deployment processes for data applications and services. Skilled in designing and deploying serverless applications using Azure Functions to automate backend processes and event-driven workflows, improving operational efficiency. Expertise in Azure Synapse Analytics for end-to-end data management, from ingestion to transformation, enabling large-scale data analysis and insights. Implemented robust data governance and security features using Azure Active Directory (Azure AD) and Role-Based Access Control (RBAC) to ensure secure access to data resources. Strong proficiency in integrating Azure Blob Storage for scalable data storage, ensuring durability, availability, and accessibility for analytics. Leveraged Azure Cosmos DB for NoSQL data management, supporting globally distributed applications and high-throughput workloads. Implemented Azure Databricks to accelerate big data analytics workflows using Apache Spark, enhancing data processing performance and productivity. Proficient in automation and deployment tools such as Jenkins, Ansible, and Bamboo for streamlining continuous integration and delivery processes in AWS, Azure and GCP. Expert in utilizing GCP services like Compute Engine, Cloud Storage, Persistent Disks, Cloud Load Balancing, Cloud SQL, Pub/Sub, VPC, BigQuery, Cloud Deployment Manager, Cloud Monitoring, and Stackdriver Logging to design and manage scalable data solutions. Proficient in automation and deployment tools such as Jenkins, Ansible, and Bamboo for streamlining continuous integration and delivery processes in GCP. Experience in designing and implementing GCP-based database solutions, including Google Cloud SQL, Firestore, and migrating existing on-premise databases to Cloud SQL. Leveraged GCP tools such as Dataproc, BigQuery, Dataflow, and Dataprep for cloud-based data processing and analytics, ensuring seamless data integration and transformation. Implemented and managed microservices architectures using Google Kubernetes Engine (GKE), Cloud Run, and Cloud Functions, improving application scalability and reliability. Hands-on experience with Snowflake utilities, Snow SQL, Snow Pipe, and Big Data model techniques using Python. Expertise in managing both SQL and NoSQL databases, including Amazon Redshift, DynamoDB, MySQL, PostgreSQL, and Cassandra. Expertise in building data pipelines and ETL workflows within Snowflake, integrating with AWS and Azure ecosystems for seamless data movement. Configured Zookeeper for coordination and support across Kafka, Spark, Spark Streaming, and HBase deployments. Utilized Snowflake, BigQuery, SQL Server, Hive, and Teradata for effective data warehousing, data lake design, and complex query handling, supporting extensive data analysis. Strong understanding of Waterfall and Agile - SCRUM methodologies, contributing to efficient project management and software delivery lifecycles. Experience in developing data processing applications using Apache Hadoop and Dataproc for analyzing big data and transforming it as per business requirements. Experience in databases such as SQL Server, Oracle, MySQL, MongoDB and writing Stored Procedures, Functions, Joins and Triggers for different Data Models. Managed project documentation and workflows using Jira and Confluence, ensuring seamless collaboration and efficient project execution in cloud environments EDUCATION Cleveland State University, Cleveland, OH May2023 Master s in Information Science PROFESSIONAL EXPERIENCE Cigna Group, USA Sr. Data Engineer Responsibilities July 2023 - Present Analyzed 750GB+ of healthcare data using SQL, Python, and R to identify trends and insights, improving data-driven decision-making for SaaS healthcare solutions. Developed ETL pipelines with Alteryx, Talend, SQL, and Python, ensuring data integrity and smooth integration into MongoDB and DynamoDB systems. Automated data processing using Python scripts and Kubernetes, saving 100+ hours/month and enhancing CI/CD workflows. Built predictive models using Scikit-learn and TensorFlow to forecast medication demand, reducing shortages by 15% and optimizing inventory. Applied statistical methods (regression, hypothesis testing, A/B testing) to evaluate healthcare programs, leveraging machine learning models for deeper insights. Collaborated with cross-functional teams, improving project success rates by 20% through enhanced data modeling and communication. Implemented real-time monitoring with Apache Kafka, Grafana, and Docker to detect anomalies and improve operational response times. Designed scalable data pipelines on AWS, Snowflake, and Databricks, ensuring reliable data processing and machine learning deployments. Optimized cloud storage and processing with AWS Lambda, S3, and Docker, boosting efficiency by 40% and cutting infrastructure costs by 20%. Utilized clustering and segmentation to identify key customer groups, increasing satisfaction by 25% through tailored services. Enhanced system reliability by incorporating Azure Data Lake and Data Factory, improving system uptime by 30%. Created interactive dashboards using Power BI and Grafana to visualize key metrics, enabling data-driven decision- making across healthcare platforms. Mizuho Bank, Newyork, NY Data Engineer/Analyst Responsibilities Jan 2022 Jun 2023 Implemented Azure data Lake, Azure Data factory and Azure data bricks to move and conform the data from on - premises to cloud as a part of migration to serve the analytical needs of the company. Developed Spark applications using Pyspark and spark SQL for data extraction, transformation, and aggregation from multiple file formats for analyzing and transforming the data uncover insight into the customer usage patterns and even Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark Databricks cluster and Ability to apply the spark Data Frame API to complete Data manipulation within spark session. Worked on Spark Architecture for performance tuning including spark core, spark SQL, Data Frame, Spark streaming, Driver Node, Worker Node, Stages, Executors and Tasks, Deployment modes, the Execution hierarchy, fault tolerance, and collection Created Azure BLOB and worked on Enterprise level Centralized Azure Data Lake storage and loading data into Azure SQL Synapse analytics (DW). Create pipelines in ADF using Linked Services/Datasets/Pipeline/ to extract, transform and load data from various sources like Azure SQL, Blob Storage, Azure SQL Data Warehouse, Write-Back Tool To do. Deploy pipelines to Azure Data Factory (ADF) that process data using SQL activities, create UNIX shell scripts for database connectivity, and develop JSON scripts for executing queries in parallel job executions. Written Kafka producers for streaming real time Json messages to Kafka topics and processed them using spark streaming and performed streaming inserts to Synapse SQL. Proficient in Informatica Power Center, with experience in designing, developing, and deploying ETL processes. Knowledgeable in Neo4j's API and client libraries, enabling the development of custom applications and tools. Involved In ingesting large-scale Data from Teradata tables (Data Warehouse) to Delta Lake tables in Databricks on top of ADLS. Extensively worked on Optimization and Troubleshooting of Spark Applications and reducing Execution times and Debugging Failures and providing production support to various pipelines in Production. Strong experience in query optimization, including query profiling, caching, and materialized views in Snowflake. Proficient in Neo4j database administration, including setup, configuration, and maintenance tasks Designed the logical Data Model using Erwin and transformed Logical model to Physical database using Power Designer. HealthKart, Gurgaon, IN Data Engineer Responsibilities Mar 2020 Jul2021 Created ETL workflows using Apache Spark, Python, and Kubernetes to process over 2TB of healthcare and e- commerce data daily, enhancing data precision and system reliability. Designed and implemented scalable infrastructure on Google Cloud Platform (GCP), leveraging BigQuery, Google Cloud Storage, and Cloud Functions, reducing data processing time by 40%. Managed data warehouses in BigQuery, processing complex analytical queries and improving performance by 50%. Integrated MongoDB and DynamoDB for unstructured data handling. Implemented real-time data ingestion using Apache Flink, Kafka, and Docker, ensuring low-latency data availability for healthcare and transactional data. Optimized MySQL and PostgreSQL databases, achieving a 30% increase in query performance, ensuring fast and accurate data handling for health and e-commerce transactions. Automated data workflows with Python and Kubernetes, reducing manual intervention by 60%, streamlining cloud- based operations. Integrated data from fitness trackers and medical records, improving data accuracy and delivering personalized health recommendations, enhancing user experiences. Developed machine learning models to analyze customer behaviour, resulting in a 15% sales increase through targeted marketing strategies. Built data pipelines on Teradata and Snowflake, optimizing cross-platform integration between GCP, AWS, and Azure, improving data retrieval speeds by 35%. Implemented API authentication and adhered to industry compliance standards, ensuring the security of sensitive health and transaction data across cloud and on-premise environments. Conducted root cause analysis on data discrepancies and applied DevOps best practices, improving system reliability and preventing failures. Sigma InfoSolutions, Gujarat, IN Data Engineer Responsibilities Aug 2017 - Mar 2020 Utilized advanced statistical techniques and Python, with a focus on data modeling, to derive actionable insights from vast financial datasets, enhancing data-driven financial planning. Developed optimized SQL queries, improving data retrieval times by 20% for financial reporting, and building robust ETL pipelines to ensure timely and accurate data integration. Led the implementation of A/B testing frameworks with machine learning models, optimizing user engagement and increasing customer conversion rates by 35% across digital platforms. Reduced financial data discrepancies by 25% through a data quality initiative, leveraging MongoDB and DynamoDB for improved data integrity in financial systems. Leveraged Azure services like Azure Data Lake and Azure Data Factory to implement scalable cloud-based storage and processing solutions, improving accessibility for financial analytics by 20%. Architected and maintained data pipelines with Apache Airflow, Apache Kafka, and Kubernetes, streamlining financial data ingestion and processing workflows for enhanced reliability. Developed and maintained data models and machine learning algorithms, improving data warehouse efficiency and enabling real-time financial analytics. Collaborated on SaaS-based financial tools, integrating APIs for secure financial data handling and ensuring compliance with security standards. Integrated DevOps principles into the data engineering workflow, improving CI/CD processes and ensuring smooth data pipeline operations across AWS, Azure, and on-premise environments. Collaborated with finance managers to create customized Power BI dashboards and reports, reducing the reporting workload by 30% and automating repetitive tasks with Python and Docker. TECHNICAL SKILLS Methodologies: SDLC, Agile/ Scrum, Waterfall Language & Databases: Python, SQL, R, SCALA, MySQL, MS SQL Server, ETL, MongoDB, DynamoDB Python Packages: Pandas, NumPy, Matplotlib, SciPy, Scikit-Learn, SeaBorn, PyTorch, ggplot2, Plotly Data Components: HDFS, Hue, MapReduce, PIG, Hive, HCatalog, HBase, Sqoop, Impala, Zookeeper, Flume, Kafka, Yarn, Cloudera Manager, Kerberos, Pyspark Airflow, Kafka Snowflake Data Analytics Skills: Data Manipulation, Data Cleaning, Data Visualization, Exploratory Data Analysis, Data Analysis Others: AWS, AZURE(Databricks), NLP, A/B Testing, Hypothesis testing, ETL, Hadoop, Spark, Big Query, Apache Airflow, Tools: Tableau, Power BI, Advanced Excel, Visual Studio, GIT, Jupyter Notebook, Docker, Kubernetes, Grafana, Data Lake Version Control: Git, GItHub Operating Systems: Windows, macOS CERTIFICATION AWS Solution Architect Associate Thanks and Regards, Siddharth Bench Sales Recruiter [email protected] D: +1 (469) 598 1611 linkedin.com/in/dimpu-15a478260 Keywords: continuous integration continuous deployment business intelligence sthree database active directory rlang information technology microsoft New York Ohio |