HCL - Big Data Engineer, NJ Remote at Remote, Remote, USA |
Email: [email protected] |
Auto req ID | 1516081BR | SR Number | DBS-/DBS-/2025/2573563 | Experience | 11-15 Years | Skill (Primary) | Data Fabric-Big Data Processing-Apache Spark | Job Family | Architecture / Design | Buy Rate Vendor | $60/hr. C2C | Other Requirement | Position Details | SR Number | DBS-/DBS-/2025/2573563 | Job Location/Client Location (with City & State) | NJ, USA | Remote ok (Yes / No) | Y | Project Duration | 6+ Months | Project Start date | Asap | Buy Rate | $60/hr | Mode (TP/FTE) | TP | No of openings/positions | 1 | Job Title/Role | Big Data Engineer | Mandatory Skills | Apache Spark/ Hive/ Kafka/ Amazon Glue/ Google Dataflow/ Talend MDM/ Hadoop/ Presto/ Strong experience with MySQL, PostgreSQL, MongoDB, Cassandra. | Y | Job Description | Role: Data Engineer- Big Data Engineer Job Overview: Were seeking a highly skilled Data Engineer, Big Data Engineer to build scalable data pipelines, develop ML models, and integrate big data systems. You'll work with structured, semi-structured, and unstructured data, focusing on optimizing data systems, building ETL pipelines, and deploying AI models in cloud environments. Key Responsibilities: Data Ingestion: Build scalable ETL pipelines using Apache Spark, Talend, AWS Glue, Google Dataflow, Apache NiFi. Ingest data from APIs, file systems, and databases. Data Transformation Validation: Use Pandas, Apache Beam, and Dask for data cleaning, transformation, and validation. Automate data quality checks with Pytest, Unittest. Big Data Systems: Process large datasets with Hadoop, Kafka, Apache Flink, Apache Hive. Stream real-time data using Kafka, Google Cloud PubSub. Task Queues: Manage asynchronous processing with Celery, RQ, RabbitMQ, or Kafka. Implement retry mechanisms and track task status. Scalability: Optimize for performance with distributed processing (Spark, Flink), parallelization (joblib), and data partitioning. CloudStorage: Work with AWS, Azure, GCP, Databricks. Store and manage data with S3, BigQuery, Redshift, Synapse Analytics, and HDFS. Required Skills: ETL Data Processing: Expertise in Apache Spark, AWS Glue, Google Dataflow, Talend. Big Data Tools: Proficient with Hadoop, Kafka, Apache Flink, Hive, Presto. Databases: Strong experience with MySQL, PostgreSQL, MongoDB, Cassandra. Machine Learning: Hands-on with TensorFlow, PyTorch, Scikit-learn, XGBoost. Cloud Platforms: Experience with AWS, Azure, GCP, Databricks. Task Management: Familiar with Celery, RQ, RabbitMQ, Kafka. Version Control: Git for source code management. Skills: Real-time Data Processing: Experience with Apache Pulsar, Google Cloud PubSub. Data Warehousing: Familiarity with Redshift, BigQuery, Synapse Analytics. Scalability Optimization: Knowledge of load balancing (NGINX, HAProxy) and parallel processing. Data Governance: Use of MLflow, DVC, or other tools for model and data versioning. Tools Technologies: ETL: Apache Spark, Talend, AWS Glue, Google Dataflow. Big Data: Hadoop, Kafka, Apache Flink, Presto. Databases: MySQL, PostgreSQL, MongoDB, Cassandra. Cloud: AWS, GCP, Azure, Databricks. Storage: S3, BigQuery, Redshift, Synapse Analytics, HDFS. Version Control: Git. | Thanks and Regards, Ankush Verma | Lead Recruiter Office: 732 485 0000 - 9086 Direct: 209-260-5752 Email: ankush@ cygnuspro.com Cygnus Professional Inc. https://www.linkedin.com/in/ankush-verma-7a1818b2/ Keywords: artificial intelligence machine learning sthree Idaho New Jersey HCL - Big Data Engineer, NJ Remote [email protected] |
[email protected] View All |
01:18 AM 28-Jan-25 |