Job Details

Home

HCL - Big Data Engineer, NJ Remote at Remote, Remote, USA

Auto req ID
|
1516081BR
|
SR Number
|
DBS-/DBS-/2025/2573563
|
Experience
|
11-15 Years
|
Skill (Primary)
|
Data Fabric-Big Data Processing-Apache Spark
|
Job Family
|
Architecture / Design
|
Buy Rate Vendor
|
$60/hr. C2C
|
Other Requirement
|
Position Details
|
SR Number
|
DBS-/DBS-/2025/2573563
|
Job Location/Client Location (with City & State)
|
NJ, USA
|
Remote ok (Yes / No)
|
Y
|
Project Duration
|
6+ Months
|
Project Start date
|
Asap
|
Buy Rate
|
$60/hr
|
Mode (TP/FTE)
|
TP
|
No of openings/positions
|
1
|
Job Title/Role
|
Big Data Engineer
|
Mandatory Skills
|
Apache Spark/ Hive/ Kafka/ Amazon Glue/ Google Dataflow/ Talend MDM/ Hadoop/ Presto/ Strong experience with MySQL, PostgreSQL,
MongoDB, Cassandra.
|
Y
|
Job Description
|
Role: Data Engineer- Big Data Engineer

Job Overview:

Were seeking a highly skilled Data Engineer, Big Data Engineer to build scalable data pipelines, develop ML models, and integrate big data systems. You'll work with structured, semi-structured, and unstructured data, focusing on optimizing data systems, building
ETL pipelines, and deploying AI models in cloud environments.

Key Responsibilities:

Data Ingestion: Build scalable ETL pipelines using Apache Spark, Talend, AWS Glue, Google Dataflow, Apache NiFi. Ingest data from APIs, file systems, and databases.

Data Transformation Validation: Use Pandas, Apache Beam, and Dask for data cleaning, transformation, and validation. Automate data quality checks with Pytest, Unittest.

Big Data Systems: Process large datasets with Hadoop, Kafka, Apache Flink, Apache Hive. Stream real-time data using Kafka, Google Cloud PubSub.

Task Queues: Manage asynchronous processing with Celery, RQ, RabbitMQ, or Kafka. Implement retry mechanisms and track task status.

Scalability: Optimize for performance with distributed processing (Spark, Flink), parallelization (joblib), and data partitioning.

CloudStorage: Work with AWS, Azure, GCP, Databricks. Store and manage data with S3, BigQuery, Redshift, Synapse Analytics, and HDFS.

Required Skills:

ETL Data Processing: Expertise in Apache Spark, AWS Glue, Google Dataflow, Talend.

Big Data Tools: Proficient with Hadoop, Kafka, Apache Flink, Hive, Presto.

Databases: Strong experience with MySQL, PostgreSQL, MongoDB, Cassandra.

Machine Learning: Hands-on with TensorFlow, PyTorch, Scikit-learn, XGBoost.

Cloud Platforms: Experience with AWS, Azure, GCP, Databricks.

Task Management: Familiar with Celery, RQ, RabbitMQ, Kafka.

Version Control: Git for source code management.

Skills:

Real-time Data Processing: Experience with Apache Pulsar, Google Cloud PubSub.

Data Warehousing: Familiarity with Redshift, BigQuery, Synapse Analytics.

Scalability Optimization: Knowledge of load balancing (NGINX, HAProxy) and parallel processing.

Data Governance: Use of MLflow, DVC, or other tools for model and data versioning.

Tools Technologies:

ETL: Apache Spark, Talend, AWS Glue, Google Dataflow.

Big Data: Hadoop, Kafka, Apache Flink, Presto.

Databases: MySQL, PostgreSQL, MongoDB, Cassandra.

Cloud: AWS, GCP, Azure, Databricks.

Storage: S3, BigQuery, Redshift, Synapse Analytics, HDFS.

Version Control: Git.
|
Thanks and Regards,

Ankush Verma | Lead Recruiter

Office: 732 485 0000 - 9086

Direct: 209-260-5752

Email:
ankush@
cygnuspro.com

Cygnus Professional Inc.

https://www.linkedin.com/in/ankush-verma-7a1818b2/

Keywords: artificial intelligence machine learning sthree Idaho New Jersey
HCL - Big Data Engineer, NJ Remote
[email protected]

[email protected]
View All

01:18 AM 28-Jan-25

To remove this job post send "job_kill 2117274" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.

Your reply to [email protected] -

To

Subject
Message -

Job Overview:

Were seeking a highly skilled Data Engineer, Big Data Engineer to build scalable data pipelines, develop ML models, and integrate big data systems. You'll work with structured, semi-structured, and unstructured data, focusing on optimizing data systems, building
 ETL pipelines, and deploying AI models in cloud environments.

Key Responsibilities:

Data Ingestion: Build scalable ETL pipelines using Apache Spark, Talend, AWS Glue, Google Dataflow, Apache NiFi. Ingest data from APIs, file systems, and databases.

Data Transformation Validation: Use Pandas, Apache Beam, and Dask for data cleaning, transformation, and validation. Automate data quality checks with Pytest, Unittest.

Big Data Systems: Process large datasets with Hadoop, Kafka, Apache Flink, Apache Hive. Stream real-time data using Kafka, Google Cloud PubSub.

Task Queues: Manage asynchronous processing with Celery, RQ, RabbitMQ, or Kafka. Implement retry mechanisms and track task status.

Scalability: Optimize for performance with distributed processing (Spark, Flink), parallelization (joblib), and data partitioning.

CloudStorage: Work with AWS, Azure, GCP, Databricks. Store and manage data with S3, BigQuery, Redshift, Synapse Analytics, and HDFS.

Required Skills:

ETL Data Processing: Expertise in Apache Spark, AWS Glue, Google Dataflow, Talend.

Big Data Tools: Proficient with Hadoop, Kafka, Apache Flink, Hive, Presto.

Databases: Strong experience with MySQL, PostgreSQL, MongoDB, Cassandra.

Machine Learning: Hands-on with TensorFlow, PyTorch, Scikit-learn, XGBoost.

Cloud Platforms: Experience with AWS, Azure, GCP, Databricks.

Task Management: Familiar with Celery, RQ, RabbitMQ, Kafka.

Version Control: Git for source code management.

Skills:

Real-time Data Processing: Experience with Apache Pulsar, Google Cloud PubSub.

Data Warehousing: Familiarity with Redshift, BigQuery, Synapse Analytics.

Scalability Optimization: Knowledge of load balancing (NGINX, HAProxy) and parallel processing.

Data Governance: Use of MLflow, DVC, or other tools for model and data versioning.

Tools Technologies:

ETL: Apache Spark, Talend, AWS Glue, Google Dataflow.

Big Data: Hadoop, Kafka, Apache Flink, Presto.

Databases: MySQL, PostgreSQL, MongoDB, Cassandra.

Cloud: AWS, GCP, Azure, Databricks.

Storage: S3, BigQuery, Redshift, Synapse Analytics, HDFS.

Version Control: Git.
 | 
Thanks and Regards,

Ankush Verma | Lead Recruiter

Office: 732 485 0000 - 9086

Direct: 209-260-5752

Email:  
ankush@
cygnuspro.com

Cygnus Professional Inc.

https://www.linkedin.com/in/ankush-verma-7a1818b2/

Keywords: artificial intelligence machine learning sthree Idaho New Jersey 
HCL - Big Data Engineer, NJ Remote
ankush@cygnuspro.com

Your email id:

Captcha Image:

Captcha Code:

Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]

Time Taken: 0

Location: ,