Job Details

Home

Big Data Engineer || Remote || 10+ years at Remote, Remote, USA

Greetings from ICONIC Infosys Inc !!

ICONIC Infosys Inc
is an IT Development & IT Staffing firm with more than a
decade of experience in providing IT Staffing Solutions & Services. Our
expertise is in sourcing and deploying highly skilled IT Specialists into
mainstream and niche technologies to meet clients Temporary, Permanent &
SOW project needs.

Job Title/Role:

Big Data engineer

Location
Remote

Duration : Long Term

Mandatory skills : Apache Spark/ Hive/ Kafka/ Amazon Glue/ Google Dataflow/ Talend MDM/ Hadoop/ Presto/ Strong experience with MySQL, PostgreSQL, MongoDB, Cassandra.

Job
Description:

Were seeking a highly skilled Data Engineer, Big
Data Engineer to build scalable data pipelines, develop ML models, and
integrate big data systems. You'll work with structured, semi-structured, and
unstructured data, focusing on optimizing data systems, building ETL pipelines,
and deploying AI models in cloud environments.

Key Responsibilities:

Data Ingestion: Build scalable ETL pipelines using
Apache Spark, Talend, AWS Glue, Google Dataflow, Apache NiFi. Ingest data from
APIs, file systems, and databases.

Data TransformationValidation: Use Pandas, Apache
Beam, and Dask for data cleaning, transformation, and validation. Automate data
quality checks with Pytest, Unittest.

Big Data Systems: Process large datasets with
Hadoop, Kafka, Apache Flink, Apache Hive. Stream real-time data using Kafka,
Google Cloud PubSub.

Task Queues: Manage asynchronous processing with
Celery, RQ, RabbitMQ, or Kafka. Implement retry mechanisms and track task
status.

Scalability: Optimize for performance with
distributed processing (Spark, Flink), parallelization (joblib), and data
partitioning.

CloudStorage: Work with AWS, Azure, GCP,
Databricks. Store and manage data with S3, BigQuery, Redshift, Synapse
Analytics, and HDFS.

Required Skills:

ETL Data Processing: Expertise in Apache Spark,
AWS Glue, Google Dataflow, Talend.

Big Data Tools: Proficient with Hadoop, Kafka,
Apache Flink, Hive, Presto.

Databases: Strong experience with MySQL,
PostgreSQL, MongoDB, Cassandra.

Machine Learning: Hands-on with TensorFlow,
PyTorch, Scikit-learn, XGBoost.

Cloud Platforms: Experience with AWS, Azure, GCP,
Databricks.

Task Management: Familiar with Celery, RQ,
RabbitMQ, Kafka.

Version Control: Git for source code management.

Desirable Skills:

Real-time Data Processing: Experience with Apache
Pulsar, Google Cloud PubSub.

Data Warehousing: Familiarity with Redshift,
BigQuery, Synapse Analytics.

Scalability Optimization: Knowledge of load
balancing (NGINX, HAProxy) and parallel processing.

Data Governance: Use of MLflow, DVC, or other
tools for model and data versioning.

Tools Technologies:

ETL: Apache Spark, Talend, AWS Glue, Google
Dataflow.

Big Data: Hadoop, Kafka, Apache Flink, Presto.

Databases: MySQL, PostgreSQL, MongoDB, Cassandra.

Cloud: AWS, GCP, Azure, Databricks.

Storage: S3, BigQuery, Redshift, Synapse
Analytics, HDFS.

Version Control: Git.

--

Thanks & Regards,
Preethi
Email : [email protected]

Iconic Infosys

--

To post to this group, send email to [email protected].

Keywords: artificial intelligence machine learning sthree information technology
Big Data Engineer || Remote || 10+ years
[email protected]

[email protected]
View All

11:11 PM 27-Jan-25

To remove this job post send "job_kill 2116403" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.

Your reply to [email protected] -

To

Subject
Message -

preethi@iconicinfosys.com wrote:
Greetings from ICONIC Infosys Inc !!

ICONIC Infosys Inc
  is an IT Development & IT Staffing firm with more than a
decade of experience in providing IT Staffing Solutions & Services. Our
expertise is in sourcing and deploying highly skilled IT Specialists into
mainstream and niche technologies to meet clients Temporary, Permanent &
SOW project needs.

Job Title/Role:

Big Data engineer

Location  
 Remote

Duration : Long Term

Mandatory skills : Apache Spark/ Hive/ Kafka/ Amazon Glue/ Google Dataflow/ Talend MDM/ Hadoop/ Presto/ Strong experience with MySQL, PostgreSQL, MongoDB, Cassandra.

Job
Description:

Were seeking a highly skilled Data Engineer, Big
Data Engineer to build scalable data pipelines, develop ML models, and
integrate big data systems. You'll work with structured, semi-structured, and
unstructured data, focusing on optimizing data systems, building ETL pipelines,
and deploying AI models in cloud environments.

Key Responsibilities:

Data Ingestion: Build scalable ETL pipelines using
Apache Spark, Talend, AWS Glue, Google Dataflow, Apache NiFi. Ingest data from
APIs, file systems, and databases.

Data TransformationValidation: Use Pandas, Apache
Beam, and Dask for data cleaning, transformation, and validation. Automate data
quality checks with Pytest, Unittest.

Big Data Systems: Process large datasets with
Hadoop, Kafka, Apache Flink, Apache Hive. Stream real-time data using Kafka,
Google Cloud PubSub.

Task Queues: Manage asynchronous processing with
Celery, RQ, RabbitMQ, or Kafka. Implement retry mechanisms and track task
status.

Scalability: Optimize for performance with
distributed processing (Spark, Flink), parallelization (joblib), and data
partitioning.

CloudStorage: Work with AWS, Azure, GCP,
Databricks. Store and manage data with S3, BigQuery, Redshift, Synapse
Analytics, and HDFS.

Required Skills:

ETL Data Processing: Expertise in Apache Spark,
AWS Glue, Google Dataflow, Talend.

Big Data Tools: Proficient with Hadoop, Kafka,
Apache Flink, Hive, Presto.

Databases: Strong experience with MySQL,
PostgreSQL, MongoDB, Cassandra.

Machine Learning: Hands-on with TensorFlow,
PyTorch, Scikit-learn, XGBoost.

Cloud Platforms: Experience with AWS, Azure, GCP,
Databricks.

Task Management: Familiar with Celery, RQ,
RabbitMQ, Kafka.

Version Control: Git for source code management.

Desirable Skills:

Real-time Data Processing: Experience with Apache
Pulsar, Google Cloud PubSub.

Data Warehousing: Familiarity with Redshift,
BigQuery, Synapse Analytics.

Scalability Optimization: Knowledge of load
balancing (NGINX, HAProxy) and parallel processing.

Data Governance: Use of MLflow, DVC, or other
tools for model and data versioning.

Tools Technologies:

ETL: Apache Spark, Talend, AWS Glue, Google
Dataflow.

Big Data: Hadoop, Kafka, Apache Flink, Presto.

Databases: MySQL, PostgreSQL, MongoDB, Cassandra.

Cloud: AWS, GCP, Azure, Databricks.

Storage: S3, BigQuery, Redshift, Synapse
Analytics, HDFS.

Version Control: Git.

Thanks & Regards,
Preethi
Email : preethi@iconicinfosys.com

Iconic Infosys

To post to this group, send email to naren-requriments@googlegroups.com.

Keywords: artificial intelligence machine learning sthree information technology 
Big Data Engineer || Remote || 10+ years
preethi@iconicinfosys.com

Your email id:

Captcha Image:

Captcha Code:

Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]

Time Taken: 0

Location: ,