Home

Urgent Requirement::Machine Learning Engineer in CA or TX at Remote, Remote, USA
Email: [email protected]
http://bit.ly/4ey8w48
https://jobs.nvoids.com/job_details.jsp?id=1095385&uid=

From:

Stacy,

Karma Consulting, Inc

[email protected]

Reply to:   [email protected]

URGENT REQUIREMENT

Machine Learning Engineer

Pleasanton, CA or Plano,TX

6-12months

Job Description

Good understanding of Spark Architecture with Databricks, structured streaming. Setting up Microsoft Azure with Databricks, Databricks workspace for business analytics, manage clusters in Databricks, managing the machine learning lifecycle.

Experienced in manipulating existing data columns, merge data, add data tables, connect to in-database data, using bookmarks, tags, and lists to capture aspects of your dynamic analysis and consider the available aggregation options and expression shortcuts.

Used Pandas, Numpy, Seaborn, Scipy, Matplotlib, Sci-kit-learn, in Python for developing various machine learning algorithms.

Responsible for working with various teams on a project to develop analytics-based solution to target customer subscribers specifically.

Performed end to end delivery of pyspark ETL pipelines on Azure databricks to perform transformation of the data

Built and automated data engineering ETL pipeline over Snowflake DB using Apache spark and integrated data from disparate sources with Python APIs, consolidated them in a data mart (star schema) and orchestrated entire pipeline using

Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks

Predicted the likelihood of customer churn based on customer attributes. The models deployed in production environment helped detect churn in advance and aided sales/marketing teams plan for various retention strategies in advance.

Design and implement streaming solutions using Kafka or Azure Stream Analytics.

Experience managing Azure Data Lakes (ADLS) and Data Lake Analytics and an understanding of how to integrate with other Azure Services.

Design and develop ETL integration patterns using Python on Spark and created PySpark frame to bring data from DB2 to Azure ADF.

Process and load marketing data from Google Pub/Subtopic to BigQuery using cloud DataFlow with Python.

Developed new transformation pipelines using Google Cloud Composer, Python, BigQuery.

Used Apache Spark Data frames, Spark-SQL, Spark MLLib extensively and developing and designing POC's using Scala, Spark SQL and MLlib libraries.

Data Integration ingests, transforms and integrates structured data and delivers data to a scalable data warehouse platform using traditional ETL (Extract, Transform, Load) tools and methodologies to collect of data from various sources into a single data warehouse.

Implemented statistical modeling with XGBoost machine learning software package using Python to determine the predicted probabilities of each model.

Developed Customer Churn model using Logistic Regression and Random Forest ensemble methods.

Responsible for monitoring and troubleshooting of the Spark Databricks cluster.

Collaborate with Data Engineers and Software Developers to develop experiments and deploy solutions to production.

Wrote production level Machine Learning classification models and ensemble classification models from scratch using Python and PySpark to predict binary values for certain attributes in certain time frame.

Used cloud SDK in GCP to configure the services Data Proc, Storage, BigQuery

Working with. ORC, AVRO and JSON, Parquetted file formats and create external tables and query on top of these files Using Big Query.

Worked with Customer Churn Models including Random Forest regression, lasso regression along with pre-processing of the data.

Work on data that was a combination of unstructured and structured data from multiple sources and automate the cleaning using Python scripts.

Improve fraud prediction performance by using random forest and gradient boosting for feature selection with Python Scikit-learn.

Write research reports describing the experiment conducted, results, and findings and make strategic recommendations to technology, product, and senior management.
Worked closely with regulatory delivery leads to ensure robustness in prop trading control frameworks using Hadoop, Python Jupyter Notebook, Hive and NoSql.

Wrote production level Machine Learning classification models and ensemble classification models from scratch using Python and PySpark to predict binary values for certain attributes in certain time frame.

Performed all necessary day-to-day GIT support for different projects, Responsible for design and maintenance of the GIT Repositories, and the access control strategies.

Developed the 
features, 
scenarios, 
step definitions for 
BDD (Behavior Driven Development) and 
TDD (Test Driven Development) using 
Cucumber and 
ruby.

Creating Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.

Experience managing Azure Data Lakes (ADLS) and Data Lake Analytics and an understanding of how to integrate with other Azure Services. Knowledge of USQL

Developed PySpark scripts using Python on Azure HDInsight for Data Aggregation, Validation, verifying its performance over MR jobs and to extract the data from the web server output files to load into HDFS.

Built a new CI pipeline. Testing and deployment automation with Docker, Swamp, Jenkins and Puppet. Utilized continuous integration and automated deployments with Jenkins and Docker.

Have knowledge of Numerical optimization, Anomaly Detection and estimation, A/B testing, Statistics, and Maple. Have big data analysis technique using Big data related techniques i.e., Hadoop, MapReduce, NoSQL, Pig/Hive, Spark/Shark, MLlib and Scala, numpy, scipy, Pandas, scikit-learn.

Data Integration ingests, transforms, and integrates structured data and delivers data to a scalable data warehouse platform using traditional ETL (Extract, Transform, Load) tools and methodologies to collect of data from various sources into a single data warehouse.

Keywords: continuous integration database California Texas
http://bit.ly/4ey8w48
https://jobs.nvoids.com/job_details.jsp?id=1095385&uid=
[email protected]
View All
10:53 PM 07-Feb-24


To remove this job post send "job_kill 1095385" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.


Your reply to [email protected] -
To       

Subject   
Message -

Your email id:

Captcha Image:
Captcha Code:


Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]


Time Taken: 0

Location: ,