Home

Job Opening :: DataHub Developer in Austin, TX | Full Time at Austin, Texas, USA
Email: sachin.chauhan@tanishasystems.com
From:

Sachin Kumar Chauhan,

Tanisha Systems

sachin.chauhan@tanishasystems.com

Reply to:   sachin.chauhan@tanishasystems.com

Hello,

Please let me know your interest and availability for the below position.
Also provide me your updated resume.

Position: DataHub Developer

Location: Austin, TX (hybrid)

Mode: Full-Time

Job Description: DataHub Developer with Committer Experience

Position Overview

We are looking for an experienced DataHub Developer with Committer Experience to join our team and contribute to the design, development, and optimization of enterprise metadata management and data lineage solutions. The ideal candidate will have strong expertise in data cataloging, data lineage, data governance, and hands-on experience with DataHub, Spark-based frameworks, and machine learning for anomaly detection. This role demands a mix of open-source contribution, technical problem-solving, and metadata management expertise.

Key Responsibilities

DataHub Development and Integration

Lead projects involving metadata cataloging using the DataHub open-source framework.

Design and develop custom APIs to integrate ETL pipelines and enable real-time metadata ingestion.

Ingest metadata from multiple systems, including data lakes, upstream, and downstream systems, to provide a holistic metadata ecosystem.

Customize and extend DataHub to enrich impact analysis by identifying pipelines reading/writing to data assets.

Data Lineage and Governance Implementation

Provide end-to-end data lineage solutions for PII identification, governance, and compliance reporting.

Develop and implement processes to enhance impact analysis and ensure seamless data governance practices.

Spark-Based Framework Development

Design, develop, and maintain Spark-based custom frameworks for config-as-code mechanisms to facilitate data enrichment and transfer.

Improve the performance and scalability of Spark applications to ensure seamless data processing.

Provide recommendations and guidance on the design and development of ETL pipelines using Spark.

Machine Learning Integration for Anomaly Detection

Collaborate with ML engineers to create features from profiled batch data.

Develop and integrate machine learning models for anomaly detection in data patterns.

AWS Cost Optimization and Platform Efficiency

Lead AWS cost optimization initiatives to enhance platform-wide efficiency.

Successfully support Spark version upgrades and ensure the platform's scalability and performance.

Community Engagement and Contributions

Act as a committer to the DataHub open-source community by contributing new features, fixing issues, and enhancing documentation.

Participate in open-source discussions, propose architectural improvements, and represent the organization in community events.

Required Qualifications

Experience:

5+ years in metadata management, data lineage, or data governance roles.

Proven track record as a committer or active contributor to the DataHub open-source project.

Technical Skills:

Proficiency in Java, Python, and REST API development.

Strong experience with Apache Spark for ETL pipeline design and custom framework development.

Expertise in metadata ingestion from systems like data lakes, databases, and ETL tools.

Hands-on experience with AWS services and cost optimization strategies.

Familiarity with machine learning techniques for anomaly detection.

Other Skills:

Strong analytical and problem-solving skills.

Excellent communication and collaboration abilities.

Preferred Qualifications

Knowledge of data governance regulations like GDPR, CCPA, or HIPAA.

Experience with infrastructure-as-code tools such as Terraform or Helm.

Familiarity with other metadata management tools like Amundsen, Collibra, or Alation.

Understanding of version control, CI/CD pipelines, and open-source development practices.

Thanks & Warm Regards,

Sachin Kumar Chauhan

Sr. Technical Recruiter

Tanisha Systems Inc.

99 Wood Ave South, Suite # 308, Iselin, NJ 08830

Email Id: sachin.chauhan@tanishasystems.com

Phone: 7323652553Ext.782

Keywords: continuous integration continuous deployment machine learning Idaho New Jersey Texas
Job Opening :: DataHub Developer in Austin, TX | Full Time
sachin.chauhan@tanishasystems.com
sachin.chauhan@tanishasystems.com
View All
05:59 AM 25-Jan-25


To remove this job post send "job_kill 2113539" as subject from sachin.chauhan@tanishasystems.com to usjobs@nvoids.com. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.


Your reply to sachin.chauhan@tanishasystems.com -
To       

Subject   
Message -

Your email id:

Captcha Image:
Captcha Code:


Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at me@nvoids.com


Time Taken: 8

Location: Austin, Texas