Work location: NEW YORK (US:10001), NY/ Remote
JOB DESCRIPTION
"""What you will be doing:
• Build and maintain business functions with PySpark library with platform independency by using Python Language.
• Integrate business logic solution with PySpark Framework running on Azure Databricks, GCP dataproc, and On-prem Spark environment.
• Integrate business logic with upstream and downstream systems and applications, including RDBMS, File system, Hive, Delta lake, Azure Data Lake, Azure Event Grid, Azure Function, Azure Event Hub, etc.
• Cooperate with PySpark Framework backbone developers to integrate the business logic python program to be plug and play with PySpark Framework.
• Build and maintain DevOps process to integrate the CI/CD pipeline through Jenkins, Gitlab, Nexus Repo, checkmarkrx to ensure development and deployment integration and quality control with automated testing and security governance.
• Publish data processing and modeling result/dataset to RDBMs on Azure SQL solution and generate microservice through restful Azure Function API.
• Manage and maintain the Data bricks job execution through Azure job scheduler and/or other orchestration tools.
Here's what you will need to know/have:
• A minimum of 2 years of professional, hands-on data management/operations and/or analytic experience
• A minimum of 3 years of professional, hands-on building enterprise level application with full stack experience including data preparation, data processing, data publication and consumption with UI and microservice.
• A minimum of 2 years of experience in building enterprise level solution on Azure cloud environment with Azure Function, Azure Databricks, Azure Event Hub, Azure Event Grid, Azure data lake, Azure Data factory
• A minimum of 1 year of experience on hands-on coding in python and/or pyspark on Azure Databricks environment
• A minimum of 1 year of experience in snowflake
• Demonstrated experience interacting with and influencing decision-making by non-analytical business audiences
• Excellent problem-solving skills along with experience in constructing data management, automated solutions, including building application/process to address business problems
• Proficiency with data access, manipulation and retrieving data from large databases with SQL on RDBMS such as Teradata, Oracle, and SQL Server
• Experience with data access, manipulations and statistical analysis through python and or PySpark
• Experience in using Terraform and/or similar infrastructure as code to provision environment on Azure
• Experience in DevOP CI/CD with gitlab, Jenkins, Nexus Repo, checkmarkrx, and or other quality control or security governance tools""
|