Previous Job
Big Data Architect
Ref No.: 19-09660
Location: Columbus, Ohio
11-15 years of total IT experience including 4+ years of Big Data experience (Hadoop, Spark (Java or Scala or Python), HBase, Hive, Impala, Kafka, etc.)
Hands on experience on designing and programming Big Data tools and technologies is mandatory
Experience on Hortonworks distribution is mandatory
Must have hands on experience on PySpark, Kafka and Spark Streaming for ETL on Big Data Lake
Must have data architecting and data modeling skills and use of Erwin as data modeling tool.
Strong UNIX shell script / Python scripting hands-on experience
Knowledge on developer productivity tools and other productivity management tools is preferable
Experience in Agile methodology is a must
Knowledge of standard methodologies, concepts, best practices, and procedures within Big Data environment
Bachelor's degree in Engineering - Computer Science, or Information Technology. Master's degree in Finance, Computer Science, or Information Technology a plus
Exposure to infrastructure as service (IAAS) providers such as: Google Compute Engine, Microsoft Azure or Amazon AWS is a plus
Self-starter and able to independently implement the solution
Good problem-solving techniques and communication
Job Description
Strong Big Data Architect with hands-on data modeling on Big Data Lake space (schema based and schema less data model designs and implementations)
Develop data pipelines using Big Data technologies that leverage value to the customer; understand customer use cases and workflows and translate them into engineering deliverables
Actively participate in scrum calls, story points, estimates and own the development piece
Analyze the user stories, understand the requirements and develop the code as per the design
Develop test cases, perform unit testing and integrating testing
Support QA Testing, UAT and production deployment
Develop batch and real-time data load jobs from a broad variety of data sources into Hadoop
Design ETL jobs to read data from Hadoop and pass to variety of consumers / downstream applications
Perform analysis of vast data stores and uncover insights
Analyze the long running queries and jobs, performance tune them by using query optimization techniques and Spark code optimization