Site Reliability Engineer
Responsibilities:
· Conceptualize and implement Site Reliability Engineering Framework/Components to improve predictive monitoring and driving SRE team's journey towards "Robotics First” approach. · Research latest technology, concepts, conceptualize solution and develop proof of concept that will improve resiliency and performance of the production infrastructure. Design and implement innovative solution/framework that will improve software engineering velocity, infrastructure resiliency and security, and data availability. · Develop common framework components (to be leveraged by enterprise applications), define standards for configuration, monitoring, reliability and performance engineering. · Work with operations team to resolve major incidents. · Continuously improve automated remediation tasks to ensure the highest QUALIFICATION · A BS degree in Computer Science, Computer Engineering, other Technical discipline, or equivalent work experience. · 8+ years of Technical hands-on experience with systems analysis, incorporating: Design Methodology, Production Support and Engineering, Enterprise level technologies including, but not limited to OpenShift, WebSphere Administration, JEE (JSP, Servlets, XML, Java), and internet-related technologies to deliver complex Internet facing solutions. · Broad Technical field exposure, with preference to following skills: Cloud Infrastructure, VM, load balancing, containers, Kubernetes, JVM's, web servers, application debugging, queing technologies, Caching technologies, databases, routing and switching, etc. · Experience working relational and nosql databases such as DB2, Oracle, Cassandra & Redis. · Strong knowledge of Linux internals and experience managing Linux systems in high traffic environments. · Fluent in programing languages - Python · Strong interpersonal communication skills and the ability to work well in a diverse team-focused environment. · Experience with Splunk (Experience with ELK is a plus). · Familiarity with financial services and authorizations systems. · Understanding of using Agile Practices in Operations teams | ||||||