Previous Job
Previous
Sr. Site Reliability Engineer/Santa Clara, CA 5mth+ Contract
Ref No.: 18-01104
Location: Santa Clara, California
Position Type:Contract
 Location: Santa Clara, CA
Duration:5+ Months Contract
 
USC, GC or GC EAD, TN visa
NO OPT and H1's
 
 
  • Design, build and maintain Infra in AWS to enable reliable and rapid deployment of microservices with effective monitoring and resilient operations.
  • Set up critical infrastructure, develop tools and framework to automate operational tasks, deployment of machines, services/app
  • Work closely with engineering teams to ensure microservices are designed with scale, operability, and performance
  • Create meaningful dashboards, logging, alerting, and responses to ensure that issues are captured and addressed proactively
  • Define Service Level Objectives for product(s) to constantly measure their reliability in production.
  • Maximize services uptime and availability ensuring functional and performance SLAs
  • Develop custom code or scripts to automate infrastructure, monitoring services
  • Cross Functionality with Engineering Teams: Contribute to architecture diagrams and other documentation for security reviews
  • Initiate, lead scripting and automation to streamline system updates and upgrades
Qualifications
  • Deep understanding of at least one of modern programming language: Java, C, C++, Python, C#.
  • Fluency in Linux, AWS services, and systems management tools (Ansible, Puppet, Chef, etc.)
  • Fundamental understanding of distributed systems including: the CAP Theorem, Microservices, and the Twelve Factor App.Skills and Experience
  • Expertise in configuration management with a framework such as Ansible, Chef, or Puppet
  • 5+ years Experience in Site Reliability, or infrastructure engineering for a commercial SaaS solutions
  • 5+ years Expertise in AWS cloud infrastructure and its related services
  • Serious troubleshooting skills across different levels of stack
  • Deep experience in monitoring distributed application architecture
  • Experience monitoring cloud services with Datadog
  • Strong experience with Linux and MySQL
  • Proficiency with a programming language like Python, Ruby, Java and shell scripting to automate tasks
  • Experience in CI/CD automation and GitHub
  • Experience in custom code or scripts for 'destructive testing' to ensure adequate resiliency in production
  • Excellent problem solving, critical thinking, communication, and teamwork skills
  • Excellent written and verbal communication, able to collaborate and rally support
  • BS or MS in Computer Science, related field, or equivalent professional experience