Previous Job
Previous
Site Reliability Engineer (SRE)
Ref No.: 26-00412
Location: Plano, Texas
Position Type:Contract
Job Title: Site Reliability Engineer (SRE)
Location: Plano, TX (5 Days onsite role)
Long Term Project
Job Summary:
  • We are seeking a highly skilled Site Reliability Engineer (SRE) to join our Commercial & Investment Banking technology team. In this role, you will be responsible for ensuring the reliability, scalability, and performance of applications and infrastructure. You will work with modern cloud technologies, implement automation, and proactively improve system health while collaborating across engineering and business teams.
Key Responsibilities:
  • Design, build, and maintain scalable, reliable, and high-performance systems
  • Collaborate with development teams to implement CI/CD pipelines and deployment strategies
  • Develop and manage infrastructure using Infrastructure as Code (IaC) practices
  • Monitor system performance and availability using observability tools
  • Implement and maintain SLOs/SLAs and proactively resolve potential issues
  • Troubleshoot complex system and network issues across distributed environments
  • Drive adoption of SRE best practices including automation, reliability, and performance optimization
  • Partner with stakeholders and technical teams to solve business-critical problems
Required Skills & Qualifications:
  • Bachelor's degree in Computer Science or related field (or equivalent experience)
  • 3+ years of experience in Site Reliability Engineering / DevOps / Software Engineering
  • Strong knowledge of system reliability, scalability, performance, and security principles
  • Proficiency in at least one programming language (Python, Java, or similar)
  • Experience with CI/CD tools such as Jenkins, GitLab, or Terraform
  • Hands-on experience with containerization and orchestration tools (Docker, Kubernetes, ECS)
  • Strong understanding of observability tools (Grafana, Prometheus, Datadog, Dynatrace, Splunk)
  • Experience with cloud platforms and distributed systems
  • Solid understanding of networking concepts and troubleshooting
Preferred Qualifications:
  • Experience implementing SLO/SLA frameworks for critical systems
  • Knowledge of chaos engineering tools (e.g., Gremlin, Chaos Monkey)
  • Familiarity with infrastructure components (load balancers, routers, storage systems)
  • Experience with tools like Jira, Confluence, ServiceNow, Netcool
  • Strong problem-solving skills and ability to work in a fast-paced environment
  • Experience with log analysis and monitoring tools
Key Competencies:
  • Strong analytical and troubleshooting skills
  • Excellent collaboration and communication abilities
  • Proactive mindset with focus on automation and continuous improvement
  • Ability to work across cross-functional teams