Previous
Site Reliability Engineer (SRE)
Next
| Ref No.: |
25-01552 |
| Location: |
Tampa, Florida
|
| Position Type: | Contract |
Role: Site Reliability Engineer (SRE)
Location: Tampa, FL (Onsite)
Duration: Contract
Job Description:
We are seeking a highly skilled Site Reliability Engineer (SRE) to ensure the reliability, scalability, and performance of mission-critical systems. The ideal candidate has strong experience in cloud platforms, container orchestration, automation, monitoring, and performance testing. This role involves building resilient systems, optimizing infrastructure, and collaborating with development teams to improve operational excellence across the organization.
Responsibilities
- Deploy, manage, and monitor applications on Google Kubernetes Platform (GKP), Docker, and Kubernetes.
- Optimize containerized workloads in private or hybrid cloud environments.
- Database Reliability & Management
- Work with CockroachDB, Oracle, and SQL databases to ensure high availability and performance.
- Conduct reliability, failover, and stress testing for database systems.
Infrastructure as Code
- Automate infrastructure provisioning using Terraform, Helm, and Ansible.
- Implement consistent and repeatable environment deployments.
Monitoring & Observability
- Configure and manage monitoring and logging solutions including Prometheus, Grafana, ELK Stack, Splunk, and Dynatrace.
- Set up alerting, dashboards, and health checks to ensure system uptime.
Performance & Load Testing
- Use JMeter, Gatling, and Locust to run load, stress, and scalability tests.
- Benchmark system performance and identify optimization opportunities.
Reliability Engineering & Chaos Testing
- Perform chaos engineering using Gremlin or Chaos Mesh to validate system resilience.
- Implement disaster recovery testing and fault injection strategies.
CI/CD Pipeline Management
- Build and maintain automated pipelines using Jenkins, GitLab CI, and Azure DevOps.
- Ensure reliable deployment workflows across environments.
Scripting & Automation
- Develop automation scripts using Python, Bash, and Go.
- Automate routine operational tasks, monitoring, and test scenarios.
Security & Vulnerability Testing
- Conduct security testing using OWASP ZAP, Burp Suite, and Trivy.
- Validate container, application, and infrastructure-level security.
Service Mesh & Networking
- Work with Istio and Envoy to manage microservices traffic, security, and observability.
- Implement network policies, service routing, and distributed tracing.
Test Reporting & Analytics
- Use Allure and Report Portal to generate, visualize, and analyze test results.
Version Control
- Manage source code, CI/CD pipelines, and configuration files using GitHub.
Qualifications
- Bachelor's degree in Computer Science, Engineering, or related field.
- Strong understanding of Linux systems, distributed systems, and cloud-native architectures.
- Excellent troubleshooting, problem-solving, and communication skills.
|