The Site Reliability Engineer (SRE) is responsible for ensuring
reliability, scalability, and performance
of applications and infrastructure. This role combines
software engineering and systems engineering principles
to design automated solutions, monitor system health, and improve operational efficiency.
Key ResponsibilitiesReliability & Performance
Maintain and improve
system availability, latency, and scalability
.
Implement
SLA, SLO, and SLI
metrics and monitor system health.
Troubleshoot incidents and perform
root cause analysis
.
Automation & DevOps
Build and maintain
CI/CD pipelines
, automated monitoring, and alerting systems.
Automate operational tasks to reduce toil and improve reliability.
Collaborate with DevOps and development teams to deploy and maintain services.
Infrastructure & Cloud
Design and operate
cloud-native platforms
(AWS, Azure, GCP) and on-prem infrastructure.
Implement
infrastructure as code (IaC)
using Terraform, CloudFormation, or similar tools.
Optimize system architecture for performance, cost, and scalability.
Monitoring & Incident Management
Implement monitoring, logging, and alerting solutions for production systems.
Respond to production incidents, outages, and alerts with urgency.
Conduct post-incident reviews and implement preventive measures.
Security & Compliance
Embed
security best practices
into deployment and operations processes.
Ensure systems comply with
regulatory and internal standards
.
Technical Skills (Mandatory)
Programming / Scripting:
Python, Go, Bash, or Java
Infrastructure & Cloud:
AWS, Azure, GCP
CI/CD & Automation:
Jenkins, GitLab CI/CD, GitHub Actions
Containers & Orchestration:
Docker, Kubernetes
Monitoring & Logging:
Prometheus, Grafana, ELK Stack, Datadog
Version Control & Collaboration:
Git, GitOps principles
Nice to Have
Experience with
microservices, event-driven architecture, and serverless platforms
Knowledge of
SRE principles, SLIs, SLOs, and error budgets
Exposure to
financial services / banking / fintech
production environments
Experience with
performance tuning, chaos engineering, and capacity planning
Soft Skills
Strong problem-solving and analytical mindset
Effective communication with technical and non-technical stakeholders
Ability to work under pressure and manage incidents
Ownership and proactive improvement mindset
Job Type: Contract
Contract length: 12 months
Pay: RM4,243.90 - RM17,741.09 per month
Benefits:
Health insurance
Maternity leave
Opportunities for promotion
Professional development
Work Location: In person
Beware of fraud agents! do not pay money to get a job
MNCJobz.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.