Incident Manager

Bangi, M10, MY, Malaysia

Job Description

Requirements

Bachelor's degree in Information Technology, Computer Science, Engineering, Business Administration, or any related discipline. Minimum 3-5 years of experience in IT Service Operations, Technical Support, or Enterprise Incident Management, including hands-on exposure to critical incident handling. Strong working knowledge of ITIL v4 practices, especially Incident Management, Problem Management, and Change Control processes. Solid organizational and coordination skills with the ability to manage multiple teams and activities simultaneously during high-pressure incidents. Demonstrated experience engaging 3rd party Managed Service Provider (Service Desk), Service Delivery Engineer, Technical Operations (TechOps), Development Operations (DevOps), and Engineering teams in structured technical investigation workflows. Excellent communication skills to deliver accurate, timely updates to internal leadership, external clients, and stakeholders during live incidents. Ability to analyze logs, technical evidence, and cross-tier investigative outputs to facilitate informed decision-making. Proven capability to lead root cause analysis, prepare structured reports, and drive corrective and preventive actions to closure. Skilled in managing documentation, ensuring process compliance, and maintaining audit-ready incident records. Ability to work independently, maintain composure under pressure, and demonstrate initiative in rapidly evolving operational conditions.

Responsibilities



Lead and govern the entire lifecycle of Severity 1 & Severity 2 incidents from activation to recovery. Validate incident severity and ensure proper classification, de-escalation, or progression based on service impact. Initiate and coordinate the Incident Response activities, ensuring accurate ticket information, correct severity tagging, and alignment with the P052 workflow. Mobilize relevant technical responders and ensure timely engagement throughout remote checks, on prem checks, cloud checks, system-level troubleshooting, or code-level analysis. Coordinate stakeholder management, ensuring that communication flows consistently across internal teams, clients, and management based on the incident's development. Evaluate workaround feasibility, confirm if additional information is needed, and guide responders through the sequence of troubleshooting actions until recovery is achieved. Ensure all investigative activities, insights, and decisions are documented within the ticketing system in alignment with operational governance requirements. Track SLA performance throughout the incident lifecycle, identify risk of breach, and initiate escalation or remediation actions to protect contractual obligations. Confirm recovery, validate service stability, and authorize closure of incident after verifying all required checks are completed. Trigger the formal transition into the Post-Incident Review (PIR) phase once the incident is closed. Prepare all necessary information for PIR, including timelines, technical findings, escalation notes, system behaviour evidence, and resolution steps. Lead the PIR session and ensure participation from all stakeholders involved (Problem Manager, Service Delivery Lead, Technical Responders, and Knowledge teams). Support the Problem Manager in validating and finalizing the Root Cause Analysis (RCA), ensuring accuracy and completeness of technical details shared by L2, L3, and engineering teams. Collaborate with Problem Manager and Service Delivery Lead to define corrective and preventive actions (CPA), ensuring operational alignment with current processes and technical capabilities. Verify completion of all required corrective actions before PIR closure, following up with responsible teams to ensure full remediation and evidence submission. Ensure that technical insights and validated solutions are fed into Knowledge Management for documentation updates, KB creation, or refinement of existing SOPs. Provide clear guidance to the Knowledge Coordinator to ensure successful publication, distribution, and version control of updated knowledge articles and service catalog entries. Maintain accurate PIR documentation, ensuring it is audit-ready, stored appropriately, and distributed to relevant leadership and governance bodies. Champion process adherence for High-Critical Incident Process and Post Incident Review (PIR), ensuring technical teams follow the required sequence of checks and documentation activities. Identify recurring patterns or systemic risks observed during incident handling and feed insights into Problem Management and Service Operations improvement initiatives. Recommend improvements to escalation flows, communication templates, investigative steps, and knowledge articles based on PIR outcomes and operational experience. Support refinement of SOPs, integration paths, system dependencies, and troubleshooting playbooks in collaboration with technical SMEs. Maintain readiness to support after-hours escalations or urgent service disruptions to ensure operational continuity. To be responsible for any ad-hoc assignment by the Head of Product Operations
Job Type: Contract
Contract length: 12 months

Pay: RM5,000.00 - RM6,000.00 per month

Benefits:

Free parking Maternity leave Meal provided Opportunities for promotion Parental leave Professional development
Ability to commute/relocate:

Bandar Baru Bangi (43650): Reliably commute or planning to relocate before starting work (Preferred)
Application Question(s):

How long is your notice period? How much is your expected salary?
Work Location: In person

Beware of fraud agents! do not pay money to get a job

MNCJobz.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Job Detail

  • Job Id
    JD1378055
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Contract
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    Bangi, M10, MY, Malaysia
  • Education
    Not mentioned