Application Monitoring Subject Matter Expert

Puchong, Selangor - Petaling, Selangor, Malaysia

Job Description


MISSION / JOB PURPOSE

  • Responsible for the monitoring, analysis, troubleshooting and reporting for AXA Application\xe2\x80\x99s operational performance. This includes but not limited to Infrastructure, Application, Network and Security.
  • Responsible for driving performance enhancements, and leading targeted process improvement initiatives.
  • Responsible for defining the metrics, data collection methods, and reporting mechanisms as well as implementation of an overall performance management strategy.
  • Ensures the effective capture of all logging and monitoring of all aspects of system and application behavior to facilitate fast detection and resolution of Application availability issues.
  • SME in troubleshooting all performance issue across the Enterprise. This role will work closely with IT, Application Development, Project Management and external vendors ensuring the consistent tracking and reporting of metrics and performance data across the Enterprise.
  • Supporting cost transparency efforts, and helping to develop mature cost metrics and Cost Optimization.
KEY RESPONSIBILITIES
  • Define and maintain IT\xe2\x80\x99s performance monitoring and reporting strategy (processes, tools, & templates): develop enhanced reporting capabilities through standardization and automation
  • Proactively analyze trends in performance across IT: collaborate with process owners and stakeholders to identify and implement process improvements to increase operational efficiency and Application availability
  • Analyze and recommend performance improvements for capacity, availability, performance, support and security.
  • Stays informed of production changes that could affect functionality and alerting.
  • Ability to coordinate across teams, working closely with peers to ensure the appropriate focus and sense of urgency is applied to all issues
  • Troubleshooting using logs, alerts and external data sources to determine network, application, or security issues. The ability to correlate data to determine root cause.
  • Accurately troubleshoots, reproduces, and documents issues and other pertinent information in Incident or Problem tickets.
  • Handles incident queue and performs various tasks as assigned and determines business impact.
  • Handles ad hoc requests and take on new procedures as required.
  • When working on projects, identify and track project issues and dependencies, ensure follow-through, and appropriate actions are taken to complete project on time
  • Recommend, implement and manage cloud Automation using native Cloud tools.
Qualifications
  • A minimum of five years of experience related to Performance analysis and monitoring across multiple areas including Infrastructure, Application, Network and Security for medium to large scale companies.
  • Bachelor\xe2\x80\x99s degree in computer science or information systems or an equivalent combination of education, work experience and/or applicable certifications.
  • Expert knowledge of IT performance metrics. Experience with data management, report design, data visualization and presentation techniques
  • Hands-on experience using open source and commercial tools such as: Load Runner/Performance Center, Jmeter, Gatling, Locust and APM tools like Dynatrace, AppDynamics, New Relic, Splunk etc.
  • Ability to troubleshoot Application performance and monitoring issues and provide detailed analysis.
  • Ability to provide documentation that other Performance Operations Engineers can use.
  • Provide runbooks for other departments to execute.
  • Recommend ideas to streamline operations, improve operations, and create processes to proactively determine potential issues.
  • Provide training and mentoring other team members
  • Ability to work independently.
  • Experience with one or more Cloud platforms; Microsoft Azure, Amazon Web Service (AWS), Google Cloud or IBM Cloud as it relates to performance, monitoring and cost management.
  • Expert experience with Application and Network Performance Management Tools
  • Knowledge and understanding of microservices and web application Protocols
  • Thorough understanding of throughput, latency, memory and CPU utilization
  • Knowledge on CI/CD technologies such as Jenkins, Ansible and docker container
  • Excellent communication, collaboration, reporting, analytical and problem-solving skills
  • Design/Implementation/Integration Experience on Azure Monitors, New Relic, Splunk and Infrastructure Monitoring tool like Nagios
  • Scripting Expertise on one or more languages like Python, Power Shell, Perl
  • Integration experiences with Third Party Monitoring (Logs/Events Triggers), Ticketing (Events/Workflows Triggers), Orchestration/Automation (Events/Workflow Triggers) Tools
  • Support solving complex performance issues, events correlation, resource optimization, tuning and/or triaging performance problems across on-premise and cloud environment
  • Collaborate and work with other senior staff to recommend and design systems architecture and topology from both general and specific perspectives.
  • Interact with IT Operation teams to communicate and understand the monitoring requirements and provide support on an on-call rotation model

AXA

Beware of fraud agents! do not pay money to get a job

MNCJobz.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Related Jobs

Job Detail

  • Job Id
    JD1000382
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    Puchong, Selangor - Petaling, Selangor, Malaysia
  • Education
    Not mentioned