Lead, Systems Reliab Eng'ng

Malaysia, Malaysia

Job Description


Role Responsibilities We\'re looking for an SRE Engineer to work alongside our Conversational Engagement team offering new-wave Digital platform for Client Messaging. We\'re small right now, but a growing squad, with new and exciting technologies to work on along with our technology partners. We work in project-based sprints in small, interdisciplinary teams. As an SRE Engineer you\'d be able to work on and solve some of the many interesting challenges we are facing. Candidate is expected to help troubleshoot, optimise, and improve our stack whether that be our cloud infrastructure or on-prem deployments. Understanding the full stack and being able to support and debug issues will be a core requirement Our Ideal Candidate :- At least 5 - 8 years of experience in production support & development environment Have good technical knowledge and hands on experience on Azure DevOps, Python, Kubernets, Podman, any Cloud based deployment, Kibana, Logstash and etc. Experience in MSSQL, DB2, Oracle DB objects/design such as Stored Procedure Preferable with DevOps and SRE engineering experienced Strong debugging and analysis skills is a must. Understanding of basic networking protocols like TCP/IP, Subnets/Gateway, HTTP, DNS and NTP. Added advantage with Mandarin/Chinese language given the nature of job demands the resource to work very closely with Northern Asia and Far East Asia markets Standby support require based on support roaster and rotation basis Demonstrated to learn new things and able to pick up new challenges Role Specific Technical Competencies Data Science Azure DevOps Chatbot Framework: conversational AI and machine learning concepts to build advanced bots Good analytical and problem solving skills. Able to interact and communicate well with users & business stakeholders. Working Experience in Agile Technical expertise In VM, Linux, Win Servers and Kubernets based deployments Perform production support activities by: Providing root caused analysis and resolution within the specified SLA System maintenance Capacity Planning Data Analysis Experience with revision control systems like GitHub and CI/CD unit testing. Demonstrable understanding of UNIX & Windows Operating Systems. Understanding of basic networking protocols like TCP/IP, Subnets/Gateway, HTTP, DNS and NTP. Strong debugging and analysis skills is a must. Excellent spoken and written communication skills. Good understanding on the Product. Interest in independently learning new technologies. Good interpersonal capabilities and ability to work multi-functionally with other teams. Have technical knowledge and hands on experience on Azure DevOps, Python, Kubernets, Podman, any Cloud based deployment, Kibana, Logstash and etc. Experience in MSSQL, DB2, Oracle DB objects/design such as Stored Procedure Preferable with DevOps and SRE experienced Added advantage with Mandarin/Chinese language given the nature of job demands the resource to work very closely with Northern Asia and Far East Asia markets RESPONSIBILITIES Collaborate with software development groups to ensure operational needs are adequately considered and baked into new software releases. Define monitoring & alarming thresholds. Setup clear and accurate SLO/SLI for efficient service monitoring Develop infrastructure (network, compute, storage) & application capacity models. Drive toward automated deployments & modern approaches to configuration management. Focus on application reliability. Ensure applications and infrastructure designs avoid pitfalls of large scale SaaS offerings (bottlenecks, single points of failure, etc.). Fullservice handling (analysis, debugging, response, and resolution) of application level issues using any application monitoring tool. Focus on service availability. Reduce MTTR by assisting the incident command, operations, and application developers teams to diagnose & resolve service outages (incidents with significant customer impact) Perform independent functional and Technical Analysis Detect, Analyse and Resolve major or standard incidents. Regular updates using the Ticketing systems. Ableto communicate and work with all stakeholders Perform RCA and resolution within the specified SLA. Design and provide service improvements Provide end-to-end support and implement resolution to resolve incident tickets. Ready to standby support to any critical/major incidents. Support of weekly system maintenance, capacity planning, data analysis, DR drill and audit and compliance related matters. Provide change implementation support. Coordinate and validate the implementation of changes Preferable with DevOps and SRE practice Added advantage with Azure ADO knowledge Added advantage with Mandarin language Strategy A SRE engineer will be responsible for the support and maintenance of the banks technical stack including cloud and application services. You will troubleshoot and debug customer issues and help monitoring and optimise the solution. A SRE engineer you are responsible for system reliability, working towards to increase productivity and reduce time to market by striving to reduce technical debt of the services SRE team supports Be responsible for the support, monitoring and optimisations of the infrastructure and application service, particularly with regard to identifying and fixing issues whilst scaling the platform to support customer growthTroubleshoot the system and solve problems across all platform and application domains to improve the customer experience Support the microservice based application suite related to customer support and general technical issues Recommend process, technical improvements and any automation approach Refine real-time observability and metrics to improve visibility into the performance of the full stack environment. Participant system testing to ensure the high quality of a company\'s services and products into the production environment. Business Responsible for CPBB Digital Channel, Chat and Chatbot application support Processes Responsible for executing and supporting the technology and innovation processes, system deployment and stabilisation, system compliance and vendor SLA management People & Talent Lead through example and build the appropriate culture and values. Set appropriate tone and expectations from their team and work in collaboration with other tech team, risk and control partners. Ensure the provision of ongoing training and development of people and ensure that holders of all critical functions are suitably skilled and qualified for their roles ensuring that they have effective supervision in place to mitigate any risks. Risk Management Responsibilities relating to identifying, assessing, monitoring, controlling and mitigating risks to the Group, as well as an awareness and understanding of the main risks facing the Group and the role the individual plays in managing them. Governance Responsibilities relating to the planning, structure, frameworks (e.g. processes and policies) and overall technologies standard Regulatory & Business Conduct .Display exemplary conduct and live by the Group\'s Values and Code of Conduct. Take personal responsibility for embedding the highest standards of ethics, including regulatory and business conduct, across Standard Chartered Bank. This includes understanding and ensuring compliance with, in letter and spirit, all applicable laws, regulations, guidelines and the Group Code of Conduct. Support the Chat and Chatbot based applications for Technology Support team to achieve the outcomes set out in the Bank\'s Conduct Principles Key stakeholders Product Owner - Conversation Engagement Hive Country Business users Technology Partners - External Vendors Technology Partners - Digital Channels Hive Other Responsibilities Embed Here for good and Group\'s brand and values in Group / CPBB TTO / Technology Support team Perform other responsibilities assigned under Group, Country, Business or Functional policies and procedures Multiple functions (double hats) Provide technical expertise on aspects of the organisation\'s IT infrastructure/software applications/ architecture/ hardware to internal customers, advise them of appropriate actions to fulfil procedural and regulatory requirements or solve immediate problems. Manage uptime of the application and integration of Conversational Banking Services (Chatbot/Alexa/WhatsApp) using AI based integration with Finacle, Bank CRM, Credit Cards, Loans and similar applications within the Bank QUALIFICATIONS TRAINING, LICENSES, MEMBERSHIPS AND CERTIFICATIONS Bachelor Degree of IT Computing AI , Genesys Certification, Agile, ITIL, SCRUM, DevOps About Standard Chartered We\'re an international bank, nimble enough to act, big enough for impact. For more than 160 years, we\'ve worked to make a positive difference for our clients, communities, and each other. We question the status quo, love a challenge and enjoy finding new opportunities to grow and do better than before. If you\'re looking for a career with purpose and you want to work for a bank making a difference, we want to hear from you. You can count on us to celebrate your unique talents. And we can\'t wait to see the talents you can bring us. Our purpose, to drive commerce and prosperity through our unique diversity, together with our brand promise, to be here for good are achieved by how we each live our valued behaviours. When you work with us, you\'ll see how we value difference and advocate inclusion. Together we: . Do the right thing and are assertive, challenge one another, and live with integrity, while putting the client at the heart of what we do . Never settle, continuously striving to improve and innovate, keeping things simple and learning from doing well, and not so well . Be better together, we can be ourselves, be inclusive, see more good in others, and work collectively to build for the long term In line with our Fair Pay Charter, we offer a competitive salary and benefits to support your mental, physical, financial and social wellbeing. . Core bank funding for retirement savings, medical and life insurance, with flexible and voluntary benefits available in some locations . Time-off including annual, parental/maternity (20 weeks), sabbatical (12 weeks maximum) and volunteering leave (3 days), along with minimum global standards for annual and public holiday, which is combined to 30 days minimum . Flexible working options based around home and office locations, with flexible working patterns . Proactive wellbeing support through Unmind, a market-leading digital wellbeing platform, development courses for resilience and other human skills, global Employee Assistance Programme, sick leave, mental health first-aiders and all sorts of self-help toolkits . A continuous learning culture to support your growth, with opportunities to reskill and upskill and access to physical, virtual and digital learning . Being part of an inclusive and values driven organisation, one that embraces and celebrates our unique diversity, across our teams, business functions and geographies - everyone feels respected and can realise their full potential. Recruitment assessments - some of our roles use assessments to help us understand how suitable you are for the role you\'ve applied to. If you are invited to take an assessment, this is great news. It means your application has progressed to an important stage of our recruitment process.

foundit

Beware of fraud agents! do not pay money to get a job

MNCJobz.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Related Jobs

Job Detail

  • Job Id
    JD1015529
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    Malaysia, Malaysia
  • Education
    Not mentioned