Story Behind the Need
• Business group: The GTS Technology Operations & Site Reliability Team are supporting a large initiative to streamline and automize the banks Security protocols. This team will apply SRE practices to client’s Kubernetes based container platform (Athena)
• Need: The Site Reliability Engineer within the GTS Technology Operations & Site Reliability Engineering team will be responsible for implementing and operating secure solutions related to security, access governance, and data protection.
Candidate Value Proposition
• The successful candidate will have the opportunity to work on multiple initiatives on public cloud platforms, gaining exposure to multiple bank streams and utilizing new technology.
• The individual is comfortable working with business and technical staff ensuring systems are designed and maintained according to enterprise architectural standards. Collaborating with team members, they will utilize agile best practices and metrics to build high quality technology solutions in line with the product’s vision.
• The main function of a systems engineer is to apply the principles of computer science and mathematical analysis to the design, development, testing, and evaluation of the software and systems that make computers work.
• Project: Hiring 4 people SRE – System Reliability Engineers – Combination of Pure Software Development, Infrastructure skills (strong knowledge, networking, storage devices – strong knowledge of application service level agreements
Typical expectations in the Role:
• Implement standards for Service Level Initiatives, Service Level Objectives and Service Level Agreements.
• Provide Management with reports, KPIs& metrics to improve operations, fulfillment and release services
• Create dashboards and client reports to provide customer status on a regular basis
• Review production problems to resolve difficult issues and identify systemic problems and resolution to them
• Provide platform and application operations teams leadership in production resiliency practices
• Implement, through code, best practices and standardize patterns for resiliency processes to provide exceptional
• Design and perform application pre-release testing practice (non-functional characteristics)
• Advice or implement code improving application resiliency to operational issues
• Review and improve internal procedures, documentation, and data reporting to ensure prompt resolution to incidents and provisioning, fulfillment & operational service needs.
• Use automation to implement improvements identified
Must have skills:
• 5 + years’ Software Development with Java OR Python OR Shell OR PowerShell
• 3 + years’ exposure to networking protocols, storage devices
• Recent Project Exposure (minimum 1) with Cloud deployments (GCP Highly preferred)
• 3 + years’ experience Kubernetes based container platform or similar
• 3 + years hands on experience on CI/CD tools (Jenkins, Gradle/Maven, BitBucket, Artifactory, SonarQube)
• 3 + years with source code versioning tools (Git).
• Excellent team player with experience working in an Agile environment.
• Excellent communication skills to work cross-functionally with business groups.