Lead Site Reliability Engineer with Java :: San Antonio, TX
Company: TOPSYSIT
Location: San Antonio
Posted on: April 11, 2025
|
|
Job Description:
Role: Lead Site Reliability Engineer with Java
Interested in this role You can find all the relevant information
in the description below.
Location: San Antonio, Texas
Project Tenure: 18 Month
Customer: Banking Client
Relevant Experience: 14+ Years
Job Description & Key Responsibilities:
As a Lead Site Reliability Engineer (SRE), you will leverage your
extensive experience in SRE practices to
maintain and enhance the reliability, performance, and scalability
of mission-critical systems. You will
play a crucial role in ensuring the continuous availability and
optimal functioning of our services.
Key Responsibilities:
--- Senior-Level SRE Expertise: Apply your deep understanding of
SRE principles to lead efforts in
improving system reliability and operational efficiency.
--- Incident Management: Provide expert-level support during
incidents, ensuring swift resolution
with minimal service disruption. Lead post-incident reviews to
drive continuous improvement.
--- Monitoring & Alerting: Design, implement, and optimize
monitoring, alerting, and incident
response processes. Ensure the effectiveness of these systems to
proactively address potential
issues.
--- Automation: Drive the automation of manual processes to enhance
operational efficiency,
reduce human error, and increase overall system resilience.
--- CI/CD Pipeline Management: Develop, maintain, and improve
automated CI/CD pipelines using
tools such as GitLab CI/CD and Jenkins, ensuring seamless and
reliable deployment processes.
--- Cross-Functional Collaboration: Work closely with
cross-functional teams to ensure the
reliability, performance, and scalability of our infrastructure.
Foster a culture of collaboration
and knowledge sharing.
--- Support Across Time Zones: Provide support across all U.S. time
zones, with the flexibility to
work weekends, rotational shifts, and overtime as required to
maintain service continuity.
Required Skills & Qualifications:
--- Java Programming: Advanced proficiency in Java, with a deep
understanding of contemporary
software development practices.
--- Kubernetes & Containerization: Extensive hands-on experience
with Kubernetes, including
containerization technologies like Docker and Kubernetes storage
solutions such as Portworx.
--- Linux/Unix Systems: Strong command of Linux/Unix operating
systems and Shell Scripting
(BASH), with a focus on system reliability and automation.
--- Functional Programming: Proficiency in functional programming
languages such as Prolog,
Haskell, and OCaml.
--- Scripting & Automation: Experience with Python or Go,
particularly in the context of scripting
and automation tasks.
--- Virtualization: In-depth knowledge of VMware and other
virtualization platforms, with a focus
on optimizing virtual environments for reliability and
performance.
--- Streaming Technologies: Expertise with Kafka Stream Generator,
KSQLDB, cluster federation, and
Spark Streams, including experience in managing and optimizing
streaming data architectures.
--- Service Mesh & Networking: Familiarity with Istio and Anthos
Service Mesh, with the ability to
manage and optimize service meshes for complex environments.
--- Performance Monitoring & Debugging: Proficiency in using EBPF
(Extended Berkeley Packet
Filter) for performance monitoring and debugging.
--- Monitoring & Logging Tools: Experience with industry-standard
monitoring and logging tools
such as Splunk, Prometheus, Datadog, and Kiali.
--- Load Balancing: Familiarity with Nginx Controller and Seesaw
for effective load balancing and
traffic management.
--- Infrastructure-as-Code (IaC): Competence in using Terraform for
managing cloud infrastructure,
ensuring consistency and scalability across environments.
Additional Requirements:
--- Flexibility: Willingness to work weekends, rotational shifts,
and provide 24/7 support as
necessary to maintain service reliability and meet project
deadlines.
--- Certifications Required:
o Kubernetes
o Azure
Thanks,
Prem Kusuma
Keywords: TOPSYSIT, Round Rock , Lead Site Reliability Engineer with Java :: San Antonio, TX, IT / Software / Systems , San Antonio, Texas
Click
here to apply!
|