Site Reliability Engineer
Company: INSPYR Solutions
Location: Miami
Posted on: November 19, 2024
|
|
Job Description:
Client: Royal Caribbean Cruise Lines
Apply (by clicking the relevant button) after checking through all
the related job information below.
Location: Miami, FL
Website: www.rccl.com
Duration: 6+ month contract
Site Reliability Engineer
Description:
Consultant will play a critical role in ensuring the reliability,
performance, and seamless operation of our digital ecosystem. This
includes our guest-facing mobile apps, websites, and the backend
systems that power them. You will work collaboratively with
development, operations, and product teams to build and maintain a
highly resilient and scalable digital experience for our
guests.
Essential Duties and Responsibilities: Incident Response and
Resolution: Respond to and resolve production incidents,
prioritizing guest-facing issues to minimize disruption. Conduct
root cause analysis with guidance from senior team members and
implement preventive measures to avoid recurrence.
Monitoring and Observability: Build, maintain, and enhance
monitoring tools and dashboards (using Prometheus, Grafana, or
similar) to provide visibility into system health, performance, and
guest impact. Proactively detect and address potential issues.
Automation and Tooling: Develop and implement automation scripts
and tools to streamline operations, reduce manual intervention, and
improve system reliability. Utilize configuration management tools
and infrastructure as code principles.
Collaboration: Work closely with product teams to incorporate
reliability principles into new feature development. Collaborate
with operations teams to ensure smooth deployments and
transitions.
Documentation and Knowledge Sharing: Create and maintain clear
documentation on system architecture, troubleshooting guides, and
incident postmortems. Share knowledge and best practices with the
team.
On-Call Support: Participate in on-call rotation as defined by team
needs, primarily focusing on acknowledging and escalating
incidents, with guidance from senior team members.
Working Hours: Expectations of non-standard working hours which
include mornings, nights, and weekend rotations.
Knowledge and Skills: Technical Expertise: Strong knowledge of
mobile (iOS, Android) and web technologies, backend systems, cloud
infrastructure (AWS, Azure, etc.), and database technologies.
Programming: Proficiency in one or more programming languages
(e.g., Python, Java, Go, Jenkins) for scripting and automation.
Working knowledge of and Kubernetes a high plus.
Monitoring and Observability: Experience with tools like
Prometheus, Grafana, Splunk, or similar.
Incident Management: Experience with incident management tools like
PagerDuty, ServiceNow, or similar.
Security: Understanding of security best practices, vulnerability
identification, and incident response.
Communication: Excellent written and verbal communication skills
for collaborating with diverse teams and stakeholders.
Customer Service: Understands and is aligned to the purpose of
providing a great client experience (client focused attitude)
Detailed Oriented: The ability to understand and appreciate the
fine, granular details.
SQL Database: Ability to work with large volumes of customer data.
Ability to use Oracle SQL (or similar) to query databases and
perform edits to SQL queries.
Preferred Qualifications: 5+ years of demonstrated proficiency in
one or more scripting languages such as python, Go, etc
3+ years of experience with Kubernetes or equivalent
5+ years of Software development experience in Java, JavaScript
etc.
3+ years of experience with containers and container orchestrators
- Docker, Kubernetes
5+ years of demonstrated experience debugging and fixing
system/infrastructure and application issues
5+ years of experience working with monitoring tools such as
Prometheus, Grafana, Splunk, Google stack driver, etc.
5+ years of experience with databases (SQL or NoSQL)
5+ years of experience with log analysis and building
dashboards
At least 6 years in a Reliability Engineering, DevOps or
infrastructure focused role
Advanced experience with programming languages ( Python, Java)
Deep systems and infrastructure knowledge
Excellent troubleshooting and problem-solving skills
Experience with high-traffic, guest-facing systems.
Our benefits package includes: Comprehensive medical benefits
Competitive pay
401(k) retirement plan
...and much more!
About INSPYR Solutions
Technology is our focus and quality is our commitment. As a
national expert in delivering flexible technology and talent
solutions, we strategically align industry and technical expertise
with our clients' business objectives and cultural needs. Our
solutions are tailored to each client and include a wide variety of
professional services, project, and talent solutions. By always
striving for excellence and focusing on the human aspect of our
business, we work seamlessly with our talent and clients to match
the right solutions to the right opportunities. Learn more about us
at inspyrsolutions.com.
INSPYR Solutions provides Equal Employment Opportunities (EEO) to
all employees and applicants for employment without regard to race,
color, religion, sex, national origin, age, disability, or
genetics. In addition to federal law requirements, INSPYR Solutions
complies with applicable state and local laws governing
nondiscrimination in employment in every location in which the
company has facilities
Keywords: INSPYR Solutions, Jupiter , Site Reliability Engineer, Professions , Miami, Florida
Click
here to apply!
|