Select Page

CAREERS

Technology

Site Reliability Engineer

 

Join us in helping the world’s most admired brands innovate and deliver great customer experiences

 

WHY WORK HERE?

 

“The chance to work with an amazing team, partnered with the world’s most admired brands.”
 
At Message Broadcast, we have created a world-class platform that delivers messaging at scale for Fortune 100 companies in the Finance, Energy, Healthcare and Telecommunications Industries. Our messaging platform manages enterprise communications, delivering positive results by automating interactions with sophisticated personalization based on time, place, preference, profile and response data.
 
All solutions are designed to dramatically increase customer engagement through self-service, acquisition, retention, and conversion across all channels, including TXT/SMS, MMS, RCS, voice, PUSH, email, and social media. Our highly scaled Platform as a Service (PaaS) delivers reduction to call center calls, proactive effective customer communications, improved J.D. Power CSAT scores, customer channel preference, real-time monitoring & reporting, scalable & deliverable notifications and TCPA mitigation.
 

OUR CULTURE

With solid leadership, a clear growth path and a wealth of expertise, we foster a collaborative environment and welcome those who want to work with like-minded talent with a modern technology stack. We embrace positive change and open communication. The long-term tenure of our team is a testament to our commitment to the growth of our employees, the success of Message Broadcast and our valued clients.

We are looking for a Site Reliability Engineer and the successful candidate should have a strong aptitude for learning new technologies and the ability to drive complex and meaningful projects to conclusion. Tight-knit collaboration with the software and network engineering teams and an ability to thrive under pressure are key skills required to succeed in this role. This individual should be self-motivated and have a passion for quality.

RESPONSIBILITIES

Operational Performance and Stability

Works with other members to monitor applications/platforms to ensure they are meeting performance and stability requirements.

  • Monitor traffic to multiple data centers to ensure proper balancing, availability, and efficient resource utilization

  • Monitor real time system performance for availability and SLA requirements

  • Analyze trends in hardware resource consumption, network latency, software errors and application logging

Monitors and Metrics

Works with Application Development and Network Engineering to ensure that applications/platforms have the appropriate monitoring and metrics in place to appropriately measure performance and stability.

  • Identify key areas where software needs to expose elements for performance measurement and debugging

  • Identify key areas in network and hardware systems where proper monitoring will quickly identify performance problems

  • Ensure monitoring provides holistic view of system health and availability

  • Configure dashboards to provide quick visibility of problems or trends that might indicate pending performance issues

 

Operational Readiness

Ensures that applications/platforms are Operationally ready for Production.

  • Ensures that new applications/platforms have adequate monitoring and that monitoring has been tested to expose areas that may impact reliable operations

  • Review any new Feature launch or other significant change that may impact monitoring

  • Write SOP/Knowledge Article for new features and update any affected support documentation

  • Training of Network Operations Center (NOC) and Application 1st level Support on new SOPs

Problem Management

Performs Post-Incident Reviews of all Major Incidents and determining Action Items required to avoid similar issues/minimize downtime for future Incidents.

SKILLS REQUIRED

  • Bachelor’s Degree in Computer Science or equivalent and 4 years of relevant work experience

  • 2+ years of SRE/DevOps/infrastructure experience

  • Experience with the use, maintenance and configuration of monitoring, metrics and logging infrastructure (Datadog, Elasticsearch, Graylog, Kibana, Logstash, Nagios, etc.)

  • Experience with Tracing tools (Open Trace, Wire Shark)

  • Experience configuring and monitoring containerized deployments (Docker, Swarm)

  • Full Stack troubleshooting experience including networking, operating system (CentOS), Nginx, DNS, and load balancing

 

BONUS SKILLS

  • Knowledgeable with Node.js, Redis, Mongo, RabbitMQ

 

BENEFITS

  • On-site full-time position with reasonable flexible work schedule, once established and approved by manager

  • Fully covered Medical, Dental, and Vision for employee

  • 401K

  • 14 days PTO

  • Well-stocked kitchen with energy drinks and other snacks

  • Onsite gym with showers

Message Broadcast is an Equal Opportunity Employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, disability status, protected veteran status, or any other characteristic protected by law.

***Message Broadcast does not provide visa sponsorship, transfer or assistance***