Track Awesome Sre Updates Daily
A curated list of Site Reliability and Production Engineering resources.
🏠 Home · 🔍 Search · 🔥 Feed · 📮 Subscribe · ❤️ Sponsor · 😺 dastergon/awesome-sre · ⭐ 9.1K · 🏷️ Miscellaneous
Sep 02, 2022
Newsletters
- Monitoring Weekly - What's new in monitoring? Curated monitoring articles to your inbox each week.
- Observability news - Updates around observability (o11y) with a special focus on open source.
Aug 12, 2022
Reliability
Aug 05, 2022
Culture
Education
Books
On-Call
Capacity Planning
Jul 07, 2022
Service Level Agreement
Jul 06, 2022
Monitoring & Observability & Alerting
Jun 29, 2022
Books
Jun 28, 2022
Podcasts
May 30, 2022
Culture
May 15, 2022
Culture
May 08, 2022
Books
Apr 24, 2022
Performance
Mar 01, 2022
Service Level Agreement
Feb 22, 2022
Service Level Agreement
Feb 10, 2022
Culture
Feb 03, 2022
SRE Tools
- SRE cheat sheet (⭐116) - A cheat sheet for Site Reliability Engineering principles and numbers
Jan 07, 2022
Books
- Systems Performance: Enterprise and the Cloud [Sample chapter titled CPUs
Dec 21, 2021
Blogs
- Logit.io Blog - Resources on log management, SRE and devOps.
Dec 14, 2021
Service Level Agreement
Dec 13, 2021
Books
Reliability
Post-Mortem
Service Level Agreement
Dec 04, 2021
On-Call
Service Level Agreement
Nov 28, 2021
Reliability
On-Call
Service Level Agreement
Sep 22, 2021
Misc Articles
Jul 13, 2021
On-Call
Jul 11, 2021
Culture
Jun 26, 2021
Blogs
- incident.io Blog - Guides, advice and resources on incident management and response.
May 19, 2021
Blogs
- Rootly Blog - Incident management best practices and guides.
May 17, 2021
Misc Articles
- Site Reliability Engineering for Native Mobile Apps - Abhijith Krishnappa - Case study: Halodoc adaptation of SRE principles for Native Mobile Apps
Newsletters
- ChaosEngineering.news - Chaos Engineering newsletter. All things Chaos Engineering, directly to your inbox!
Apr 09, 2021
Education
Mar 09, 2021
Culture
Reliability
Post-Mortem
Feb 22, 2021
Capacity Planning
Jan 24, 2021
Culture
Jan 22, 2021
Books
Jan 14, 2021
Reliability
Dec 22, 2020
Blogs
- FireHydrant Blog - Posts about complex systems, incident response, and SRE best practices.
Dec 10, 2020
Capacity Planning
Dec 09, 2020
Education
Nov 14, 2020
Conferences & Meetups
- Site Reliability Engineering India - SRE Meetup India
Oct 03, 2020
Service Level Agreement
Oct 02, 2020
Culture
Sep 30, 2020
Misc Articles
Sep 25, 2020
Reliability
On-Call
Sep 24, 2020
Education
Reliability
On-Call
Sep 21, 2020
Culture
Sep 11, 2020
Culture
Reliability
Monitoring & Observability & Alerting
On-Call
Post-Mortem
Service Level Agreement
Blogs
- Squadcast Blog - Blog posts about SRE best practices, reliability, on-call and incident management.
- The SRE Dev - SRE-related Posts from dev.to.
Aug 07, 2020
Culture
Aug 06, 2020
Education
Jul 04, 2020
Newsletters
- KubeWeekly - The weekly newsletters for all things Kubernetes. KubeWeekly is curated by Bob Killen, Chris Short, Craig Box, Kim McMahon and Michael Hausenblas
May 29, 2020
Newsletters
- DevOpsLinks - A weekly newsletter about SRE, SysAdmin and DevOps news, tools, tutorials and opinions.
- SRE Weekly - Weekly Site Reliability Newsletter.
- O’Reilly Systems Engineering and Operations Newsletter - Weekly systems engineering and operations news and insights from industry insiders.
May 03, 2020
Education
Apr 30, 2020
Books
Apr 28, 2020
Reliability
Apr 25, 2020
SRE Tools
- Awesome SRE Tools (⭐617) - A curated list of Site Reliability and Production Engineering tools
Apr 09, 2020
Books
Post-Mortem
Feb 12, 2020
On-Call
Jan 31, 2020
Culture
Jan 30, 2020
On-Call
Jan 23, 2020
Blogs
- Resilience Roundup - Weekly analysis of Resilience Engineering and Human Factors research designed for software systems
Dec 21, 2019
Reliability
Dec 12, 2019
Reliability
Service Level Agreement
Dec 02, 2019
Service Level Agreement
Dec 01, 2019
Misc Articles
Nov 26, 2019
Conferences & Meetups
- Site Reliability Engineering Paris, France - SRE Meetup in the city of light.
Nov 11, 2019
Culture
Monitoring & Observability & Alerting
Nov 06, 2019
Service Level Agreement
Oct 31, 2019
Misc Articles
Oct 29, 2019
Culture
Monitoring & Observability & Alerting
Oct 22, 2019
On-Call
Oct 18, 2019
Conferences & Meetups
- ADDO - All Day DevOps - A 24 hour conference that is completely online and free.
Oct 10, 2019
Monitoring & Observability & Alerting
Oct 08, 2019
Culture
Monitoring & Observability & Alerting
Sep 11, 2019
Reliability
Sep 04, 2019
Post-Mortem
SRE Tools
Aug 28, 2019
Reliability
Aug 26, 2019
Service Level Agreement
Aug 05, 2019
Education
Aug 02, 2019
Reliability
On-Call
Jul 24, 2019
Culture
Jul 23, 2019
Reliability
On-Call
Jul 22, 2019
Service Level Agreement
Performance
Jul 20, 2019
On-Call
Jul 18, 2019
Culture
Education
Post-Mortem
Misc Articles
Blogs
- Blameless Blog - Blog posts about SRE culture and practices.
Jul 15, 2019
Education
On-Call
Service Level Agreement
Jul 05, 2019
On-Call
Jul 03, 2019
On-Call
- Google SRE Twitter Account - Google's SRE Twitter Account.
- SREWorkbook - The Official Twitter Account of Site Reliability Workbook.
Jun 28, 2019
Culture
Education
Books
On-Call
Jun 23, 2019
Service Level Agreement
Jun 19, 2019
Monitoring & Observability & Alerting
Jun 18, 2019
Hiring
Real-time Messaging
Jun 16, 2019
Blogs
- Brendan Gregg's Blog - Highly Technical Blog Posts About Systems Internals, Performance and SRE.
May 31, 2019
Misc Articles
May 16, 2019
Service Level Agreement
May 08, 2019
On-Call
Apr 17, 2019
Education
Mar 13, 2019
Misc Articles
Feb 02, 2019
Post-Mortem
Jan 21, 2019
Culture
Education
Reliability
Misc Articles
Jan 08, 2019
Blogs
- Cindy Sridharan - Blog posts about distributed systems and their management.
Dec 23, 2018
Misc Articles
Dec 14, 2018
Culture
Dec 10, 2018
Performance
Dec 01, 2018
- Twitter SRE Weekly - The Official Twitter Account of SRE Weekly Newsletter.
Nov 30, 2018
Education
Nov 16, 2018
Books
Nov 15, 2018
On-Call
Nov 14, 2018
Culture
Reliability
Nov 12, 2018
Culture
Nov 06, 2018
Culture
Oct 29, 2018
Culture
Oct 11, 2018
Culture
Sep 19, 2018
On-Call
Sep 12, 2018
Misc Articles
Sep 05, 2018
Culture
Books
Aug 24, 2018
Post-Mortem
Aug 14, 2018
Service Level Agreement
Jul 31, 2018
Books
Jul 20, 2018
Service Level Agreement
Jul 16, 2018
Culture
Jul 09, 2018
Culture
Jul 04, 2018
Service Level Agreement
Jul 02, 2018
Culture
Jun 15, 2018
Culture
On-Call
Jun 09, 2018
Service Level Agreement
Jun 02, 2018
Education
May 30, 2018
Post-Mortem
May 24, 2018
On-Call
Post-Mortem
May 23, 2018
Culture
Misc Articles
May 18, 2018
Monitoring & Observability & Alerting
May 06, 2018
On-Call
May 04, 2018
Education
Reliability
Apr 29, 2018
Culture
Conferences & Meetups
- Site Reliability Engineering Munich, Germany - SRE Meetup in the greater area of Oktoberfest city.
Apr 24, 2018
Books
Apr 18, 2018
Culture
Reliability
On-Call
Apr 17, 2018
Reliability
Misc Articles
Apr 16, 2018
Monitoring & Observability & Alerting
Apr 14, 2018
On-Call
Mar 24, 2018
On-Call
Mar 18, 2018
On-Call
Mar 14, 2018
On-Call
Mar 10, 2018
Monitoring & Observability & Alerting
Mar 09, 2018
Culture
Feb 19, 2018
On-Call
Feb 15, 2018
Culture
Monitoring & Observability & Alerting
Feb 13, 2018
Reliability
On-Call
Feb 12, 2018
Culture
Reliability
Monitoring & Observability & Alerting
On-Call
Service Level Agreement
Misc Articles
Feb 08, 2018
Culture
Books
Reliability
Monitoring & Observability & Alerting
On-Call
Jan 08, 2018
Service Level Agreement
Dec 29, 2017
Post-Mortem
Dec 25, 2017
On-Call
Nov 04, 2017
Blogs
- rachelbythebay - Techincal Blog Posts.
Oct 27, 2017
Culture
Oct 25, 2017
Service Level Agreement
Oct 24, 2017
Service Level Agreement
Oct 23, 2017
Hiring
Oct 21, 2017
Culture
Sep 17, 2017
Culture
Education
Monitoring & Observability & Alerting
On-Call
Blogs
- GopherSRE - Blog Posts about Go and SRE.
Sep 16, 2017
Culture
Aug 26, 2017
Education
Aug 24, 2017
Culture
Monitoring & Observability & Alerting
Aug 16, 2017
Misc Articles
Aug 13, 2017
Books
Aug 08, 2017
Performance
Aug 07, 2017
Post-Mortem
Aug 03, 2017
Reliability
Jul 31, 2017
Culture
Jul 29, 2017
Post-Mortem
Jul 26, 2017
Reliability
Jul 24, 2017
Culture
Reliability
Programming
Jul 20, 2017
Programming
Jul 18, 2017
Books
Jul 09, 2017
Culture
Reliability
Jun 26, 2017
Service Level Agreement
Jun 14, 2017
Culture
Jun 13, 2017
Books
Reliability
Service Level Agreement
Jun 12, 2017
Culture
Reliability
Jun 02, 2017
Culture
Reliability
May 29, 2017
Post-Mortem
May 25, 2017
Service Level Agreement
May 23, 2017
Reliability
Service Level Agreement
May 22, 2017
Reliability
May 06, 2017
Culture
May 03, 2017
Culture
Misc Articles
Blogs
- Increment - A digital magazine about how teams build and operate software systems at scale.
Mar 24, 2017
Education
Mar 11, 2017
Culture
Blogs
- Stephen Thorne's Blog - Blog Posts About SRE
Mar 01, 2017
On-Call
Feb 03, 2017
Culture
Reliability
Service Level Agreement
Jan 31, 2017
Culture
Capacity Planning
Service Level Agreement
Jan 28, 2017
Reliability
Jan 27, 2017
Reliability
Monitoring & Observability & Alerting
Jan 26, 2017
Reliability
Monitoring & Observability & Alerting
Jan 20, 2017
Reliability
Jan 16, 2017
Reliability
On-Call
Jan 09, 2017
Culture
Jan 05, 2017
Culture
Education
Reliability
Dec 31, 2016
Culture
Reliability
Monitoring & Observability & Alerting
Dec 24, 2016
Culture
Dec 23, 2016
Reliability
On-Call
Post-Mortem
Service Level Agreement
Blogs
- SysAdvent - One article for each day of December, ending on the 25th article.
Nov 29, 2016
Misc Articles
Nov 03, 2016
Culture
Oct 26, 2016
Monitoring & Observability & Alerting
Oct 17, 2016
Culture
Books
Hiring
Blogs
- Susan J. Fowler - Various blog posts about SRE, Software Engineering and Microservices.
Oct 13, 2016
Culture
Sep 29, 2016
Reliability
Sep 26, 2016
Misc Articles
Sep 21, 2016
Conferences & Meetups
- San Francisco Reliability Engineering - A Group Of People Who Are Passionate About Reliable, Performant Software Systems.
Sep 14, 2016
Culture
Reliability
Post-Mortem
Conferences & Meetups
- South Bay Site Reliability Engineering (Sunnyvale, CA) Meetup - A Group For Individuals Who Tackle Reliability Challenges For Web-Scale Systems.
Sep 12, 2016
Culture
On-Call
Post-Mortem
Sep 04, 2016
Books
Sep 01, 2016
Post-Mortem
Aug 30, 2016
Reliability
Misc Articles
Aug 26, 2016
Reliability
Aug 25, 2016
Culture
Education
Books
Hiring
Reliability
Post-Mortem
Real-time Messaging
- #sre channel at Hangops Slack - Discussion of Site Reliability Engineering generally.
- #incident_response channel at Hangops Slack - Discussion about Incident Response.
Jul 13, 2016
Conferences & Meetups
- SRECon Conferences - The Official SRE Conference.
- LISA Conferences - Prominent Conference About SysAdmin/DevOps/SRE.
- SRE Tech Talks - SRE Talks Hosted by Google.
- SREBook - The Official Twitter Account of Site Reliability Engineering Book.
- SREcon - SRECon's Official Twitter Account.
- USENIX Association - The Official USENIX Twitter Account.
Jul 06, 2016
Blogs
- Everything Sysadmin - Blog Posts About SysAdmin/DevOps/SRE by Tom Limoncelli.
- High Scalability - Technical Blog Posts About Systems Architecture.
- Twitter SRE - The Official Twitter Account of Twitter's SRE team.
Jul 05, 2016
Reliability
Jun 23, 2016
Culture
Jun 12, 2016
Monitoring & Observability & Alerting
Misc Articles
May 15, 2016
Reliability
May 11, 2016
Capacity Planning
Apr 30, 2016
Culture
Reliability
Programming
Misc Articles
Apr 26, 2016
Service Level Agreement
Apr 22, 2016
On-Call
Service Level Agreement
Apr 18, 2016
Culture
Monitoring & Observability & Alerting
Apr 15, 2016
Culture
Reliability
Monitoring & Observability & Alerting
On-Call
Post-Mortem
Apr 13, 2016
Culture
Education
Books
Hiring
Reliability
Misc Articles