Challenge
A client running web services and internal applications on Ubuntu servers had no systematic monitoring, manual update processes and no centralised logging. Incidents were discovered only when users complained.
Solution
Took over full administration of the server fleet and implemented a modern ops workflow:
- Hardening — CIS benchmark hardening applied to all servers: disabled root SSH login, set up key-based authentication only, configured UFW firewall rules, and enabled automatic security updates.
- Monitoring — deployed Prometheus + Grafana stack for real-time visibility into CPU, memory, disk I/O and service health. Alerting via email and Telegram for any threshold breach.
- Centralised logging — set up Loki + Promtail to aggregate logs from all servers into a single searchable interface.
- Automated maintenance — Ansible playbooks for OS updates, configuration drift detection and routine tasks, reducing manual work to near zero.
- Backup — daily snapshot backups with retention policy, stored offsite on a separate provider.
Result
Mean time to detect issues dropped from hours to under 5 minutes. The client now has full visibility into server health and spends less on emergency incident response.