Bruno Marcuche
Site Reliability Engineer, AIOPs
Boulder, CO 80301 · 561-284-2441 · bmarcuche@gmail.com · linkedin.com/in/bruno-marcuche · resume.mindtunnel.org · github.com/bmarcuche
Summary
Site Reliability Engineer and technical leader who builds the systems other teams run on. I've scaled infrastructure across on-prem, hybrid, and cloud, and automated deployment and operations for thousands of Linux and Windows instances. Most recently I architected an internal AI agent platform, a custom semantic router orchestrating 16 specialized LLM agents, that has handled 10,000+ ops and engineering tasks and cut change lead time by ~89%. I lead ops and SRE teams, drive observability with OpenTelemetry and PagerDuty, and turn slow, manual operations into fast, repeatable automation. Open to both leadership and senior IC roles.
Professional Experience
Operations Team Lead
AssetWorks
- Architected and built an internal AI agent platform on the Model Context Protocol: 16 specialized LLM agents coordinated by a custom semantic router (fine-tuned sentence-transformer embeddings with pgvector knowledge retrieval). It handled 10,000+ routed ops and engineering tasks in its first 12 weeks across deployment, cloud, CI/CD, and incident response.
- Cut FA-EAM change lead time by ~89% after launch. Customer upgrades dropped from ~27 days to under 3 days, and provisioning of new environments from ~12 days to ~1.5 days, declining every month after going live (DORA lead time for changes).
- Delivered 235 Ansible and CI/CD pipelines built by the agents and automated 213 customer upgrade deployments, eliminating ~426 hours of manual deploy work, across a fleet of 350+ servers serving 150+ government clients at 99.99% uptime.
- Designed multi-agent incident workflows that investigate and remediate across the fleet in a single session. They caught a bug that was deleting configuration on 32 servers and identified the root cause of a Windows Update and API regression that would otherwise take hours of manual log correlation.
- Lead a team of five; drove an observability rollout (OpenTelemetry and Observe) to reduce MTTR and mentor engineers through 1:1s, training, and knowledge sharing.
Backend Developer, Founder
EdventureTrek
- Founded and led development of an educational game focused on outdoor exploration and biodiversity.
- Designed custom taxonomy GPTs for plant and animal classification.
- Built Python/FastAPI backend with MySQL and event logging.
- Managed CI/CD on GCP with GitHub Actions and internal tooling.
Site Reliability Engineering Manager
AnswerRocket
- Led remote SRE team (4 reports); ran weekly syncs and architecture reviews.
- Expanded Ansible coverage across AWS, cutting manual deploy time by 15%.
- Supported SOC 2 audit by automating cloud environment validation.
Site Reliability Architect
OfficeSpace Software
- Led SRE hiring, onboarding and 1:1s for a 3-person team.
- Built Slackbot enabling teams to deploy customer instances in under 10 minutes.
- Reduced deploy times over 60% via CI pipeline (CircleCI, Puppet, Docker, Terraform).
- Migrated infrastructure from Rackspace to GCP, saving $60K annually.
- Owned production and staging infrastructure on GCP; managed OS patching, config management, release packaging, and automation with Puppet and Python.
Sr. Technical Consultant / Team Lead
Hewlett Packard
- Delivered Tier 3 support for HP Server Automation; mentored junior engineers.
- Automated workflows by developing Python scripts utilizing the HPSA API.
- Ranked #1 in team for customer satisfaction.
Education
Information Systems | Bachelor's Degree
ITT Technical Institute
(Honors Graduate)
Strengths
Key Skills
Cloud & Automation
CI/CD & Delivery
Observability
AI & Agents
Development & Data
Systems & Serving
Hobbies
- Exploring AI tooling (LLMs, MCP)
- SRE meetups
- Home lab experimentation with Docker
Volunteering
Delivery Driver / Wellness Check
Meals on Wheels Boulder
As a Meals on Wheels delivery driver, I got to enjoy great conversations with some of Boulder's greatest citizens.
mowboulder.org