LTRATO-2600: Cross-Domain Network Automation Lab Guide

Session Overview

Welcome to the Cross-Domain Network Automation lab session. This hands-on lab demonstrates how modern NetDevOps practices, CI/CD pipelines, and AI-driven automation can transform network operations across multiple domains.

Session Scenario

Session abstract

The demand for automated and stable networks grows in the world of agile applications and infrastructures which are more distributed and complex than before. Networks are no longer isolated segments and must be adjusted where business intent and cross-domain insights drive network changes. Cross-domain automation seamlessly connects multiple domains, enabling continuous delivery using Infrastructure as Code (IaC) and integration of updates across all the network infrastructure. By leveraging open-source large language models (LLMs), operators can deploy advanced troubleshooting use cases. These AI-driven solutions proactively identify, diagnose, and resolve network issues, enhancing operational efficiency and reducing downtime. Integrating AI within CI/CD frameworks ensures that updates and fixes are deployed. Splunk will serve as the data lake for network monitoring. This will ensure that telemetry data are analysed to maintain a healthy environment.

Learning Objectives

By the end of this lab, you will be able to:

Implement Infrastructure as Code (IaC) for multi-domain network configuration
Build and execute automated CI/CD pipelines for network changes
Deploy closed-loop automation using telemetry data and machine learning
Leverage Large Language Models (LLMs) for proactive network troubleshooting
Use different MCP (Model Context Protocol) server together with a LLM such as Sonnet 4.5 and LibreChat (AI Chat platform)
Get all the code as a takeaway

During the lab session, you will work on three different scenarios:

Configure the environment using modern NetDevOps tools to build a cross-domain automation pipeline.
Develop a closed-loop automation system that utilizes telemetry data in Splunk to respond automatically to specific behaviors.
Analyze certain syslog entries and take proactive measures by using a language model (LLM) to create a service request in ServiceNow.

1. Scenario (must do!)

This scenario represents a comprehensive automation pipeline designed to push configuration changes across three different network domains: branch (Catalyst), data Center (Nexus), and security (Firewall). The primary objective of this automation scenario is to configure switch ports in both the branch and data center while ensuring that firewall rules are correctly applied to allow traffic between these domains. All these tasks are executed through a single automated GitLab pipeline (CI/CD), streamlining the entire process.

The workflow begins when a POD user submits code containing the required network configuration changes. These changes can include VLAN assignments, interface settings, and security policies. Once the code is submitted, GitLab triggers the pipeline to initiate the configuration process in a structured, version-controlled manner.

To ensure secure authentication and accuracy, GitLab retrieves the necessary credentials from Hashicorp Vault. This step guarantees that sensitive access information is handled securely. Simultaneously, GitLab fetches inventory and configuration data from NetBox, which serves as a single source of truth for the network infrastructure. This ensures that all configuration changes are applied to the correct devices and interfaces.

Once the credentials and configuration data are available, GitLab pushes the job to its docker/shell runner, which is responsible for executing the automation scripts. The runner processes the job and applies configuration updates to the respective controllers managing different network domains.

After the deployment, the automation pipeline executes verification steps to ensure that the applied configurations are functioning correctly. These verification tasks include connectivity tests between the branch and data center, validation of firewall rules, and consistency checks of port settings across both environments.

2. Scenario (must do)

Bringing together multiple domains in a pipeline is one thing. Letting them interact with each other and correlate events to bring it all together that can make a real difference.

In this scenario the Model Context Protocol (MCP) is a standardized communication layer that enables AI agents to access multiple data sources through a unified interface, eliminating the need for custom integrations for each system.

MCP represents a fundamental shift in how AI systems interact with enterprise infrastructure. Rather than building point-to-point integrations, organizations can deploy MCP servers that make their systems AI-accessible, enabling agents to orchestrate complex workflows across multiple platforms through natural language.

3. Scenario (optional)

The blue path in this closed-loop automation scenario illustrates how telemetry data from network devices is collected, analyzed, and used to trigger automated configuration changes. Devices such as the Nexus 9KV and Catalyst 8KV send telemetry data in JSON format to the Telegraf agent. Telegraf processes and forwards the data to Splunk Core Enterprise, where it is analyzed for anomalies using Splunk’s machine learning (ML) engine.

When an anomaly is detected, Splunk triggers a GitLab pipeline via a Webhook, initiating an automated configuration change process. GitLab, upon receiving the Webhook, processes the pipeline using its docker/shell runner, which in turn applies necessary configuration changes to the respective controller platforms, such as Catalyst Center, ensuring that corrective actions are applied to the affected devices. This automation creates a continuous feedback loop, enhancing network stability and minimizing manual intervention.

4. Scenario (optional)

The highlighted blue path in the diagram represents the automated syslog data flow from network devices to proactive notification and AI-driven recommendations. Network devices, such as the Nexus 9KV and Catalyst 8KV, send syslog data to the Telegraf agent, which collects and forwards it to Splunk Connect for Syslog (Docker). This component processes the logs and feeds them into Splunk Core Enterprise, where the data is analyzed for potential issues or anomalies.

Once the syslog data is ingested into Splunk, a Python WebHook/Script is triggered to take further action based on detected events. This script can create a service request in ServiceNOW, alerting IT teams of potential issues, or forward the data to an LLM (OpenAI) to generate recommendations for resolving the problem. Finally, the user receives a proactive notification, ensuring they are informed and can take action as needed to maintain a healthy network environment.