[Optional] Lab Task 7: Scenario 3 (AI for NetOps)

\[Optional\] Lab Task 7: Scenario 3 - AI for NetOps: Enhancing IT-Support Tickets with AI

In this lab, you will build a closed-loop automation system with human interference, where network issues detected through syslog messages trigger automated actions.

Specifically, when a critical syslog error message appears, a ServiceNow ticket will be created automatically. However, instead of simply passing raw error messages to the support engineer, a Large Language Model (LLM) will enhance the ServiceNow ticket by suggesting potential solutions based on the syslog error.

This approach improves efficiency by providing engineers with actionable insights, reducing mean time to resolution (MTTR), and enabling AI-assisted troubleshooting in network operations.

By the end of this lab, you will have:

Configured a Cisco IOS XR switch to send syslog messages to Splunk.
Verified syslog message ingestion in Splunk.
Set up a Python-based LLM webhook service.
Created a Splunk alert to trigger the LLM webhook.
Tested the complete flow by generating a duplicate ARP error message and verifying its presence in ServiceNow.

Step 1: Introduction to Splunk Events

As you already know by now, Splunk is a powerful tool for log and event management. It helps analyze, monitor, and visualize data from various sources, including network devices.

In this scenario we are working with events, which are timestamped records of activities or changes that occur in a system. In this scenario we will not use the Analytics tab in the Search & Reporting application, we will use the search bar.

Example network engineering events in Splunk:

Interface down/up messages
BGP session flaps
Authentication failures
ARP conflict messages
High CPU/memory warnings

Events vs. Metrics

Events: Represent discrete occurrences, such as a syslog message indicating an interface down or an authentication failure.
Metrics: Represent numerical data over time, such as CPU utilization or bandwidth usage.

Step 2: Sending Syslog Messages from IOS XR to Splunk

Like in the previous scenario, we need to ingest data into Splunk at first in order to analyze and view the data. For that, we send event messages via syslog which is a well-known standard for message logging.

Syslog messages provide insights into network device operations and issues. Forwarding them to Splunk centralizes log management, simplifies troubleshooting, and enables automation workflows such as AI-driven enhancements and ServiceNow ticket creation.

Why use a separate Syslog Server?

Instead of directly sending syslog messages to Splunk, it is best practice to use a dedicated syslog server with the Splunk Syslog App.

This approach:

Reduces load on Splunk.
Allows for pre-processing and filtering of messages.
Enhances reliability by ensuring logs are not lost.

Preconfigured Syslog Server: In this lab, the syslog server is already configured to forward logs to Splunk. The Splunk Syslog App (which is using syslog-ng under the hood featuring predefined data models) is running on an external virtual machine.

If you are interested, check out the documentation of the external application.

Later in Splunk you will see that the source of the event is defined as SC4S (=Splunk Connect for Syslog).

Background on Syslog Syslog is a standard protocol used to send system logs from network devices to a centralized logging server. It uses UDP (default) or TCP on port 514.

Step 3: Configuring IOS XR Switch and IOS XE Router for Syslog Forwarding

Since we know that the server is running, we need to configure on our Nexus and Catalyst, that we would like to send all logs to the server.

Luckily, we thought ahead and pushed already the configuration of the syslog-server ot the device. Let’s double check if the syslog server is sending events.

Catalyst Router – IOS XE

bash

POD01

show run | include logging
show logging last 5

You should see after the configuration lines:

bash

POD01

Logging to 10.49.232.221 (tcp port 514, audit disabled,link up), xxx message lines logged,

Nexus Switch – NX OS

Then, let’s see if the server is actually sending logs with the following command.

bash

POD01

sh run | begin logging
show logging server

You should see after the configuration lines:

bash

POD01

Logging server: enabled{198.18.134.22}server status: No errors foundserver severity: notificationsserver facility: local7server VRF: managementserver port: 514

Step 4: Verifying Syslog Message Ingestion in Splunk

Now let’s see if we can actually see our syslog messages in Splunk. For that, let’s send a sample error message from our Catalyst router:

bash

POD01

send log 0 Hello from Catalyst!

Checking in Splunk

Go to Splunk, specifically the Search & Reporting application. Be sure that you are in the search view, where you can see the search bar.
Run each of the following search queries to filter syslog messages from NX-OS or IOS XE respectively.
If your message appears, syslog forwarding is working correctly.

bash

POD01

index="netops" sourcetype="cisco:ios" host="<insert-cat8kv-host-ip-from-pod>"

Explanation:

The Splunk Search Bar is where you enter search queries to retrieve, filter, and analyze data. It is very powerful and you can write multi-lines queries using the Search Processing Language (SPL). It would take a whole lab or Cisco Live session to cover what is possible with SPL. In this lab we are just scratching the surface.

In our case we are filtering all logs with these metadata fields. They help to categorize and search for events effectively.

sourcetype: Represents the format or structure of the data.
host: Represents the machine or device where the event originated. In our case this is the IP address of the Catalyst 8000V router.

Try it out: Filter out more

Try out other filters and check out the details of the syslog message on the Catalyst router when clicking on the event for example by including the severity_id you can directly see your test log message.

bash

POD01

index="netops" sourcetype="cisco:ios" host="<insert-cat8kv-host-ip-from-pod>" severity_id=0

Step 5: Examining the LLM Webhook Service

Now that we are getting syslog messages into Splunk, let’s examine the Python-based LLM Webhook Service which enhances ServiceNow tickets with suggested solutions.

To test it:

Open a browser and enter the webhook URL: http://198.18.134.22:5000/webhook
If the service is running, you should see a simple JSON message with:
"This the the webhook service for the LLM"

What does the service do?

The Webhook LLM service is running inside of a Podman container and using the Python framework Flask (Web Server Gateway Interface (WSGI) with asynchronous functions).

Find below the most important function when the webhook is triggered:

JSON is decoded
Check if severity level is less than 4. If yes:
Check if the mnemonic (short error description) is “DUP_SRC_IP“. If yes:
Ask the LLM about the error_information: ask_llm()
Create a Service Now ticket via REST API: create_service_now_ticket()

python

POD01

1@app.route('/webhook', methods=['POST'])
2
3async def webhook_post():
4    try:
5        data = request.get_json()
6
7        severity = int(data["result"]["severity_id"])
8
9        # only check messages with severity less than 4
10        if severity < 4:
11            log.info(f"Received error message (severity<4) for host {data['result']['host']}")
12
13            # important data points
14            error_information = f"""
15                error message: {data["result"]["message_text"]}
16                host: {data["result"]["host"]}
17                vendor: {data["result"]["vendor"]}
18            """
19            
20            # check if the error message is a duplicate source IP issue
21            if data["result"]["mnemonic"] == "DUP_SRC_IP":
22                # at first as the LLM on the solution
23                result = await ask_llm(error_information)
24
25                # create service now ticket information
26                service_now_ticket = f"Issue Information from the device:\n{error_information}\n\nRecommended Solution:\n\n{result}"
27                log.debug(service_now_ticket)
28
29                snow_response = await create_service_now_ticket(service_now_ticket,sys_id)
30
31                return jsonify({"response": snow_response}), 200
32
33    except Exception as e:
34        log.error(f"Error processing webhook: {e}")
35
36        return jsonify({"error": "Invalid JSON"}), 400

Step 6: Creating a Splunk Alert for the LLM Webhook

You might have already guessed it, we can use Splunk to trigger this webhook.

Our Use-Case example: We would like to showcase if a duplicate source IP address is detected by the Nexus switch, Splunk will trigger the webhook, get a solution from a LLM and create a Service Now ticket with the provided solution for the support engineer.

Let’s create an alert in Splunk:

1. Navigate to Search in the Splunk Search & Reporting application.

2. Configure the search query to filter any syslog messages coming from your Nexus switch:

bash

POD01

index="netops" host="<insert-Nexus-host-ip>"

3. Execute the search. Then click in the top right corner Save As -> Alert.

4. Configure the alert as seen in this screenshot below.

Name: The unique identifier for the alert.
Alert type: Specifies the alert’s evaluation method: Every minute (via a Cron expression) Splunk will check if an alerts needs to be sent.
Throttle: Prevents duplicate alerts by limiting how often an alert can trigger within a set period.
Suppress results containing field value: Ensures that alerts do not trigger multiple times for the same field value within the throttle period. In our case we use:
bash
POD01
```
mnemonic = $result.mnemonic$
```
Suppress triggering for: The duration during which the same alert condition should not be triggered again after an initial alert. Here choose 4 hours.
Trigger Actions: Defines what happens when an alert is triggered. We will send a message to our Webhook Service on:http://198.18.134.22:5000/webhook

Step 7: End-to-End Testing

Finally let’s test if everything works!

There are actually two ways how to enable the link in CML:

Conventional one: logging in to CML, choosing the link as shown in the picture, and starting it:

A) Triggering the Syslog Error (old-school way)

1. Return to CML and click on the link between datacenter_client02 and the nxos9kv-01 switch.

2. Then click on START as seen in the image below.

Model Context Protocol (MCP) Server for Cisco Modeling Labs (CML) The second one is based on the knowledge that you have gained in the Lab Task 6 before, and now we would like to introduce using the MCP server for CML in order to enable the link.

Login to LibreChat, enable the CML-MCP-Server, choose theClaude Sonnet 4.5 as the LLM and use the following prompt in CML.

Attention

Please replace with your own Pod (pod) in the followin \[Prompt\]

yaml

POD01

[Prompt]: Start the link between nxos9kv-01 and datacenter-client02 in lab POD-01

Please follow the pictures

3. Back in CML: Log into your Nexus switch: Right click on the device –> Console

4. Log into your the datacenter_client02, execute:

bash

POD01

ping 10.0.1.1

5. This will cause a Duplicate ARP error on the Nexus switch. Return to your Nexus switch to verify the error. You should see the error message already appearing in the console.

B) Verifying in Splunk (optional)

Run the search query:

bash

POD01

sourcetype="cisco:ios" host="<insert-nxos9kv-host-ip-from-pod>" ARP

Ensure the ARP error is logged.

3. Checking the ServiceNow Ticket

Log into our ServiceNow instance.
Navigate to Incidents.
Locate the newly created ticket.
Verify that the AI-powered solution is included in the ticket.

Attention

Do not open the the ServiceNow web page in your RDP session, because it makes it very slow. Please open the ServiceNow web page outside of the RDP session in the browser on the your workstation! Copy&Paste the link: https://ven03091.service-now.com/

Conclusion

In this lab, you successfully:

Configured a Cisco switch to send syslog messages to Splunk.
Verified log ingestion in Splunk.
Examined the LLM webhook service.
Created a Splunk alert to trigger the AI solution.
Tested the full workflow, from syslog error detection to AI-enhanced ServiceNow ticket creation.

This AI-driven NetOps automation improves incident response time and helps network engineers resolve issues efficiently. 🚀