Network Performance Troubleshooting Guide

exodata.io
Infrastructure |AWS |Azure |Cloud |Data & Analytics |Infrastructure

Published on: 19 July 2023

Your network is slow. Someone in the office says “the internet is down” (it’s not down — it’s slow, which is almost worse because it’s harder to diagnose). Before you start rebooting things at random, you need a systematic approach to figure out where the bottleneck actually is.

This is a practical guide to network performance troubleshooting — the tools, commands, and techniques that IT professionals use to isolate problems and fix them. No theory lectures, just the stuff that works.

Step 1: Define the Problem Before You Touch Anything

“The network is slow” isn’t a diagnosis. You need to narrow it down:

  • Is it slow for everyone or just one user?
  • Is it slow for all applications or just one (Teams, a specific web app, file shares)?
  • When did it start? Was anything changed recently (new equipment, a firmware update, a new ISP circuit)?
  • Is it consistently slow or intermittent?

These questions save hours of troubleshooting. If only one user is affected, you’re probably looking at a client-side issue (bad cable, WiFi interference, a machine saturating its own connection with a backup job). If everyone is affected for one application, you’re looking at a WAN or DNS issue. If everything is slow for everyone, start at the core — your firewall, core switch, or ISP circuit.

Step 2: Bandwidth Analysis — How Much Pipe Do You Have?

Before blaming the network, verify what you’re actually working with.

Speed Testing the Right Way

Speedtest.net is fine for a quick gut check, but it tests your connection to a remote server — not your internal network. For accurate internal bandwidth measurement, use iPerf3.

Run iPerf3 in server mode on one machine:

iperf3 -s

Then test from another machine on the same network:

iperf3 -c 192.168.1.50 -t 30 -P 4

This runs a 30-second test with 4 parallel streams. On a gigabit LAN, you should see 900+ Mbps. If you’re seeing 100 Mbps, someone is connected at Fast Ethernet (check the switch port negotiation — show interface status on Cisco gear). If you’re seeing 300-400 Mbps on WiFi, that’s actually normal for WiFi 5 (802.11ac) in a real-world environment.

Interface Utilization

Check whether your uplink ports are saturated. On a Cisco switch:

show interface GigabitEthernet0/1 | include rate

On a FortiGate firewall, check real-time throughput in the dashboard or run:

diagnose netlink interface list

If your 1 Gbps uplink is consistently running above 700-800 Mbps during business hours, you need a bigger pipe or link aggregation (LAG).

Step 3: Latency Diagnosis — Where’s the Delay?

High latency kills user experience far more than low bandwidth does. A 10 Mbps connection with 20ms latency feels faster than a 100 Mbps connection with 200ms latency for interactive applications.

Traceroute

The classic starting point. On Windows:

tracert 8.8.8.8

On macOS/Linux:

traceroute -I 8.8.8.8

The -I flag uses ICMP instead of UDP, which gives more reliable results since many routers don’t respond to UDP probes. Look for hops where latency suddenly jumps — that’s where the bottleneck lives. If the jump happens between your firewall and the first ISP hop, the issue is your WAN circuit. If it happens 8 hops out, it’s an ISP backbone issue you can’t fix, but you can route around it.

MTR — Traceroute on Steroids

MTR (My Traceroute) combines traceroute and ping into a continuous test that shows packet loss and latency at every hop over time. It’s significantly more useful than a single traceroute because intermittent problems only show up if you watch long enough.

On Linux/macOS:

mtr -r -c 100 8.8.8.8

This runs 100 cycles and produces a report. Look for:

  • Packet loss at a single hop that doesn’t propagate: The router at that hop is probably just deprioritizing ICMP — not a real problem.
  • Packet loss that continues through subsequent hops: Real packet loss. The problem is at the first hop where loss appears.
  • High jitter (big variation in latency at a hop): Could indicate congestion or a failing interface.

Ping with Statistics

A simple extended ping gives you baseline latency and jitter numbers:

ping -c 100 8.8.8.8

On Windows:

ping -n 100 8.8.8.8

The summary statistics at the end show min/avg/max/stddev. Standard deviation over 10-15ms on a wired connection suggests an unstable link.

Step 4: DNS — The Silent Performance Killer

Slow DNS resolution adds delay to every single web request, API call, and cloud service connection. Users experience it as “the internet is slow” but can’t pinpoint why.

Test DNS Resolution Time

On Linux/macOS:

dig exodata.io | grep "Query time"

On Windows:

Measure-Command { Resolve-DnsName exodata.io } | Select TotalMilliseconds

DNS resolution should take under 50ms. If you’re seeing 200-500ms, your DNS server is either overloaded, misconfigured, or too far away.

Common DNS Fixes

  • Switch to a faster resolver. If you’re using your ISP’s DNS servers, try Cloudflare (1.1.1.1) or Google (8.8.8.8). For internal DNS, make sure your Active Directory DNS servers are responsive and not running on overloaded domain controllers.
  • Check for DNS forwarding chains. Sometimes a local DNS server forwards to another forwarder that forwards again — each hop adds latency. Keep the chain short.
  • Enable DNS caching. Most firewalls (FortiGate, Meraki, pfSense) can act as a caching DNS forwarder. This means the second request for the same domain resolves instantly from cache.
  • Look for excessive NXDOMAIN responses. Run a packet capture filtered for DNS (Wireshark filter: dns) and check for failed lookups. Malware often generates thousands of DNS queries to random domains, which can slow down the entire DNS infrastructure.

Step 5: TCP Tuning for WAN Performance

TCP wasn’t designed for high-latency, high-bandwidth links. The default TCP window size on many operating systems is too small for WAN connections, which means you’re not filling the pipe.

Calculate Bandwidth-Delay Product

The formula: Bandwidth (bits/sec) x Round-Trip Time (seconds) = TCP window size needed

Example: A 100 Mbps link with 50ms RTT needs a window of at least 625 KB to fill the pipe. The default Windows TCP window size is 64 KB — meaning you’ll never use more than about 10 Mbps on that link without tuning.

Windows TCP Tuning

Check current settings:

netsh interface tcp show global

Enable receive window auto-tuning (should be on by default, but sometimes gets disabled by “optimization” tools):

netsh interface tcp set global autotuninglevel=normal

Linux TCP Tuning

Check and increase buffer sizes in /etc/sysctl.conf:

net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216

Apply with sysctl -p. This is particularly impactful for file transfers between offices over VPN tunnels.

Step 6: QoS — Stop Letting Bulk Traffic Ruin Everything

Without Quality of Service policies, all traffic is treated equally. That means a Windows Update download competes with your CEO’s Teams call for the same bandwidth. QoS fixes this by classifying and prioritizing traffic.

Basic QoS Priority Model

PriorityTraffic TypeDSCP MarkingWhy
HighestVoice (VoIP)EF (46)Zero tolerance for latency/jitter
HighVideo conferencingAF41 (34)Low latency needed
MediumBusiness apps (ERP, CRM)AF21 (18)Important but tolerates some delay
LowWeb browsing, emailAF11 (10)Best effort is fine
LowestBackups, updatesCS1 (8)Can wait

On a FortiGate, you create traffic shaping policies that match application signatures (it can identify Teams, Zoom, Salesforce, etc.) and assign bandwidth guarantees and limits. On Meraki, traffic shaping is configured per-SSID for WiFi and per-port for switches.

The key insight: QoS is most effective on your WAN link, not your LAN. Internally, you usually have plenty of bandwidth. It’s the 100-500 Mbps internet circuit where contention happens.

Step 7: Caching and CDN — Reduce the Distance

Local Caching

If multiple users access the same cloud resources, a local caching proxy reduces redundant downloads. Windows Server BranchCache handles this for file shares and WSUS updates. For web content, a transparent proxy like Squid can cache frequently accessed resources.

CDN Configuration

If you’re hosting web applications, put them behind a CDN like Cloudflare, AWS CloudFront, or Azure CDN. A CDN caches static assets at edge locations worldwide, so a user in Nashville isn’t pulling JavaScript files from a server in Frankfurt. The performance difference is dramatic — page load times drop by 40-60% for geographically distributed users.

For internal applications, the equivalent strategy is to host application servers close to users. If your ERP runs in AWS us-east-1 but most of your users are in Los Angeles, every click has 70ms of physics-imposed latency that no amount of tuning will fix.

Common Bottlenecks and Their Fixes

Here are the problems I see most often in small and mid-size networks:

Problem: WiFi is slow in conference rooms. Fix: Add a dedicated AP in the conference room. A single Meraki MR36 or Aruba AP-505 handles 30-40 simultaneous clients comfortably.

Problem: File transfers to a branch office are painfully slow over VPN. Fix: Check TCP window size (see above). Also verify the VPN tunnel isn’t running through an overloaded firewall — check the firewall’s CPU utilization during transfers.

Problem: Teams/Zoom calls drop or have choppy audio. Fix: Implement QoS on your firewall to prioritize real-time media traffic. Also check for WiFi-to-Ethernet transitions — if users are on WiFi, 802.11 retransmissions add jitter that VoIP can’t tolerate.

Problem: Everything slows down at 2 PM every day. Fix: Check for scheduled tasks — cloud backups, antivirus scans, Windows updates. Reschedule these to off-hours or throttle their bandwidth with QoS.

Problem: Cloud applications (Microsoft 365, Salesforce) are slow despite fast internet. Fix: Check your DNS resolution time and verify you’re not backhauling cloud traffic through a VPN or proxy. Microsoft publishes their IP ranges — add direct routes for Microsoft 365 traffic so it goes straight to the internet.

When to Call In Help

If you’ve worked through these steps and the problem persists, you’re likely dealing with something that requires deeper packet analysis (Wireshark), ISP-level troubleshooting, or architectural changes to your network. That’s where a managed network provider earns their money.

Exodata’s IT operations team handles network performance troubleshooting and optimization for businesses across Nashville and the Southeast. Whether you need help diagnosing a specific issue or want ongoing network management to prevent problems before they start, our managed services team can help you get your network performing the way it should.


Need help with your IT infrastructure? Exodata helps businesses modernize, secure, and manage their infrastructure environments. Contact us to discuss your needs.