Azure VPN Gateway is the backbone of hybrid connectivity for thousands of organizations. It bridges on-premises networks, remote users, and cloud workloads through encrypted tunnels over the public internet. But when a VPN tunnel goes down or performance degrades, the impact is immediate: applications become unreachable, file shares time out, and users start filing tickets.
This guide walks through the most common Azure VPN Gateway issues across Site-to-Site (S2S), Point-to-Site (P2S), and VNet-to-VNet configurations. Each section follows a problem, diagnosis, solution format so you can get back to stable connectivity as quickly as possible. Whether you are managing a single tunnel or a complex cloud engineering environment with dozens of connections, the troubleshooting steps below apply.
Understanding Azure VPN Gateway Basics
Before diving into specific issues, it helps to understand the components involved.
Azure VPN Gateway is a virtual network gateway resource deployed within a dedicated gateway subnet in your Azure Virtual Network (VNet). It supports two VPN types:
-
Route-based (RouteBased): Uses any-to-any traffic selectors and is required for most modern configurations including P2S, VNet-to-VNet, and multi-site S2S connections. This is the recommended type for nearly all deployments.
-
Policy-based (PolicyBased): Uses specific traffic selectors defined by combinations of address prefixes. Limited to a single S2S tunnel and does not support P2S or VNet-to-VNet.
VPN Gateway SKUs
Choosing the right SKU directly affects throughput, tunnel count, and available features. Mismatched SKU expectations are a frequent source of performance complaints.
| SKU | Max S2S Tunnels | Max P2S Connections | Aggregate Throughput |
|---|---|---|---|
| VpnGw1 / VpnGw1AZ | 30 | 250 | 650 Mbps |
| VpnGw2 / VpnGw2AZ | 30 | 500 | 1.25 Gbps |
| VpnGw3 / VpnGw3AZ | 30 | 1,000 | 2.5 Gbps |
| VpnGw4 / VpnGw4AZ | 100 | 5,000 | 5 Gbps |
| VpnGw5 / VpnGw5AZ | 100 | 10,000 | 10 Gbps |
The AZ variants support zone redundancy and are recommended for production workloads. For the full SKU comparison, see the Microsoft VPN Gateway SKU documentation.
You can check your current gateway SKU using the Azure CLI:
az network vnet-gateway show \
--name MyVpnGateway \
--resource-group MyResourceGroup \
--query "{Name:name, SKU:sku.name, VpnType:vpnType, GatewayType:gatewayType}" \
--output table
Site-to-Site (S2S) Tunnel Issues
S2S connections link your on-premises network to Azure through an IPsec/IKE tunnel. These are the most common VPN Gateway deployments and the most frequent source of support tickets.
Problem: Tunnel Will Not Connect
Symptoms: The connection resource in Azure shows a status of “Not Connected” or “Connecting” indefinitely. No traffic passes between on-premises and Azure.
Diagnosis:
Start by checking the connection status and verifying the configuration:
az network vpn-connection show \
--name MyS2SConnection \
--resource-group MyResourceGroup \
--query "{Status:connectionStatus, Protocol:connectionProtocol, EgressBytes:egressBytesTransferred, IngressBytes:ingressBytesTransferred}" \
--output table
If the status shows “Not Connected,” check the following in order:
-
Pre-shared key (PSK) mismatch. This is the single most common cause of S2S tunnel failure. The PSK must match exactly on both the Azure Local Network Gateway connection and the on-premises device. Keys are case-sensitive and whitespace matters.
Verify the PSK configured on the Azure side:
az network vpn-connection shared-key show \ --connection-name MyS2SConnection \ --resource-group MyResourceGroup -
On-premises public IP mismatch. The IP address configured in the Azure Local Network Gateway must match the actual public IP of your on-premises VPN device. If your ISP has changed your public IP or you are behind a NAT device, the tunnel will fail.
az network local-gateway show \ --name MyLocalGateway \ --resource-group MyResourceGroup \ --query "{GatewayIP:gatewayIpAddress, AddressPrefixes:localNetworkAddressSpace.addressPrefixes}" -
Firewall or NAT blocking UDP 500 and 4500. IPsec requires UDP port 500 (IKE) and UDP port 4500 (NAT-T) to be open between the two endpoints. Many corporate firewalls block these by default. Verify with your network team that both ports are allowed outbound and inbound.
-
IKE version mismatch. Azure supports both IKEv1 and IKEv2. If your on-premises device is configured for IKEv1 but the Azure connection expects IKEv2 (or vice versa), the handshake will fail silently. Ensure both sides agree on the same version.
Solution:
Correct any mismatches identified above. If the PSK needs to be reset:
az network vpn-connection shared-key reset \
--connection-name MyS2SConnection \
--resource-group MyResourceGroup \
--key-length 32
Then set the same key on your on-premises device.
Problem: IKE Phase 1 or Phase 2 Negotiation Failures
Symptoms: The tunnel connection drops during negotiation. Logs on the on-premises device show “no proposal chosen” or “SA payload mismatch.”
Diagnosis:
IKE Phase 1 (Main Mode or Aggressive Mode) establishes the security association between the two peers. Phase 2 (Quick Mode) negotiates the actual IPsec tunnel parameters. Mismatches in either phase prevent the tunnel from forming.
Common mismatches include:
- Encryption algorithm: Azure supports AES-128, AES-192, AES-256, and others. Both sides must offer at least one matching algorithm.
- Integrity/hash algorithm: SHA1, SHA-256, SHA-384, or MD5. SHA-256 or higher is recommended.
- DH Group: Group 2, 14, 24, ECP256, ECP384. Both sides must agree on at least one group.
- SA lifetime: Azure defaults to 28,800 seconds for IKE SA and 3,600 seconds for IPsec SA. A mismatch in lifetimes can cause rekey failures and intermittent drops.
You can configure a custom IPsec/IKE policy on the Azure connection to force specific parameters:
az network vpn-connection update \
--name MyS2SConnection \
--resource-group MyResourceGroup \
--set ipsecPolicies="[{ \
\"saLifeTimeSeconds\": 3600, \
\"saDataSizeKilobytes\": 102400000, \
\"ipsecEncryption\": \"AES256\", \
\"ipsecIntegrity\": \"SHA256\", \
\"ikeEncryption\": \"AES256\", \
\"ikeIntegrity\": \"SHA256\", \
\"dhGroup\": \"DHGroup14\", \
\"pfsGroup\": \"PFS2048\" \
}]"
Solution:
Align the IPsec/IKE parameters on both sides. Document the exact algorithms, DH groups, and SA lifetimes in use. For reference, Microsoft publishes default IPsec/IKE parameters for Azure VPN Gateway.
If your on-premises device is an older model with limited cipher support, you may need to use a custom policy on the Azure side to match what the device can offer. Keep in mind that weaker algorithms like DES and MD5 should be avoided. Aligning on AES-256 with SHA-256 and DH Group 14 is a strong baseline that most modern devices support.
Problem: Traffic Selectors Mismatch
Symptoms: The tunnel shows as “Connected” in Azure, but traffic for certain subnets does not flow. Some address ranges work while others do not.
Diagnosis:
This is a traffic selector problem. In a policy-based configuration, each pair of local and remote address prefixes creates a separate IPsec SA. If the on-premises device has different address prefixes configured than what Azure expects, traffic for the mismatched ranges will be silently dropped.
Check the address spaces configured on the Azure Local Network Gateway:
az network local-gateway show \
--name MyLocalGateway \
--resource-group MyResourceGroup \
--query "localNetworkAddressSpace.addressPrefixes"
Compare this with the address prefixes configured on your on-premises device. They must match exactly.
Solution:
For route-based VPN gateways, Azure uses any-to-any (0.0.0.0/0 to 0.0.0.0/0) traffic selectors by default, which simplifies configuration. If you are using a route-based gateway but still experiencing selective traffic failures, enable the usePolicyBasedTrafficSelectors option:
az network vpn-connection update \
--name MyS2SConnection \
--resource-group MyResourceGroup \
--use-policy-based-traffic-selectors true
This is particularly useful when connecting to on-premises firewalls that require policy-based traffic selectors even though your Azure gateway is route-based. For more context on network segmentation concepts, see our guide on VLANs vs subnets in the cloud.
Point-to-Site (P2S) Issues
P2S connections allow individual clients (laptops, workstations) to connect to Azure VNets. These are commonly used for remote workers or administrators who need direct access to cloud resources without a full S2S tunnel.
Problem: Client Authentication Failures
Symptoms: The VPN client fails to connect with errors like “The remote connection was denied because the user name and password combination you offered is not recognized” or “Error 798: A certificate could not be found.”
Diagnosis:
P2S supports three authentication methods. Determine which one you are using:
- Azure certificate authentication: Requires a root certificate uploaded to the VPN Gateway and client certificates installed on each device.
- RADIUS authentication: Delegates authentication to a RADIUS server (e.g., NPS on Windows Server).
- Azure AD / Entra ID authentication: Uses Azure AD tokens for authentication. Only available with the OpenVPN protocol.
For certificate-based authentication, check that:
- The root certificate uploaded to Azure has not expired. You can verify this in the Azure portal under the VPN Gateway Point-to-Site configuration blade.
- The client certificate was issued by the same root CA that was uploaded to Azure.
- The client certificate has not expired and has not been revoked.
# Check client certificate details on Windows
Get-ChildItem -Path Cert:\CurrentUser\My | Where-Object {
$_.Subject -like "*VPN*"
} | Format-List Subject, NotAfter, Thumbprint, Issuer
For Entra ID authentication issues, refer to our Azure SSO troubleshooting guide as many of the same token and conditional access issues apply.
Solution:
If the root certificate has expired, generate a new root and client certificate pair, upload the new root certificate to the VPN Gateway, and distribute new client certificates. If using Entra ID, verify that the VPN app registration exists in your tenant, the user is assigned, and admin consent has been granted.
Problem: DNS Resolution Fails Over P2S
Symptoms: Users connect to the VPN successfully but cannot resolve internal hostnames. IP-based access works fine.
Diagnosis:
By default, P2S VPN clients do not automatically use Azure DNS or your custom DNS servers. The client continues using its local DNS resolver, which cannot resolve private Azure DNS zones or on-premises DNS records.
Check the DNS servers configured on the VNet:
az network vnet show \
--name MyVNet \
--resource-group MyResourceGroup \
--query "dhcpOptions.dnsServers"
Solution:
Configure custom DNS servers in the VPN client configuration or push DNS settings through the VPN Gateway. For Azure DNS Private Resolver or custom DNS, ensure the DNS server IP is reachable from the VPN client address pool and that the server is configured to forward queries appropriately.
On Windows clients, you may need to add DNS suffixes manually or via Group Policy to ensure proper name resolution for split-DNS environments.
Throughput Problems
Slow VPN performance is one of the most common complaints, and it frequently has nothing to do with the VPN tunnel itself.
Problem: Throughput Does Not Meet Expectations
Symptoms: File transfers over VPN are significantly slower than expected. Applications feel sluggish. Speed tests show throughput far below the SKU’s rated capacity.
Diagnosis:
-
Verify your SKU’s aggregate throughput limit. A VpnGw1 SKU is rated for 650 Mbps aggregate across all tunnels. If you have multiple S2S tunnels and P2S clients, they share this capacity. Check the SKU table above.
-
Check if a single TCP stream is the bottleneck. A single TCP connection over an IPsec tunnel typically maxes out at 200-300 Mbps due to encryption overhead and TCP window scaling. Throughput benchmarks should use multiple parallel streams.
-
MTU and fragmentation. IPsec encapsulation adds overhead (typically 50-80 bytes). If the original packet is at the standard 1500-byte MTU, the encapsulated packet may exceed the path MTU, causing fragmentation and reassembly delays. Set the MTU on the VPN interface to 1400 bytes on your on-premises device to avoid this.
-
Internet path quality. VPN Gateway uses the public internet. Latency, jitter, and packet loss on the internet path directly affect throughput. Run a traceroute or MTR from on-premises to the Azure gateway’s public IP to identify degraded hops.
# Check gateway metrics for throughput and packet drops
az monitor metrics list \
--resource "/subscriptions/{sub-id}/resourceGroups/{rg}/providers/Microsoft.Network/virtualNetworkGateways/MyVpnGateway" \
--metric "TunnelIngressBytes,TunnelEgressBytes,TunnelIngressPacketDropCount" \
--interval PT1H \
--output table
Solution:
- Upgrade the SKU if aggregate throughput is consistently near the limit. Use
az network vnet-gateway update --sku VpnGw2to resize without downtime (within the same generation). - Enable multiple tunnels with ECMP (Equal-Cost Multi-Path) for active-active configurations to distribute load.
- Reduce MTU on your on-premises tunnel interface to 1400 to eliminate fragmentation.
- For sustained high-throughput needs, consider whether a VPN is the right solution at all. ExpressRoute provides dedicated private connectivity with predictable performance. See the section on escalation below.
Intermittent Disconnections
Problem: VPN Tunnel Drops and Reconnects Periodically
Symptoms: The S2S tunnel disconnects for a few seconds to a few minutes, then reconnects automatically. Users experience brief outages every few hours.
Diagnosis:
Intermittent disconnections typically fall into a few categories:
-
IKE SA rekey failures. When the SA lifetime expires, both sides must renegotiate. If there is a timing mismatch or one side is slow to respond, the tunnel drops briefly during rekey. This is especially common when the SA lifetime is set very short (e.g., 900 seconds).
-
Dead Peer Detection (DPD). Azure sends DPD probes every 10 seconds and considers the peer dead after 45 seconds of no response. If your on-premises device has aggressive DPD settings or its responses are delayed by firewall inspection, the tunnel may be torn down prematurely.
-
On-premises device instability. High CPU utilization on the on-premises firewall or router during peak hours can cause it to miss DPD probes or fail to process rekey requests in time.
-
Azure platform maintenance. Azure periodically performs maintenance on the underlying gateway infrastructure. Single-instance gateways (non-AZ SKUs) may experience brief disconnections during these events.
Solution:
- Set SA lifetimes to reasonable values (28,800 seconds for IKE, 3,600 for IPsec) and ensure both sides match.
- Configure your on-premises device’s DPD interval to be no more aggressive than Azure’s (10-second intervals, 45-second timeout).
- Deploy an active-active VPN Gateway to maintain connectivity during Azure platform maintenance. This requires two gateway instances and two tunnels to your on-premises device.
- Monitor tunnel health using Azure Monitor alerts:
az monitor metrics alert create \
--name "VPN-Tunnel-Down" \
--resource-group MyResourceGroup \
--scopes "/subscriptions/{sub-id}/resourceGroups/{rg}/providers/Microsoft.Network/virtualNetworkGateways/MyVpnGateway" \
--condition "avg TunnelAverageBandwidth < 1" \
--window-size 5m \
--evaluation-frequency 1m \
--action "/subscriptions/{sub-id}/resourceGroups/{rg}/providers/Microsoft.Insights/actionGroups/NetworkAlerts"
VNet-to-VNet Issues
VNet-to-VNet connections use VPN Gateway to link two Azure VNets, often across regions. While similar to S2S in implementation, they have their own set of common issues.
Problem: VNet-to-VNet Connection Shows Connected but No Traffic Flows
Symptoms: The connection status shows “Connected” and IKE negotiation succeeds, but pings and application traffic between VNets fail.
Diagnosis:
-
Overlapping address spaces. If both VNets use the same or overlapping IP ranges, routing will fail. Azure cannot route traffic between two networks with the same address space.
# Check address spaces for both VNets az network vnet show --name VNet1 --resource-group RG1 --query "addressSpace.addressPrefixes" az network vnet show --name VNet2 --resource-group RG2 --query "addressSpace.addressPrefixes" -
NSG rules blocking traffic. Network Security Groups on subnets in either VNet may be blocking inter-VNet traffic. Check both inbound and outbound rules.
-
User-Defined Routes (UDRs) overriding gateway routes. If you have custom route tables with a 0.0.0.0/0 route pointing to an NVA (Network Virtual Appliance), the return traffic may not route back through the VPN gateway correctly.
-
Missing gateway transit configuration. If one VNet is peered with another VNet that has the VPN Gateway, ensure “Allow Gateway Transit” and “Use Remote Gateways” are configured correctly on the peering.
Solution:
Re-address one of the VNets if there is overlap. Review NSG flow logs to identify blocked traffic. For environments with NVAs and complex routing, ensure UDRs include explicit routes for the remote VNet’s address space pointing to the VPN Gateway. For a deeper understanding of how these network constructs interact, see our post on VLANs vs subnets in the cloud.
For production multi-VNet architectures, consider whether VNet peering (which operates over the Azure backbone without encryption overhead) would be a better fit than VNet-to-VNet VPN. Peering offers lower latency and higher throughput. Incorporate this decision into your Azure landing zone design to avoid rearchitecting later.
Diagnostic Tools and Techniques
Azure provides several built-in tools for VPN Gateway troubleshooting. Knowing which tool to reach for saves significant time.
VPN Diagnostics (Connection Troubleshoot)
The Network Watcher connection troubleshoot tool runs a series of automated checks against your VPN connection:
az network watcher troubleshooting start \
--resource "/subscriptions/{sub-id}/resourceGroups/{rg}/providers/Microsoft.Network/connections/MyS2SConnection" \
--resource-type "Microsoft.Network/connections" \
--storage-account "/subscriptions/{sub-id}/resourceGroups/{rg}/providers/Microsoft.Storage/storageAccounts/mydiagstorageacct" \
--storage-path "https://mydiagstorageacct.blob.core.windows.net/vpn-troubleshoot"
This generates a detailed diagnostic log that identifies issues including SA mismatches, unreachable peers, and certificate problems. The results are written to the specified storage account as a JSON file.
Packet Capture on VPN Gateway
For deeper analysis, capture packets directly on the VPN Gateway:
# Start packet capture (PowerShell)
$gateway = Get-AzVirtualNetworkGateway -Name "MyVpnGateway" -ResourceGroupName "MyResourceGroup"
Start-AzVirtualNetworkGatewayPacketCapture -InputObject $gateway
# After reproducing the issue, stop the capture
Stop-AzVirtualNetworkGatewayPacketCapture -InputObject $gateway `
-SasUrl "https://mydiagstorageacct.blob.core.windows.net/captures?sv=..."
The resulting .cap file can be opened in Wireshark for detailed protocol analysis. Look for IKE negotiation failures, retransmitted packets, and ESP sequence number gaps.
Log Analytics and Azure Monitor
For ongoing monitoring, send VPN Gateway diagnostic logs to a Log Analytics workspace:
az monitor diagnostic-settings create \
--name "VpnGatewayDiagnostics" \
--resource "/subscriptions/{sub-id}/resourceGroups/{rg}/providers/Microsoft.Network/virtualNetworkGateways/MyVpnGateway" \
--workspace "/subscriptions/{sub-id}/resourceGroups/{rg}/providers/Microsoft.OperationalInsights/workspaces/MyWorkspace" \
--logs '[{"category":"IKEDiagnosticLog","enabled":true},{"category":"RouteDiagnosticLog","enabled":true},{"category":"TunnelDiagnosticLog","enabled":true},{"category":"P2SDiagnosticLog","enabled":true}]'
Once logs are flowing, use Kusto queries to identify patterns:
AzureDiagnostics
| where ResourceType == "VIRTUALNETWORKGATEWAYS"
| where Category == "IKEDiagnosticLog"
| where Message_s contains "FAILED" or Message_s contains "error"
| project TimeGenerated, Message_s, RemoteIP_s
| order by TimeGenerated desc
| take 50
These logs are invaluable for identifying recurring rekey failures, DPD timeouts, and authentication errors over time. For broader monitoring strategies, our guide on building Azure dashboards covers how to visualize these metrics alongside other infrastructure health data.
Security Considerations
VPN Gateway is a critical component of your network perimeter. Treat it accordingly:
- Rotate pre-shared keys regularly. At minimum, rotate PSKs annually or after any personnel changes. Use keys of at least 32 characters with mixed case, numbers, and symbols.
- Use IKEv2 with strong cryptography. AES-256 encryption, SHA-256 integrity, and DH Group 14 or higher should be the baseline. Avoid IKEv1, DES, and MD5 in production.
- Restrict access to the GatewaySubnet. Do not deploy NSGs directly on the GatewaySubnet (Azure does not support this and it can break gateway functionality), but do ensure that surrounding subnets and on-premises access are governed by least-privilege rules.
- Integrate with your zero trust architecture. VPN connectivity should be one layer in a broader zero trust security strategy. Combine VPN access with Conditional Access policies, MFA, and device compliance checks.
- Use Azure Bastion for administrative access. Rather than routing admin RDP/SSH traffic over VPN, consider Azure Bastion for direct, browser-based VM access without exposing management ports.
When to Escalate to ExpressRoute
VPN Gateway is an excellent solution for many hybrid connectivity scenarios, but it has inherent limitations tied to its use of the public internet. Consider migrating to Azure ExpressRoute when:
- You need guaranteed bandwidth. ExpressRoute provides dedicated circuits from 50 Mbps to 100 Gbps with SLA-backed uptime, whereas VPN throughput depends on internet conditions.
- Latency-sensitive workloads are impacted. Real-time applications like VoIP, database replication, or financial trading require predictable, low-latency paths that the public internet cannot guarantee.
- Compliance mandates private connectivity. Certain regulatory frameworks (HIPAA, PCI DSS, FedRAMP) may require that data in transit does not traverse the public internet. ExpressRoute meets this requirement.
- VPN tunnel count is becoming a constraint. Even VpnGw5 is limited to 100 S2S tunnels. ExpressRoute with Global Reach can connect multiple on-premises locations without individual tunnel management.
- Intermittent disconnections are unacceptable. If your business cannot tolerate the brief outages inherent in IPsec rekey events and Azure maintenance windows, ExpressRoute’s BGP-based failover is more resilient.
A common pattern is to use ExpressRoute as the primary path and maintain a VPN Gateway as a backup. This provides the reliability of a private circuit with the redundancy of an internet-based fallback.
Quick Reference: Common Issues at a Glance
| Symptom | Likely Cause | First Step |
|---|---|---|
| Tunnel stuck on “Not Connected” | PSK mismatch or firewall blocking UDP 500/4500 | Verify PSK and check firewall rules |
| ”No proposal chosen” in on-prem logs | IKE/IPsec parameter mismatch | Align encryption, integrity, and DH group |
| Connected but no traffic for some subnets | Traffic selector or address prefix mismatch | Compare Local Network Gateway prefixes with on-prem config |
| P2S client cannot authenticate | Expired or mismatched certificate | Check root and client certificate validity |
| DNS not resolving over P2S | VPN client not using Azure DNS | Configure custom DNS on VNet and push to client |
| Slow throughput | SKU limit, MTU fragmentation, or single-stream TCP | Check SKU, reduce MTU to 1400, test with parallel streams |
| Tunnel drops every few hours | SA lifetime mismatch or DPD timeout | Align SA lifetimes and DPD settings |
| VNet-to-VNet connected but no traffic | Overlapping address spaces or NSG rules | Check address space overlap and NSG flow logs |
Summary
Azure VPN Gateway issues usually come down to configuration mismatches, SKU limitations, or environmental factors on the internet path. The troubleshooting process is methodical: verify the configuration, check the logs, align both sides, and monitor for recurrence.
Start with the Azure CLI commands and diagnostic tools outlined above. Use Network Watcher connection troubleshoot for automated analysis, packet captures for deep protocol inspection, and Log Analytics for long-term trend identification. When VPN limitations become a recurring business impact, evaluate ExpressRoute as a more resilient alternative.
If you need help designing, troubleshooting, or optimizing your Azure network connectivity, Exodata’s cloud engineering team can help assess your architecture and implement a solution that meets your performance and compliance requirements.