I have 2 servers on the same switch. I'm losing 5% of packets on ~16k pings between the two.
Below is my nasty ASCII diagram of the configuration of the network, all machines have a single interface.
a b
| |
-- S1 --
|
S2
|
S3
|
c
a = Sun Netra 240
b = Dell 2950
c = my machine
S1 - S3 = 3 x Cisco Catalyst 2960G
pings from a -> b lose 5% data
pings from b -> a lose 5% data
pings from c -> a lose 0 data
pings from c -> b lose 0 data
I can't think of a reason that I'd lose packets going between ports on the same switch, when I didn't lose data coming from a different switch but still using the same port.
Can anyone throw any ideas my way please?
Thanks
-
Do you get any loss if you ping using the default packet size? How about if you ping using ping -l 1472? How about when pinging using ping -l 1473?
Try pinging from C to A, C to B, A to B, and B to A using ping -l 1473 -f and post the results of each of them here.
chewy_fruit_loop : i'd not thought of using a different packet size. i'll give that a try too. thanksFrom joeqwerty -
Another troubleshooting step would be to plug both machines into a different switch to see if the problem moves with the devices. My guess would be that you either have an interference problem as entens suggests, or one of those boxes is load bound and dropping packets.
chewy_fruit_loop : i've already switched the ports on the current switch, which sorted the problem out for a few hours. the boxes are being patched up to date and being left to run until next week. if theres still a problem then i'm moving one to a different switchFrom Greeblesnort -
NIC Driver? duplex settings? any errors showing up on the switches? What are you using to measure the loss? ping?
Also, try disabling any offloading(checksum offloading etc) on the NIC if enabled, so you can use wireshark to find out what kind of traffic you lose.
Hope that gives you some ideas.
chewy_fruit_loop : the ping loss is being measured on the sun box using ping -s to gather statistics. the other pings have statistics on by default. once you stop the ping, it tells you how many packets etc and what percentage was lost.From MrTimpi -
We have encountered cases where having the swicth port and/or the NIC set to Auto speed and/or auto duplex results in loss. Changing to set speed and duplex from Auto resolved the issue.
From Dave M -
Check the NIC\CAT Cables also is there any other network transfer traffic in the background?
chewy_fruit_loop : i've already stripped out all the cat5 that was plugged into them and replaced it with new cat6 yes there is other network traffic going on. the reason we're investigating the problem is that the server is timing out when trying to communicate with the NIS master on the other side of the atlanticAnicho : Is it possible to test it in a sand box(test environment) just the two servers communicating with each other and connected to nothing else, test that if you can and see what it returns.From Anicho -
it "looks" like the problem was port 0 on the nic in the sun box. we've transfered all the traffic to port 1 and the problem has vanished.
i'm not holding my breath though, this is the second time this year that this has happened. i had a bad feeling about the box when i found out that it had been end of lifed, 3 months after we bought it, had a memory failure 2 weeks before the end of the first year, and the boss won't pay for a service contract on it but prefers a case by case payment.
thanks to everyone who suggested courses of action
From chewy_fruit_loop
0 comments:
Post a Comment