An odd networking issue (not Mageia)

Here wizards, magicians, sorcerers and everybody can rest a bit and talk about anything they like.

Just remember to respect the rules.

An odd networking issue (not Mageia)

Postby jiml8 » Aug 4th, '14, 18:57

I am asking this here because I have never seen anything like this before and I did not even know it was possible, but it is definitely happening.

Here is the topology. I have a Netgear WNDR3700v4 router running DD-WRT firmware. It is connected by a 50 foot CAT-6 cable to a Netgear GS-105 5-port switch.

Also connected to the router is my Mageia-based workstation.

The switch has two connections to my NAS. One connection is to the Asus motherboard and handles data transfers. The other connection is directly to the RAID controller card, enabling me to monitor and control it directly.

Also connected to this switch, presently, is an Acer computer running Windows Vista. I connected it because I am loading a bunch of old DVDs onto my NAS, and the Acer DVD reader seems to do a better job of reading these than the one on my workstation (which needs to be replaced).

I encountered some difficulty getting the Acer to open the Video share on my NAS and, rather than figure it out (this is Vista, after all), I have the Acer opening a share on my workstation...which it did without difficulty. I then am doing a "click and drag" on the Acer to move files from the DVDs to the share on my workstation. On my workstation, I review the files, sometimes rename them, then ship them via an NFS connection to the Video share on the NAS, which is their final resting place.

So here is what happens.

When the Acer is transferring files in this fashion. the link light on both the router and the switch, for the connection between them, goes out intermittently. The lights come on briefly whenever the Acer is making a transfer, then goes back out as the Acer is loading up the next chunk of data from the DVD. While this link light is out, the link is down. I cannot communicate with my NAS from my workstation, pings fail, the log on my workstation tells me that my iscsi link is down and my NFS link is down.

When the Acer is not transferring, the link light for the connection is on solidly and apparently reliably.

The link light between the Acer and the switch remains on solidly throughout; it is only the connection between the router and the switch that goes up and down.

At first I thought this was a defective cable; when I move the cable to different ports on the router and the switch, the problem follows the cable. But I eventually noticed the correlation between what the Acer was doing and what the link was doing; the link comes alive long enough for the Acer to transfer a block of data, then goes dead again.

Now, sometimes the link DOES remain alive when the Acer is buffering more data. This, too, happens at intermittent intervals and when it happens I can communicate with the NAS and everything works.

It sure looks like Vista is doing something. But what? And why? I was not aware that this Netgear switch could even be managed; there are no docs I have seen that say so, but clearly its links can be raised and lowered remotely. Possibly DD-WRT in the router is doing it, but I see no configuration information to suggest it is even capable of that and if it IS doing it, how does it know to restore the link when the Acer wants to transfer a block of data? The data comes through at regular intervals, including during the times when the link remains alive, so I can't say that the Acer is transferring whenever DD-WRT brings the link back up.

Is there a software or signalling failure mode so severe that the solution is a hard reset of the link?

I do not plan to have the Acer continually available on this link; I pulled it out of the closet for this job and can't wait to put it back in the closet. But the fact that this happens is troubling; I don't understand it. Anyone have any ideas?
jiml8
 
Posts: 1253
Joined: Jul 7th, '13, 18:09

Re: An odd networking issue (not Mageia)

Postby doktor5000 » Aug 4th, '14, 20:21

Hmmm, first step would be the advanced energy settings on that Vista, and disable everything (PCI Express and USB devices and such can be put into power saving mode, IIRC that's also the default).
The other thing is, that 50m cable to your workstation ... I know specs says up to 100m, but can you test with a shorter cable?

And for the easiest test to rule out Vista - use a live cd and copy one DVD over with that.
Also to check simple network throughput, the classic ftp to /dev/null could be used, does the link also go down during that?
Code: Select all
ftp remoteserver
bin
put "|dd if=/dev/zero bs=1M count=10000" /dev/null


You could also try iperf/jperf to measure the jitter over that connection.
Cauldron is not for the faint of heart!
Caution: Hot, bubbling magic inside. May explode or cook your kittens!
----
Disclaimer: Beware of allergic reactions in answer to unconstructive complaint-type posts
User avatar
doktor5000
 
Posts: 17630
Joined: Jun 4th, '11, 10:10
Location: Leipzig, Germany

Re: An odd networking issue (not Mageia)

Postby jiml8 » Aug 5th, '14, 19:51

I am not really willing to put a lot of effort into diagnosing this; like I say, the Acer won't be staying on the link. However, it is worth some time because the switch ports are available and it is possible that at some later point I will add some other device at that end of the connection...if the problem is specific to Vista then I can ignore the problem. Given that the NAS seems to work on that end of that link very well, I can't claim there is a problem with the link.

Also, the link is at the end of a 50 foot cable, not 50 meters. About 16 meters, actually.

After further consideration, despite the way it looks, I don't see how Vista could be doing it - it almost has to be DD-WRT doing it, and your iperf suggestion is a good one; DD-WRT has iperf on it so I actually can make that measurement.

It has to be DD-WRT because DD-WRT controls one end of the connection while Vista controls neither end of the connection. Vista controls the connection between it and the switch, but not the link from the switch to the router. To claim Vista is doing it, I must then assert that the unmanaged switch can be managed, and Vista is capable of doing it. It seems far more credible to suggest that Vista (or the Acer) is doing something very wrong and DD-WRT is therefore bringing down the link to wait for the problem to clear up or to force a link reset on Vista.

Mostly, I was wondering if anyone had ever seen this kind of problem before; I'm not used to seeing links going up and down like this. I'm new to DD-WRT since I just purchased this 1Gb router and promptly installed DD-WRT on it, and I am still getting used to having a high-performance and fully configurable router sitting on my LAN.

Also, in my defense, I must note that the Acer was given to me when I built its replacement for a friend. I did not purchase that POS, and I have never purchased a Vista platform. :) It has been sitting in the closet awaiting some particular use.
jiml8
 
Posts: 1253
Joined: Jul 7th, '13, 18:09

Re: An odd networking issue (not Mageia)

Postby doktor5000 » Aug 5th, '14, 22:29

jiml8 wrote:Also, the link is at the end of a 50 foot cable, not 50 meters. About 16 meters, actually.

That's why civilised countries use the metric system :lol: Sorry, totally ignored the unit, must be force of habit.

But would be interesting to hear the outcome.
Cauldron is not for the faint of heart!
Caution: Hot, bubbling magic inside. May explode or cook your kittens!
----
Disclaimer: Beware of allergic reactions in answer to unconstructive complaint-type posts
User avatar
doktor5000
 
Posts: 17630
Joined: Jun 4th, '11, 10:10
Location: Leipzig, Germany

Re: An odd networking issue (not Mageia)

Postby jiml8 » Nov 30th, '14, 20:54

I just ran across this thread and realized I never posted the answer.

Turns out there was a bad cable between the router and the switch. It seems that the timing between the two ends was getting messed up due to some defect in the cable (I would assume an impedance problem, causing reflections) and the consequence was framing errors and timing errors. Whenever this happens, the device that detects the problem drops carrier, and the link gets renegotiated. Changing the cable solved the problem.
jiml8
 
Posts: 1253
Joined: Jul 7th, '13, 18:09

Re: An odd networking issue (not Mageia)

Postby isadora » Nov 30th, '14, 21:17

Please don't forget to mark the topic [SOLVED]. ;)
..........bird from paradise..........

Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.
—Antoine de Saint-Exupéry
User avatar
isadora
 
Posts: 2742
Joined: Mar 25th, '11, 16:03
Location: Netherlands


Return to The Wizards Lair

Who is online

Users browsing this forum: No registered users and 1 guest

cron