Page 1 of 1

wifi cautionary tale

Posted: Sun Sep 14, 2014 1:07 am
by kerberos
I encountered multiple hardware failures while bootstrapping a new Model B RPI with a generic wifi adapter based on the MediaTek RT5370 chipset. Dr Google and his associates provided plentiful advice of dubious value. Eventually, the story had a happy ending. If you're searching for a solution to a similar problem, you might find the following to be helpful. Warning though, it's long and not particularly entertaining.

The 802.1N adapter I am using is tiny: slightly bigger in cross-section than a USB connector, it protrudes just 5-10 mm from the socket. It has a pin-head-sized blue LED that faces the same way as the row of LEDs on the component side of the RPI mainboard. The lsusb command will produce something like:

Code: Select all

Bus 00m Device 00n: ID 148f:5370 Ralink Technology, Corp. RT5370 Wireless Adapter
The initial deployment wasn't entirely smooth. After the first boot I completed the configuration while connected to our wired LAN. The RT5370 adapter was connected during the latter part of the set up. I noted these anomalies:
  • hot-plugging the RT5370 adapter causes a crash reboot.
  • MeteoHub cannot connect to an AP with a hidden SSID.
  • "normal" boot-up time is in the 25 - 35 second range.
Things got a little bumpy when I moved the RPI to its permanent home. When disconnected from the wired network, between 1 minute 35 and 2 minutes elapses from power up to appearance of the blue light on the RT5370. It can then take a further 15-20 seconds to complete the start-up process. I've now observed this many times. All activity ceases for 75 - 95 seconds. These delays are reflected by entries from the kernel (dmesg) ring buffer.

This snapshot is typical of what happens when both wired (eth0) and wireless (wlan0) adapters are present:

Code: Select all

[   18.394173] smsc95xx 1-1.1:1.0 eth0: hardware isn't capable of remote wakeup
[   18.951169] NET: Registered protocol family 10
[   18.970496] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[   19.069616] ieee80211 phy0: rt2x00lib_request_firmware: Info - Loading firmware file 'rt2870.bin'
[   19.098913] ieee80211 phy0: rt2x00lib_request_firmware: Info - Firmware detected - version: 0.29
[   19.469865] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
[   19.906459] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
By contrast, when only the wireless adapter is present, we experience a lengthy delay:

Code: Select all

[   18.963838] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[  113.492075] ieee80211 phy0: rt2x00lib_request_firmware: Info - Loading firmware file 'rt2870.bin'
After 7 hours of reading, experimenting and tweaking, I rebooted the RPI and left it "in production". About 3 days later it locked up. During the intervening period it had been logging and pushing aggregated data to Weather Underground and the Meteoplug server. I was able to restart it by cycling the power. It then failed several more times. The gap between each failure became shorter each time. Eventually the RT5370 adapter would not connect after cycling the power.

Visually, the failure starts with the blue LED going dark. Sometimes it returns after a short interval. If it does that, a short time later it will go dark again. After that, the OK LED stops flickering and only the PWR LED remains illuminated. Linux will not respond to a ping. For all practical purposes it's dead.

Post-mortem analysis of /data/log/messages shows a similar pattern for each failure. All start with bursts of these two messages. Sometimes there are a few; other times there are literally thousands:

Code: Select all

[18057.918951] ieee80211 phy0: rt2800usb_entry_txstatus_timeout: Warning - TX status timeout for entry 15 in queue 2
[18057.935256] ieee80211 phy0: rt2800usb_txdone: Warning - Got TX status for an empty queue 2, dropping
Typically, these will be followed by multiple attempts to re-authenticate with the AP:

Code: Select all

[28043.253783] wlan0: authenticate with 00:16:b6:d1:a3:70
[28043.283709] wlan0: send auth to 00:16:b6:d1:a3:70 (try 1/3)
[28043.332303] wlan0: send auth to 00:16:b6:d1:a3:70 (try 2/3)
[28043.371428] wlan0: send auth to 00:16:b6:d1:a3:70 (try 3/3)
[28043.414152] wlan0: authentication with 00:16:b6:d1:a3:70 timed out
This can go on for many cycles. All the while the RPI is unable to communicate.

There are a plethora of examples of these symptoms documented on RPI and linux support sites, forums and blogs. Those that come to a conclusion or offer a remedy boil the solution down to one-or-more of three elements: update the firmware, modify /etc/network/interfaces and modify /etc/wpa_supplicant/wpa_supplicant.conf. My assessment was that only the first of these--the firmware--had any merit. I found the rationales offered for changing the control files a little too hand-wavvy.

I did further research on the firmware. MeteoHub 5.0 ships with version 0.29. This is advertised each time the firmware is loaded:

Code: Select all

[  146.084687] ieee80211 phy0: rt2x00lib_request_firmware: Info - Loading firmware file 'rt2870.bin'
[  146.108895] ieee80211 phy0: rt2x00lib_request_firmware: Info - Firmware detected - version: 0.29
I have read claims that more-recent firmware exists. The version of firmware in a file is difficult to determine without loading it into the kernel. All files are the same size (8192). I used md5sum to help me compare candidates from reputable sources. The digest for the file that ships with MeteoHub:

Code: Select all

# md5sum -b /lib/firmware/rt2870.bin
36c944c3138125605d28c0a3a1338be9 */lib/firmware/rt2870.bin
This matches the "fingerprint" of the most recent file in the kernel.org git repository (it also matches one of the most-commonly linked-to files). The highest version I could find on the Media Tek website was 0.22 (md5 digest: 2bb89af3a7d446deb4695c9a3daa7f9d).

While researching firmware alternatives I wrote to Boris to see if he was aware of problems similar to mine. He encouraged me to contribute my experience to the forum. In the meantime, I discovered what I believe to be the true cause of the erratic behaviour of the RT5370 adapter: heat-related hardware failure.

The last time the RPI locked up, I powered it off and removed the adapter. It was very hot to the touch. Based on my description of the symptoms, the retailer exchanged the malfunctioning adapter for a new one. This time I'm using an unpowered USB hub to make it easy to monitor its operating temperature. I booted it up on Friday evening and it's been operating flawlessly since then. By tomorrow evening it will have surpassed its predecessor. I will report back if the reliability changes.

Why a cautionary tale? Using a search engine to match error messages can draw you to incorrect conclusions. I was convinced, by the similarity of my symptoms with those reported by others, that I needed updated firmware. Luckily, I was unable to find a more-recent version before my hardware failed.

[edit: typos]

Re: wifi cautionary tale

Posted: Sun Sep 14, 2014 6:50 am
by kerberos
An update on the above.

After about 20 hours of nominal operation I've seen signs of the previous symptoms. I just observed a 36-minute RT5370 failure. In that time, we logged the following:
  • 10564 x rt2800usb_entry_txstatus_timeout: Warning - TX status timeout for entry M in queue N
    895 x rt2800usb_txdone: Warning - Got TX status for an empty queue N, dropping
    65 x wlan0: send auth to AP:MAC:ADDR (try X/3)
    30 x wlan0: associate with AP:MAC:ADDR (try X/3)
The temperature probe I've put on the RT5370 had been sitting at 27C. The ambient temperature is 20C. During the above "outage" the probe registered 28C. This small change is probably not significant.

I'm hoping that someone with experience of this kind of thing might be able to offer some suggestions. In the meantime, I'm back to looking for a firmware file more-recent than 0.29.

Re: wifi cautionary tale

Posted: Sun Sep 14, 2014 10:36 am
by PWS
I don't know too much about the detail of Pi WiFi adapters, but it seems that there are multiple adapters in this miniature USB plug format, the Edimax one for instance that seems to use the RT8188 chipset rather than 5370 and which is perhaps a little more Pi-friendly. Given how cheap these adapters are, might it not be worth trying an example with the alternative chipset?

Re: wifi cautionary tale

Posted: Sun Sep 14, 2014 3:06 pm
by kerberos
Thank you for that information PWS. I have no particular allegiance to a 5370-based device. The form factor caught my eye and it was available from the online shop where I purchased my RPI. I'd welcome hearing about the 802.11G/N USB devices that other MeteoHub owners are using.

I did find firmware 0.33 on the MediaTek website:

Code: Select all

# cd /lib/firmware
# mv rt2870.bin rt2870.bin_V29
# cp -p /data/transfer/rt2870.bin_V33 .
# ln rt2870.bin_V33 rt2870.bin
# ls -l rt2870.bin*
-rwxr--r-- 2 root root 8192 Oct 22  2012 rt2870.bin
-rw-r--r-- 1 root root 8192 Jan  6  2013 rt2870.bin_V29
-rwxr--r-- 2 root root 8192 Oct 22  2012 rt2870.bin_V33
# md5sum -b rt2870.bin_V29
36c944c3138125605d28c0a3a1338be9 *rt2870.bin_V29
# md5sum -b rt2870.bin_V33 
ac4f6d8b679945208a978e397c016aa7 *rt2870.bin_V33
After rebooting:

Code: Select all

# grep rt2x00lib_request_firmware /var/log/dmesg
[  115.067347] ieee80211 phy0: rt2x00lib_request_firmware: Info - Loading firmware file 'rt2870.bin'
[  115.106162] ieee80211 phy0: rt2x00lib_request_firmware: Info - Firmware detected - version: 0.33
I'll report back once I've seen how this works out.

Re: wifi cautionary tale

Posted: Thu Sep 18, 2014 5:25 pm
by kerberos
The new (v0.33) Media Tek firmware did produce a more stable result than 0.29. The system logs only recorded a couple of bursts of the previous messages. Sustained throughput remained well below 2 MB/s.

Following the suggestion made by PWS, I've changed the hardware:

Code: Select all

Bus 001 Device 006: ID 7392:7811 Edimax Technology Co., Ltd EW-7811Un 802.11n Wireless Adapter [Realtek RTL8188CUS]
I've managed to crank about twice the throughput from this adapter. The error rate is low and so far no error conditions have been logged.

Just before I shut down the RPI to change adapters:

Uptime: 4 days, 5 hours, 39 minutes

I'll report back in a week to let you know how the Edimax is behaving.

Re: wifi cautionary tale

Posted: Sat Oct 18, 2014 6:53 am
by kerberos
kerberos wrote:I'll report back in a week to let you know how the Edimax is behaving.
I'm afraid that "week" quickly stretched to a month. The news is good. I can report that the Edimax EW-7811Un WiFi b/g/n adapter has proven to be an excellent solution for the MeteoHub. Not a single error in 4 weeks of operation; which is exactly as it should be.

Earlier in this thread I noted a long (~80 second) boot-time delay that I was unable to explain. The delay was still present when using the Edimax adapter. It wasn't present if the RPI was connected to our wired LAN. After a few experiments I concluded that delay is caused by eth0 waiting for media to become available. It eventually times out and the boot process continues.

The effects of the timeout can be eliminated by reordering the device stanzas in /etc/network/interfaces so that the wired interface appears last.

Code: Select all

# diff -c /etc/network/interfaces.org /etc/network/interfaces
*** /etc/network/interfaces.org	Fri Sep  5 23:36:36 2014
--- /etc/network/interfaces	Fri Sep 19 22:45:53 2014
***************
*** 10,23 ****
  auto lo
  iface lo inet loopback
  #
- # The interface used by default during boot
- auto eth0
- # netmask, gateway just used in case of static, unused for dhcp
- iface eth0 inet dhcp
- 	address 192.168.21.165
- 	netmask 255.255.255.128
- 	gateway 192.168.21.254
- #
  auto wlan0
  # netmask, gateway just used in case of static, unused for dhcp
  iface wlan0 inet dhcp
--- 10,15 ----
***************
*** 28,30 ****
--- 20,30 ----
  	address 192.168.21.166
  	netmask 255.255.255.128
  	gateway 192.168.21.254
+ #
+ # The interface used by default during boot
+ auto eth0
+ # netmask, gateway just used in case of static, unused for dhcp
+ iface eth0 inet dhcp
+ #	address 
+ 	netmask 255.255.255.128
+ 	gateway 192.168.21.254
I'm grateful to our colleague PWS for directing my attention to the RTL8188 family of devices. My combination of RPI and MeteoHub is now behaving much as I expect.

Re: wifi cautionary tale

Posted: Sat Oct 18, 2014 7:21 pm
by admin
Thanks for sharing your findings. Unfortunately, I cannot support on every WiFi adapter, but this is something the forum is for, where users can help each other.

Re: wifi cautionary tale

Posted: Fri Oct 24, 2014 4:26 pm
by gm4jjj
Thanks for that, I have only changed the order of the interfaces in the interfaces file as you suggested and my Ralink Technology, Corp. RT5370 Wireless Adapter certainly does start up much faster at a reboot.

Re: wifi cautionary tale

Posted: Mon Nov 10, 2014 8:33 am
by chal45oye
The news is good. I can report that the Edimax EW-7811Un WiFi b/g/n adapter has proven to be an excellent solution for the MeteoHub. Not a single error in 4 weeks of operation; which is exactly as it should be.

Earlier in this thread I noted a long (~80 second) boot-time delay that I was unable to explain. The delay was still present when using the Edimax adapter. It wasn't present if the RPI was connected to our wired LAN. After a few experiments I concluded that delay is caused by eth0 waiting for media to become available. It eventually times out and the boot process continues.