wifi cautionary tale
Posted: Sun Sep 14, 2014 1:07 am
I encountered multiple hardware failures while bootstrapping a new Model B RPI with a generic wifi adapter based on the MediaTek RT5370 chipset. Dr Google and his associates provided plentiful advice of dubious value. Eventually, the story had a happy ending. If you're searching for a solution to a similar problem, you might find the following to be helpful. Warning though, it's long and not particularly entertaining.
The 802.1N adapter I am using is tiny: slightly bigger in cross-section than a USB connector, it protrudes just 5-10 mm from the socket. It has a pin-head-sized blue LED that faces the same way as the row of LEDs on the component side of the RPI mainboard. The lsusb command will produce something like:
The initial deployment wasn't entirely smooth. After the first boot I completed the configuration while connected to our wired LAN. The RT5370 adapter was connected during the latter part of the set up. I noted these anomalies:
This snapshot is typical of what happens when both wired (eth0) and wireless (wlan0) adapters are present:
By contrast, when only the wireless adapter is present, we experience a lengthy delay:
After 7 hours of reading, experimenting and tweaking, I rebooted the RPI and left it "in production". About 3 days later it locked up. During the intervening period it had been logging and pushing aggregated data to Weather Underground and the Meteoplug server. I was able to restart it by cycling the power. It then failed several more times. The gap between each failure became shorter each time. Eventually the RT5370 adapter would not connect after cycling the power.
Visually, the failure starts with the blue LED going dark. Sometimes it returns after a short interval. If it does that, a short time later it will go dark again. After that, the OK LED stops flickering and only the PWR LED remains illuminated. Linux will not respond to a ping. For all practical purposes it's dead.
Post-mortem analysis of /data/log/messages shows a similar pattern for each failure. All start with bursts of these two messages. Sometimes there are a few; other times there are literally thousands:
Typically, these will be followed by multiple attempts to re-authenticate with the AP:
This can go on for many cycles. All the while the RPI is unable to communicate.
There are a plethora of examples of these symptoms documented on RPI and linux support sites, forums and blogs. Those that come to a conclusion or offer a remedy boil the solution down to one-or-more of three elements: update the firmware, modify /etc/network/interfaces and modify /etc/wpa_supplicant/wpa_supplicant.conf. My assessment was that only the first of these--the firmware--had any merit. I found the rationales offered for changing the control files a little too hand-wavvy.
I did further research on the firmware. MeteoHub 5.0 ships with version 0.29. This is advertised each time the firmware is loaded:
I have read claims that more-recent firmware exists. The version of firmware in a file is difficult to determine without loading it into the kernel. All files are the same size (8192). I used md5sum to help me compare candidates from reputable sources. The digest for the file that ships with MeteoHub:
This matches the "fingerprint" of the most recent file in the kernel.org git repository (it also matches one of the most-commonly linked-to files). The highest version I could find on the Media Tek website was 0.22 (md5 digest: 2bb89af3a7d446deb4695c9a3daa7f9d).
While researching firmware alternatives I wrote to Boris to see if he was aware of problems similar to mine. He encouraged me to contribute my experience to the forum. In the meantime, I discovered what I believe to be the true cause of the erratic behaviour of the RT5370 adapter: heat-related hardware failure.
The last time the RPI locked up, I powered it off and removed the adapter. It was very hot to the touch. Based on my description of the symptoms, the retailer exchanged the malfunctioning adapter for a new one. This time I'm using an unpowered USB hub to make it easy to monitor its operating temperature. I booted it up on Friday evening and it's been operating flawlessly since then. By tomorrow evening it will have surpassed its predecessor. I will report back if the reliability changes.
Why a cautionary tale? Using a search engine to match error messages can draw you to incorrect conclusions. I was convinced, by the similarity of my symptoms with those reported by others, that I needed updated firmware. Luckily, I was unable to find a more-recent version before my hardware failed.
[edit: typos]
The 802.1N adapter I am using is tiny: slightly bigger in cross-section than a USB connector, it protrudes just 5-10 mm from the socket. It has a pin-head-sized blue LED that faces the same way as the row of LEDs on the component side of the RPI mainboard. The lsusb command will produce something like:
Code: Select all
Bus 00m Device 00n: ID 148f:5370 Ralink Technology, Corp. RT5370 Wireless Adapter
- hot-plugging the RT5370 adapter causes a crash reboot.
- MeteoHub cannot connect to an AP with a hidden SSID.
- "normal" boot-up time is in the 25 - 35 second range.
This snapshot is typical of what happens when both wired (eth0) and wireless (wlan0) adapters are present:
Code: Select all
[ 18.394173] smsc95xx 1-1.1:1.0 eth0: hardware isn't capable of remote wakeup
[ 18.951169] NET: Registered protocol family 10
[ 18.970496] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 19.069616] ieee80211 phy0: rt2x00lib_request_firmware: Info - Loading firmware file 'rt2870.bin'
[ 19.098913] ieee80211 phy0: rt2x00lib_request_firmware: Info - Firmware detected - version: 0.29
[ 19.469865] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
[ 19.906459] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Code: Select all
[ 18.963838] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 113.492075] ieee80211 phy0: rt2x00lib_request_firmware: Info - Loading firmware file 'rt2870.bin'
Visually, the failure starts with the blue LED going dark. Sometimes it returns after a short interval. If it does that, a short time later it will go dark again. After that, the OK LED stops flickering and only the PWR LED remains illuminated. Linux will not respond to a ping. For all practical purposes it's dead.
Post-mortem analysis of /data/log/messages shows a similar pattern for each failure. All start with bursts of these two messages. Sometimes there are a few; other times there are literally thousands:
Code: Select all
[18057.918951] ieee80211 phy0: rt2800usb_entry_txstatus_timeout: Warning - TX status timeout for entry 15 in queue 2
[18057.935256] ieee80211 phy0: rt2800usb_txdone: Warning - Got TX status for an empty queue 2, dropping
Code: Select all
[28043.253783] wlan0: authenticate with 00:16:b6:d1:a3:70
[28043.283709] wlan0: send auth to 00:16:b6:d1:a3:70 (try 1/3)
[28043.332303] wlan0: send auth to 00:16:b6:d1:a3:70 (try 2/3)
[28043.371428] wlan0: send auth to 00:16:b6:d1:a3:70 (try 3/3)
[28043.414152] wlan0: authentication with 00:16:b6:d1:a3:70 timed out
There are a plethora of examples of these symptoms documented on RPI and linux support sites, forums and blogs. Those that come to a conclusion or offer a remedy boil the solution down to one-or-more of three elements: update the firmware, modify /etc/network/interfaces and modify /etc/wpa_supplicant/wpa_supplicant.conf. My assessment was that only the first of these--the firmware--had any merit. I found the rationales offered for changing the control files a little too hand-wavvy.
I did further research on the firmware. MeteoHub 5.0 ships with version 0.29. This is advertised each time the firmware is loaded:
Code: Select all
[ 146.084687] ieee80211 phy0: rt2x00lib_request_firmware: Info - Loading firmware file 'rt2870.bin'
[ 146.108895] ieee80211 phy0: rt2x00lib_request_firmware: Info - Firmware detected - version: 0.29
Code: Select all
# md5sum -b /lib/firmware/rt2870.bin
36c944c3138125605d28c0a3a1338be9 */lib/firmware/rt2870.bin
While researching firmware alternatives I wrote to Boris to see if he was aware of problems similar to mine. He encouraged me to contribute my experience to the forum. In the meantime, I discovered what I believe to be the true cause of the erratic behaviour of the RT5370 adapter: heat-related hardware failure.
The last time the RPI locked up, I powered it off and removed the adapter. It was very hot to the touch. Based on my description of the symptoms, the retailer exchanged the malfunctioning adapter for a new one. This time I'm using an unpowered USB hub to make it easy to monitor its operating temperature. I booted it up on Friday evening and it's been operating flawlessly since then. By tomorrow evening it will have surpassed its predecessor. I will report back if the reliability changes.
Why a cautionary tale? Using a search engine to match error messages can draw you to incorrect conclusions. I was convinced, by the similarity of my symptoms with those reported by others, that I needed updated firmware. Luckily, I was unable to find a more-recent version before my hardware failed.
[edit: typos]