Template causes MB lockup on 3.9 & 4.0 - works fine on 3.8 **solved**

All about the standard Meteobridge devices based on mobile routers from TP-Link, D-Link, ASUS

Moderator: Mattk

SANdood
Senior Boarder
Senior Boarder
Posts: 60
Joined: Wed Jan 28, 2015 1:36 pm

Template causes MB lockup on 3.9 & 4.0 - works fine on 3.8 **solved**

Post by SANdood »

I have an application built on SmartThings and Hubitat that uses an HTML GET template to collect data from the MeteoBridge every 60 seconds. This is running on two separate platforms in my house for a couple of months, but started having issues at the beginning of May sometime.

After doing some testing, it appears that versions 3.9 (May 5) and the latest version 4.0 are suffering some sort of memory corruption or process lockup after these apps run for 2-3 hours. At some point, the MeteoBridge simply stops replying to the HTML GET requests from both apps - at the exact same time. If I reboot the MB, the apps will continue again for a while until the next time the MB stops responding.

I went back to version 3.8 (Jan 2) and ran both apps overnight - without failure...

So, while I know this isn't scientific, I lean towards suspecting that something introduced since the January 3.8 version is causing problems. I suspected perhaps the new OpenWeatherMap support, but the issue still occurs on 3.9 and 4.0 with that turned off.

(On a side note - I'm still hoping that an MB update implements template isolation SOON - having multiple apps using GET requests is definitively risky since every request uses the same temporary file names for the template and template responses, and these requests are taking upwards of 7-10 seconds each before my app gets a response - that's a big window for template collisions...I don't think this is locking up the MB, because v3.8 works, but it is still a concern that I have voiced earlier).

My platform info:

Platform: TL-MR3020V3 (no USB hub)
RAM: 60592 kB total, 20532 kB free (66% used)
SW Version: Meteobridge 3.8 (Jan 2 2019, build 1855), FW 1.3
Uptime: 0 hours, 6 minutes Buffer: 3 items (0%)

SANdood
Senior Boarder
Senior Boarder
Posts: 60
Joined: Wed Jan 28, 2015 1:36 pm

Re: Template causes MB lockup on 3.9 & 4.0 - works fine on 3.8

Post by SANdood »

If it helps, this is the template I am using - works on 3.8, MB running 3.9 or 4.0 (latest) stops responding to the request after 2-3 hours.

http://192.168.1.112/cgi-bin/template.c ... 0,"version":[mbsystem-swversion:1.0],"lat":[mbsystem-latitude:""],"lon":[mbsystem-longitude:""],"tz":"[mbsystem-timezone:]","current":["[M]/[D]/[YY]","[H]:[mm]:[ss] [apm]",[th*hum-act=.2:""],[thb*hum-act=.2:""],[th*temp-act=F.2:""],[th*dew-act=F.2:""],[th*heatindex-act=F.2:""],[wind*chill-act=F.2:""],[thb*temp-act=F.2:""],[thb*dew-act=F.2:""],[th*temp-dmax=F.2:""],[th*temp-dmin=F.2:""],[rain*total-daysum=in.3:""],[rain*total-sum1h=in.3:""],[sol*evo-act=in.3:""],[rain*rate-act=in.3:""],[thb*seapress-act=inHg.2:""],"[thb*seapress-delta10=enbarotrend:]","[wind*dir-act=endir:N/A]",[wind*wind-max10=mph.2:""],[wind*dir-act:""],[wind*wind-act=mph.2:""],[wind*wind-act=bft.0:""],"[mbsystem-daylength:]",[th*hum-dmax=.2:""],[th*hum-dmin=.2:""],"[mbsystem-sunrise:]","[mbsystem-sunset:]",[mbsystem-daylength=mins.0:""],[uv*index-act:""],[sol*rad-act:""],[mbsystem-lunarage:""],[mbsystem-lunarpercent:""],[mbsystem-lunarsegment:""],"[mbsystem-moonrise:]","[mbsystem-moonset:]",[mbsystem-isday=.0:""]]}&contenttype=application/json

This same template is sent by two different home automation hubs (SmartThings and Hubitat) every 60 seconds, but the two are not synchronized to send at different times (impossible to reliably accomplish).

It takes MB about 4.8 seconds to process and return response to this request.

PS. The template is actually transformed by the hub platform to be URL Encoded:

http://192.168.1.112/cgi-bin/template.c ... ion%2Fjson

User avatar
admin
Platinum Boarder
Platinum Boarder
Posts: 6257
Joined: Mon Oct 01, 2007 10:51 pm

Re: Template causes MB lockup on 3.9 & 4.0 - works fine on 3.8

Post by admin »

I tested with most recent version on a NAN SD. There were 3 processes requesting data by calling the Meteobridge template feature via wget in parallel. Meteobridge drops some of the requests because of overload. Dropped requests have an empty return. But I can't see any lockup.

Problem can be that Meteobridge runs out of RAM when too many requests are stacked up.

User avatar
admin
Platinum Boarder
Platinum Boarder
Posts: 6257
Joined: Mon Oct 01, 2007 10:51 pm

Re: Template causes MB lockup on 3.9 & 4.0 - works fine on 3.8

Post by admin »

I checked versions and discovered that newest code has not been deployed to all platforms. I just released an update, please give that a try.

SANdood
Senior Boarder
Senior Boarder
Posts: 60
Joined: Wed Jan 28, 2015 1:36 pm

Re: Template causes MB lockup on 3.9 & 4.0 - works fine on 3.8

Post by SANdood »

Thanks for the quick response!!!

OK, I updated and restarted the two processes (EDT).

FWIW, the MB isn't returning an empty response, at least, not before Hubitat decides that the request has timed out. I think the timeout period on both platforms is 10 seconds for a local hubAction request, and (unfortunately) I can't change it or catch the error (I'm working with Hubitat on that). What is the timeout period in MB?

Also, I have an MR3020V3, and there is plenty of available RAM. When I log into OpenWRT, it says that only 67-70% of RAM is used (almost 20000kB free) and a load factor no higher than 0.74.

Hopefully the version you pushed fixes the problem (it will take several hours before I will know), but you might want to look into if/why MB is discarding requests with so much memory still available...

SANdood
Senior Boarder
Senior Boarder
Posts: 60
Joined: Wed Jan 28, 2015 1:36 pm

Re: Template causes MB lockup on 3.9 & 4.0 - works fine on 3.8

Post by SANdood »

GOOD NEWS - the latest 4.0 update has run without failing for 8.5 hours...

SANdood
Senior Boarder
Senior Boarder
Posts: 60
Joined: Wed Jan 28, 2015 1:36 pm

Re: Template causes MB lockup on 3.9 & 4.0 - works fine on 3.8

Post by SANdood »

Well, it looks like I spoke too soon...

After almost 13 hours, the MB stopped responding to most of the cgi-bin/template requests from SmartThings, and ALL requests from Hubitat (same code and same template request running on both). They both stopped working at precisely the same point in time.

With 3.8, they both ran more than 36 hours without a single missed request from either platform.

I guess that something still isn't right with the latest 4.0.

Hubitat times out after 10 seconds without a response from the MB. SmartThings doesn't timeout, but it also never gets a response back from the MB (both are making asynchronous hubAction requests). The code makes another cgi-bin/template request every 60 seconds, independent of whether the prior request succeeded or not - but on both platforms, once they miss one response, it appears that the MB never, ever responds again to their request(s) - until I reboot the MeteoBridge, then my code on both platforms starts working again...

The MB System page shows 19548kB free.

Logging into OpenWRT, I see the following:

Load average 0.02, 0.04, 0.00
Total available memory 16056kB / 60592 kB (27%)
Free memory 13428 (22%)
Buffered memory 2940 (4%)

Please let me know if you need additional information...I can't run 3.8 continuously because of other dependencies on newer MB versions, and I'd also prefer not to have to rebooth the MB every 12 hours to keep things working. So, I hope the problem can be identified and repaired...

Thanks!
Barry

User avatar
admin
Platinum Boarder
Platinum Boarder
Posts: 6257
Joined: Mon Oct 01, 2007 10:51 pm

Re: Template causes MB lockup on 3.9 & 4.0 - works fine on 3.8

Post by admin »

It would help if you drop me a note when it hangs and then allow me to login. This way I can look myself. Apart from that I have no idea why it should work 13 hours and then stop working.

SANdood
Senior Boarder
Senior Boarder
Posts: 60
Joined: Wed Jan 28, 2015 1:36 pm

Re: Template causes MB lockup on 3.9 & 4.0 - works fine on 3.8

Post by SANdood »

Will do...

SANdood
Senior Boarder
Senior Boarder
Posts: 60
Joined: Wed Jan 28, 2015 1:36 pm

Re: Template causes MB lockup on 3.9 & 4.0 - works fine on 3.8

Post by SANdood »

I PM'd you last night - any chance you can look at this soon, so that I can revert back to 3.8 and get things working again?

User avatar
staze
Expert Boarder
Expert Boarder
Posts: 136
Joined: Fri Jan 18, 2019 2:56 am
Contact:

Re: Template causes MB lockup on 3.9 & 4.0 - works fine on 3.8

Post by staze »

I'm seeing same issue. 4.1 seems to persist with this issue.

Reverting back to 3.8 for now... =(
Davis Vantage Vue, Weather Envoy, Meteobridge Nano SD

User avatar
admin
Platinum Boarder
Platinum Boarder
Posts: 6257
Joined: Mon Oct 01, 2007 10:51 pm

Re: Template causes MB lockup on 3.9 & 4.0 - works fine on 3.8

Post by admin »

There have been changes how the polling of template requests is handled since 3.9. Changes were necessary to avoid problems, but may be there is a side effect I have not thought through.

Please post me your template.cgi request that stalls after a while and I will try to reproduce the error on my system. What platform are you using? MR3020?

template that you say does not work reliable, right?

Code: Select all

template={"timeGet":1558659490070,"version":[mbsystem-swversion:1.0],"lat":[mbsystem-latitude:""],"lon":[mbsystem-longitude:""],"tz":"[mbsystem-timezone:]","current":["[M]/[D]/[YY]","[H]:[mm]:[ss] [apm]",[th*hum-act=.2:""],[thb*hum-act=.2:""],[th*temp-act=F.2:""],[th*dew-act=F.2:""],[th*heatindex-act=F.2:""],[wind*chill-act=F.2:""],[thb*temp-act=F.2:""],[thb*dew-act=F.2:""],[th*temp-dmax=F.2:""],[th*temp-dmin=F.2:""],[rain*total-daysum=in.3:""],[rain*total-sum1h=in.3:""],[sol*evo-act=in.3:""],[rain*rate-act=in.3:""],[thb*seapress-act=inHg.2:""],"[thb*seapress-delta10=enbarotrend:]","[wind*dir-act=endir:N/A]",[wind*wind-max10=mph.2:""],[wind*dir-act:""],[wind*wind-act=mph.2:""],[wind*wind-act=bft.0:""],"[mbsystem-daylength:]",[th*hum-dmax=.2:""],[th*hum-dmin=.2:""],"[mbsystem-sunrise:]","[mbsystem-sunset:]",[mbsystem-daylength=mins.0:""],[uv*index-act:""],[sol*rad-act:""],[mbsystem-lunarage:""],[mbsystem-lunarpercent:""],[mbsystem-lunarsegment:""],"[mbsystem-moonrise:]","[mbsystem-moonset:]",[mbsystem-isday=.0:""]]}&contenttype=application/json

User avatar
staze
Expert Boarder
Expert Boarder
Posts: 136
Joined: Fri Jan 18, 2019 2:56 am
Contact:

Re: Template causes MB lockup on 3.9 & 4.0 - works fine on 3.8

Post by staze »

I'm like sandood, and using a MB Nano SD. Also using his code (Hubitat or ST grabbing from the MB). Let me ping him and see if he can answer the query question more authoritatively.
Davis Vantage Vue, Weather Envoy, Meteobridge Nano SD

User avatar
admin
Platinum Boarder
Platinum Boarder
Posts: 6257
Joined: Mon Oct 01, 2007 10:51 pm

Re: Template causes MB lockup on 3.9 & 4.0 - works fine on 3.8

Post by admin »

When template requests are beginning to stack up, Meteobridge returns an empty string for some requests, that have gone through a timeout period of 13 seconds. Does your processing handle that fine? Or does it endlessly wait for data? That way your post processing might stuck instead of the Meteobridge.

SANdood
Senior Boarder
Senior Boarder
Posts: 60
Joined: Wed Jan 28, 2015 1:36 pm

Re: Template causes MB lockup on 3.9 & 4.0 - works fine on 3.8

Post by SANdood »

Well, a 13 second timeout that responds with an empty string explains a lot.

On the Hubitat platform, they have enforced a hard 10 second timeout for asynchronous HTTP calls (which my code uses). So, I get an (uncatchable) hard time out error from Hubitat well before your code sends a null string. I'm researching to see if I can extend the timeout, but I doubt it.

The secondary problem is returning a null string instead of a valid HTTP error status code. My code only accepts responses with 200-Ok reponses; I should think your code would return a valid response with a 503-Service Unavailable or a 500-Internal Server Error, and not just a blank string. And if it could be returned in LESS than 10 seconds, my code could handle appropriately.
503 Service Unavailable
The server is not ready to handle the request. Common causes are a server that is down for maintenance or that is overloaded. Note that together with this response, a user-friendly page explaining the problem should be sent. This responses should be used for temporary conditions and the Retry-After: HTTP header should, if possible, contain the estimated time before the recovery of the service. The webmaster must also take care about the caching-related headers that are sent along with this response, as these temporary condition responses should usually not be cached.
The problem compounds itself in that I can't catch the error, so that code just makes the exact same request a minute later, which generally seems to cause the timeout to happen again. Now this might be an issue with Hubitat (I'll try to work with them on this too), but once we get into the "Read timed out" condition, it seems to NEVER clear unless we reboot the MeteoBridge...

Your assistance in resolving this is most appreciated...

Post Reply