"-age" modifier bug in template variables?

All about the standard Meteobridge devices based on mobile routers from TP-Link, D-Link, ASUS

Moderator: Mattk

Post Reply
dolfs
Junior Boarder
Junior Boarder
Posts: 37
Joined: Mon Apr 20, 2015 11:26 am

"-age" modifier bug in template variables?

Post by dolfs »

I have an alarm event (one time alarm) setup to activate when I get bad pressure data (Raise: [thb0seapress-act=inHg.2:-999] <= 25), and clear once it goes back in range (Clear: [thb0seapress-act=inHg.2:-999] > 25). This should catch both not getting data at all, as well as extremely low values.

When it triggers I send an email that contains, among other things: ([thb0seapress-age:0] s) with the idea that it communicates how long ago it last saw good data.

Ever since this was installed, whenever I get an email it says "600 s" or "601 s", never anything else. This would suggest that when it triggers I already have not seen any good data for 10 minutes. I also have detailed logs of every single upload, and tonight I received an alarm email at 07:25:17 (Email received time), that again said 601s. When I checked the log I could see that the first upload from the MeteoBridge with the value -999 was stamped 07:25:10 and the one before, with good data was stamped 07:24:53. So I would have expected the value to be 17 seconds.

Am I misunderstanding how "-age" works, or is this a bug? Running Meteobridge 3.2 (May 29 2017, build 11222), FW 1.4
User avatar
admin
Platinum Boarder
Platinum Boarder
Posts: 7874
Joined: Mon Oct 01, 2007 10:51 pm

Re: "-age" modifier bug in template variables?

Post by admin »

I just tested my own MB with this URL to evaluate the template variable you are referencing to:

Code: Select all

http://192.168.123.239/cgi-bin/template.cgi?template=age:[thb0seapress-age:0]secs
It returns various results, well as expected:

Code: Select all

age:13secs
dolfs
Junior Boarder
Junior Boarder
Posts: 37
Joined: Mon Apr 20, 2015 11:26 am

Re: "-age" modifier bug in template variables?

Post by dolfs »

So do I, but the scenario in which I see the issue is the thb0seapress sensor actually returns invalid, which in my expression will then become -999 and that triggers the raise condition. Only then will I see what [thb0seapress-age:0] converts to, and in those moments my emails contain 600 or 601.

This is not a situation I can (re)create with your template url method. It does happen occasionally with my situation (reading data from an ObserverIP module), but when it does I can only capture it through an email and it is wrong.

The value of 600/602 is noticeable because it corresponds to 10 minutes, but I don't set ten minutes anywhere. Notice that my raise condition is not that the age is over a certain limit. I trigger only on values < 25 which will only happen if either my sensor is starting to produce ridiculously low values, or there is no sensor data at all, causing a representation of -999 which raises the condition.

What causes MB to decide that a sensor has not provided valid data? Difficulty reading a value from the ObserverIPs Live Data page in this case, or not having been able to read this page for 10 minutes? Otherwise I cannot explain the 600 seconds.
dolfs
Junior Boarder
Junior Boarder
Posts: 37
Joined: Mon Apr 20, 2015 11:26 am

Re: "-age" modifier bug in template variables?

Post by dolfs »

I started logging some of the ages of my sensor data earlier tonight and noticed something peculiar right after I rebooted my ObserverIP (columns are: time, th0temp-age, thb0temp-age, thb0seapress-age, mbsystem-lastdata, mbsystem-lastgooddata):

Code: Select all

06:46:12	90	90	90	34	90
06:46:29	9	9	9	9	9
06:46:47	27	27	27	5	27
06:47:05	45	45	45	23	45
06:47:23	63	63	63	41	63
06:47:41	12	12	12	12	12
06:47:59	30	30	30	2	3
06:48:17	48	48	48	20	21
06:48:35	66	66	66	38	39
06:48:52	15	15	15	13	15
06:49:08	31	31	31	29	31
06:49:26	49	49	49	18	49
06:49:44	67	67	67	36	67
06:50:00	15	15	15	10	15
06:50:18	33	33	33	28	33
06:50:36	51	51	51	18	51
06:50:54	69	69	69	36	69
06:51:11	13	13	13	8	12
06:51:29	31	31	31	26	30
06:51:47	49	49	49	18	48
06:52:05	67	67	67	36	66
06:52:22	3	3	3	3	3
06:52:40	3	3	3	3	3
06:52:58	21	21	21	6	21
06:53:15	14	14	14	6	6
06:53:32	13	13	13	6	6
06:53:50	12	12	12	6	12
06:54:07	12	12	12	12	12
06:54:24	15	15	15	15	15
06:54:40	12	13	13	3	12
06:54:58	30	31	31	21	30
06:55:14	13	13	13	12	12
06:55:32	31	31	31	1	30
06:55:49	48	48	48	1	6
06:56:07	6	66	66	1	6
06:56:24	3	83	83	2	3
06:56:40	3	99	99	3	3
06:56:58	4	117	117	3	3
06:57:15	3	134	134	3	3
06:57:33	3	152	152	3	3
06:57:51	3	4	4	3	3
06:58:09	4	4	4	3	3
06:58:26	9	3	3	0	3
06:58:42	1	1	1	0	0
06:58:59	6	6	6	1	6
06:59:17	6	6	6	1	6
06:59:35	6	6	6	1	6
06:59:53	6	6	6	1	6
07:00:11	6	6	6	1	6
07:00:29	7	2	2	2	2
07:00:46	5	0	0	0	0
07:01:02	4	4	4	3	3
07:01:19	3	3	3	2	3
07:01:37	3	3	3	3	3
07:02:13	3	3	3	3	3
07:02:31	3	3	3	3	3
Just looking at the last column you'll notice lastgooddata was creeping up to 90 seconds, then some data would be captured, it would built up again, etc. I discovered it was because my ObserverIP was slow in responding so I rebooted it around 05:55:35 and after that you see much better behavior. This also shows the value of being able to observe and alarm on such variables (but one must understand them, hence my questions below).

But you'll also notice that as good data is now retrieved every 3-6 seconds (last column) the age of the sensor data keeps increasing until it reaches 150+, then goes down as well and stays in the same 3-6 seconds range.

So it appears that the last data and last good data variables track a little different from what I understood from the documentation. It sees good data 06:55:49 minus 6 seconds, and it even saw other data as little as one seconds before that, but that does not qualify as "good" data. And even though there was "good" data 6 seconds before, the age of the sensors is still significantly more than those 6 seconds. I am pretty sure that because of the reboot, the ObserverIP is displaying only what I would consider good data, so what might be happening that causes MeteoBridge to think it is not good, and what causes the discrepancy?

Even more puzzling is that temperature age and pressure age continue increasing, even though "good data" is being received. Should that not reset the age of these aging variables? Starting at 06:57:51 all ages appear to be reset and ages are more of less in sync with the "last good data" interval, but still the ages do not track identical in all cases?

I do not understand why not. From what I understand MeteoBridge "scrapes" all the values from the ObserverIP "Live Data" page. Does that not mean that all variables should have the same age (in the absence of bogus data being presented, which I know it was not)?
dolfs
Junior Boarder
Junior Boarder
Posts: 37
Joined: Mon Apr 20, 2015 11:26 am

Re: "-age" modifier bug in template variables?

Post by dolfs »

I think I figured out why I was always getting 600 s when the sensor indicated invalid, and the observations might be helpful for those of you who would like to tinker with alarms on MeteoBridge. If you don't care about the how I discovered it, the conclusions are at the end.

I think MeteoBridge does not declare "sensor data invalid" when it cannot read the ObserverIP Live Data page right away. It only does so after the "Tolerated Age of data" is exceeded. I am pretty sure about my conclusion (see below for detail), but I am now verifying this by changing its setting to 5 minutes from the prior 10 minutes (600 s). I'll have to wait until a problem develops because I do not want to switch of my ObserverIP and have no uploads for 5 minutes.

This also suggests that in the period between 0 and 10 minutes of not being able to read a valid pressures it continues to supply the last known value for uploads.

Here is what I found from my private logging which happens (a) direct from my ObserverIP, an (b) through MeteoBridge, and this happens to both wunderground.com and AmbientWeather.net. My ObserverIP suffers data outages (as precise as I can, bounds indicated) for wunderground.com:
6:56PM < t < 7:01PM to 7:02PM ≤ t < 7:06PM (min duration is 1m, max is 10m)
7:12PM < t < 7:15PM to 7:32PM ≤ t < 7:37PM (min duration is 17m, max is 25m)
7:42PM < t < 7:47PM to 8:13PM ≤ t < 8:18PM (min duration is 26m, max is 36m)
The MeteoBridge data shows a gap only from 7:22PM < t ≤ 7:27 to 7:32PM ≤ t < 7:37PM (I have data at 7:20 and at 7:40).

At the time, the tolerate setting was 10m. So, within the intervals stated the first wunderground gap is too short for MB to go to "bad data" status, the second is long enough and MB would have attained "bad data" status no sooner than 7:22PM and I received the email at 7:25 so that seems to be consistent. I do not receive email when the condition clears, but from the above we know it is no later than 7:37PM, when MB resets its "bad data" timer, which remains reset at least until 7:42PM when the third gap starts). That gap is long enough to trigger MB and I do indeed receive an email time stamped at 7:57PM.

During the true gaps, I can see MB delivering constant data for the indoor sensors, also consistent with my assumption above for supplying last known value. One interesting observation (to which I'll come back below) is that there is no gap in my outdoor sensor values.

For this particular station (not true for hardware where sensors are read out independently), where all settings are retrieved at the same time, this also implies that the "age" of all sensors should be the same and correspond to the "lastgooddata" value, and that is generally true in the supplied sample data (first three columns). It does seem that a 1 second (rounding/precision?) offset gets introduced starting at 06:51:11 (BTW this appears roughly in the same time frame as the data gaps above, but this was observed on an entirely different data and is not related to those gaps). This gets resolved after the ObserverIP reboot shortly before 06:52:22. The above theory, however, cannot explain what happens until 06:57:51 where the various ages do not line up.

That last bit can only be explained if in that period, the Live Data page displays invalid data for the indoor temp and en pressure, but not for the outdoor temp. This is indeed possible if we assume that after a reboot it takes the ObserverIP longer to (re)connect with the indoor sensors than the outdoor ones. Observing, however, that "lastgooddata" remains in sync with the outdoor temperature also means that "lastgooddata" is reset as soon as a single sensor value is obtained successfully. It does not mean that all sensors were obtained successfully.

I had two separate alarms set, but only the second one triggered email:
  • [mbsystem-lastgooddata] >= 300
  • [thb0seapress-act=inHg.2:-999] <= 25
The first one never triggered because throughout the whole period, outdoor sensor values were valid. The second one triggered after 10 minutes of invalid data because then the default was supplied. "lastgooddata" likely never exceeded 10 or 15 seconds.

All of this leads me to the following extended explanation of the template variables:
  • [mbsystem-lastgooddata] corresponds to the most recent time at least one sensor was read successfully and recorded. Its value might be slightly less than the age of that sensor due to internal delays. It seems this time stamp is taken at a different moment from updating sensor ages. As long as at least one sensor is regularly and successfully read, this will also regularly be reset to a small value. It is not (yet) clear to me whether this is also delayed by the tolerance interval.
  • [mbsystem-lastdata] seems to correspond to the most recent time at least one sensor value was received, valid or not. This seems to always be set after lastgooddata has been updated because it is often smaller. Why it might be so different is not clear to me.
  • [<some sensor>-act:-999] will not produce -999 until the "tolerance interval" has expired. This implies that depending on seeing such default values is not reliable to detect individual sensor values quickly (unless you consider the 5 minute minimum tolerance interval quickly).
  • [<some sensor>-age:9999] > <max age> is a trigger condition that will work to detect specific sensors failing, where <max age> is your specific tolerated age, and the default is larger than that max-age. The latter is only relevant if your <max age> is larger than any test you might have set for "lastgooddata".
User avatar
admin
Platinum Boarder
Platinum Boarder
Posts: 7874
Joined: Mon Oct 01, 2007 10:51 pm

Re: "-age" modifier bug in template variables?

Post by admin »

Thanks for going that deeply. I think your findings are correct.
Post Reply