System Data7 Understanding (Heartbeat)

Discussion of the Meteohub software package

Moderator: Mattk

Post Reply
cbhiii
Gold Boarder
Gold Boarder
Posts: 306
Joined: Fri Feb 15, 2008 2:02 am
Location: Michigan, USA
Contact:

System Data7 Understanding (Heartbeat)

Post by cbhiii »

Boris (or anyone),

Can you help me understand better how these new variables relate to one another?

I understand that this is the feature I was hoping for which shows system availability :) but I want to make sure I'm using the variables properly.

If it wouldn't be too much trouble can someone explain the differences of each on? Particularly the rise, fall and perminute readings.

I have data7 setup on Heartbeat on my system. Here are the variables.

[*]day1_data7_value_num 1.00
day1_data7_valuemin_num 1.00
day1_data7_valuemax_num 1.00
day1_data7_value_rise 0
day1_data7_value_fall 0
day1_data7_valuesum_num 0
day1_data7_valuesum_int 755
day1_data7_valuesum_perminute 0.52
day1_data7_valuesum_delta 0.00
Station: Davis Vantage Pro2 Plus
Hardware: Raspberry Pi 2 (Meteohub status)
skyewright
Platinum Boarder
Platinum Boarder
Posts: 873
Joined: Fri Jan 25, 2008 6:27 pm
Location: Isle of Skye, Scotland

Re:System Data7 Understanding (Heartbeat)

Post by skyewright »

cbhiii wrote:Particularly the rise, fall and perminute readings.
I'd have said that the per minute was the "increase in count over a period divided by the number of minutes in the period" calculated over the relevant time frame, but with one of my plug-ins using a data value that doesn't seem to work.

My data18 sensor is an accumulating counter (i.e. it holds a total number "events", and each time an "event" happens the total is increased by 1.

Ever since assigning the sensor the value has (intentionally) remained at 1118.00, e.g

20090511094350 data18 111800

Every data18 record in the raw data has that same value. Sensor readings arrive around once a minute, sometimes maybe slightly more than a minute between.

A 5 minute bucket graph of data18, shows a straight line at 1118.00, which is just what I expected.

Based on my initial expectation, I thought a 5 minute bucket graph of data18-sum/min, would show a rate of zero (because there has been no change), but instead it show a line that hops occasionally between around 1118.0 and around 900.

The ~900 makes me thing that sum/min works very differently to my expectation. I think the ~900 represents a 5 minute period in which there were only 4 sensror readings and that the value is the result of a calculation of (4 x 1118.00) / 5 = 894.4, i.e. "the sum of the readings received in the period divided by the number of minutes in the period".

Adding data18-sum to the graph, ties in with that, because it produces a graph that hops between 5590 (i.e. 5 x 1118) and 4472 (i.e. 4 x 1118), so I think that "sum" seems to be "the sum of the readings received in the period"

I'm guessing that this might be useful for a counter device of some sort that either sends a "1" every time the counted event occurs, or where the sensor value was "the number of events since the last reading". If the latter, I think that sending the same number of readings each minute might be a necessity for consistency?

Getting back to the subject of this thread, the "Heartbeat" would fit into that model as a "device" that sends a reading of "1" every minute, so a "sum/min" of "1.00" would indicate "meteohub 'alive' throughout the bucket period". If meteohub was 'alive' for, say, 45 minutes in a given hour, then with hourly buckets the "sum/min" would be "0.75" (i.e. 45/60).

Am I getting close Boris?
Is that the intended behavour?

On first hearing of the feature what I was expecting was a device more like a rain gauge, i.e. something which reports a gradually incrementing total count, with the per minute feature deriving a rate from the increments.
sevenless
Gold Boarder
Gold Boarder
Posts: 160
Joined: Wed Jul 02, 2008 7:35 pm
Location: Seattle WA, USA

Re:System Data7 Understanding (Heartbeat)

Post by sevenless »

If I correctly understood Boris' original post about heartbeat in the update 4.2f thread, the sum-perminute measure is only really useful over longer periods of time.

This your reading of day1_data7_valuesum_perminute 0.52 means that your system has had a heartbeat for 52 percent of the day. This isn't too useful with day1 since it appears to be calculating the number of heartbeat's its seen today (presumably that's what _int is reporting?) divided by 1440 (minutes in a day), and the day itself isn't over yet, so it's tough to tell from this readout if the system has had any downtime today. It's only really useful in graphing over multiple day spans, such as the example he shows in that same thread.

In his example he uses buckets of one day, and if the system had no downtime that day the daily total is equal to one (since every single minute of the day there was a heartbeat, and thus heartbeats = 1440.) When there is downtime, you lose a heartbeat for every minute the system was not running, so the perminute total is less than one.

This is essentially the same as the theory skyewright posits for his '4 data18 beats in 5 minutes' supposition above for his other sensor.

Essentially, sum-perminute is taking the number of times the sensor pings the system and gives you the average over that given time period. It doesn't take into account any values within the data, and only cares that it actually heard from the sensor.

Thus, it's probably not directly useful for anything beyond confirming that the sensor remains "alive", but once you establish a baseline of how frequently it *should* be pinging the Meteohub when everything is working normally, you should be able to calibrate your readout such that you can determine when it is functioning normally, and when it's had some downtime or connection problems.

The confusing part for other sensor types is in the concept of 'sum/minute' since this value is apparently adding a pip every single time it receives a sensor reading, but instead of simply adding a '1', it's actually adding the value the sensor is reporting at any given time, which might result in some very large values for _int depending on each sensor's output value, especially if a sensor pings the Meteohub with new data multiple times every minute.

Of course that still doesn't explain what's reported in _rise, _fall, and _delta, although delta is usually a mathematical symbol standing for change, so that might be what skyewright was expecting to see as a flat-line zero graph for data18, and it would explain why that's zero for heartbeat, since it always sends a '1'.
User avatar
admin
Platinum Boarder
Platinum Boarder
Posts: 7879
Joined: Mon Oct 01, 2007 10:51 pm

Re:System Data7 Understanding (Heartbeat)

Post by admin »

Yes, heart beat is intended to say something about availability of the system during the time period under consideration. If you take the "sum/min (sum per minute)" evaluation of the heart beat sensor for a time bucket of a day the resulting number will tell you during what percentage of theat time frame Meteohub was up and running. If you get 1.0 it has been working all the time, when it is "0.9" it has just been up 90% of the time. This is a measure for system availability.

By using the "falling edges" evaluation on "uptime" data sensor, you can count and display the number of reboots. Used with rain totals, it indicates reset of the rain sensor. "rising edges" evaluation on "rain rate" might indicate how often it has started raining the day, etc. Rise and fall counters are triggered by an event, where the value switches from falling to rising and vice versa. When values stay flat no trigger is given.

Example in announcement of 4.2f might also shed some light on this.

Next update will also have an evaluation method called "deltasum" which takes an ever increasing counter reading (like rain total) and will line out the increment during that time frame. The evaluation can also handle resets of counter totals during that time and will reflect the total increase (so it is more than a comparison of first and last value of the time frame). Meteohub already uses this for rain totals and ignores rising jumps when beyond a certain level. Applying that for generic data, there will not be that limiting.
skyewright
Platinum Boarder
Platinum Boarder
Posts: 873
Joined: Fri Jan 25, 2008 6:27 pm
Location: Isle of Skye, Scotland

Re:System Data7 Understanding (Heartbeat)

Post by skyewright »

admin wrote:...Next update will also have an evaluation method called "deltasum" which takes an ever increasing counter reading (like rain total) and will line out the increment during that time frame...
Thank you Boris. That all sounds excellent.

Loads of ways of looking at the data, with different ways suited to different types of data. :cheer:
cbhiii
Gold Boarder
Gold Boarder
Posts: 306
Joined: Fri Feb 15, 2008 2:02 am
Location: Michigan, USA
Contact:

Re:System Data7 Understanding (Heartbeat)

Post by cbhiii »

Thank you for your responses.

I'm trying to look at this from a simple point of view, but I have found a problem.

In this variable:
[*]last24h_data7_valuesum_int 1434


I would expect to see 1440 as it's value, but it is only showing 1434. My Meteohub has been up and running for much longer than 24 hours with no problems or high usage, but my data7 counter only shows 1434 minutes worth of counts.

What is going on here?

I also notice the same thing for last15m (which reads 14) and last60m (which reads 59).
Station: Davis Vantage Pro2 Plus
Hardware: Raspberry Pi 2 (Meteohub status)
skyewright
Platinum Boarder
Platinum Boarder
Posts: 873
Joined: Fri Jan 25, 2008 6:27 pm
Location: Isle of Skye, Scotland

Re:System Data7 Understanding (Heartbeat)

Post by skyewright »

cbhiii wrote:I'm trying to look at this from a simple point of view, but I have found a problem.
<SNIP>
What is going on here?
Here at least the moment of issuing of the "Heart Beat" is gradually moving during the day.

Looking at Inspect Data and reducing it to just the Heart Beat, I see:

20090512080403 data7 100
20090512080504 data7 100
20090512080604 data7 100
20090512080704 data7 100
20090512080804 data7 100
20090512080904 data7 100
20090512081004 data7 100
20090512081104 data7 100
20090512081204 data7 100
20090512081304 data7 100
20090512081405 data7 100

i.e. (on my ALIX 1D) there's a shift of 1 second every 8-9 minutes. If that happens consistently then after around 8 hours there would be a minute that has no Heart Beat even though the system was fully up. That's less loss than you are seeing, but it may well be that the shift rate varies from Meteohub to Meteohub depending on configuration and loading.

Perhaps the Heart Beat needs 'anchoring' to around a point safely inside the minute, e.g. at around 30 seconds? With that, if an unusual processing load caused the odd Heart Beat to be even a few seconds late that would not matter so long as it the loop tried to drag the next Heart Beat back toward the 30.

Edit: Here's an example of the slipping Heart Beat leaving a missed minute:
20090512072759 data7 100
20090512072859 data7 100
20090512073000 data7 100
20090512073100 data7 100

i.e. no Heart Beat during the minute 0829, even though the system was running just fine.
cbhiii
Gold Boarder
Gold Boarder
Posts: 306
Joined: Fri Feb 15, 2008 2:02 am
Location: Michigan, USA
Contact:

Re:System Data7 Understanding (Heartbeat)

Post by cbhiii »

That must be what is happening with my unit as well. I see a one second shift every 4-5 minutes. That must be why I'm missing 6 counts in my "last24h" totals.

I think an anchor of some sort to send the data at the top of each minute would work, but I can't imagine how that would be done reliably unless it's a cron job.

Here is my data showing a missed minute at 0207:

[*]20090512020158 data7 100
20090512020259 data7 100
20090512020359 data7 100
20090512020459 data7 100
20090512020559 data7 100
20090512020659 data7 100
20090512020800 data7 100
20090512020900 data7 100
20090512021000 data7 100
20090512021100 data7 100
20090512021200 data7 100

20090512021301 data7 100
20090512021401 data7 100
20090512021501 data7 100
20090512021601 data7 100
20090512021701 data7 100
20090512021802 data7 100
Station: Davis Vantage Pro2 Plus
Hardware: Raspberry Pi 2 (Meteohub status)
skyewright
Platinum Boarder
Platinum Boarder
Posts: 873
Joined: Fri Jan 25, 2008 6:27 pm
Location: Isle of Skye, Scotland

Re:System Data7 Understanding (Heartbeat)

Post by skyewright »

cbhiii wrote:...but I can't imagine how that would be done reliably unless it's a cron job.
Internally Boris probably has more precise ways to control the timing of Heart Beat than relying on cron... :)
User avatar
admin
Platinum Boarder
Platinum Boarder
Posts: 7879
Joined: Mon Oct 01, 2007 10:51 pm

Re:System Data7 Understanding (Heartbeat)

Post by admin »

You are all right. The program that generates heart beats is very lazy programmed. It sends a heart beat and goes down to sleep for 60 seconds. As a result we have a little drift. I changed that for the next release in an way that it synchronizes every minute so drift might be at maximum a second but will not grow over time. I expect availability reading will be precise from then on.
cbhiii
Gold Boarder
Gold Boarder
Posts: 306
Joined: Fri Feb 15, 2008 2:02 am
Location: Michigan, USA
Contact:

Re:System Data7 Understanding (Heartbeat)

Post by cbhiii »

Excellent!
Station: Davis Vantage Pro2 Plus
Hardware: Raspberry Pi 2 (Meteohub status)
cbhiii
Gold Boarder
Gold Boarder
Posts: 306
Joined: Fri Feb 15, 2008 2:02 am
Location: Michigan, USA
Contact:

Re:System Data7 Understanding (Heartbeat)

Post by cbhiii »

Now running 4.2h, but I still see an issue.

After rebooting the data for [last15m...] and [last60m...] were both good at 15 and 60, but I just checked later and now they are at 14 and 59 for some reason. The raw data looks good, but I don't know why the counts are off.

Is it possible that the counts differ because the process that updates the variables is also running at 00 seconds? Would telling that process to update the variables at 10 seconds after each minute help avoid this error? (I'm only guessing)


[*]last15m_data7_valuesum_int 14
last60m_data7_valuesum_int 59

15 min raw data is good:

[*]20090513052200 data7 100
20090513052300 data7 100
20090513052400 data7 100
20090513052500 data7 100
20090513052600 data7 100
20090513052700 data7 100
20090513052800 data7 100
20090513052900 data7 100
20090513053000 data7 100
20090513053100 data7 100
20090513053201 data7 100
20090513053300 data7 100
20090513053400 data7 100
20090513053500 data7 100
20090513053600 data7 100
20090513053701 data7 100
Station: Davis Vantage Pro2 Plus
Hardware: Raspberry Pi 2 (Meteohub status)
User avatar
admin
Platinum Boarder
Platinum Boarder
Posts: 7879
Joined: Mon Oct 01, 2007 10:51 pm

Re:System Data7 Understanding (Heartbeat)

Post by admin »

This minor bug only affects "last..." sensor readings. Will be fixed with next update.
Post Reply