Page 1 of 1

Out of Memory error on ebox-3300 after 30 days running

Posted: Wed Aug 19, 2009 1:52 pm
by cbhiii
Boris,

My eBox-3300 has been running for 30.5 days until this morning at 6:30am when Meteohub stopped working.

I could not log in via ethernet. Meteohub no longer sent and received data. I could not even login on the console keyboard plugged into it (it wouldn't accept my password, but it did respond). The only message on the console was "Out of memory ... process 1980 killed"

I do not believe this is my hardware, because it has been working fine for the past 30 days and when it breaks or locks up is does so usually at 6:30am when some mystery process runs. Before the 30 days this happened twice or so within a week. I believe it is something in software w/ 4.4 code.

You can see on the graph that the swap file memory ran down starting at 6:30am when some process ran. As time went on the swap file ran out.

After 6:30am I started receiving alarm emails from Meteohub with these kinds of errors:

[*]FTP Upload failed: cp /var/run/meteohub/mobile1.out /data/myweb/uploads/mobile.html
[*]FTP Upload failed: mv /var/run/meteohub/m-HS9.png /var/run/meteohub/uploads/m-HS.png


At or about 7:30am the Meteohub software stopped and locked the system and nothing more was sent.

Is anyone else running v4.4a on eBox-3300 for over 30 days or have any freezing lockup issues with eBox-3300 or other x86 units?

Any help would be appreciated. Thank you. Image

Re:Out of Memory error on ebox-3300 after 30 days running

Posted: Wed Aug 19, 2009 10:21 pm
by cbhiii
By some miracle... the system log has something potentially useful in it!

This is what is in my log starting at 6:30am until the unit stopped. I hope it helps.
Aug 19 06:30:13 meteohub kernel: init invoked oom-killer: gfp_mask=0x1201d2, order=0, oomkilladj=0
Aug 19 06:30:13 meteohub kernel: init invoked oom-killer: gfp_mask=0x1201d2, order=0, oomkilladj=0
Aug 19 06:35:56 meteohub kernel: Pid: 1, comm: init Not tainted 2.6.24-etchnhalf.1-486 #1
Aug 19 06:35:56 meteohub kernel: Pid: 1, comm: init Not tainted 2.6.24-etchnhalf.1-486 #1
Aug 19 06:35:57 meteohub kernel: [<c014d460>] oom_kill_process+0x53/0xf7
Aug 19 06:35:57 meteohub kernel: [<c014d460>] oom_kill_process+0x53/0xf7
Aug 19 06:35:57 meteohub kernel: [<c014d785>] out_of_memory+0x141/0x16c
Aug 19 06:35:57 meteohub kernel: [<c014d785>] out_of_memory+0x141/0x16c
Aug 19 06:35:57 meteohub kernel: [<c014f12e>] __alloc_pages+0x232/0x2c0
Aug 19 06:35:57 meteohub kernel: [<c014f12e>] __alloc_pages+0x232/0x2c0
Aug 19 06:35:58 meteohub kernel: [<c016b640>] do_lookup+0x4f/0x14e
Aug 19 06:35:58 meteohub kernel: [<c016b640>] do_lookup+0x4f/0x14e
Aug 19 06:35:58 meteohub kernel: [<c0150a3f>] __do_page_cache_readahead+0x9b/0x183
Aug 19 06:35:58 meteohub kernel: [<c0150a3f>] __do_page_cache_readahead+0x9b/0x183
Aug 19 06:35:58 meteohub kernel: [<c0150e73>] do_page_cache_readahead+0x49/0x56
Aug 19 06:35:58 meteohub kernel: [<c0150e73>] do_page_cache_readahead+0x49/0x56
Aug 19 06:35:58 meteohub kernel: [<c014caa5>] filemap_fault+0x156/0x319
Aug 19 06:35:58 meteohub kernel: [<c014caa5>] filemap_fault+0x156/0x319
Aug 19 06:35:59 meteohub kernel: [<c0154b6b>] __do_fault+0x57/0x32e
Aug 19 06:35:59 meteohub kernel: [<c0154b6b>] __do_fault+0x57/0x32e
Aug 19 06:35:59 meteohub kernel: [<c01563a8>] handle_mm_fault+0x29c/0x5fd
Aug 19 06:35:59 meteohub kernel: [<c01563a8>] handle_mm_fault+0x29c/0x5fd
Aug 19 06:35:59 meteohub kernel: [<c0115c7d>] do_page_fault+0x1f4/0x5cf
Aug 19 06:35:59 meteohub kernel: [<c0115c7d>] do_page_fault+0x1f4/0x5cf
Aug 19 06:36:00 meteohub kernel: [<c0170728>] sys_select+0x160/0x186
Aug 19 06:36:00 meteohub kernel: [<c0170728>] sys_select+0x160/0x186
Aug 19 06:36:01 meteohub kernel: [<c0115a89>] do_page_fault+0x0/0x5cf
Aug 19 06:36:01 meteohub kernel: [<c0115a89>] do_page_fault+0x0/0x5cf
Aug 19 06:36:05 meteohub kernel: [<c02b571a>] error_code+0x6a/0x70
Aug 19 06:36:05 meteohub kernel: [<c02b571a>] error_code+0x6a/0x70
Aug 19 06:36:05 meteohub kernel: [<c02b0000>] vcc_sendmsg+0x5/0x2a2
Aug 19 06:36:05 meteohub kernel: [<c02b0000>] vcc_sendmsg+0x5/0x2a2
Aug 19 06:36:05 meteohub kernel: =======================
Aug 19 06:36:05 meteohub kernel: =======================
Aug 19 06:36:06 meteohub kernel: Mem-info:
Aug 19 06:36:06 meteohub kernel: DMA per-cpu:
Aug 19 06:36:06 meteohub kernel: DMA per-cpu:
Aug 19 06:36:06 meteohub kernel: CPU 0: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Aug 19 06:36:06 meteohub kernel: CPU 0: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
Aug 19 06:36:06 meteohub kernel: Normal per-cpu:
Aug 19 06:36:06 meteohub kernel: Normal per-cpu:
Aug 19 06:36:06 meteohub kernel: CPU 0: Hot: hi: 90, btch: 15 usd: 42 Cold: hi: 30, btch: 7 usd: 18
Aug 19 06:36:06 meteohub kernel: CPU 0: Hot: hi: 90, btch: 15 usd: 42 Cold: hi: 30, btch: 7 usd: 18
Aug 19 06:36:06 meteohub kernel: Active:54626 inactive:4862 dirty:0 writeback:0 unstable:0
Aug 19 06:36:06 meteohub kernel: Active:54626 inactive:4862 dirty:0 writeback:0 unstable:0
Aug 19 06:36:06 meteohub kernel: free:744 slab:1337 mapped:338 pagetables:546 bounce:0
Aug 19 06:36:06 meteohub kernel: free:744 slab:1337 mapped:338 pagetables:546 bounce:0
Aug 19 06:36:06 meteohub kernel: DMA free:1076kB min:124kB low:152kB high:184kB active:5716kB inactive:4728kB present:16256kB pages_scanned:75618 all_unreclaimable? yes
Aug 19 06:36:06 meteohub kernel: DMA free:1076kB min:124kB low:152kB high:184kB active:5716kB inactive:4728kB present:16256kB pages_scanned:75618 all_unreclaimable? yes
Aug 19 06:36:06 meteohub kernel: lowmem_reserve[]: 0 238 238
Aug 19 06:36:06 meteohub kernel: lowmem_reserve[]: 0 238 238
Aug 19 06:36:07 meteohub kernel: Normal free:1900kB min:1908kB low:2384kB high:2860kB active:212788kB inactive:14720kB present:243840kB pages_scanned:763135 all_unreclaimable? yes
Aug 19 06:36:07 meteohub kernel: Normal free:1900kB min:1908kB low:2384kB high:2860kB active:212788kB inactive:14720kB present:243840kB pages_scanned:763135 all_unreclaimable? yes
Aug 19 06:36:07 meteohub kernel: lowmem_reserve[]: 0 0 0
Aug 19 06:36:07 meteohub kernel: lowmem_reserve[]: 0 0 0
Aug 19 06:36:07 meteohub kernel: DMA: 5*4kB 0*8kB 0*16kB 3*32kB 1*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1076kB
Aug 19 06:36:07 meteohub kernel: DMA: 5*4kB 0*8kB 0*16kB 3*32kB 1*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1076kB
Aug 19 06:36:07 meteohub kernel: Normal: 31*4kB 0*8kB 1*16kB 1*32kB 3*64kB 2*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1900kB
Aug 19 06:36:07 meteohub kernel: Normal: 31*4kB 0*8kB 1*16kB 1*32kB 3*64kB 2*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1900kB
Aug 19 06:36:07 meteohub kernel: Swap cache: add 151080, delete 151042, find 3471660/3484459, race 0+10
Aug 19 06:36:07 meteohub kernel: Swap cache: add 151080, delete 151042, find 3471660/3484459, race 0+10
Aug 19 06:36:07 meteohub kernel: Free swap = 0kB
Aug 19 06:36:07 meteohub kernel: Free swap = 0kB
Aug 19 06:36:07 meteohub kernel: Total swap = 200804kB
Aug 19 06:36:07 meteohub kernel: Total swap = 200804kB
Aug 19 06:36:07 meteohub kernel: Free swap: 0kB
Aug 19 06:36:07 meteohub kernel: 65536 pages of RAM
Aug 19 06:36:07 meteohub kernel: 0 pages of HIGHMEM
Aug 19 06:36:07 meteohub kernel: 1408 reserved pages
Aug 19 06:36:07 meteohub kernel: 10147 pages shared
Aug 19 06:36:07 meteohub kernel: 38 pages swap cached
Aug 19 06:36:07 meteohub kernel: 0 pages dirty
Aug 19 06:36:07 meteohub kernel: 0 pages writeback
Aug 19 06:36:07 meteohub kernel: 338 pages mapped
Aug 19 06:36:07 meteohub kernel: 1337 pages slab
Aug 19 06:36:08 meteohub kernel: 546 pages pagetables
Aug 19 06:36:08 meteohub kernel: Out of memory: kill process 1980 (loggerd) score 466 or a child
Aug 19 06:36:08 meteohub kernel: Out of memory: kill process 1980 (loggerd) score 466 or a child
Aug 19 06:36:08 meteohub kernel: Killed process 1981 (loggerd)
Aug 19 06:36:08 meteohub kernel: Killed process 1981 (loggerd)

Re:Out of Memory error on ebox-3300 after 30 days running

Posted: Wed Aug 19, 2009 10:40 pm
by cbhiii
Thank goodness for the self monitoring data fields!

Check this out.

The number of processes on Meteohub doubled from around 80 to 182 at 6:33am.

Processor load jumped up to 36.97 at 6:31am! Image

Re: Out of Memory error on ebox-3300 after 30 days running

Posted: Sun Nov 21, 2010 7:20 pm
by wfpost
My meteohub system on an ebox 4300 has similar problems.

I have connected an IP-Power-Switch controlling the meteohub system since half a year or so.
If meteohub does not response pings for a period longer than 6 minutes the switch power cycles the ebox and reboots it.

Since running on 4.6r (Build 4577) the situation somehow got worse.
The system is not running longer than a -10 days period.
The peaks with the orange line showing the freezing, rebooting, recalculation situation and that these are appearing more often since updatinmg to 4.6r

Anything that can be done?
Danke,
Wolfgang

Image

Re: Out of Memory error on ebox-3300 after 30 days running

Posted: Sun Nov 21, 2010 7:59 pm
by admin
I don't think that we have the same situation as the threads do tell from summer 2009. To my knowledge, the bug which filled RAM until Linux kernel has to kill processes randomly to get some RAM back, has been fixed last year.

Do the system data sensors report something interesting when ebox has hung-up?
Did you try taking the clock speed down in BIOS (this fixed a lot of stability issues with some eboxes as far as I remember)?
When a lock-up appears, is it totally frozen (LEDs no more blinking/activity)? neither SSH nor HTTP login possible?

I don't know of storage leaks of running processes. Therefore, it is hard to imagine what part of Meteohub SW should be capable of bringing the complete system down.

Re: Out of Memory error on ebox-3300 after 30 days running

Posted: Sun Nov 21, 2010 8:14 pm
by wfpost
I was on a clock speed of 5, and have changed it to 4 a week ago.

When the system is freezing there´s no access via SSH or WEBGUI. No response to pings. Seems the whole network stack is down.
The harddisk activity LED lits.

I have a keyboard connected permanently and as far as I remember it does not response to the local keystrokes as well.

Since I had installed imagemagick for my webcam shoots, the system partition is on 93%.
Could this be a problem?
BTW: The additional load by imagemagick every 10 minutes is not a problem, because the systems also fails at night when IM is not used; and that part is only running until sunset.

Strange, I know!
Speicherplatz
Swap: 2MB von 196MB belegt (1%)
System: 709MB von 755MB belegt (93%)
Daten: 2450MB von 6576MB belegt (37%)
top - 19:28:04 up 1 day, 4:10, 1 user, load average: 1.05, 1.73, 1.75
Tasks: 62 total, 2 running, 60 sleeping, 0 stopped, 0 zombie
Cpu(s): 10.5%us, 1.9%sy, 69.7%ni, 16.9%id, 0.9%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 483892k total, 475736k used, 8156k free, 1796k buffers
Swap: 200804k total, 3032k used, 197772k free, 454276k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
13813 root 30 10 3248 1288 524 R 96.4 0.3 0:01.84 wmr928eval
1 root 20 0 1940 628 536 S 0.0 0.1 0:05.26 init
2 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root 15 -5 0 0 0 S 0.0 0.0 0:00.16 ksoftirqd/0
4 root RT -5 0 0 0 S 0.0 0.0 0:00.24 watchdog/0
5 root 15 -5 0 0 0 S 0.0 0.0 0:12.94 events/0
6 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 khelper
41 root 15 -5 0 0 0 S 0.0 0.0 0:18.18 kblockd/0
44 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kacpid
45 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kacpi_notify
119 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kseriod
151 root 20 0 0 0 0 S 0.0 0.0 0:00.00 pdflush
152 root 20 0 0 0 0 S 0.0 0.0 0:04.32 pdflush
153 root 15 -5 0 0 0 S 0.0 0.0 0:39.42 kswapd0
154 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 aio/0
554 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 ksuspend_usbd
570 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 khubd
662 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 ata/0
668 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 ata_aux
889 root 15 -5 0 0 0 S 0.0 0.0 0:00.40 kjournald
1032 root 16 -4 2176 720 448 S 0.0 0.1 0:01.00 udevd
1326 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kpsmoused
1650 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 ksnapd
1719 root 15 -5 0 0 0 S 0.0 0.0 0:06.00 kjournald
1843 daemon 20 0 1680 368 268 S 0.0 0.1 0:00.00 portmap
2097 root 20 0 3664 996 736 S 0.0 0.2 0:00.08 meteoschedule
2133 root 20 0 1620 608 508 S 0.0 0.1 0:05.18 syslogd
2139 root 20 0 1572 376 304 S 0.0 0.1 0:00.02 klogd
2214 root 20 0 1568 556 476 S 0.0 0.1 0:00.00 acpid
2218 root 20 0 4164 1460 880 S 0.0 0.3 40:16.40 meteonet
2230 root 20 0 1964 548 432 S 0.0 0.1 0:00.00 rsync
2232 root 20 0 5868 1340 912 S 0.0 0.3 0:01.80 nmbd
2234 root 20 0 8940 2480 1736 S 0.0 0.5 0:00.06 smbd
2243 root 20 0 8940 1064 328 S 0.0 0.2 0:00.00 smbd
2247 root 20 0 4924 1080 748 S 0.0 0.2 0:00.00 sshd
2250 root 5 -15 2184 1100 620 S 0.0 0.2 0:10.04 thttpd
2252 root 5 -15 2144 812 556 S 0.0 0.2 0:12.12 thttpdbackup
2283 statd 20 0 1752 732 624 S 0.0 0.2 0:00.00 rpc.statd
2298 root 20 0 4120 1292 992 S 0.0 0.3 0:08.14 ntpd
2308 root 20 0 2188 780 596 S 0.0 0.2 0:04.92 cron
2316 root -2 0 1616 1616 1348 S 0.0 0.3 0:05.32 watchdog
2327 root 10 -10 3360 352 272 S 0.0 0.1 0:00.00 loggerd
2328 root 10 -10 5524 1316 764 S 0.0 0.3 5:32.78 loggerd
2358 root 20 0 1568 492 420 S 0.0 0.1 0:00.00 getty
2359 root 20 0 1568 492 420 S 0.0 0.1 0:00.00 getty
2361 root 20 0 1568 484 420 S 0.0 0.1 0:00.00 getty
2363 root 20 0 1568 488 420 S 0.0 0.1 0:00.00 getty
2365 root 20 0 1568 488 420 S 0.0 0.1 0:00.00 getty
2367 root 20 0 1568 488 420 S 0.0 0.1 0:00.00 getty
2459 root 10 -10 1704 468 392 S 0.0 0.1 0:08.54 meteosys
12689 root 20 0 7696 2356 1920 S 0.0 0.5 0:00.68 sshd
12692 root 20 0 5760 1744 1312 S 0.0 0.4 0:00.04 bash
13787 root 20 0 2524 876 664 S 0.0 0.2 0:00.00 cron
13788 root 20 0 2400 1052 884 S 0.0 0.2 0:00.00 sh
13789 root 20 0 1556 396 332 S 0.0 0.1 0:00.00 sleep
13790 root 20 0 2524 876 664 S 0.0 0.2 0:00.00 cron
13792 root 20 0 2400 1040 876 S 0.0 0.2 0:00.00 sh
13795 root 20 0 1556 396 332 S 0.0 0.1 0:00.00 sleep
13802 root 20 0 2524 876 664 S 0.0 0.2 0:00.00 cron
13806 root 20 0 2400 1032 868 S 0.0 0.2 0:00.00 sh
13810 root 30 10 2396 1044 884 S 0.0 0.2 0:00.00 histeval0
13818 root 20 0 2220 984 752 R 0.0 0.2 0:00.02 top

Re: Out of Memory error on ebox-3300 after 30 days running

Posted: Sun Nov 21, 2010 8:22 pm
by admin
I experienced occasional freezes when having connected USB webcams to my eboxes. Looks like gspca low level drivers operating as kernel modules do kill the system from time to time. Therefore, I don't recommend using USB cams on Meteohubs anymore.

Re: Out of Memory error on ebox-3300 after 30 days running

Posted: Sun Nov 21, 2010 8:32 pm
by wfpost
I don´t have a USB cam connected. Only VP2 and RFX.

Meteohub downloads a webcam shot every 10 minutes from my IP-webcam and manipulates it with imagemagick and stores it on the meteohub system.

Re: Out of Memory error on ebox-3300 after 30 days running

Posted: Sun Nov 21, 2010 8:47 pm
by admin
ok, that shouldn't be a problem. Looking at your processes shows that much RAM is consumed, but none of the processes listed seems to take that portion. Do I have a chance to login?

Re: Out of Memory error on ebox-3300 after 30 days running

Posted: Sun Nov 21, 2010 9:02 pm
by wfpost
Just sent you the login detail

Re: Out of Memory error on ebox-3300 after 30 days running

Posted: Tue Nov 23, 2010 10:08 pm
by HeinrichH
wfpost wrote: Meteohub downloads a webcam shot every 10 minutes from my IP-webcam and manipulates it with imagemagick and stores it on the meteohub system.
How do you set up Meteohub to download a shot from a IP camera?

Re: Out of Memory error on ebox-3300 after 30 days running

Posted: Tue Nov 23, 2010 10:27 pm
by wfpost
depends on the webcam
my level-one FCS-1060
allows ftpdownload
in bold the command for downloading the image from the webcam
rest is grabbing some current values from the VP2 and print it with imagemagick into the picture.

if you click on Heute you see the final result ...
http://honsolgen.de/
#!/bin/bash
export LANG=de_DE.UTF-8
/usr/bin/ncftpget -C -u root -p password ftp://192.168.1.141/snap_00.jpg /data/myweb/webcam.jpg

cd /data/myweb
nc localhost 5558 > sensordata.txt
TEXT=$(awk '/actual_th0_temp_c/' sensordata.txt > sensoractual.txt)
CAUSSEN=$(sed -e 's/actual_th0_temp_c/Temperatur: /g' < sensoractual.txt)
DATUM=$(date +'%d/%m/%Y')
CAUSSEN=" $DATUM | $CAUSSEN"
TEXT=$(awk '/actual_wind0_speed_kmh/' sensordata.txt > sensorwind.txt)
WINDSPEED=$(sed -e 's/actual_wind0_speed_kmh/Wind: /g' < sensorwind.txt)
TEXT=$(awk '/actual_wind0_dir_de/' sensordata.txt > sensorwinddir.txt)
WINDDIR=$(awk 'NR==1 { print $2 }' sensorwinddir.txt)
WINDRICHTUNG=$(awk 'NR==2 { print $2 }' sensorwinddir.txt)
DAYNIGHT=$(awk 'NR==52 { print $2 }' sensordata.txt)
TWILIGHT=$(awk 'NR==51 { print $2 }' sensordata.txt)
CT=${TWILIGHT%:*}
CIVILTWILIGHT=$[$CT -1]
CURRENTHOUR=$(awk 'NR==20 { print $2 }' sensordata.txt)

OUTPUT=webcam.jpg
convert webcam.jpg -font /usr/share/fonts/truetype/DejaVuSans-Bold.ttf -pointsize 12 -fill yellow -undercolor '#000000' -gravity None -annotate +156+12 " | $CAUSSEN\°\C | $WINDSPEED km/h - $WINDRICHTUNG - $WINDDIR\° " "$OUTPUT"
convert webcam.jpg -crop 640x270+0+0 "$OUTPUT"
for file in webcam.jpg; do cp -p -- "$file" "$(ls -l --time-style=long-iso "$file" | cut -d " " -s -f 6,7 | sed "s/:/-/g" | sed "s/ /_/g")_webcam.jpg"; done