Sorting data and averaging multiple and uneven numbers of rows
Hi Folks, I have a pressing problem. I have a large data table with about 11000 rows. It represents the hourly wind speed and direction data for a weather station. There are several things that I need to do with this data.
1. Overall I would like to obtain an average the daily data for Wind speed
2. Similarly for Wind secondly direction if possible.
The complication is that the data sequences appear to be disrupted or inconsistent. If the data were single entries for each hour, there should be 8724 rows as 2012 was a leap year.
a) However, thre are actually 10970 rows, so there may be duplicate entries. How do I clean this up? to remove duplicates
b) Some days have entries of data every hour, others have missed hours, i.e. every other hour. How do I average according to the date only (note teh entries are logged by a date and time field)
The table looks like this at teh start:
ob_time | version_num | wind_direction | wind_speed/knots | wind speed ms-1 |
01/01/2012 00:00 | 1 | 220 | 17 | 8.75 |
01/01/2012 01:00 | 1 | 210 | 19 | 9.77 |
01/01/2012 02:00 | 1 | 240 | 17 | 8.75 |
01/01/2012 03:00 | 1 | 230 | 15 | 7.72 |
01/01/2012 04:00 | 1 | 230 | 13 | 6.69 |
01/01/2012 05:00 | 1 | 230 | 14 | 7.20 |
01/01/2012 06:00 | 1 | 220 | 16 | 8.23 |
01/01/2012 07:00 | 1 | 230 | 17 | 8.75 |
01/01/2012 08:00 | 1 | 220 | 16 | 8.23 |
01/01/2012 09:00 | 1 | 220 | 14 | 7.20 |
01/01/2012 10:00 | 1 | 220 | 13 | 6.69 |
01/01/2012 11:00 | 1 | 220 | 16 | 8.23 |
01/01/2012 12:00 | 1 | 230 | 17 | 8.75 |
01/01/2012 13:00 | 1 | 230 | 13 | 6.69 |
01/01/2012 14:00 | 1 | 230 | 13 | 6.69 |
01/01/2012 15:00 | 1 | 230 | 11 | 5.66 |
01/01/2012 16:00 | 1 | 220 | 12 | 6.17 |
01/01/2012 17:00 | 1 | 220 | 14 | 7.20 |
01/01/2012 18:00 | 1 | 210 | 12 | 6.17 |
01/01/2012 19:00 | 1 | 240 | 8 | 4.12 |
01/01/2012 20:00 | 1 | 230 | 12 | 6.17 |
01/01/2012 21:00 | 1 | 240 | 13 | 6.69 |
01/01/2012 22:00 | 1 | 240 | 17 | 8.75 |
01/01/2012 23:00 | 1 | 230 | 14 | 7.20 |
but at the end, it looks like this...
29/12/2012 20:00 | 1 | 210 | 17 | 8.75 |
29/12/2012 22:00 | 1 | 220 | 17 | 8.75 |
30/12/2012 00:00 | 1 | 230 | 16 | 8.23 |
30/12/2012 02:00 | 1 | 220 | 16 | 8.23 |
30/12/2012 04:00 | 1 | 230 | 12 | 6.17 |
30/12/2012 06:00 | 1 | 230 | 14 | 7.20 |
30/12/2012 08:00 | 1 | 220 | 15 | 7.72 |
30/12/2012 10:00 | 1 | 220 | 16 | 8.23 |
30/12/2012 12:00 | 1 | 230 | 15 | 7.72 |
30/12/2012 14:00 | 1 | 240 | 18 | 9.26 |
30/12/2012 16:00 | 1 | 240 | 20 | 10.29 |
30/12/2012 18:00 | 1 | 230 | 17 | 8.75 |
30/12/2012 19:00 | 1 | 230 | 17 | 8.75 |
30/12/2012 20:00 | 1 | 230 | 14 | 7.20 |
30/12/2012 22:00 | 1 | 230 | 17 | 8.75 |
31/12/2012 00:00 | 1 | 220 | 20 | 10.29 |
31/12/2012 02:00 | 1 | 220 | 23 | 11.83 |
31/12/2012 04:00 | 1 | 210 | 21 | 10.80 |
31/12/2012 06:00 | 1 | 220 | 17 | 8.75 |
31/12/2012 08:00 | 1 | 210 | 21 | 10.80 |
31/12/2012 11:00 | 1 | 220 | 17 | 8.75 |
31/12/2012 13:00 | 1 | 250 | 17 | 8.75 |
31/12/2012 15:00 | 1 | 260 | 15 | 7.72 |
31/12/2012 17:00 | 1 | 240 | 12 | 6.17 |
31/12/2012 19:00 | 1 | 220 | 9 | 4.63 |
31/12/2012 21:00 | 1 | 250 | 10 | 5.14 |
31/12/2012 23:00 | 1 | 250 | 12 | 6.17 |
sorry - it seems to have lost the formatting - not sure why, but the point is to show that the later entries are different to those at teh top - note the observation times.
I know this is a tough one, but woudl appreciate any help that could be given.
cheers
John
MacBook Pro (13-inch, Mid 2012), OS X Mountain Lion (10.8.2)