The technique I used to remove the duplicates is similar to Badunit's. I'll give you my approach:
Add a column to the left of the date/time column
In row 2 of the new column, write:
=OFFSET(A2, -1,1)=OFFSET(A2, 0,1)
Copy this cell and Paste to the rest of the column.
All the duplicate rows will be identified with TRUE.
Copy the entire aux column and Command-C.
Edit > Paste Values.
Sort the comparison column ascending.
Delete the rows containing TRUE.
Delete the Aux column.
I took a stab at this before reading the rest of the thread.
Looking at the results of copying your sample data from the thread, then pasting it into a Numbers table, my impression is that the Date and Time values are each in a single cell. Gven that, I'm surprised at the frequency of duplicates identified by Badunit's formula.
An issue with the gaps in the data is that dayswith measurements at one hour intervals will outweigh days with data at two hour intervals if all non-duplicate data is included in a simple average calculation.
My approach was to calculate an average for each date, then use that daily average in the calculation of an annual average. Example below uses on the sample data you provided, but will also work with the full table.
Column A has been added to contain the date of each sample, with the time of day for all samples on that date set to 00:00:00.
On the separate, small table, averages have been calculated for each date using AVERAGEIF. In a Footer row, the 'annual' average has been calculated using the Daily average data.
A2 and filled down (to A25): =DATEVALUE(B2)
A26 and filled down: =DATEVALUE(LEFT(B26,10))
The two formulas were necessary in my tables as Numbers recognized the first rows as date and time values, but saw the December values as text. This might be an artifact of my custom date format. You may need only the first formula.
Column A contains entered dates. these are Date and Time values, with the time set (by Numbers) to 00:00;00, used to match the D&T values in column A of the large table.
B2, filled down to B5: =AVERAGEIF(Table 2 :: $A,A2,Table 2 :: $F)
This calculates the average wind speed (in m/s) for each date in column A, giving a single value for that date.
This calculates the annual average, with equal weight given to each date on which measurements were recorded.
Row 6 is a Footer Row.
Thanks for joining in. The reason for averaging the daily wind data, is to allow a comparison with the average daily electricity consumption from a different data source. I cannot obtain consumption data with any greater resolution than daily, but the wind data (and ultimately the energy that can be derived from the wind via a turbine), has been supplied in hourly format.
For the purposes of comparing supply with demand a daily average will be adequate, as the aim of the work is to consider the surplus / deficit and mitigation measures, i.e. export to grid, storage capacity required or additional electical energy to be purchased. The assignment is to consider the financial viability of a community owned wind turbine.
Hope that is sufficient. I will try to mail the data to you separately, I am not clear on whether I can attach it here directly.
Once you have a handle on the data, you can do lots of analysis. The daily averages can be calculated using AVERAGEIFS. Here's an example:
The trend line is a 7-day moving average, so following that takes out some of the noise and gives you an idea of the trends by week.
I'd just like to extend a warm thank you to Jerrold, Barry and Badunit for your excellent help on this problem and your explanations of how you have worked through the solutions. It has been a busy couple of days, so I have just had the chance to look through the latest posts today and will work through your methodologies over the next day or so to understand the solutions for future use (I expect additional similar problems in the coming weeks).
Thanks once again and best regards