Skip navigation

Sorting data and averaging multiple and uneven numbers of rows

575 Views 20 Replies Latest reply: Feb 20, 2013 7:12 AM by diveo RSS
1 2 Previous Next
diveo Calculating status...
Currently Being Moderated
Feb 18, 2013 7:59 AM

Hi Folks, I have a pressing problem. I have a large data table with about 11000 rows. It represents the hourly wind speed and direction data for a weather station. There are several things that I need to do with this data.

 

1. Overall I would like to obtain an average the daily data for Wind speed

2. Similarly for Wind secondly direction if possible.

 

The complication is that the data sequences appear to be disrupted or inconsistent. If the data were single entries for each hour, there should be 8724 rows as 2012 was a leap year.

 

a) However, thre are actually 10970 rows, so there may be duplicate entries. How do I clean this up? to remove duplicates

 

b) Some days have entries of data every hour, others have missed hours, i.e. every other hour. How do I average according to the date only (note teh entries are logged by a date and time field)

 

The table looks like this at teh start:

 

 

ob_time

version_num

wind_direction

wind_speed/knots

wind speed ms-1

01/01/2012 00:00

1

220

17

8.75

01/01/2012 01:00

1

210

19

9.77

01/01/2012 02:00

1

240

17

8.75

01/01/2012 03:00

1

230

15

7.72

01/01/2012 04:00

1

230

13

6.69

01/01/2012 05:00

1

230

14

7.20

01/01/2012 06:00

1

220

16

8.23

01/01/2012 07:00

1

230

17

8.75

01/01/2012 08:00

1

220

16

8.23

01/01/2012 09:00

1

220

14

7.20

01/01/2012 10:00

1

220

13

6.69

01/01/2012 11:00

1

220

16

8.23

01/01/2012 12:00

1

230

17

8.75

01/01/2012 13:00

1

230

13

6.69

01/01/2012 14:00

1

230

13

6.69

01/01/2012 15:00

1

230

11

5.66

01/01/2012 16:00

1

220

12

6.17

01/01/2012 17:00

1

220

14

7.20

01/01/2012 18:00

1

210

12

6.17

01/01/2012 19:00

1

240

8

4.12

01/01/2012 20:00

1

230

12

6.17

01/01/2012 21:00

1

240

13

6.69

01/01/2012 22:00

1

240

17

8.75

01/01/2012 23:00

1

230

14

7.20

 

but at the end, it looks like this...

 

 

 

29/12/2012 20:00

1

210

17

8.75

29/12/2012 22:00

1

220

17

8.75

30/12/2012 00:00

1

230

16

8.23

30/12/2012 02:00

1

220

16

8.23

30/12/2012 04:00

1

230

12

6.17

30/12/2012 06:00

1

230

14

7.20

30/12/2012 08:00

1

220

15

7.72

30/12/2012 10:00

1

220

16

8.23

30/12/2012 12:00

1

230

15

7.72

30/12/2012 14:00

1

240

18

9.26

30/12/2012 16:00

1

240

20

10.29

30/12/2012 18:00

1

230

17

8.75

30/12/2012 19:00

1

230

17

8.75

30/12/2012 20:00

1

230

14

7.20

30/12/2012 22:00

1

230

17

8.75

31/12/2012 00:00

1

220

20

10.29

31/12/2012 02:00

1

220

23

11.83

31/12/2012 04:00

1

210

21

10.80

31/12/2012 06:00

1

220

17

8.75

31/12/2012 08:00

1

210

21

10.80

31/12/2012 11:00

1

220

17

8.75

31/12/2012 13:00

1

250

17

8.75

31/12/2012 15:00

1

260

15

7.72

31/12/2012 17:00

1

240

12

6.17

31/12/2012 19:00

1

220

9

4.63

31/12/2012 21:00

1

250

10

5.14

31/12/2012 23:00

1

250

12

6.17

 

 

 

 

 

 

sorry - it seems to have lost the formatting - not sure why, but the point is to show that the later entries are different to those at teh top - note the observation times.

 

I know this is a tough one, but woudl appreciate any help that could be given.

 

cheers

 

John

MacBook Pro (13-inch, Mid 2012), OS X Mountain Lion (10.8.2)
  • Badunit Level 6 Level 6 (10,775 points)

    If you sort by date (column A)  first you can use an IF formula in a new column to flag duplicates.

     

    =IF(A2=A1, "Duplicate","").

     

    You can copy/paste that to the entire column. But you have to sort the table by column A first, before creating this column of formulas. After you have the new column completed, you can then sort by it to get all the duplicates together so you can delete them all at once. I would recommend, though, that if your criteria for duplicates is based soley on the date&time (as I did in the formula above), you first look through the table and determine if they are actually duplicates versus two different data points with the same date and time (not sure how that could happen but I'd still recommend you look).

     

    To get an average of an unspecified number of items, use SUMIF/COUNTIF.  We can provide more detail if that isn't enough to get you going.

     

    There is another method for flagging duplicates using the COUNTIF function but I suspect it will be very very slow on a table of 11000 entries.

  • Badunit Level 6 Level 6 (10,775 points)

    That is strange.

    • What is the highlighting on the table in your screenshot?
    • Take a look and tell me what the formula is on the row for 01/01/2013 02:00 (3rd row down). What row is that entry in?
    • Is ob_time in column A? Is ob_time completely in column A, not column A is the date and column B is the time?  I can't tell these things from your screenshots.
  • Jerrold Green1 Level 7 Level 7 (28,290 points)

    John,

     

    I'm curious how you want to average the wind data. Do you want to average the vector wind velocity, or just the scalar values?

     

    If you want to mail me a copy of the data, I'll take a look at why Badunit's duplicate finding expression isn't working.

     

    Jerry

  • Badunit Level 6 Level 6 (10,775 points)

    I am mystified. You can email it to me, too. Click on my name to go to my profile to get my email address.

     

    I was suggesting the formula be slightly different. Row 3 would have =IF(A3=A2, "Duplicate","") . This way you can fill/paste the formula down to the bottom row of the table. With your version of it, the last row will be an error because it will be trying to reference a row that does not exist (the non-existent row below the last row). But that's a different problem.

     

    The two values (A3 and A4) are obviously different when you look at them. I'd baffled as to why the formula would not work. How about this as another test using three cells in that new column:

    1. In a cell put the formula =A4 . What's the result? Looks exactly like the value in A4?
    2. In another cell put the formula =A3 . What's the result? Looks exactly like the value in A3?
    3. In a third cell put =A4-A3 . Is the result 0d ?
  • Badunit Level 6 Level 6 (10,775 points)

    Regarding the average, if you have regularly spaced data (like hourly), the average is simple to do.  If you have missing data or unevenly spaced data, an average for the day might not be completely accurate.

     

    Time     Data

    1:00     10

    2:00     12

    3:00     13

    4:00     18

     

    • Average of all four = (10+12+13+18)/4 = 13.25
    • If missing the 3:00 data, the average = (10 +12+18)/3 = 13.33
    • If you try to interpolate to get the 3:00 data point you would get a different "average"
    • If you summed windspeed x "hours until next data point taken" & divided by the total number of hours you would get a different "average". With the missing 3:00 data it would be =(10*1 + 12*2 + 18*1)/4 = 13.00

     

    My suggestion is to do a simple average of the data points you have. If data points are missing, no method will be absolutely correct.

  • Jerrold Green1 Level 7 Level 7 (28,290 points)

    John,

     

    The most significant issue with your plan, I think, is the fact that a simple average of wind speed will have little bearing on the power produced over a 24-hour period, especially if the speed deviation is significant compared to the average.

     

    Here's a typical output curve (Power generated vs. Wind Speed)

     

    Screen Shot 2013-02-18 at 2.11.14 pm.png

    Notice that at low wind speeds you have a cube law effect, and at higher speeds the output is clipped because of limitations of the generator and pitch controls.

     

    Jerry

  • Jerrold Green1 Level 7 Level 7 (28,290 points)

    John,

     

    You've got double entries from the first of the year to the end of March. After that, there are just a few duplicates.

     

    I'll return your file with the dups removed.

     

    Jerry

  • Jerrold Green1 Level 7 Level 7 (28,290 points)

    Screen Shot 2013-02-18 at 7.18.45 pm.png

    This sort of presentation might be helpful in making your argument. This is your data with the duplicates removed.

     

    Jerry

1 2 Previous Next

Actions

More Like This

  • Retrieving data ...

Bookmarked By (0)

Legend

  • This solved my question - 10 points
  • This helped me - 5 points
This site contains user submitted content, comments and opinions and is for informational purposes only. Apple disclaims any and all liability for the acts, omissions and conduct of any third parties in connection with or related to your use of the site. All postings and use of the content on this site are subject to the Apple Support Communities Terms of Use.