Looks like no one’s replied in a while. To start the conversation again, simply ask a new question.

How to traverse excel files/sheets within a folder to delete specified columns.

I am working on Excel. We have a little over 25,000 excel WORKBOOKS that need 2 columns deleted from each of their sheets within. (Meaning if all 25k had 1 sheet, we should be deleting 50k columns.) But most of them have multiple sheets in them.


So far this is what I have...


set sourcefilepath to (choose folder) as alias

tell application "Finder" to set sourcefiles to ¬
	(every file in the folder sourcefilepath whose ¬
		name ends with ".xlsx") as alias list

tell application "Finder"
	set filecount to count files in the entire contents of sourcefilepath
	tell application "Microsoft Excel" to open the sourcefiles
	repeat filecount times
		set myValues to "Prod Alpha, Warehouse No"
		tell application "Microsoft Excel"
			activate
			set mysheets to every sheet of active workbook
			repeat with asheet in mysheets
				tell asheet
				set columnCount to (((count of columns of used range of active sheet)) + 1)
repeat with i from columnCount to 1 by -1
						set cellValue to value of row 1 of column i
						if cellValue is in myValues then
							delete column i
						end if
					end repeat
				end tell
			end repeat
			close active workbook
		end tell
	end repeat
end tell


The problem is when it opens all the sheets, everything (for obvious reasons) slows down. Then it gets about half way through the files ands stops. Leaving 1/2 still open with no changes.


Is there a reason? and is there a better why I can structure this to maybe open 1, do the operations, save it, close it, then move to another?

Posted on Jan 19, 2021 7:14 AM

Reply
Question marked as Best reply

Posted on Jan 19, 2021 10:20 AM

TechnicalSupportK wrote:


Is there a reason? and is there a better why I can structure this to maybe open 1, do the operations, save it, close it, then move to another?


Opening 25,000 files at once sounds like a nightmare. As you suggest, it would be much better to open them one at a time. I don't have Excel so I can't test this, but give it a try (perhaps on a smaller sample). It assumes that all the files are in the same folder (rather than nested in subfolders).


tell application "Finder" to set sourcefiles to ¬
	(every file of (choose folder) whose ¬
		name extension is "xlsx") as alias list


	set filecount to count sourcefiles
	repeat with each_file from 1 to filecount
	
	
	tell application "Microsoft Excel" 
	activate
	open file each_file of sourcefiles -- (this might need to be item each_file of sourcefiles)
	set myValues to "Prod Alpha, Warehouse No"
			set mysheets to every sheet of active workbook
			repeat with asheet in mysheets
				tell asheet
				set columnCount to (((count of columns of used range of active sheet)) + 1)
                   repeat with i from columnCount to 1 by -1
						set cellValue to value of row 1 of column i
						if cellValue is in myValues then
							delete column i
						end if
					end repeat
				end tell
			end repeat
			close active workbook with saving
		end tell
	end repeat

5 replies
Question marked as Best reply

Jan 19, 2021 10:20 AM in response to TechnicalSupportK

TechnicalSupportK wrote:


Is there a reason? and is there a better why I can structure this to maybe open 1, do the operations, save it, close it, then move to another?


Opening 25,000 files at once sounds like a nightmare. As you suggest, it would be much better to open them one at a time. I don't have Excel so I can't test this, but give it a try (perhaps on a smaller sample). It assumes that all the files are in the same folder (rather than nested in subfolders).


tell application "Finder" to set sourcefiles to ¬
	(every file of (choose folder) whose ¬
		name extension is "xlsx") as alias list


	set filecount to count sourcefiles
	repeat with each_file from 1 to filecount
	
	
	tell application "Microsoft Excel" 
	activate
	open file each_file of sourcefiles -- (this might need to be item each_file of sourcefiles)
	set myValues to "Prod Alpha, Warehouse No"
			set mysheets to every sheet of active workbook
			repeat with asheet in mysheets
				tell asheet
				set columnCount to (((count of columns of used range of active sheet)) + 1)
                   repeat with i from columnCount to 1 by -1
						set cellValue to value of row 1 of column i
						if cellValue is in myValues then
							delete column i
						end if
					end repeat
				end tell
			end repeat
			close active workbook with saving
		end tell
	end repeat

Jan 19, 2021 12:47 PM in response to TechnicalSupportK

HD is right in showing you how to open and process the files serially, rather than trying to open all 25k files at once, so that's a start, but I think there's still a logic flaw you need to address.


Specifically the line where you setup the column names to look for:


	set myValues to "Prod Alpha, Warehouse No"


and you subsequently check to see if the first cell in each column matches:


						if cellValue is in myValues then


This will only match on a single cell value of "Prod Alpha, Warehouse No", whereas I think you want it to match either column "Prod Alpha" OR "Warehouse No". On that basis you should define myValues as a list:


	set myValues to {"Prod Alpha", "Warehouse No"}


Now the script will match on either column name.

Jan 19, 2021 2:58 PM in response to TechnicalSupportK

One more thing that might speed things up.


open file each_file of sourcefiles
	set myValues to {"Prod Alpha", "Warehouse No"} -- © Camelot
			set mysheets to every sheet of active workbook
			repeat with asheet in mysheets
				tell asheet
				set columnCount to (((count of columns of used range of active sheet)) + 1)
                   set delete_count to 0
				   repeat with i from columnCount to 1 by -1
						set cellValue to value of row 1 of column i
						if cellValue is in myValues then
							delete column i
							set delete_count to delete_count +1
						end if
						if delete_count = 2 then
						set delete_count to 0
						exit repeat
						end if
					end repeat
				end tell
			end repeat
			close active workbook with saving



Again, I can't test it, but if your sheets have a large number of columns and the ones you want to delete are towards the right-hand edge, this should call off the search immediately after the second deletion and move on to the next file.

Jan 20, 2021 1:44 PM in response to HD

While we're looking for optimizations, two more come to mind.


If there are a large number of columns, then iterating through the top cell in each column may take a while. It *might* be faster to have Excel search for a cell in the top row with the relevant header and use that as a reference for which column to delete - I have to assume that Excel's search is faster than AppleScript getting each cell.


Secondly, if there are any workbooks that don't have the columns on any sheet, there's no point in saving the workbook after no changes were made. Therefore it may be beneficial to keep a flag as to whether any columns were deleted, and use that flag to determine whether to save the workbook or not.


Just a couple of ideas. :)

How to traverse excel files/sheets within a folder to delete specified columns.

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple ID.