how to split csv file, keep header and add file extension - using Terminal

In terminal, the script below splits the CSV file and includes the header row in each file. BUT... I also need to add ".csv" (dot csv) to each file. Now I get a files like 'xaa' 'xab' 'xac' etc. I want 'xaa.csv' 'xab.csv' 'xac.csv' ... and so on. what needs to be added below and where does it need to be added please? Thank you, Dora


#!/bin/bash

FILENAME=split.csv

HDR=$(head -1 $FILENAME)

split -l 20000 $FILENAME xyz

n=1

for f in xyz*

do

if [ $n -gt 1 ]; then

echo $HDR > Part${n}.csv

fi

cat $f >> Part${n}.csv

rm $f

((n++))

done

iMac 27″, macOS 10.15

Posted on Dec 13, 2020 4:28 PM

Reply
8 replies

Dec 17, 2020 7:32 PM in response to dtarver13

I think I found the answer. FINALLY! :D The following splits the file, adds the header into each split and adds the file extension to each file. Whew! I tested it. Regards, ~Dora


#!/bin/bash

FILENAME=data.csv

HDR=$(head -1 $FILENAME)

split -l 20000 $FILENAME xyz

n=1

for f in xyz*

do

if [ $n -gt 1 ]; then

echo $HDR > Part${n}.csv

fi

cat $f >> Part${n}.csv

rm $f

((n++))

done

Dec 13, 2020 8:23 PM in response to dtarver13

It looks like you already have ".csv" on your file names. Do you see these file extensions when using the "ls" command to see the files in the Terminal? Or are the file extensions only hidden when viewing the files in the Finder? The Finder hides file extensions by default. I know that some macOS GUI apps will have an option to "show" the file extension when saving a file. However, I don't know what these GUI apps do to make the file extension visible in the Finder. The Finder also includes the ability to show all file extensions, but I find this annoying as it shows it for the Applications as well.


In the future please use the "Code Insertion" tool indicated by the "<>" icon when writing a post. The Code Insertion tool will allow better formatting & readability of posted code snippets.

Dec 15, 2020 6:45 PM in response to dtarver13

You should enclose your paths in double quotes in case any folders or the filename itself includes any spaces. You should also enclose the third line in double quotes as well to preserve spaces when assigning the output to the variable:

HDR="$(head -1  $FILENAME)"


I guess I misunderstood your request the when I responded previously. You need to modify the "split" command to add the file extension, but I don't know if the bash included with macOS supports the option which is:

 --additional-suffix=".csv"


So the "split" command becomes:

split -l 20000  --additional-suffix=".csv"  "$FILENAME"  xyz 


If macOS doesn't support this option to "split", then you will need to include this command to rename the "xyz" files by inserting this into your script immediately after the "split" line:

for i in xyz*
do
    mv "$i" "$i.csv"
done

Dec 17, 2020 11:06 AM in response to HWTech

Hi

I tried the following however still got file with no csv extension. Also the header in each file was missing. Terminal does not support the "--additional -suffix+".csv" command. hmmm... still looking for a solution.



#!/bin/bash

FILENAME=split.csv

HDR="$(head -1 $FILENAME)"

split -l 20000 "$FILENAME" xyz

for i in xyz*

do

mv "$i" "$i.csv"

done


n=1

for f in xyz*

do

if [ $n -gt 1 ]; then

echo $HDR > Part${n}.csv

fi

cat $f >> Part${n}.csv

rm $f

((n++))

done



Dec 17, 2020 7:55 PM in response to dtarver13

I tried both options I suggested in my earlier post on my Linux system. Both the xyz* files and the Part* files had .csv extensions. I cannot see that macOS wouldn't run this script correctly. The only difference between my script and yours is that I used a path to the files as I didn't want to risk accidentally creating or deleting files in an unexpected location. Here is my script for reference which works on Linux:


#!/bin/bash
FILENAME=split.csv
HDR="$(head -1 $FILENAME)"
#split -l 3 --additional-suffix=".csv"  $FILENAME  xyz

split -l 3   $FILENAME  xyz

for i in xyz*
do
    printf "Renaming $i to $i.csv\n"
    mv ./"$i" ./"$i.csv"
done


n=1
for f in xyz*
do
    if [ $n -gt 1 ]; then
        echo "$HDR" > ./Part"$n".csv
    fi
    cat $f >> ./Part"$n".csv
    rm -i "$f"
    ((n++))
done


Here is what happens when I run the script on Linux (I declined to delete the temporary files):

Files in folder before running sript....
split.csv
split-script.sh
****************************************
Running script.....
Renaming xyzaa to xyzaa.csv
Renaming xyzab to xyzab.csv
****************************************
Files in folder after running script....
Part1.csv
Part2.csv
split.csv
split-script.sh
xyzaa.csv
xyzab.csv


Here is the contents of the original "split.csv" file:

This is the header line.....
This is line# 1
This is line# 2
This is line# 3
This is line# 4
This is line# 5


Here is the contents of the "Part2.csv" file:

This is the header line.....
This is line# 3
This is line# 4
This is line# 5

This thread has been closed by the system or the community team. You may vote for any posts you find helpful, or search the Community for additional answers.

how to split csv file, keep header and add file extension - using Terminal

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple Account.