Reading/Writing to CSV in Matlab

When doing data analysis, many times your input will come in a .csv file, and you'll also want output in a .csv file.  Unfortunately, matlab's built-in csvread/csvwrite functions are exceptionally frustrating to use for anything than pure numeric input/output.  Luckily, writing your own code to read /csv files (especially with some knowledge of your desired input/output) is pretty easy.

Reading CSV

Say I have a file Data.csv that looks like:

FirstName, LastName, Age, GPA, Height
'John',    'Smith',  27,  1.4, 5.5
'Jane',    'Doe',    21,  3.5, 5.4
'John',    'Doe',    23,  3.3, 6.0
'Mike',    'Soltys', NA,  3.9, 5.5

And I want to load it into MATLAB.  A couple things to notice:

  1. The .csv has one header row.
  2. The .csv has mixed data types: Strings, Integers, and doubles
  3. The .csv has missing data (NA in fourth row). We'll need a special way to handle this. I like to read NA's as NaN's.

We can read in the data like so:

filename = 'Data.csv';
fid = fopen(filename,'rt');
[data]=textscan(fid, '%s %s %d %f %f',...
       'headerlines', 1,...
       'delimiter',',',...
       'TreatAsEmpty','NA',...
       'EmptyValue', NaN);                              
fclose(fid);

So whats going on here? I'm opening and closing the file using fopen and fclose. Next I'm using textscan to read the file, specified by
fid.

I'm giving a specifier to tell textscan what data types to look for (in this case, string string int float float). Finally I'm giving the code special instructions on what the file looks like (one header row and separated by commas), as well as how to handle missing data (in this case, I'm plugging them in as NaN).

If you'd like to convert to an array, you need to have all similar data types. In this example, I'll want to skip the students names using the '*' symbol, and read everything else as a float:

[data]=textscan(fid, '%*s %*s %f %f %f',...
       'headerlines', 1,...
       'delimiter',',',...
       'TreatAsEmpty','NA',...
       'EmptyValue', NaN);   
m = cell2mat(data)

The resulting output looks like:

m =
   27.0000    1.4000    5.5000
   21.0000    3.5000    5.4000
   23.0000    3.3000    6.0000
       NaN    3.9000    5.5000

Writing a CSV

Writing a CSV follows the same process.  You'll want to specify what you want your output to look like and covert data to a sell.  Continuing from the example above, say I want to rank the students, and output a csv with their last name, first name, GPA, and class rank. I also want a header row describing each column.

We'll start by opening the file and writing the header row:

fid = fopen('Output.csv','w');
C={'LastName','FirstName','GPA','Rank'};
fprintf(fileID,'%s,%s,%s,%s\n',C{:});

It's important to note that in the format specifier, I'm including the comma-delimiter between the fields and a newline command '\n' at the end. Next I need to define what my output will look like. Here's where you can specify the precision of numbers, and any leading zeros to make your CSV match your required specs. Here I want two strings, a 2 decimal GPA, and a 2 digit rank with leading zeros for 1 digit numbers

formatSpec = '%s,%s,%0.2f,%02d\n';
for row = 1:Len
  C={data{1,2}{row},data{1,1}{row},data{1,4}(row),rank(row)};
  fprintf(fileID,formatSpec,C{:});
end
fclose(fileID);

And that's all!