SQL Server Bulk insert of CSV file with inconsistent quotes
Is it possible to BULK INSERT (SQL Server) a CSV file in which the fields are only OCCASSIONALLY surrounded by quotes? Specifically, quotes only surround those fields that contain a “,”.
In other words, I have data that looks like this (the first row contain headers):
id, company, rep, employees
729216,INGRAM MICRO INC.,"Stuart, Becky",523
729235,"GREAT PLAINS ENERGY, INC.","Nelson, Beena",114
721177,GEORGE WESTON BAKERIES INC,"Hogan, Meg",253
Because the quotes aren’t consistent, I can’t use ‘”,”‘ as a delimiter, and I don’t know how to create a format file that accounts for this.
I tried using ‘,’ as a delimter and loading it into a temporary table where every column is a varchar, then using some kludgy processing to strip out the quotes, but that doesn’t work either, because the fields that contain ‘,’ are split into multiple columns.
Unfortunately, I don’t have the ability to manipulate the CSV file beforehand.
Is this hopeless?
Many thanks in advance for any advice.
By the way, i saw this post SQL bulk import from csv, but in that case, EVERY field was consistently wrapped in quotes. So, in that case, he could use ‘,’ as a delimiter, then strip out the quotes afterwards.


xequnsruuh on Mar 05, 2013
You are going to need to preprocess the file, period.
If you really really need to do this, here is the code. I wrote this because I absolutely had no choice. It is utility code and I’m not proud of it, but it works. The approach is not to get SQL to understand quoted fields, but instead manipulate the file to use an entirely different delimiter.
EDIT: Here is the code in a github repo. It’s been improved and now comes with unit tests! https://github.com/chrisclark/Redelim-it
This function takes an input file and will replace all field-delimiting commas (NOT commas inside quoted-text fields, just the actual delimiting ones) with a new delimiter. You can then tell sql server to use the new field delimiter instead of a comma. In the version of the function here, the placeholder is <TMP> (I feel confident this will not appear in the original csv – if it does, brace for explosions).
Therefore after running this function you import in sql by doing something like:
And without further ado, the terrible, awful function that I apologize in advance for inflicting on you (edit – I’ve posted a working program that does this instead of just the function on my blog here):
rajesh on Mar 05, 2013
It isn’t possible to do a bulk insert for this file, from MSDN:
To be usable as a data file for bulk import, a CSV file must comply with the following restrictions:
(http://msdn.microsoft.com/en-us/library/ms188609.aspx)
Some simple text processing should be all that’s required to get the file ready for import. Alternatively your users could be required to either format the file according to the se guidelines or use something other than a comma as a delimiter (e.g |)
pharme86 on Mar 05, 2013
I have also created a function to convert a CSV to a usable format for Bulk Insert. I used the answered post by Chris Clark as a starting point to create the following C# function.
I ended up using a regular expression to find the fields. I then recreated the file line by line, writing it to a new file as I went, thus avoiding having the entire file loaded into memory.
fernando-correia on Mar 05, 2013
An alternate method–assuming you don’t have a load of fields or expect a quote to appear in the data itself would be to use the REPLACE function.
I have used it. I can’t make any claims regarding performance. It is just a quick and dirty way to get around the problem.
harry on Mar 05, 2013
This might be more complicated or involved than what your willing to use, but …
If you can implement the logic for parsing the lines into fields in VB or C#, you can do this using a CLR table valued function (TVF).
A CLR TVF can be a good performing way to read data in from external source when you want to have some C# or VB code separate the data into columns and/or adjust the values.
You have to be willing to add a CLR assembly to your database (and one that allows external or unsafe operations so it can open files). This can get a bit complicated or involved, but might be worth it for the flexibility you get.
I had some large files that needed to be regularly loaded to tables as fast as possible, but certain code translations needed to be performed on some columns and special handling was needed to load values that would have otherwise caused datatype errors with a plain bulk insert.
In short, a CLR TVF lets you run C# or VB code against each line of the file with bulk insert like performance (although you may need to worry about logging). The example in the SQL Server documentation lets you create a TVF to read from the event log that you could use as a starting point.
Note that the code in the CLR TVF can only access the database in an init stage before the first row is processed (eg. no lookups for each row – you use a normal TVF on top of this to do such things). You don’t appear to need this based on your question.
Also note, each CLR TVF must have its output columns explicitly specified, so you can’t write a generic one that is reusable for each different csv file you might have.
You could write one CLR TVF to read whole lines from the file, returning a one column result set, then use normal TVFs to read from that for each type of file. This requires the code to parse each line to be written in T-SQL, but avoids having to write many CLR TVFs.
johan-offer on Mar 05, 2013
I found the answer by Chris very helpful, but I wanted to run it from within SQL Server using T-SQL (and not using CLR), so I converted his code to T-SQL code. But then I took it one step further by wrapping everything up in a stored procedure that did the following:
For my needs, I further cleaned up the lines by removing quotes around values and converting two double quotes to one double quote (I think that’s the correct method).
The stored procedure makes use of this helper function that parses a string into a table (thanks Erland Sommarskog!):
Here’s how I call it from T-SQL. In this case, I’m inserting the results into a temp table, so I create the temp table first:
I haven’t tested the performance much, but it works well for what I need – importing CSV files with less than 1000 rows. However, it might choke on really large files.
Hopefully someone else also finds it useful.
Cheers!
arora on Mar 05, 2013
I found few issues while having ‘,’ inside our fields like Mike,”456 2nd St, Apt 5″.
Solution to this issue is @ http://crazzycoding.blogspot.com/2010/11/import-csv-file-into-sql-server-using.html
Thanks,
- Ashish
johan-offer on Mar 05, 2013
Chris,
Thanks a bunch for this!! You saved my biscuits!! I could not believe that bulk loader wouldn’t handle this case when XL does such a nice job..don’t these guys see eachother in the halls???
Anyway…I needed a ConsoleApplication version so here is what I hacked together. It’s down and dirty but it works like a champ! I hardcoded the delimiter and commented out the header as they were not needed for my app.
I wish I could also paste a nice big beer in here for ya too.
Geeze, I have no idea why the End Module and Public Class are outside the code block…srry!
rajesh on Mar 05, 2013
You should be able to specifiy not only the field separator, which should be [,] but also the text qualifier, which in this case would be ["]. Using [] to enclose that so there’s no confusion with “.