Wednesday, March 23, 2011

Reading and Writing Files in SQL Server using T-SQL

SQL Server has never been short of ways to read from and write to files and it is always better to use the standard techniques provided by SQL Server where possible. However, most of them are really designed for reading and writing tabular data and aren't always trouble-free when used with large strings or relatively unstructured data.
For reading tabular data from a file, whether character-delimited or binary, there is nothing that replaces the hoary old Bulk Copy Program (BCP), which underlies more esoteric methods such as Bulk Insert. It is possible to read text-based delimited files with ODBC, simple files can be read and written-to using xp_cmdshell, and you will find that OSQL is wonderful for writing results to file, but occasionally I've found I need to do more than this.
Thankfully, when armed with OLE Automation and the FileSystem Object (FSO), all sorts of things are possible. The FileSystem Object was introduced into Windows to provide a single common file-system COM interface for scripting languages. It provides a number of handy services that can be accessed from TSQL. In this article, I provide examples of stored procedures that use this interface to allow you to:
  • Read lines of text from a file
  • Read a file into a SQL Server data type
  • Write out to a file
  • Get file details
  • Tidy up XML, XHTML or HTML code
I'll provide a few details on the FSO along the way, but let's start with examples of some of these procedures in action. You'll need to enable OLE Automation on your test server in order to follow along.

Reading lines from a file

I have often pined for a simple function that will read information a line at a time, and to present to me a 'fake table' where each line of text is a row, with a primary key based on the line number. With such a function, one can then do one's own parsing and checking of data.
Well, here it is. Create the uftReadFileAsTable stored procedure in your test database, and try it out with something like:
 
Select line from
 Dbo.uftReadfileAsTable('MyPath','MyFileName')
where line not like '#%'
--where line doesnt begin with a hash
Just fill in an existing file name and path to the file you wish to read, instead of 'MyPath' and 'MyFileName', and away you go.
This is a method I use for reading web logs and gathering usage statistics. It is also useful where the data feed has to be validated before one can parse it into the final SQL data format.

Reading a file into a SQL Server data type

This is all very well, but how about something that reads a file in one gulp into a varchar or XML datatype? Perhaps you need to extract data from HTML, XHTML or some other format. Create the ufsReadfileAsString procedure and try something like…
Select dbo.ufsReadfileAsString ('MyPath','MyFileName')

Writing out a file

No problem – just create spWriteStringToFile and try:
execute spWriteStringToFile 'This article describes how to fully access the
local filesystem from SQL Server. It shows a
way of reading and writing data to file, and
accessing the details of the server's
filesystem using OLE Automation to access
the filesystem object'
, 'MyPath','MyFileName'
The path you use instead of 'MyPath' will have to exist, in this example.

Getting file details

If you need to find out the attributes of a particular file, then try out spFileDetails: