Friday, July 21, 2017

Read all files of SQL Server's drive in a Tree using recursive CTE

In this blog I'll use undocumented in SQL Server extended stored procedure "xp_dirtree".
That procedure is well described in Patrick Keisler's blog.

Briefly, "xp_dirtree" extended procedure returns tree of sub-directories from given folder and has three parameters:
1. Starting or root folder
2. Depth level - determines how deep SQL Server will dig into a sub-directory structure. If it set to zero or omitted whole tree will be returned.
3. If omitted or set to zero will return only folders .If set to "1" will return files as well.

The easiest examples would be like this:
EXEC master.sys.xp_dirtree 'C:\';
EXEC master.sys.xp_dirtree 'C:\',0,1; EXEC master.sys.xp_dirtree 'C:\TEST';

The Result might look like something like this and be not very useful.

So, in order to make it more useful we have to read drive content into a table and then build a directory tree from it:
CREATE TABLE #dirtree (
 Dir VARCHAR(256), 
 Depth TinyInt, 
 IsFile Bit, 
 ID INT IDENTITY(1,1) PRIMARY KEY);
GO
CREATE TABLE #fulltree (
 Dir VARCHAR(256), 
 Depth TinyInt, 
 IsFile Bit, 
 Parent INT, 
 ID INT PRIMARY KEY);
GO
INSERT INTO #dirtree(Dir, Depth, IsFile) 
EXEC master.sys.xp_dirtree 'C:\',0,1;
GO
INSERT INTO #fulltree(Dir, Depth, IsFile, Parent, ID)
SELECT Dir, Depth, IsFile, p.MID as Parent, ID
FROM #dirtree as d WITH (NOLOCK)
OUTER APPLY (SELECT MAX(ID) as MID FROM #dirtree as i WITH (NOLOCK)
 WHERE i.Depth + 1 = d.Depth and i.ID < d.ID ) as p;
GO
CREATE NONCLUSTERED INDEX #fulltree_parent ON #fulltree([Parent])
GO
;WITH RecCTE as (
 SELECT CAST('C:\' + Dir as VARCHAR(MAX)) as Dir
  , Depth, ID, IsFile, Dir as FName
 FROM #fulltree WHERE Parent is Null
 UNION ALL
 SELECT CAST(p.Dir + '\' + d.Dir as VARCHAR(MAX))
  , d.Depth, d.ID, d.IsFile, d.Dir as FName
 FROM #fulltree as d INNER JOIN RecCTE as p ON d.Parent = p.ID
) SELECT Dir, Depth, IsFile, FName FROM RecCTE ORDER BY ID
The script is using two temporary tables. First is for data extraction and second for data manipulation.
In case of very small tree and small number of files it is possible to simplify query by using only one table and avoiding intermediate update, but if you extract structure of a regular drive it might take a while to produce a result.

In my case the result looks like this. More than 200K items and most of them in "Windows" folder:

To test my script in the extreme conditions I've created test folder and bunch of underlying sub-folders and then run following script against test folder:
DROP TABLE #dirtree 
GO
CREATE TABLE #dirtree (
 Dir VARCHAR(256), 
 Depth TinyInt, 
 Parent INT, 
 ID INT IDENTITY(1,1) PRIMARY KEY);
GO
INSERT INTO #dirtree(Dir, Depth) 
EXEC master.sys.xp_dirtree 'C:\T\';
GO
UPDATE d SET Parent = p.MID
FROM #dirtree as d WITH (NOLOCK)
OUTER APPLY (SELECT MAX(ID) as MID FROM #dirtree as i WITH (NOLOCK)
 WHERE i.Depth + 1 = d.Depth and i.ID < d.ID ) as p;
GO
;WITH RecCTE as (
 SELECT CAST('C:\T\' + Dir as VARCHAR(MAX)) as Dir, Depth, ID
 FROM #dirtree WHERE Parent is Null
 UNION ALL
 SELECT CAST(p.Dir + '\' + d.Dir as VARCHAR(MAX)), d.Depth, d.ID
 FROM #dirtree as d INNER JOIN RecCTE as p ON d.Parent = p.ID
) SELECT Dir, Depth FROM RecCTE ORDER BY ID

I was not surprised when I've got following error message:
Msg 530, Level 16, State 1, Line 60
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.
I've built too many sub-folders that my recursive CTE couldn't handle so many levels of depth.

To fix this problem I've had to increase maximum recursion level by adding one more statement to my last query:
OPTION (MAXRECURSION 125)
And I've got following results: