Another solution for this could be a CLR and RegEx if you are using SQL Server 2005 and above. I tried to create a simple class for CLR Regex. It takes a source string and regex expression.
It then returns all matched groups.
For your example you put your string and regular expression `\^(\d+)\|`
As mentioned. The funtion retruns all matched groups. We have in the expression one group. So in the result the group 0 represents the matched string and group 1 represents the ID we require.
The regex function is universal, and even when filtering th result I tried it with about 2 MB input string with nearly 100 000 results rows and even in this scenario I achieved a runtime around 1 sec.
With some optimalizations, the result coud be even better.
Edit: When simply modifying the ouput value and instead of `nvarchar(4000)` I put only `nvarchar(10)` then the CPU time and Total RunTime was much better then the results below for the 2 MB string (31000 replications) and also for the 15 000 replications.
public class SQLRegEx
{
private class RegExRow
{
public RegExRow(int rowId, int groupID, string value)
{
RowId = rowId;
GroupID = groupID;
Value = value;
}
public int RowId;
public int GroupID;
public string Value;
}
[Microsoft.SqlServer.Server.SqlFunction(FillRowMethodName = "FillRegExRow")]
public static IEnumerable RegExMatch(string source, string pattern)
{
System.Text.RegularExpressions.Regex r = new System.Text.RegularExpressions.Regex(pattern);
int rowId = 0;
foreach (System.Text.RegularExpressions.Match m in r.Matches(source))
{
for (int i = 0; i < m.Groups.Count; i++)
{
yield return new RegExRow(++rowId, i, m.Groups[i].Value);
}
}
}
public static void FillRegExRow(Object obj, out int rowId, out int groupID, out System.Data.SqlTypes.SqlChars value)
{
RegExRow r = (RegExRow)obj;
rowId = r.RowId;
groupID = r.GroupID;
value = new System.Data.SqlTypes.SqlChars(r.Value);
}
}
Then when registering it DB and tested it.
CREATE ASSEMBLY [SQLRegEx]
AUTHORIZATION [dbo]
FROM 'C:\CLR\SQLRegEx.dll'
WITH PERMISSION_SET = SAFE
GO
CREATE FUNCTION dbo.fn_RegExMatch(@sourceString nvarchar(max), @pattern nvarchar(4000))
RETURNS TABLE
(
rowId int,
groupId int,
value nvarchar(4000)
)
AS
EXTERNAL NAME [SQLRegEx].[SQLRegEx].RegExMatch
GO
Executing it you will receive exactly what you expect.
SELECT
*
FROM dbo.fn_RegExMatch('2|200911_s^3^1988415|20091452_s^3^1988411|2009152_s^3^1988455|' ,'\^(\d+)\|')
WHERE groupId = 1
GO
rowId groupId value
----------- ----------- --------
2 1 1988415
4 1 1988411
6 1 1988455
With the about 2 MB string (31 192 replications of original string) the results were:
(93576 row(s) affected)
SQL Server Execution Times:
CPU time = 640 ms, elapsed time = 1126 ms.
For 15340 replications of the string the result is:
(46020 row(s) affected)
SQL Server Execution Times:
CPU time = 344 ms, elapsed time = 616 ms.
With nvarchar(10) as output:
(46020 row(s) affected)
SQL Server Execution Times:
CPU time = 140 ms, elapsed time = 585 ms.
↧