Quantcast
Channel: Answers for "Find substring saved between 2 characters"
Viewing all articles
Browse latest Browse all 10

Answer by Pavel Pawlowski

$
0
0
Another solution for this could be a CLR and RegEx if you are using SQL Server 2005 and above. I tried to create a simple class for CLR Regex. It takes a source string and regex expression. It then returns all matched groups. For your example you put your string and regular expression `\^(\d+)\|` As mentioned. The funtion retruns all matched groups. We have in the expression one group. So in the result the group 0 represents the matched string and group 1 represents the ID we require. The regex function is universal, and even when filtering th result I tried it with about 2 MB input string with nearly 100 000 results rows and even in this scenario I achieved a runtime around 1 sec. With some optimalizations, the result coud be even better. Edit: When simply modifying the ouput value and instead of `nvarchar(4000)` I put only `nvarchar(10)` then the CPU time and Total RunTime was much better then the results below for the 2 MB string (31000 replications) and also for the 15 000 replications. public class SQLRegEx { private class RegExRow { public RegExRow(int rowId, int groupID, string value) { RowId = rowId; GroupID = groupID; Value = value; } public int RowId; public int GroupID; public string Value; } [Microsoft.SqlServer.Server.SqlFunction(FillRowMethodName = "FillRegExRow")] public static IEnumerable RegExMatch(string source, string pattern) { System.Text.RegularExpressions.Regex r = new System.Text.RegularExpressions.Regex(pattern); int rowId = 0; foreach (System.Text.RegularExpressions.Match m in r.Matches(source)) { for (int i = 0; i < m.Groups.Count; i++) { yield return new RegExRow(++rowId, i, m.Groups[i].Value); } } } public static void FillRegExRow(Object obj, out int rowId, out int groupID, out System.Data.SqlTypes.SqlChars value) { RegExRow r = (RegExRow)obj; rowId = r.RowId; groupID = r.GroupID; value = new System.Data.SqlTypes.SqlChars(r.Value); } } Then when registering it DB and tested it. CREATE ASSEMBLY [SQLRegEx] AUTHORIZATION [dbo] FROM 'C:\CLR\SQLRegEx.dll' WITH PERMISSION_SET = SAFE GO CREATE FUNCTION dbo.fn_RegExMatch(@sourceString nvarchar(max), @pattern nvarchar(4000)) RETURNS TABLE ( rowId int, groupId int, value nvarchar(4000) ) AS EXTERNAL NAME [SQLRegEx].[SQLRegEx].RegExMatch GO Executing it you will receive exactly what you expect. SELECT * FROM dbo.fn_RegExMatch('2|200911_s^3^1988415|20091452_s^3^1988411|2009152_s^3^1988455|' ,'\^(\d+)\|') WHERE groupId = 1 GO rowId groupId value ----------- ----------- -------- 2 1 1988415 4 1 1988411 6 1 1988455 With the about 2 MB string (31 192 replications of original string) the results were: (93576 row(s) affected) SQL Server Execution Times: CPU time = 640 ms, elapsed time = 1126 ms. For 15340 replications of the string the result is: (46020 row(s) affected) SQL Server Execution Times: CPU time = 344 ms, elapsed time = 616 ms. With nvarchar(10) as output: (46020 row(s) affected) SQL Server Execution Times: CPU time = 140 ms, elapsed time = 585 ms.

Viewing all articles
Browse latest Browse all 10

Trending Articles