score:1

I think there is no easy answer in your case, but you have several possibilities. Regex can be a simple solution if you do not want to get the full tags. A more deeper but more complex would be to use a package like HtmlAgilityPack to parse your mail.

Here is an example of the regex :

var searchWord = "strong";
var mail = "<strong>blablabla</strong><p>blabla strong blabla</p>";
var rgx = new Regex($"(?!<){searchWord}(?!>)"); // Will match strong but not <strong> or <strong or strong>
if (rgx.IsMatch(mail))
{
    // Do what you want
}

score:-1

You can use the regex function as am using below:

"<div>Hello</div><span>World</span>".replace(/<[^>]*>/g, '')

score:1

See if you can use free and open source HtmlAgilityPack through which you can first convert the html text to plain text and then apply the search criteria.

E.g.: var plainTextResult = HtmlUtilities.ConvertToPlainText(string html);

if(!string.IsNullOrWhiteSpace(searchText))
{
    bool containsResult = plainTextResult.Contains(searchText);
}

score:1

Thanks to @Amine and @lollmbaowtfidgafgtfoohwtbs, I figured out how to do this.

First, I created a SQL function in my database that strips a given text:


CREATE FUNCTION [dbo].[ufnStripHTML] (@HTMLText NVARCHAR(MAX))
RETURNS NVARCHAR(MAX) AS
BEGIN
    DECLARE @Start INT
    DECLARE @End INT
    DECLARE @Length INT
    SET @Start = CHARINDEX('<',@HTMLText)
    SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))
    SET @Length = (@End - @Start) + 1
    WHILE @Start > 0 AND @End > 0 AND @Length > 0
    BEGIN
        SET @HTMLText = STUFF(@HTMLText,@Start,@Length,'')
        SET @Start = CHARINDEX('<',@HTMLText)
        SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))
        SET @Length = (@End - @Start) + 1
    END
    RETURN LTRIM(RTRIM(@HTMLText))
END
GO

Then I added a reference to that function in my DbContext:

        [DbFunction("ufnStripHTML")]
        public static string StripHTML(string text)
        {
            throw new Exception("not implemented");
        }

and now I can use it in my Linq to SQL query:


if (!string.IsNullOrWhiteSpace(searchText))
{
    query = query.Where(ent => TGDbContext.StripHTML(ent.Contents).Contains(searchText));
}

Related Articles