Skip to content

Instantly share code, notes, and snippets.

@ErikEJ
Last active March 30, 2025 14:29
Show Gist options
  • Save ErikEJ/6ab62e8b9c226ecacf02a5e5713ff7bd to your computer and use it in GitHub Desktop.
Save ErikEJ/6ab62e8b9c226ecacf02a5e5713ff7bd to your computer and use it in GitHub Desktop.
Replacement for EF Core .Contains, that avoids SQL Server plan cache pollution
using System.Linq.Expressions;
namespace Microsoft.EntityFrameworkCore
{
public static class IQueryableExtensions
{
public static IQueryable<TQuery> In<TKey, TQuery>(
this IQueryable<TQuery> queryable,
IEnumerable<TKey> values,
Expression<Func<TQuery, TKey>> keySelector)
{
ArgumentNullException.ThrowIfNull(values);
ArgumentNullException.ThrowIfNull(keySelector);
if (!values.Any())
{
return queryable.Take(0);
}
var distinctValues = Bucketize(values);
if (distinctValues.Length > 2048)
{
throw new ArgumentException("Too many parameters for SQL Server, reduce the number of parameters", nameof(keySelector));
}
var expr = CreateBalancedORExpression(distinctValues, keySelector.Body, 0, distinctValues.Length - 1);
var clause = Expression.Lambda<Func<TQuery, bool>>(expr, keySelector.Parameters);
return queryable.Where(clause);
}
private static BinaryExpression CreateBalancedORExpression<TKey>(TKey[] values, Expression keySelectorBody, int start, int end)
{
if (start == end)
{
var v1 = values[start];
return Expression.Equal(keySelectorBody, ((Expression<Func<TKey>>)(() => v1)).Body);
}
else if (start + 1 == end)
{
var v1 = values[start];
var v2 = values[end];
return Expression.OrElse(
Expression.Equal(keySelectorBody, ((Expression<Func<TKey>>)(() => v1)).Body),
Expression.Equal(keySelectorBody, ((Expression<Func<TKey>>)(() => v2)).Body));
}
else
{
int mid = (start + end) / 2;
return Expression.OrElse(
CreateBalancedORExpression(values, keySelectorBody, start, mid),
CreateBalancedORExpression(values, keySelectorBody, mid + 1, end));
}
}
private static TKey[] Bucketize<TKey>(IEnumerable<TKey> values)
{
var distinctValues = new HashSet<TKey>(values).ToArray();
var originalLength = distinctValues.Length;
int bucket = (int)Math.Pow(2, Math.Ceiling(Math.Log(originalLength, 2)));
if (originalLength == bucket) return distinctValues;
var lastValue = distinctValues[originalLength - 1];
Array.Resize(ref distinctValues, bucket);
distinctValues.AsSpan().Slice(originalLength).Fill(lastValue);
return distinctValues;
}
}
}
@ErikEJ
Copy link
Author

ErikEJ commented Feb 2, 2024

@juliowh I think I answered that question already?

@juliowh
Copy link

juliowh commented Feb 2, 2024

I read that answer, but did not understand why a bucket size of 32 performs better than one of size 20.
A sql plan is always "created" in sizes of multiple of 2?

@ErikEJ
Copy link
Author

ErikEJ commented Feb 2, 2024

@juliowh It is the number of unique query plans that matter, and a factor 2 seemed like suitable bucket sizes

@clement911
Copy link

That's very cool @ErikEJ !

We use a lot of composite keys and I wonder if would be possible to create another overload that works with composite keys?

So I guess the signature might be something like this:

public static IQueryable<TQuery> In<TKey, TQuery>(
this IQueryable<TQuery> queryable,
IEnumerable<Tuple<TKey1, TKey2>> values,
Expression<Func<TQuery, Tuple<TKey1, TKey2>>> keySelector)

And we might use it like this:

[PrimaryKey(nameof(State), nameof(LicensePlate))]
internal class Car
{
    public string State { get; set; }
    public string LicensePlate { get; set; }

    public string Make { get; set; }
    public string Model { get; set; }
}

var keys = new[] { ("state1", "license1"), ("state2", "license2"), etc... }
var cars = context.Cars.In(keys, c => (c.State, c.LicensePlate));

It's a bit more complicated because the predicate needs to operate on two separate columns to generate something like this:

SELECT ...
FROM ...
WHERE (State = @pState1 AND LicensePlate = @pLicensePlate1)
OR    (State = @pState2 AND LicensePlate = @pLicensePlate2)
OR ...

@clement911
Copy link

@ErikEJ
Copy link
Author

ErikEJ commented Feb 23, 2024

@clement911 Feel free to do with this snippet whatever you want. It is just a sample.

@julienFlexsoft
Copy link

@ErikEJ : Thank you for sharing this snippet!
You made this in 2021, do you consider it still up to date now that we're a few EF Core versions further? Performance wise I mean. I see that the EF Core team might want to make bucketization the default behaviour for collections in a future version, but I'm looking into you're snippet to avoid having to wait another year.
The current OPENJSON behavior is tanking our performance and the old constant behavior (without parameters at all) makes monitoring heavy queries difficult (since they're all unique we can't see them as aggregated recurring heavy queries).

@ErikEJ
Copy link
Author

ErikEJ commented Mar 5, 2025

@julienFlexsoft Thanks. It was updated in 2023, so I think it will work with current EF Core.

@julienFlexsoft
Copy link

I simplified the CreateBalancedORExpression like this

private static BinaryExpression CreateBalancedORExpression<TKey>(TKey[] values, Expression keySelectorBody)
{
    return values
        .Select(p => Expression.Equal(keySelectorBody, ((Expression<Func<TKey>>)(() => p)).Body))
        .Aggregate(Expression.OrElse);
}

The resulting SQL looks almost identical, but it makes the parameter names a bit clearer. No real added value except the shorter code.

-- Old SQL
[Parameters=[@__v1_0='1', @__v2_1='2', @__v1_2='3', @__v2_3='3']]
SELECT [a].[Id]
FROM [ActivityEntities] AS [a]
WHERE [a].[Id] = @__v1_0 OR [a].[Id] = @__v2_1 OR [a].[Id] = @__v1_2 OR [a].[Id] = @__v2_3

-- New SQL
[Parameters=[@__p_0='1', @__p_1='2', @__p_2='3', @__p_3='3']]
SELECT [a].[Id]
FROM [ActivityEntities] AS [a]
WHERE [a].[Id] = @__p_0 OR [a].[Id] = @__p_1 OR [a].[Id] = @__p_2 OR [a].[Id] = @__p_3

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment