-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Static typed Estimator/Transformer/Data #778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
f7ff0a9
Work in progress and demo of "PIpelines with Generic Static TYpes" (P…
TomFinley 5bd5607
Beginnings of data reader estimator static typing
TomFinley c9babed
Move Pigsty from Core to Data
TomFinley db24ddd
Generalization of the reader and transform typed estimator analysis
TomFinley 9c1f9e1
New test project, beginning of analyzer stub
TomFinley a8c1d5f
Restructure the analyzer, put reference to it in test project
TomFinley 5e98822
Renamings and header
TomFinley a9c48a4
Addition of new wrapping statically-typed classes, and wrapping methods
TomFinley f13105c
Working static pipelines for TextLoader
TomFinley 94219e5
PR comments
TomFinley 4d88a5c
Put image analytics into the PiGSTy
TomFinley 864a6e9
* NormVector added
TomFinley 91e3708
Code review and refinement
TomFinley df23c3d
Movement and refinement of the analyzer placement
TomFinley 7854b8d
More checking of the static vs. dynamic types
TomFinley 8ac20f0
Analyzer for types meant as schema shapes
TomFinley 9a46981
Senja comments
TomFinley 5f41433
Introduce delegate estimator. Now can avoid name collisions where out…
TomFinley 170c647
Senja comments
TomFinley File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
<Project Sdk="Microsoft.NET.Sdk"> | ||
|
||
<PropertyGroup> | ||
<TargetFramework>netstandard1.3</TargetFramework> | ||
</PropertyGroup> | ||
|
||
<ItemGroup> | ||
<PackageReference Include="Microsoft.CodeAnalysis.CSharp.Workspaces" Version="$(MicrosoftCodeAnalysisCSharpVersion)" /> | ||
<PackageReference Include="Microsoft.CSharp" Version="$(MicrosoftCSharpVersion)" /> | ||
<PackageReference Include="System.Composition" Version="$(SystemCompositionVersion)" /> | ||
</ItemGroup> | ||
|
||
</Project> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,180 @@ | ||
// Licensed to the .NET Foundation under one or more agreements. | ||
// The .NET Foundation licenses this file to you under the MIT license. | ||
// See the LICENSE file in the project root for more information. | ||
|
||
using System.Collections.Immutable; | ||
using System.Linq; | ||
using Microsoft.CodeAnalysis; | ||
using Microsoft.CodeAnalysis.CSharp; | ||
using Microsoft.CodeAnalysis.CSharp.Syntax; | ||
using Microsoft.CodeAnalysis.Diagnostics; | ||
|
||
namespace Microsoft.ML.Analyzer | ||
{ | ||
[DiagnosticAnalyzer(LanguageNames.CSharp)] | ||
public sealed class TypeIsSchemaShapeAnalyzer : DiagnosticAnalyzer | ||
{ | ||
internal static class ShapeDiagnostic | ||
{ | ||
private const string Category = "Type Check"; | ||
public const string Id = "MSML_TypeShouldBeSchemaShape"; | ||
private const string Title = "The type is not a schema shape"; | ||
private const string Format = "Type{0} is neither a PipelineColumn nor a ValueTuple."; | ||
internal const string Description = | ||
"Within statically typed pipeline elements of ML.NET, the shape of the schema is determined by a type. " + | ||
"A valid type is either an instance of one of the PipelineColumn subclasses (e.g., Scalar<bool> " + | ||
"or something like that), or a ValueTuple containing only valid types. (So, ValueTuples containing " + | ||
"other value tuples are fine, so long as they terminate in a PipelineColumn subclass.)"; | ||
|
||
internal static DiagnosticDescriptor Rule = | ||
new DiagnosticDescriptor(Id, Title, Format, Category, | ||
DiagnosticSeverity.Error, isEnabledByDefault: true, description: Description); | ||
} | ||
|
||
internal static class ShapeParameterDiagnostic | ||
{ | ||
private const string Category = "Type Check"; | ||
public const string Id = "MSML_TypeParameterShouldBeSchemaShape"; | ||
private const string Title = "The type is not a schema shape"; | ||
private const string Format = "Type parameter {0} is not marked with [IsShape] or appropriate type constraints."; | ||
internal const string Description = ShapeDiagnostic.Description + " " + | ||
"If using type parameters when interacting with the statically typed pipelines, the type parameter ought to be " + | ||
"constrained in such a way that it, either by applying the [IsShape] attribute or by having type constraints to " + | ||
"indicate that it is valid, e.g., constraining the type to descend from PipelineColumn."; | ||
|
||
internal static DiagnosticDescriptor Rule = | ||
new DiagnosticDescriptor(Id, Title, Format, Category, | ||
DiagnosticSeverity.Error, isEnabledByDefault: true, description: Description); | ||
} | ||
|
||
private const string AttributeName = "Microsoft.ML.Data.StaticPipe.IsShapeAttribute"; | ||
private const string LeafTypeName = "Microsoft.ML.Data.StaticPipe.Runtime.PipelineColumn"; | ||
|
||
public override ImmutableArray<DiagnosticDescriptor> SupportedDiagnostics => | ||
ImmutableArray.Create(ShapeDiagnostic.Rule, ShapeParameterDiagnostic.Rule); | ||
|
||
public override void Initialize(AnalysisContext context) | ||
{ | ||
context.RegisterSemanticModelAction(Analyze); | ||
} | ||
|
||
private void Analyze(SemanticModelAnalysisContext context) | ||
{ | ||
// We start with the model, then do the the method invocations. | ||
// We could have phrased it as RegisterSyntaxNodeAction(Analyze, SyntaxKind.InvocationExpression), | ||
// but this seemed more inefficient since getting the model and fetching the type symbols every | ||
// single time seems to incur significant cost. The following invocation is somewhat more awkward | ||
// since we must iterate over the invocation syntaxes ourselves, but this seems to be worthwhile. | ||
var model = context.SemanticModel; | ||
var comp = model.Compilation; | ||
|
||
// Get the symbols of the key types we are analyzing. If we can't find any of them there is | ||
// no point in going further. | ||
var attrType = comp.GetTypeByMetadataName(AttributeName); | ||
if (attrType == null) | ||
return; | ||
var leafType = comp.GetTypeByMetadataName(LeafTypeName); | ||
if (leafType == null) | ||
return; | ||
|
||
// This internal helper method recursively determines whether an attributed type parameter | ||
// has a valid type. It is called externally from the loop over invocations. | ||
bool CheckType(ITypeSymbol type, out string path, out ITypeSymbol problematicType) | ||
{ | ||
if (type.TypeKind == TypeKind.TypeParameter) | ||
{ | ||
var typeParam = (ITypeParameterSymbol)type; | ||
path = null; | ||
problematicType = null; | ||
// Does the type parameter have the attribute that triggers a check? | ||
if (type.GetAttributes().Any(attr => attr.AttributeClass == attrType)) | ||
return true; | ||
// Are any of the declared constraint types OK? | ||
if (typeParam.ConstraintTypes.Any(ct => CheckType(ct, out string ctPath, out var ctProb))) | ||
return true; | ||
// Well, probably not good then. Let's call it a day. | ||
problematicType = typeParam; | ||
return false; | ||
} | ||
else if (type.IsTupleType) | ||
{ | ||
INamedTypeSymbol nameType = (INamedTypeSymbol)type; | ||
var tupleElems = nameType.TupleElements; | ||
|
||
for (int i = 0; i < tupleElems.Length; ++i) | ||
{ | ||
var e = tupleElems[i]; | ||
if (!CheckType(e.Type, out string innerPath, out problematicType)) | ||
{ | ||
path = e.Name ?? $"Item{i + 1}"; | ||
if (innerPath != null) | ||
path += "." + innerPath; | ||
return false; | ||
} | ||
} | ||
path = null; | ||
problematicType = null; | ||
return true; | ||
} | ||
else | ||
{ | ||
for (var rt = type; rt != null; rt = rt.BaseType) | ||
{ | ||
if (rt == leafType) | ||
{ | ||
path = null; | ||
problematicType = null; | ||
return true; | ||
} | ||
} | ||
path = null; | ||
problematicType = type; | ||
return false; | ||
} | ||
} | ||
|
||
foreach (var invocation in model.SyntaxTree.GetRoot().DescendantNodes().OfType<InvocationExpressionSyntax>()) | ||
{ | ||
var symbolInfo = model.GetSymbolInfo(invocation); | ||
if (!(symbolInfo.Symbol is IMethodSymbol methodSymbol)) | ||
{ | ||
// Should we perhaps skip when there is a method resolution failure? This is often but not always a sign of another problem. | ||
if (symbolInfo.CandidateReason != CandidateReason.OverloadResolutionFailure || symbolInfo.CandidateSymbols.Length == 0) | ||
continue; | ||
methodSymbol = symbolInfo.CandidateSymbols[0] as IMethodSymbol; | ||
if (methodSymbol == null) | ||
continue; | ||
} | ||
// Analysis only applies to generic methods. | ||
if (!methodSymbol.IsGenericMethod) | ||
continue; | ||
// Scan the type parameters for one that has our target attribute. | ||
for (int i = 0; i < methodSymbol.TypeParameters.Length; ++i) | ||
{ | ||
var par = methodSymbol.TypeParameters[i]; | ||
var attr = par.GetAttributes(); | ||
if (attr.Length == 0) | ||
continue; | ||
if (!attr.Any(a => a.AttributeClass == attrType)) | ||
continue; | ||
// We've found it. Check the type argument to ensure it is of the appropriate type. | ||
var p = methodSymbol.TypeArguments[i]; | ||
if (CheckType(p, out string path, out ITypeSymbol problematicType)) | ||
continue; | ||
|
||
if (problematicType.Kind == SymbolKind.TypeParameter) | ||
{ | ||
var diagnostic = Diagnostic.Create(ShapeParameterDiagnostic.Rule, invocation.GetLocation(), problematicType.Name); | ||
context.ReportDiagnostic(diagnostic); | ||
} | ||
else | ||
{ | ||
path = path == null ? "" : " of item " + path; | ||
var diagnostic = Diagnostic.Create(ShapeDiagnostic.Rule, invocation.GetLocation(), path); | ||
context.ReportDiagnostic(diagnostic); | ||
} | ||
} | ||
} | ||
} | ||
} | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to get this analyzer in our
Microsoft.ML
NuGet package, we will have to do some build authoring. Today our infrastructure only supports putting things in thebuild
,lib
andruntimes
folders. It shouldn't be much work, but someone will have to do it.Let me know if you want me to pick the NuGet authoring up, or if you want to handle it yourself (I can give you some tips if necessary).