Ranking Sample - Hotel Search Results (changed to Bing Search Engine … #549

nicolehaugen · 2019-07-08T14:12:36Z

…result ranking) (#533)

Created ranking sample
removed todo
Fixed wording in ReadMe
Fixed typos
Modified RankingMetric code
Incorporated Justin's feedback
Fixed minor inconsistencies
Converted to new dataset
Changed code to download dataset since its zip is too large
fixed using statement
Removed unneeded license info for dataset
Renamed solution and minor changes

…result ranking) (#533) * Created ranking sample * removed todo * Fixed wording in ReadMe * Fixed typos * Modified RankingMetric code * Incorporated Justin's feedback * Fixed minor inconsistencies * Converted to new dataset * Changed code to download dataset since its zip is too large * fixed using statement * Removed unneeded license info for dataset * Renamed solution and minor changes

eerhardt · 2019-07-08T14:34:40Z

samples/csharp/getting-started/Ranking_Web/WebRanking/Program.cs

+            // Training the model is a process of running the chosen algorithm on the given data. To perform training you need to call the Fit() method.
+            ITransformer model = trainerPipeline.Fit(trainData);
+            IDataView transformedTrainData = model.Transform(trainData);
+;


Spurious semi-colon.

eerhardt · 2019-07-08T14:37:01Z

samples/csharp/getting-started/Ranking_Web/WebRanking/Common/ConsoleHelper.cs

+
+        // Performs evaluation with the truncation level set up to 10 search results within a query.
+        // This is a temporary workaround for this issue: https://github.com/dotnet/machinelearning/issues/2728.
+        public static void EvaluateMetrics(MLContext mlContext, IDataView scoredData, int truncationLevel)


(nit) It would be nice to have the parameter names be consistent between this method and the above method.

IDataView predictions
vs.
IDataView scoredData

eerhardt · 2019-07-08T14:39:46Z

samples/csharp/getting-started/Ranking_Web/WebRanking/Common/ConsoleHelper.cs

+            }
+
+            //  Uses reflection to set the truncation level before calling evaluate.
+            var mlAssembly = AppDomain.CurrentDomain.GetAssemblies().Where(a => a.FullName.Contains("Microsoft.ML.Data")).First();


You can just use a public type that you know is in Microsoft.ML.Data.

ex.

var mlAssembly =typeof(TextLoader).Assembly;

That way we don't happen to find other assemblies, like Microsoft.ML.DataView.

eerhardt

justinormont · 2019-07-08T16:43:43Z

samples/csharp/getting-started/Ranking_Web/README.md

+  timestamp = {Mon, 01 Jul 2013 20:31:25 +0200},
+  biburl    = {https://dblp.uni-trier.de/rec/bib/journals/corr/QinL13},
+  bibsource = {dblp computer science bibliography, https://dblp.org}
+}


You'll want the dataset reference in a code block (```). Currently when the markdown is displayed as HTML, the block looses formatting, and gets displayed as a single line.

justinormont · 2019-07-08T16:48:11Z

samples/csharp/getting-started/Ranking_Web/README.md

+
+    * The features are basically extracted by us (e.g. Microsoft), and are those widely used in the research community.
+
+    In the data files, each row corresponds to a query-url pair. The first column is relevance label of the pair, the second column is query id, and the following columns are features. The larger value the relevance label has, the more relevant the query-url pair is. A query-url pair is represented by a 136-dimensional feature vector.


The extra spaces at the front of these lines create a code block, which doesn't wrap its contents.

I think you want a quote block instead:

> The datasets are machine learning data, in which queries and urls are represented by IDs. The datasets consist of feature vectors extracted from query-url pairs along with relevance judgment labels: > > * The relevance judgments are obtained from a retired labeling set of a commercial web search engine (Microsoft Bing), which take 5 values from 0 (irrelevant) to 4 (perfectly relevant). > * The features are basically extracted by us (e.g. Microsoft), and are those widely used in the research community. > > In the data files, each row corresponds to a query-url pair. The first column is relevance label of the pair, the second column is query id, and the following columns are features. The larger value the relevance label has, the more relevant the query-url pair is. A query-url pair is represented by a 136-dimensional feature vector.

Good catch - #resolved

justinormont · 2019-07-08T16:59:30Z

samples/csharp/getting-started/Ranking_Web/README.md

+## ML Task - Ranking
+As previously mentioned, this sample uses the LightGBM LambdaRank algorithm which is applied using a supervised learning technique known as [**Learning to Rank**](https://en.wikipedia.org/wiki/Learning_to_rank).  This technique requires that train\test datasets contain groups of data instances that are each labeled with their relevance score (e.g. relevance judgment label).  The label is a numerical\ordinal value, such as {0, 1, 2, 3, 4}.  The process for labeling these data instances with their relevance scores can be done manually by subject matter experts.  Or, the labels can be determined using other metrics, such as the number of clicks on a given search result. 
+
+It is expected that the dataset will have many more "Bad" relevance scores than "Perfect".  This helps to avoid converting a ranked list directly into equally sized bins of {0, 1, 2, 3, 4}.  The relevance scores are also reused so that you will have many items **per group** that are labeled 0, which means the result is "Bad".  And, only one or a few labeled 4, which means that the result is "Perfect". 


Might be nice to show a bar / pie chart of label values from this MSLR dataset. This would exemplify the expected distribution of labels for ranking.

The breakdown for this demo's MSLR dataset:

Label 0 -- 624,263

Label 1 -- 386,280

Label 2 -- 159,451

Label 3 -- 21,317

Label 4 -- 8,881

This would be nice in graph form, but just the numeric table form is slightly useful, as the user could calculate that there are 70x as many 0 "Bad", then 4 "Perfect" labels in the dataset.

I added the breakdown in text format - #resolved

justinormont · 2019-07-08T17:07:09Z

samples/csharp/getting-started/Ranking_Web/README.md

+    In the data files, each row corresponds to a query-url pair. The first column is relevance label of the pair, the second column is query id, and the following columns are features. The larger value the relevance label has, the more relevant the query-url pair is. A query-url pair is represented by a 136-dimensional feature vector.
+
+## ML Task - Ranking
+As previously mentioned, this sample uses the LightGBM LambdaRank algorithm which is applied using a supervised learning technique known as [**Learning to Rank**](https://en.wikipedia.org/wiki/Learning_to_rank).  This technique requires that train\test datasets contain groups of data instances that are each labeled with their relevance score (e.g. relevance judgment label).  The label is a numerical\ordinal value, such as {0, 1, 2, 3, 4}.  The process for labeling these data instances with their relevance scores can be done manually by subject matter experts.  Or, the labels can be determined using other metrics, such as the number of clicks on a given search result. 


There's a handful of places in the doc with uncommon backslashes: (used in code a lot; generally unused in English)

Suggested change

As previously mentioned, this sample uses the LightGBM LambdaRank algorithm which is applied using a supervised learning technique known as [**Learning to Rank**](https://en.wikipedia.org/wiki/Learning_to_rank). This technique requires that train\test datasets contain groups of data instances that are each labeled with their relevance score (e.g. relevance judgment label). The label is a numerical\ordinal value, such as {0, 1, 2, 3, 4}. The process for labeling these data instances with their relevance scores can be done manually by subject matter experts. Or, the labels can be determined using other metrics, such as the number of clicks on a given search result.

As previously mentioned, this sample uses the LightGBM LambdaRank algorithm which is applied using a supervised learning technique known as [**Learning to Rank**](https://en.wikipedia.org/wiki/Learning_to_rank). This technique requires that train/test datasets contain groups of data instances that are each labeled with their relevance score (e.g. relevance judgment label). The label is a numerical/ordinal value, such as {0, 1, 2, 3, 4}. The process for labeling these data instances with their relevance scores can be done manually by subject matter experts. Or, the labels can be determined using other metrics, such as the number of clicks on a given search result.

justinormont · 2019-07-08T17:08:12Z

samples/csharp/getting-started/Ranking_Web/README.md

+
+It is expected that the dataset will have many more "Bad" relevance scores than "Perfect".  This helps to avoid converting a ranked list directly into equally sized bins of {0, 1, 2, 3, 4}.  The relevance scores are also reused so that you will have many items **per group** that are labeled 0, which means the result is "Bad".  And, only one or a few labeled 4, which means that the result is "Perfect". 
+
+Once the train\test datasets are labeled with relevance scores, the model (e.g. ranker) can then be trained and tested using this data.  Through the model training process, the ranker learns how to score each data instance within a group based on their label value.  The resulting score of an individual data instance by itself isn't important -- instead, the scores should be compared against one another to determine the relative ordering of a group's data instances.  The higher the score a data instance has, the more relevant and more highly ranked it is within its group.      


back => forward slash:

Suggested change

Once the train\test datasets are labeled with relevance scores, the model (e.g. ranker) can then be trained and tested using this data. Through the model training process, the ranker learns how to score each data instance within a group based on their label value. The resulting score of an individual data instance by itself isn't important -- instead, the scores should be compared against one another to determine the relative ordering of a group's data instances. The higher the score a data instance has, the more relevant and more highly ranked it is within its group.

Once the train/test datasets are labeled with relevance scores, the model (e.g. ranker) can then be trained and tested using this data. Through the model training process, the ranker learns how to score each data instance within a group based on their label value. The resulting score of an individual data instance by itself isn't important -- instead, the scores should be compared against one another to determine the relative ordering of a group's data instances. The higher the score a data instance has, the more relevant and more highly ranked it is within its group.

justinormont · 2019-07-08T17:09:00Z

samples/csharp/getting-started/Ranking_Web/README.md

+
+This sample performs the following high-level steps to rank the search engine results:
+1. The model is **trained** using the train dataset with LightGBM LambdaRank.  
+2. The model is **tested** using the test dataset.  This results in a **prediction** that includes a **score** for each search engine result.  The score is used to determine the ranking relative to other results within the same query (e.g. group).  The predictions are then **evaluated** by examining metrics; specifically the [Normalized Discounted Cumulative Gain](https://en.wikipedia.org/wiki/Discounted_cumulative_gain)(NDCG).


Missing space:

Suggested change

2. The model is **tested** using the test dataset. This results in a **prediction** that includes a **score** for each search engine result. The score is used to determine the ranking relative to other results within the same query (e.g. group). The predictions are then **evaluated** by examining metrics; specifically the [Normalized Discounted Cumulative Gain](https://en.wikipedia.org/wiki/Discounted_cumulative_gain)(NDCG).

2. The model is **tested** using the test dataset. This results in a **prediction** that includes a **score** for each search engine result. The score is used to determine the ranking relative to other results within the same query (e.g. group). The predictions are then **evaluated** by examining metrics; specifically the [Normalized Discounted Cumulative Gain](https://en.wikipedia.org/wiki/Discounted_cumulative_gain) (NDCG).

justinormont · 2019-07-08T17:14:22Z

samples/csharp/getting-started/Ranking_Web/README.md

+This sample performs the following high-level steps to rank the search engine results:
+1. The model is **trained** using the train dataset with LightGBM LambdaRank.  
+2. The model is **tested** using the test dataset.  This results in a **prediction** that includes a **score** for each search engine result.  The score is used to determine the ranking relative to other results within the same query (e.g. group).  The predictions are then **evaluated** by examining metrics; specifically the [Normalized Discounted Cumulative Gain](https://en.wikipedia.org/wiki/Discounted_cumulative_gain)(NDCG).
+3. The final step is to **consume** the model to perform ranking predictions for new incoming searches.


We would be better describing the full data science process of training a model. Mainly we (our sample code) should push the need of refitting on all available data when creating the final model, and that train/test or CV splits are only for getting metrics.

Quoted from: #533 (comment)

...let users experiment on the dataset by iterating using the scores on the validation set, then testing their final solution, exactly once, on the test dataset.

Generally the DS pattern for ML should look like:

Train on training set, get metrics from validation set

(iterate many times until happy with pipeline -- goto 1)

Train on found pipeline on combined train+validate set, get metrics on test set (exactly once) -- this is your final metrics

Retrain pipeline on all data train+validate+test set. Send this newly created model to production.

The final estimate of how well your model will do in production is the metrics from step (3). The final model for production, trained on all available data, is trained at step (4).

We could also demonstrate this in our code.

@justinormont - I modified the ReadMe and the code to follow the steps that you outlined above. This resulted in quite a few changes. Can you take another look to make sure that I understood everything correctly? Thanks!

justinormont · 2019-07-08T17:18:23Z

samples/csharp/getting-started/Ranking_Web/README.md

+
+## Problem
+The ability to perform ranking is a common problem faced by search engines since users expect query results to be ranked\sorted according to their relevance.  This problem extends beyond the needs of search engines to include a variety of business scenarios where personalized sorting is key to the user experience.  Here are a few specific examples:
+* Travel Agency - Provide a list of hotels with those that are most likely to be purchased\booked by the user positioned highest in the list.


Forward slash:

Suggested change

* Travel Agency - Provide a list of hotels with those that are most likely to be purchased\booked by the user positioned highest in the list.

* Travel Agency - Provide a list of hotels with those that are most likely to be purchased/booked by the user positioned highest in the list.

justinormont · 2019-07-08T17:18:40Z

samples/csharp/getting-started/Ranking_Web/README.md

+This introductory sample shows how to use ML.NET to predict the the best order to display search engine results. In the world of machine learning, this type of prediction is known as ranking.
+
+## Problem
+The ability to perform ranking is a common problem faced by search engines since users expect query results to be ranked\sorted according to their relevance.  This problem extends beyond the needs of search engines to include a variety of business scenarios where personalized sorting is key to the user experience.  Here are a few specific examples:


Forward slash:

Suggested change

The ability to perform ranking is a common problem faced by search engines since users expect query results to be ranked\sorted according to their relevance. This problem extends beyond the needs of search engines to include a variety of business scenarios where personalized sorting is key to the user experience. Here are a few specific examples:

The ability to perform ranking is a common problem faced by search engines since users expect query results to be ranked/sorted according to their relevance. This problem extends beyond the needs of search engines to include a variety of business scenarios where personalized sorting is key to the user experience. Here are a few specific examples:

justinormont · 2019-07-08T17:20:09Z

samples/csharp/getting-started/Ranking_Web/README.md

+
+* Group Id - Column that contains the group id for each data instance.  Data instances are contained in logical groupings representing all candidate results in a single query and each group has an identifier known as the group id.  In the case of the search engine dataset, search results are grouped by their corresponding query where the group id corresponds to the query id.  The input group id data type must be [key type](https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.data.keydataviewtype). 
+* Label - Column that contains the relevance label of each data instance where higher values indicate higher relevance.  The input label data type must be [key type](https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.data.keydataviewtype) or [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single). 
+* Features - The columns that are influential in determining the relevance\rank of a data instance.  The input feature data must be a fixed size vector of type [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single).


Forward slash:

Suggested change

* Features - The columns that are influential in determining the relevance\rank of a data instance. The input feature data must be a fixed size vector of type [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single).

* Features - The columns that are influential in determining the relevance/rank of a data instance. The input feature data must be a fixed size vector of type [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single).

justinormont

LGTM.

Minor items mentioned, and would be nice to codify in the sample the end-to-end data science/ML process: https://github.com/dotnet/machinelearning-samples/pull/549/files#r301207345

justinormont · 2019-07-15T21:28:47Z

samples/csharp/getting-started/Ranking_Web/README.md

+In the data files, each row corresponds to a query-url pair. The first column is relevance label of the pair, the second column is query id, and the following columns are features. The larger value the relevance label has, the more relevant the query-url pair is. A query-url pair is represented by a 136-dimensional feature vector.
+
+## ML Task - Ranking
+As previously mentioned, this sample uses the LightGBM LambdaRank algorithm which is applied using a supervised learning technique known as [**Learning to Rank**](https://en.wikipedia.org/wiki/Learning_to_rank).  This technique requires that train/validation/test datasets contain groups of data instances that are each labeled with their relevance score (e.g. relevance judgment label).  The label is a numerical/ordinal value, such as {0, 1, 2, 3, 4}.  The process for labeling these data instances with their relevance scores can be done manually by subject matter experts.  Or, the labels can be determined using other metrics, such as the number of clicks on a given search result. 


May want to a search/replace for ".<space><space>" with ".<space>". I think our docs generally have one space after a sentence. This markdown file has a mix of single & double.

Suggested change

As previously mentioned, this sample uses the LightGBM LambdaRank algorithm which is applied using a supervised learning technique known as [**Learning to Rank**](https://en.wikipedia.org/wiki/Learning_to_rank). This technique requires that train/validation/test datasets contain groups of data instances that are each labeled with their relevance score (e.g. relevance judgment label). The label is a numerical/ordinal value, such as {0, 1, 2, 3, 4}. The process for labeling these data instances with their relevance scores can be done manually by subject matter experts. Or, the labels can be determined using other metrics, such as the number of clicks on a given search result.

As previously mentioned, this sample uses the LightGBM LambdaRank algorithm which is applied using a supervised learning technique known as [**Learning to Rank**](https://en.wikipedia.org/wiki/Learning_to_rank). This technique requires that train/validation/test datasets contain groups of data instances that are each labeled with their relevance score (e.g. relevance judgment label). The label is a numerical/ordinal value, such as {0, 1, 2, 3, 4}. The process for labeling these data instances with their relevance scores can be done manually by subject matter experts. Or, the labels can be determined using other metrics, such as the number of clicks on a given search result.

CESARDELATORRE

👍

eerhardt reviewed Jul 8, 2019

View reviewed changes

eerhardt approved these changes Jul 8, 2019

View reviewed changes

minor fixes

a11cc88

justinormont reviewed Jul 8, 2019

View reviewed changes

justinormont approved these changes Jul 8, 2019

View reviewed changes

Justin's feedback for PR into master

3cf88cc

nicolehaugen mentioned this pull request Jul 15, 2019

Provide a way to append\concatentate multiple IDataViews dotnet/machinelearning#4005

Open

justinormont reviewed Jul 15, 2019

View reviewed changes

fixed period and spacing inconsistencies

0330fc3

CESARDELATORRE self-requested a review July 16, 2019 17:47

CESARDELATORRE approved these changes Jul 16, 2019

View reviewed changes

CESARDELATORRE merged commit f50535f into master Jul 16, 2019

CESARDELATORRE deleted the features/ranking-sample branch August 23, 2019 17:06

justinormont mentioned this pull request Dec 9, 2019

MN.Net Ranking Project #648

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ranking Sample - Hotel Search Results (changed to Bing Search Engine … #549

Ranking Sample - Hotel Search Results (changed to Bing Search Engine … #549

nicolehaugen commented Jul 8, 2019

eerhardt Jul 8, 2019

nicolehaugen Jul 8, 2019

eerhardt Jul 8, 2019

nicolehaugen Jul 8, 2019

eerhardt Jul 8, 2019

nicolehaugen Jul 8, 2019

eerhardt left a comment

justinormont Jul 8, 2019 •

edited

Loading

nicolehaugen Jul 15, 2019

justinormont Jul 8, 2019

nicolehaugen Jul 15, 2019

justinormont Jul 8, 2019

nicolehaugen Jul 15, 2019

justinormont Jul 8, 2019

nicolehaugen Jul 15, 2019

justinormont Jul 8, 2019

nicolehaugen Jul 15, 2019

justinormont Jul 8, 2019

nicolehaugen Jul 15, 2019

justinormont Jul 8, 2019 •

edited

Loading

nicolehaugen Jul 15, 2019

nicolehaugen Jul 16, 2019

justinormont Jul 8, 2019

nicolehaugen Jul 15, 2019

justinormont Jul 8, 2019

nicolehaugen Jul 15, 2019

justinormont Jul 8, 2019

nicolehaugen Jul 15, 2019

justinormont left a comment

justinormont Jul 15, 2019

nicolehaugen Jul 16, 2019

CESARDELATORRE left a comment


		* The features are basically extracted by us (e.g. Microsoft), and are those widely used in the research community.

		In the data files, each row corresponds to a query-url pair. The first column is relevance label of the pair, the second column is query id, and the following columns are features. The larger value the relevance label has, the more relevant the query-url pair is. A query-url pair is represented by a 136-dimensional feature vector.


		It is expected that the dataset will have many more "Bad" relevance scores than "Perfect". This helps to avoid converting a ranked list directly into equally sized bins of {0, 1, 2, 3, 4}. The relevance scores are also reused so that you will have many items per group that are labeled 0, which means the result is "Bad". And, only one or a few labeled 4, which means that the result is "Perfect".

		Once the train\test datasets are labeled with relevance scores, the model (e.g. ranker) can then be trained and tested using this data. Through the model training process, the ranker learns how to score each data instance within a group based on their label value. The resulting score of an individual data instance by itself isn't important -- instead, the scores should be compared against one another to determine the relative ordering of a group's data instances. The higher the score a data instance has, the more relevant and more highly ranked it is within its group.

	* Travel Agency - Provide a list of hotels with those that are most likely to be purchased\booked by the user positioned highest in the list.
	* Travel Agency - Provide a list of hotels with those that are most likely to be purchased/booked by the user positioned highest in the list.

	The ability to perform ranking is a common problem faced by search engines since users expect query results to be ranked\sorted according to their relevance. This problem extends beyond the needs of search engines to include a variety of business scenarios where personalized sorting is key to the user experience. Here are a few specific examples:
	The ability to perform ranking is a common problem faced by search engines since users expect query results to be ranked/sorted according to their relevance. This problem extends beyond the needs of search engines to include a variety of business scenarios where personalized sorting is key to the user experience. Here are a few specific examples:

	* Features - The columns that are influential in determining the relevance\rank of a data instance. The input feature data must be a fixed size vector of type [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single).
	* Features - The columns that are influential in determining the relevance/rank of a data instance. The input feature data must be a fixed size vector of type [Single](https://docs.microsoft.com/en-us/dotnet/api/system.single).

Ranking Sample - Hotel Search Results (changed to Bing Search Engine … #549

Ranking Sample - Hotel Search Results (changed to Bing Search Engine … #549

Conversation

nicolehaugen commented Jul 8, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eerhardt left a comment

Choose a reason for hiding this comment

justinormont Jul 8, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

justinormont Jul 8, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

justinormont left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CESARDELATORRE left a comment

Choose a reason for hiding this comment

justinormont Jul 8, 2019 •

edited

Loading

justinormont Jul 8, 2019 •

edited

Loading