Soutenance des stages de Master 2016 de la Chaire

Publié le

Les 5 étudiants de Master 2 qui ont réalisé cette année leur stage au sein de la Chaire Machine Learning for Big Data, ont présenté leurs travaux ce jeudi 27 octobre devant une quarantaine de personnes : enseignants-chercheurs de Télécom ParisTech, partenaires industriels de la chaire, représentant des établissements d'origine des étudiants... Quatre d'entre eux vont pouvoir poursuivre leurs travaux avec la Chaire dans le cadre d'un doctorat à Télécom ParisTech. Les présentations ont été suivies par un exposé sur la mission menée cet été à la New York University par Nicolas Goix sur le développement du Local Outlier Factor (LOF), un nouvel algorithme pour Scikit-learn. Ces résumés des présentations des jeunes chercheurs donnent un aperçu de la richesse, de la variété et de l'ambition des travaux conduits au sein de la Chaire.

Tighter bounds and multi-armed bandit formulation for extreme bandits

Mastane Achab, Master 2 recherche MVA (Mathématiques / Vision / Apprentissage), ENS Cachan, poursuit en thèse à Télécom ParisTech

We consider the problem where we want to sequentially allocate resources to different sources in order to detect extreme values. Sequential allocation problems are well studied in bandit theory but the best source is usually defined as the one with maximum mean reward. Here, we are interested in detecting the source providing the most extreme values. In this work we significantly improve the upper bound for ExtremeHunter algorithm (Carpentier and Valko, 2014) and we propose a lower bound for a nearly optimal strategy. Finally, we show that this problem can be seen and solved as a classic multi-armed bandit problem.

Differential inclusion for sparse linear regression

Evgenii Chzen, Master 2 recherche Mathématiques et Applications, Université Paris Est, poursuit en thèse à Paris-Est - Marne-la-Vallée

The high-dimensional settings of linear regression brought a tremendous attention over past decade. Popular approach involves penalization of the loss function which induces sparsity.  One of the main drawbacks of any convex regularization method is the bias of the estimators, in particular the Lasso estimator underestimates the amplitudes of the large coefficients. In this talk we will describe a differential inclusion approach to linear regression problem which may recover an oracle estimator of the unknown vector of coefficients and briefly cover results developed in previous works. We finally study the performance of this approach on orthogonal design matrix and compare with hard/soft thresholding rules.

Introducing Bias into Stochastic Gradient Descent

Pierre Laforgue, Master 2 recherche MASH (Mathématiques, Apprentissage et Sciences Humaines), Université Paris-Dauphine, poursuit en thèse à Télécom ParisTech

Many machine learning problems, such as logistic regression, penalized regressions or Support Vector Machines, can be seen as M-estimation problems. One of the most popular ways to tackle these problems is the gradient descent algorithm. Very often, stochastic versions of this algorithm use unbiased estimators of the actual gradient. This approach has been motivated by many results well documented in the litterature, providing both asymptotical convergence and non-asymptotical bounds. However, this unbiasedness constraint could be relaxed, in order to reduce the variance of the new estimator, and possibly make the estimation error decrease. The goal of this research internship was to assess the validity of such an approach, through both theoretical considerations and experimentations.

Structured prediction of argumentative structures from User-Generated Web Discourse

Alexandre Garcia, Master 2 recherche MVA (Mathématiques / Vision / Apprentissage), ENS Cachan, poursuit en thèse à Télécom ParisTech

Online debate platforms like idebate.org or convinceme.net have recently been exploited by researchers to analyze the properties of the arguments proposed by web users. Argumentation mining deals with the automatic retrieval and understanding of arguments contained in raw text extracts and can be seen as an extension of opinion mining where the goal is to detect the reasons that lead a speaker to adopt a position toward an issue. Applications range from automatic online reviews summarization to argument summarization for controversies. In this work, we focus on a supervised learning problem where the inputs are the sequences of sentences written by the Web Users and the outputs are the sequences of tags describing the argumentativeness of each sentence. We take advantage of the Web Discourse Corpus (Habernal and Gurevych 2015) to compare the application of different structured prediction methods on the problem of argument type prediction. We show empirically that the difficult problem of argument graph prediction can be set as a sequence labeling problem in our case for which structured prediction methods like Conditional Random Fields and its variants can be applied. We provide an analysis of the results obtained for different feature sets and the contribution of structured prediction approach on this problem.

Similarity Ranking and Performance Enhancement in Biometric Identification

Robin Vogel, Master 2 recheche Data Sciences, Université Paris-Saclay, poursuit en thèse à Télécom ParisTech

The similarity ranking setup consists in ranking pairs of observations, using a similarity function. We expect the similarity function to rank the most similar observations on top and the most dissimilar observations at the bottom. The observations are associated to a class, a pair of observations is considered similar both elements originates from the same class, and is considered dissimilar otherwise. This problem is hence somehow similar to the setup of bipartite ranking, though it differs from this setup since it ranks pairs of observations. The estimators considered in this problem are therefore U-statistics.

The internship first focused on obtaining guarantees derived in the bipartite ranking setting for the similarity ranking setup. These guarantees include the consistency and asymptotic normality of the ROC curve, as well as the validity of the bootstrap confidence intervals estimation procedures for the ROC curve. We tackled the problem of estimating a similarity function by optimizing a convex surrogate of the Neyman-Pearson problem associated with a single point of the ROC curve, using simulated data. Other techniques could be used to derive a similarity function, such as optimizing criterions that favour the best instances - which is a typical approach in bipartite ranking -, as well as the use of structured prediction techniques applied to learning rankings.

The aim of the future thesis will be to produce statistical learning algorithms to learn rules for similarity ranking that optimize empirical performance criterions, as well as to derive a framework guaranteeing the generalization properties of such rules.