Soutenance des stages de Master 2017 de la Chaire

Publié le

Les 5 étudiants de Master 2 qui ont réalisé cette année leur stage au sein de la Chaire Machine Learning for Big Data, ont présenté leurs travaux ce vendredi 29 septembre devant une trentaine de personnes : enseignants-chercheurs de Télécom ParisTech, partenaires industriels de la chaire, représentant des établissements d'origine des étudiants... Les présentations ont été suivies par des exposés sur les stages de recherche menées cet été en Californie par les doctorants Claire Vernade (chez Adobe Research) et Guillaume Papa (à Stanford). Ces résumés des présentations des jeunes chercheurs donnent un aperçu de la richesse, de la variété et de l'ambition des travaux conduits au sein de la Chaire.

Classification multi-classe extrême: graphe et parcimonie

Simon Amar, Master 2, Université Paris-Saclay

Bayesian Model Selection for Matrix and Tensor Factorization Models

Thanh Huy, Master 2 DataScience, Université Paris-Saclay

Matrix and tensor factorization provides a unifying view of a broad spectrum of techniques in machine learning and signal processing; topic modeling, multitask learning, transfer learning, multiple kernel learning, dimensionality reduction, matrix completion, source separation, network analysis or certain deep learning problems can all be framed as some kind of factorization problems. Thanks to this generic nature, these models have proven very successful in several application fields such as audio signal processing, text processing, bioinformatics, computer vision, social media analysis, or network traffic analysis. The performance of matrix and tensor factorization models heavily relies on the "rank" of the factorization. Estimating this rank is a challenging problem and is a very active research question. The main goal of this research internship is to investigate and develop new Bayesian model selection echniques for rank estimation in matrix and tensor factorization models.

Online anomaly detection algorithm for IoT data streams

Safa Boudabous, Master 2 Data & Knowledge, Université Paris-Saclay

Extending the current Internet with all connected devices ("Things") and their virtual representation has been a growing trend in decades and created a lot of new applications.
Internet of Things developments imply that the environments, cities, buildings, vehicles, clothing, portable devices and other objects have more and more information associated with them. These devices have the ability to sense, communicate, network and produce new information. The power of the internet of things paradigm is the ability to provide real time data from many different distributed sources to other machines or entities for a variety of services. Machine learning plays an essential role in IoT aspect for handling this huge amount of real-time data [1]. To address the challenge of handling high volume and high velocity of data, different stream processing engines emerged including Spark Streaming based on Spark [2], Storm[3], Flink[4] or Samza[5]. Moreover innovative IoT cloudification architectures with ML algorithms running partly on IoT devices, gateways and dynamic virtual machines are proposed. These architectures depend on the IoT network devices ressources. The goal of this research internship is to focus on the application of Machine Learning in the fields of IoT Energy efficiency [6] and node localization and investigate which algorithms, architectures and streaming plateforms are suitable to such applications.

Modélisation de séries spatio-temporelles et détection d’épidémies

Camille Jandot, Master 2 DataScience, Université Paris-Saclay

In many fields such as health monitoring, epidemiology, transport or energy, massive and complex real-world datasets are accumulated through time and space, giving rise to spatial time series. A recurrent critical task consists in forecasting outbreaks or rare events from spatial time series. Although many works have been developed in this area, this problem still raises many issues such as dealing with incomplete measures, data acquired at different time scale, heterogeneity of time series (sensors measurements, sells, text message, search engine queries, …) and inherent group-structure of the data. Eventually the very large scale of data requires to adapt forecasting tools to be efficient in inference and learning. The main goal of internship is to build upon multi-task autoregressive models and graphvased approaches to provide a general framework for outbreak/event detection in spatial time series. Approaches based on operator-valued kernel autoregressive models will be especially studied and extended while paying attention to the general context of Markov Random Fields. Implementation will be based on a home-made open-source library Operalib.

Towards efficient learning of function-valued functions

Alex Lambert, Master 2 DataScience, Université Paris-Saclay, poursuit en thèse à Télécom ParisTech