October 5 - 
              3:30 p.m.
            Statistical Learning with Large Numbers of Predictor Variables
            Many present day applications of statistical learning involve large 
            numbers of predictor variables. Ofter that number is much larger than 
            the number of cases or observations available to train the learning 
            algorithm. In such situations traditional methods fail. Recently new 
            techniques based on regularization have been developed that can often 
            produce accurate learning models in these settings. This talk will 
            describe the basic principles underlying the method of regularizationand 
            then focus on those methods exploiting the sparsity of the predicting 
            model. The potential merits of these methods are then explored by 
            example.
            
 October 6 - 11:00 a.m.
              
              Predictive Learning via Rule Ensembles
             General regression and classification models are constructed as 
              linear combinations of simple rules derived from the data. Each 
              rule consists of a conjunction of a small number of simple statements 
              concerning the values of individual input variables. These rule 
              ensembles are shown to produce predictive accuracy comparable to 
              the best methods. However their principal advantage lies in interpretation. 
              Because of its simple form, each rule is easy to understand, as 
              is its influence on individual predictions, selected subsets of 
              predictions, or globally over the entire space of joint input variable 
              values. Similarly, the
              degree of relevance of the respective input variables can be assessed 
              globally, locally in different regions of the input space, or at 
              individual prediction points. Techniques are presented for automatically 
              identifying those variables that are involved in interactions with 
              other variables, the strength and degree of those interactions, 
              as well as the identities of the other variables with which they 
              interact. Graphical representations are used to visualize both main 
              and interaction effects.
              ----------------------------------------- 
            Dr. Friedman is one of the world's leading researchers in statistics 
              and data mining. He has been a Professor of Statistics at Stanford 
              University for nearly 20 years and has published on a wide range 
              of data-mining topics including nearest neighbor classification, 
              logistical regressions, and high dimensional data analysis. His 
              primary research interest is in the area of machine learning. 
            
            The Distinguished Lecture Series in Statistical Science series was 
            established in 2000 and takes place annually. It consists of two lectures 
            by a prominent statistical scientist. The first lecture is intended 
            for a broad mathematical sciences audience. The series occasionally 
            takes place at a member university and is tied to any current thematic 
            program related to statistical science; in the absence of such a program 
            the speaker is chosen independently of current activity at the Institute. 
            A nominating committee of representatives from the member universities 
            solicits nominations from the Canadian statistical community and makes 
            a recommendation to the Fields Scientific Advisory Panel, which is 
            responsible for the selection of speakers. 
            
Distinguished 
              Lecture Series in Statistical Science Index