Note that I am not at Yale any more, but I am continuing with similar work. To disseminate papers, information, etc., this old page will be maintained.

Michael Mahoney

Assistant Professor, Department of Mathematics
Research Affiliate, Department of Computer Science
Yale University



By popular demand ... we are running the MMDS Workshop on "Algorithms for Modern Massive Data Sets" again! As with the 2006 meeting, MMDS 2008 will address algorithmic, statistical, and mathematical challenges in large-scale statistical data analysis.



Click here for the 2006 Stanford/Yahoo "Workshop on Algorithms for Modern Massive Data Sets (MMDS 2006)" that I organized (with G. Golub, P. Drineas, and L-H. Lim) and that took place on June 21-24, 2006, or click here for an article in SIAM News about the meeting.

Click here for a ppt of the SIAM-SDM06 2006 tutorial, or click here for the overheads for several other talks.


Research interests

  • Design and analysis of algorithms for very large matrix, graph, and regression problems.
  • Applications to the statistical data analysis of massive scientific and Internet data.
  • Monte Carlo methods in computer science, statistical physics, and scientific computation.

I work on developing and analyzing algorithms for large matrix, graph, and regression problems, as well as applications of these tools to the statistical analysis of extremely large scientific and Internet data sets. Current work includes applying these and related algorithmic techniques to problems arising in Internet applications, and past work includes the application of these techniques to DNA single nucleotide polymorphism data. I have also worked on developing and analyzing Monte Carlo algorithms for performing useful computations on extremely large matrices, e.g., the additive-error and relative-error CUR matrix decompositions. Past research has also included work in computational statistical mechanics on the development and analysis of the TIP5P model of liquid water. Current and planned future work includes developing novel algorithmic tools and applying them to novel statistical data analysis problems.


Publications

  • Unsupervised Feature Selection for Principal Components Analysis
  • C. Boutsidis, M. W. Mahoney, and P. Drineas
    Proc. 14-th Annual SIGKDD, 61-69 (2008)
  • Statistical properties of community structure in large social and information networks
  • J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney,
    Proc. 17-th International WWW, 695-704 (2008)
    Journal version submitted for publication.
  • Learning with Spectral Kernels and Heavy-Tailed Data
  • M. W. Mahoney and H. Narayanan,
    Submitted for publication.
  • Faster Least Squares Approximation
  • P. Drineas, M. W. Mahoney, S. Muthukrishnan, and T. Sarlos,
    Technical Report, Preprint: arXiv:0710.1435 (2007) (arXiv),
    Journal version submitted for publication.
  • Relative-Error CUR Matrix Decompositions
  • P. Drineas, M. W. Mahoney, and S. Muthukrishnan,
    Technical Report, Preprint: arXiv:0708.3696 (2007) (arXiv),
  • Sampling Algorithms and Coresets for Lp Regression
  • A. Dasgupta, P. Drineas, B. Harb, R. Kumar, and M. W. Mahoney,
    Technical Report, Preprint: arXiv:0707.1714 (2007) (arXiv),
    Proc. 19-th Annual SODA, 932-941 (2008) (ps, pdf).
    Journal version submitted for publication.
  • Feature Selection Methods for Text Classification
  • A. Dasgupta, P. Drineas, B. Harb, V. Josifovski, and M. W. Mahoney,
    Proc. 13-th Annual SIGKDD, 230-239 (2007) (ps, pdf).
  • PCA-Correlated SNPs for Structure Identification in Worldwide Human Populations
  • P. Paschou, E. Ziv, E. G. Burchard, S. Choudhry, W. Rodriguez-Cintron, M. W. Mahoney, and P. Drineas,
    PLOS Genetics, 3, 1672-1686 (2007) (ps, pdf).
  • Intra- and interpopulation genotype reconstruction from tagging SNPs,
  • P. Paschou, M. W. Mahoney, A. Javed, J. R. Kidd, A. J. Pakstis, S. Gu, K. K. Kidd, and P. Drineas,
    Genome Research, 17(1), 96-107 (2007) (ps, pdf).
  • Bridging the Gap Between Numerical Linear Algebra, Theoretical Computer Science, and Data Applications
  • G. H. Golub, M. W. Mahoney, P. Drineas, and L.-H. Lim,
    SIAM News 39:8 October 2006 (ps, pdf).
  • Randomized Algorithms for Matrices and Massive Data Sets,
  • P. Drineas and M. W. Mahoney,
    Proc. 32-nd Annual VLDB, 1269 (2006) (ps, pdf).
  • Subspace Sampling and Relative-Error Matrix Approximation: Column-Row-Based Methods,
  • P. Drineas, M. W. Mahoney, and S. Muthukrishnan,
    Proc. 14-th Annual ESA, 304-314 (2006) (ps, pdf).
  • Subspace Sampling and Relative-Error Matrix Approximation: Column-Based Methods,
  • P. Drineas, M. W. Mahoney, and S. Muthukrishnan,
    Proc. 10-th Annual RANDOM, 316-326 (2006) (ps, pdf).
  • Tensor-CUR Decompositions For Tensor-Based Data,
  • M. W. Mahoney, M. Maggioni, and P. Drineas,
    Proc. 12-th Annual SIGKDD, 327-336 (2006) (ps, pdf),
  • Polynomial Time Algorithm for Column-Row-Based Relative-Error Low-Rank Matrix Approximation,
  • P. Drineas, M. W. Mahoney, and S. Muthukrishnan,
    Technical Report, DIMACS TR 2006-04 March 2006 (ps, pdf).
  • Sampling Algorithms for L2 Regression and Applications,
  • P. Drineas, M. W. Mahoney, and S. Muthukrishnan,
    Proc. 17-th Annual SODA, 1127-1136 (2006) (ps, pdf).
  • A Randomized Algorithm for a Tensor-Based Generalization of the Singular Value Decomposition,
  • P. Drineas and M. W. Mahoney,
    Technical Report, YALEU/DCS/TR-1327, June 2005 (ps, pdf),
    Linear Algebra and its Applications, 420, 553-571 (2007) (ps, pdf).
  • On the Nystrom Method for Approximating a Gram Matrix for Improved Kernel-Based Learning,
  • P. Drineas and M. W. Mahoney,
    Technical Report, YALEU/DCS/TR-1319, April 2005 (ps, pdf),
    Proc. 18-th Annual COLT, 323-337 (2005) (ps, pdf),
    J. Machine Learning Research, 6, 2153-2175 (2005). (ps, pdf).
  • Sampling Sub-problems of Heterogeneous Max-Cut Problems and Approximation Algorithms,
  • P. Drineas, R. Kannan, and M. W. Mahoney,
    Technical Report, YALEU/DCS/TR-1283, April 2004 (ps, pdf),
    Proc. 22-nd Annual STACS, 57-68 (2005) (ps, pdf),
    Accepted for publication, Random Structures and Algorithms.
  • Fast Monte Carlo Algorithms for Matrices III: Computing an Efficient Approximate Decomposition of a Matrix,
  • P. Drineas, R. Kannan, and M. W. Mahoney,
    Technical Report, YALEU/DCS/TR-1271, February 2004 (ps, pdf),
    SIAM J. Computing, 36, 184-206 (2006) (ps, pdf).
  • Fast Monte Carlo Algorithms for Matrices II: Computing Low-Rank Approximations to a Matrix,
  • P. Drineas, R. Kannan, and M. W. Mahoney,
    Technical Report, YALEU/DCS/TR-1270, February 2004 (ps, pdf),
    SIAM J. Computing, 36, 158-183 (2006) (ps, pdf).
  • Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication,
  • P. Drineas, R. Kannan, and M. W. Mahoney,
    Technical Report, YALEU/DCS/TR-1269, February 2004 (ps, pdf),
    SIAM J. Computing, 36, 132-157 (2006) (ps, pdf).
  • Rapid Mixing of Several Markov Chains for a Hard-Core Model,
  • R. Kannan, M. W. Mahoney, and R. Montenegro,
    Proc. 14-th Annual ISAAC, 663-675 (2003) (ps, pdf).
  • Quantum, intramolecular flexibility, and polarizability effects on the reproduction of the density anomaly of liquid water by simple potential functions,
  • M. W. Mahoney and W. L. Jorgensen,
    J. Chem. Phys., 115, 10758-10768 (2001) (ps, pdf).
  • Rapid estimation of electronic degrees of freedom in Monte Carlo calculations for polarizable models of liquid water,
  • M. W. Mahoney and W. L. Jorgensen,
    J. Chem. Phys., 114, 9337-9349 (2001) (ps, pdf).
  • Diffusion constant of the TIP5P model of liquid water,
  • M. W. Mahoney and W. L. Jorgensen,
    J. Chem. Phys., 114, 363-366 (2001) (ps, pdf).
  • A five-site model for liquid water and the reproduction of the density anomaly by rigid, nonpolarizable potential functions,
  • M. W. Mahoney and W. L. Jorgensen,
    J. Chem. Phys., 112, 8910-8922 (2000) (ps, pdf).
  • Repression and activation of promoter-bound RNA polymerase activity by Gal repressor,
  • H. E. Choy, R. R. Hanger, T. Aki, M. Mahoney, K. Murakami, A. Ishihama, and S. Adhya,
    J. Mol. Biol. 272: 293-300, 1997 (ps, pdf).
  • Discrete representations of the protein C-alpha chain,
  • X. F. de la Cruz, M. W. Mahoney, and B. K. Lee,
    Fold. & Des. 2: 223-234, 1997 (ps, pdf).