# Publications

## Selected Recent Publications

### 2022

Oguz Selvitopi, Saliya Ekanayake, Giulia Guidi, Muaaz G. Awan, Georgios A. Pavlopoulos, Ariful Azad, Nikos Kyrpides, Leonid Oliker, Katherine Yelick, and Aydın Buluç. "Extreme-Scale Many-against-Many Protein Similarity Search." In

*2022 SC22: International Conference for High Performance Computing, Networking, Storage and Analysis (SC)*, pp. 1-12. IEEE Computer Society, 2022. [ IEEE ] (ACM Gordon Bell Prize Finalist)Yuxin Chen, Benjamin Brock, Serban Porumbescu, Aydın Buluç, Katherine Yelick, and John D. Owens. "Scalable Irregular Parallelism with GPUs: Getting CPUs Out of the Way." In

*2022 SC22: International Conference for High Performance Computing, Networking, Storage and Analysis (SC)*, pp. 708-723. IEEE Computer Society, 2022. [ IEEE ]Kızılkale, Can and Rashidi Mehrabadi, Farid and Sadeqi Azer, Erfan and Pérez-Guijarro, Eva and Marie, Kerrie L. and Lee, Maxwell P. and Day, Chi-Ping and Merlino, Glenn and Ergün, Funda and Buluç, Aydın and Sahinalp, S. Cenk and Malikić, Salem. 2022. Fast intratumor heterogeneity inference from single-cell sequencing data.

*Nature Computational Science*,*2*(9), pp.577-583. [ DOI ]Giulia Guidi*, Gabriel Raulet*, Daniel Rokhsar, Leonid Oliker, Katherine Yelick, Aydın Buluç, "Distributed-Memory Parallel Contig Generation for De Novo Long-Read Genome Assembly", In Proceedings of the International Conference on Parallel Processing (ICPP), 2022 [ arXiv | video ] (*: joint first authors)

Srđan Milaković, Oguz Selvitopi, Israt Nisa, Zoran Budimlić, Aydın Buluç, "Parallel Algorithms for Masked Sparse Matrix-Matrix Products", In Proceedings of the International Conference on Parallel Processing (ICPP), 2022 [ arXiv | video ]

Yuxin Chen, Benjamin Brock, Serban Porumbescu, Aydın Buluç, Katherine Yelick, John Owens, "Atos: A Task-Parallel GPU Scheduler for Graph Analytics", In Proceedings of the International Conference on Parallel Processing (ICPP), 2022 [ arXiv | video ]

Vivek Bharadwaj, Aydin Buluç, James Demmel, "Distributed Memory Sparse Kernels for Machine Learning," IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2022 [arXiv]

Ariful Azad, Oguz Selvitopi, Md Taufique Hussain, John Gilbert, and Aydin Buluç. Combinatorial BLAS 2.0: Scaling combinatorial algorithms on distributed-memory systems.

*IEEE Transactions on Parallel and Distributed Systems*, vol. 33, no. 4, pp. 989-1001, 2022. [ bib | DOI | arXiv ]Carl Yang, Aydin Buluç, and John D. Owens. GraphBLAST: A high-performance linear algebra-based graph framework on the GPU.

*ACM Transactions on Mathematical Software (TOMS),*vol. 48, no. 1, pp. 1-51, 2022. [**bib**|**arXiv**|**DOI (open access)**]

### 2021

Muaaz Awan, Steven Hofmeyr, Rob Egan, Nan Ding, Aydin Buluç, Jack Deslippe, Leonid Oliker, and Katherine Yelick. Accelerating large scale de novo metagenome assembly using GPUs. In

*International Conference for High Performance Computing, Networking, Storage and Analysis*, SC'21, 2021. [ bib |**DOI (open access)**]Oguz Selvitopi , Benjamin Brock, Israt Nisa, Alok Tripathy, Katherine Yelick, Aydın Buluç. Distributed-Memory Parallel Algorithms for Sparse Times Tall-Skinny-Dense Matrix Multiplication. ICS’21. [

**DOI**]Prashant Pandey, Brian Wheatman, Helen Xu, Aydin Buluç. Terrace: A Hierarchical Graph Container for Skewed Dynamic Graphs. SIGMOD 2021. [

**bib**|**code****|****pdf**]Giulia Guidi, Oguz Selvitopi, Marquita Ellis, Leonid Oliker, Katherine Yelick, Aydin Buluç. Parallel String Graph Construction and Transitive Reduction for De Novo Genome Assembly. In

*Proceedings of the IPDPS*, 2021. [**code**|**arXiv**]Israt Nisa, Prashant Pandey, Marquita Ellis, Leonid Oliker, Aydin Buluç, Katherine Yelick. Distributed-Memory k-mer Counting on GPUs.

*Proceedings of the IPDPS*, 2021. [**pdf**]Md Taufique Hussain, Oguz Selvitopi, Aydin Buluç, Ariful Azad. Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale.

*Proceedings of the IPDPS*, 2021. [**arXiv****]**Giulia Guidi, Marquita Ellis, Aydın Buluç, Katherine Yelick, David Culler. 10 years later: cloud computing is closing the performance gap.

*Companion of the ACM/SPEC International Conference on Performance Engineering (ICPE Companion),*2021. [**arXiv**| video**]**Seher Acer, Ariful Azad, Erik Boman, Aydin Buluç, Karen Devine, Nitin Gawande, Sayan Ghosh, Mahantesh Halappanavar, Arif Khan, Ananth Kalyanaraman, Marco Minutoli, Alex Pothen, Sivasankaran Rajamanickam, Oguz Selvitopi, Nathan Tallent, and Antonino Tumeo. ExaGraph: Graph and combinatorial methods for enabling exascale applications.

*International Journal of High Performance Computing Applications (IJHPCA)*, 2021. [DOI]

### 2020

Alok Tripathy, Katherine Yelick, and Aydin Buluç. Reducing communication in graph neural network training. In

*Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis*(SC), 2020. [**bib**|**code**|**arXiv**| video]Oguz Selvitopi*, Saliya Ekanayake*, Giulia Guidi, Georgios Pavlopoulos, Ariful Azad, and Aydin Buluç. Distributed many-to-many protein sequence alignment using sparse matrices. In

*Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis*(SC), 2020. [**bib****|****code****|****.pdf**| video] (*: joint first authors)Ariful Azad, Aydin Buluç, Xiaoye S Li, Xinliang Wang, and Johannes Langguth. A distributed-memory algorithm for computing a heavy-weight perfect matching on bipartite graphs.

*SIAM Journal on Scientific Computing*, 42(4):C143-C168, 2020. [**bib**|**arXiv**|**.pdf**]Yu-Hang Tang, Oguz Selvitopi, Doru Popovici, and Aydin Buluç. A high-throughput solver for marginalized graph kernels on GPU. In

*Proceedings of the IPDPS*, 2020. [**bib**|**arXiv**|**.pdf**| video]Oguz Selvitopi, Md Taufique Hussain, Ariful Azad, and Aydin Buluç. Optimizing high performance Markov clustering for pre-exascale architectures. In

*Proceedings of the IPDPS*, 2020. [**bib**|**code**|**arXiv**|**.pdf**| video]Alberto Zeni, Giulia Guidi, Marquita Ellis, Nan Ding, Marco D. Santambrogio, Steven Hofmeyr, Aydin Buluç, Leonid Oliker, and Katherine Yelick. LOGAN: High-performance GPU-based X-drop long-read alignment. In

*Proceedings of the IPDPS*, 2020. [**bib**|**code**|**arXiv**|**.pdf**| video]Nicolas Swenson, Aditi S. Krishnapriyan, Aydin Buluç, Dmitriy Morozov, Katherine A. Yelick. PersGNN: Applying Topological Data Analysis and Geometric Deep Learning to Structure-Based Protein Function Prediction. Learning Meaningful Representations of Life (LMRL) workshop at NeurIPS, 2020. [arXiv | video]

Yongzhe Zhang, Ariful Azad, and Aydin Buluç. Parallel algorithms for finding connected components using linear algebra.

*Journal of Parallel and Distributed Computing*, 2020. [**bib**|**DOI**|**.pdf**]Katherine Yelick, Aydin Buluç, Muaaz Awan, Ariful Azad, Benjamin Brock, Rob Egan, Saliya Ekanayake, Marquita Ellis, Evangelos Georganas, Giulia Guidi, et al. The parallelism motifs of genomic data analysis.

*Philosophical Transactions of the Royal Society A*, 378(2166):20190394, 2020. [**bib**|**DOI**|**arXiv**]

### 2019

Yusuke Nagasaka, Satoshi Matsuoka, Ariful Azad, and Aydin Buluç. Performance optimization, modeling and analysis of sparse matrix-matrix products on multi-core and many-core processors. Parallel Computing, 2019. [

**DOI**|**code**|**.pdf**Ben Brock, Aydin Buluç, and Katherine Yelick. BCL: A cross-platform distributed data structures library. In

*48th International Conference on Parallel Processing (ICPP)*, Kyoto, Japan, 2019. [**bib**|**code**|**arXiv**|**.pdf**]Marquita Ellis, Giulia Guidi, Aydin Buluç, Leonid Oliker, and Katherine Yelick. diBELLA: Distributed long read to long read alignment. In

*48th International Conference on Parallel Processing (ICPP)*, Kyoto, Japan, 2019. [**bib**|**.pdf**]Ariful Azad and Aydin Buluç. LACC: a linear-algebraic algorithm for finding connected components in distributed memory. In

*Proceedings of the IPDPS*, Rio de Janeiro, Brazil, 2019. [**bib**|**code**|**.pdf**]Giulia Guidi, Marquita Ellis, Daniel Rokhsar, Katherine Yelick, and Aydin Buluç. BELLA: Berkeley efficient long-read to long-read aligner and overlapper.

*bioRxiv preprint*, page 464420, 2019 (v4). [**bib**|**code**|**bioRxiv**|**.pdf**]

### 2018

Evangelos Georganas, Rob Egan, Steven Hofmeyr, Eugene Goltsman, Bill Arndt, Andrew Tritt, Aydin Buluç, Leonid Oliker, and Katherine Yelick. Extreme Scale De Novo Metagenome Assembly. In

*Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'18)*, 2018.**Best Paper Nominee.**[**arXiv****]**Amir Gholami, Ariful Azad, Peter Jin, Kurt Keutzer, and Aydin Buluç. Integrated model, batch, and domain parallelism in training neural networks. In

*SPAA'18: 30th ACM Symposium on Parallelism in Algorithms and Architectures*, 2018. [**bib**|**arXiv**|**.pdf**]Carl Yang, Aydin Buluç, and John D Owens. Implementing push-pull efficiently in GraphBLAS. In

*47th International Conference on Parallel Processing (ICPP)*, 2018. [**bib**|**code**|**arXiv**|**.pdf**]Carl Yang, Aydin Buluç, and John D Owens. Design principles for sparse matrix multiplication on the GPU. In

*24th International European Conference on Parallel and Distributed Computing (Euro-Par)*, 2018.**Distinguished Paper.**[**bib**|**code**|**arXiv**|**.pdf**]Yusuke Nagasaka, Satoshi Matsuoka, Ariful Azad, and Aydin Buluc. High-performance sparse matrix-matrix products on intel KNL and multicore architectures. In

*47th International Conference on Parallel Processing Workshops (ICPPW)*, 2018. [**bib**|**code**|**arXiv**|**.pdf**]Ariful Azad, Aydin Buluc, Xiaoye S Li, Xinliang Wang, and Johannes Langguth. A distributed-memory approximation algorithm for maximum weight perfect bipartite matching.

*arXiv preprint arXiv:1801.09809*, 2018. [**bib**|**arXiv**]Ariful Azad, Georgios A. Pavlopoulos, Christos A. Ouzounis, Nikos C. Kyrpides, and Aydin Buluç. HipMCL: A high-performance parallel implementation of the Markov clustering algorithm for large-scale networks.

*Nucleic Acids Research (NAR)*, 2018. [**bib**|**DOI**|**code**]Penporn Koanantakool, Alnur Ali, Ariful Azad, Aydin Buluç, Dmitriy Morozov, Sang-Yun Oh, Leonid Oliker, and Katherine Yelick. Communication-avoiding optimization methods for distributed massive-scale sparse inverse covariance estimation. In

*The 21st International Conference on Artificial Intelligence and Statistics (AISTATS)*, 2018. [**bib**|**code**|**supplementary**|**.pdf**]

### 2017

Yang You, Aydin Buluç, and James Demmel. Scaling deep learning on GPU and Knights Landing clusters. In

*Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'17)*, 2017. [**bib**|**DOI**|**code**|**.pdf**]Ariful Azad and Aydin Buluç. A work-efficient parallel sparse matrix-sparse vector multiplication algorithm. In

*Proceedings of the IPDPS*, 2017. [**bib**|**code**|**.pdf**]Aydin Buluç, Timothy Mattson, Scott McMillan, Jose Moreira, and Carl Yang. Design of the GraphBLAS API for C. In

*IEEE Workshop on Graph Algorithm Building Blocks, IPDPSW*, 2017. [**bib**|**.pdf**]Ariful Azad, Mathias Jacquelin, Aydin Buluç, and Esmond G. Ng. The reverse Cuthill-McKee algorithm in distributed-memory. In

*Proceedings of the IPDPS*, 2017. [**bib**|**code**|**.pdf**]Marquita Ellis, Evangelos Georganas, Rob Egan, Steven Hofmeyr, Aydin Buluç, Brandon Cook, Leonid Oliker, and Katherine Yelick. Performance characterization of de novo genome assembly on leading parallel systems. In

*Europar - International European Conference on Parallel and Distributed Computing*, 2017. [**bib**|**.pdf**]Ariful Azad, Aydin Buluç, and Alex Pothen. Computing maximum cardinality matchings in parallel on bipartite graphs via tree-grafting.

*IEEE Transactions on Parallel and Distributed Systems (TPDS))*, 28(1):44-59, 2017. [**bib**|**DOI**|**code**|**.pdf**]

### 2016

Ariful Azad, Grey Ballard, Aydin Buluç, James Demmel, Laura Grigori, Oded Schwartz, Sivan Toledo, and Samuel Williams. Exploiting multiple levels of parallelism in sparse matrix-matrix multiplication.

*SIAM Journal on Scientific Computing (SISC)*, 38(6):C624-C651, 2016. [**bib**|**DOI**|**code**|**arXiv**|**.pdf**]Ariful Azad and Aydin Buluç. A matrix-algebraic formulation of distributed-memory maximal cardinality matching algorithms in bipartite graphs.

*Parallel Computing*, 2016. [**bib**|**DOI**|**code**]Ariful Azad and Aydin Buluç. Distributed-memory algorithms for maximum cardinality matching in bipartite graphs. In

*Proceedings of the IPDPS*. IEEE, 2016. [**bib**|**code**|**.pdf**]Penporn Koanantakool, Ariful Azad, Aydin Buluç, Dmitriy Morozov, Sang-Yun Oh, Leonid Oliker, and Katherine Yelick. Communication-avoiding parallel sparse-dense matrix-matrix multiplication. In

*Proceedings of the IPDPS*, 2016. [**bib**|**.pdf**]Jeremy Kepner, Peter Aaltonen, David Bader, Aydin Buluç, Franz Franchetti, John Gilbert, Dylan Hutchison, Manoj Kumar, Andrew Lumsdaine, Henning Meyerhenke, Scott McMillan, José Moreira, John Owens, Carl Yang, Marcin Zalewski, and Timothy Mattson. Mathematical foundations of the GraphBLAS. In

*IEEE High Performance Extreme Computing (HPEC)*, 2016. [**bib**|**.pdf**]Aydin Buluç, Henning Meyerhenke, Ilya Safro, Peter Sanders, and Christian Schulz. Recent advances in graph partitioning. In

*Algorithm Engineering - Selected Results and Surveys*, volume 9220. Lecture Notes in Computer Science, 2016. [**bib**|**DOI**|**arXiv**]

### 2015

Evangelos Georganas, Aydin Buluç, Jarrod Chapman, Steven Hofmeyr, Chaitanya Aluru, Rob Egan, Leonid Oliker, Daniel Rokhsar, and Katherine Yelick. HipMer: An extreme-scale de novo genome assembler. In

*International Conference for High Performance Computing, Networking, Storage and Analysis (SC'15)*, 2015. [**bib**|**.pdf**]Veronika Strnadova-Neeley, Aydin Buluç, Joseph Gonzalez, Jarrod Chapman, John Gilbert, and Leonid Oliker. Efficient data reduction for large-scale genetic mapping. In

*ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM BCB)*, 2015. [**bib**|**.pdf**]Evangelos Georganas, Aydin Buluç, Jarrod Chapman, Leonid Oliker, Daniel Rokhsar, and Katherine Yelick. meraligner: A fully parallel sequence aligner. In

*Proceedings of the IPDPS*, 2015. [**bib**|**.pdf**]Ariful Azad, Aydin Buluç, and John R Gilbert. Parallel triangle counting and enumeration using matrix algebra. In

*Proceedings of the IPDPSW, Workshop on Graph Algorithm Building Blocks (GABB)*, pages 804 - 811, 2015. [**bib**|**DOI**|**.pdf**]Jarrod Chapman, Martin Mascher, Aydin Buluç, Kerrie Barry, Evangelos Georganas, Adam Session, Veronika Strnadova, Jerry Jenkins, Sunish Sehgal, Leonid Oliker, Jeremy Schmutz, Katherine Yelick, Uwe Scholz, Robbie Waugh, Jesse Poland, Gary Muehlbauer, Nils Stein, and Daniel Rokhsar. A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome.

*Genome Biology*, 16(26), 2015. [**bib**|**DOI**|**.pdf**]Adam Lugowski, Shoaib Kamil, Aydin Buluç, Samuel Williams, Erika Duriakova, Leonid Oliker, Armando Fox, and John Gilbert. Parallel processing of filtered queries in attributed semantic graphs.

*Journal of Parallel and Distributed Computing (JPDC))*, 79-80:115-131, 2015. [**bib**|**DOI**|**.pdf**]J. Kepner, D. Bader, A. Buluç, J. Gilbert, J. Kepner, T. Mattson, and H. Meyerhenke. Graphs, matrices, and the GraphBLAS: Seven good reasons. In

*The International Conference on Computational Science (ICCS)*, 2015. [**bib**|**DOI**|**arXiv**]

### 2014

Evangelos Georganas, Aydin Buluç, Jarrod Chapman, Leonid Oliker, Daniel Rokhsar, and Katherine Yelick. Parallel de bruijn graph construction and traversal for de novo genome assembly. In

*Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'14)*, 2014. [**bib**|**.pdf**]V. Strnadova, A. Buluç, J. Gonzalez, S. Jegelka, J. Chapman, J. Gilbert, D. Rokhsar, and L. Oliker. Efficient and accurate clustering for large-scale genetic mapping. In

*The IEEE International Conference on Bioinformatics and Biomedicine (BIBM'14)*, 2014. [**bib**|**.pdf**]H. Metin Aktulga, Aydin Buluç, Samuel Williams, and Chao Yang. Optimizing sparse matrix-multiple vectors multiplication for nuclear configuration interaction calculations. In

*Proceedings of the IPDPS*. IEEE Computer Society, 2014. [**bib**|**.pdf**]

### 2013

T. Mattson, D. Bader, J. Berry, A. Buluç, J. Dongarra, C. Faloutsos, J. Feo, J. Gilbert, J. Gonzalez, B. Hendrickson, J. Kepner, C. Leiserson, A. Lumsdaine, D. Padua, S. Poole, S. Reinhardt, M. Stonebraker, S. Wallach, and A. Yoo. Standards for graph algorithm primitives. In

*High Performance Extreme Computing Conference (HPEC '13)*. IEEE, 2013. (position paper). [**bib**|**.pdf**]Grey Ballard, Aydin Buluç, James Demmel, Laura Grigori, Benjamin Lipshitz, Oded Schwartz, and Sivan Toledo. Communication optimal parallel multiplication of sparse random matrices. In

*SPAA 2013: The 25th ACM Symposium on Parallelism in Algorithms and Architectures*, Montreal, Canada, 2013. [**bib**|**DOI**|**.pdf**]Aydin Buluç, Erika Duriakova, Armando Fox, John Gilbert, Shoaib Kamil, Adam Lugowski, Leonid Oliker, and Samuel Williams. High-productivity and high-performance analysis of filtered semantic graphs. In

*Proceedings of the IPDPS*. IEEE Computer Society, 2013. [**bib**|**DOI**|**.pdf**]Edgar Solomonik, Aydin Buluç, and James Demmel. Minimizing communication in all-pairs shortest paths. In

*Proceedings of the IPDPS*. IEEE Computer Society, 2013. [**bib**|**DOI**|**.pdf**]Scott Beamer, Aydin Buluç, Krste Asanović, and David Patterson. Distributed memory breadth-first search revisited: Enabling bottom-up search. In

*Workshop on Multithreaded Architectures and Applications (MTAAP), in conjunction with IPDPS*. IEEE Computer Society, 2013. [**bib**|**DOI**|**.pdf**]Aydin Buluç and Kamesh Madduri. Graph partitioning for scalable distributed graph computations. In

*Graph Partitioning and Graph Clustering (Proc. 10th DIMACS Implementation Challenge)*, volume 588 of*Contemporary Mathematics*, pages 83-101. AMS, 2013. [**bib**|**.pdf**]

### 2012

Adam Lugowski, David Alber, Aydin Buluç, John Gilbert, Steve Reinhardt, Yun Teng, and Andrew Waranis. A flexible open-source toolbox for scalable complex graph analysis. In SIAM Conference on Data Mining (SDM), 2012. [ bib | .pdf ]

Aydin Buluç and John R. Gilbert. Parallel sparse matrix-matrix multiplication and indexing: Implementation and experiments.

*SIAM Journal of Scientific Computing (SISC)*, 34(4):170 - 191, 2012. [**bib**|**DOI**|**arXiv**|**.pdf**]

### 2011

Aydin Buluç and Kamesh Madduri. Parallel breadth-first search on distributed memory systems. In

*International Conference for High Performance Computing, Networking, Storage and Analysis (SC'11)*, 2011. [**bib**|**DOI**|**arXiv**|**.pdf**]Aydin Buluç, Samuel Williams, Leonid Oliker, and James Demmel. Reduced-bandwidth multithreaded algorithms for sparse matrix-vector multiplication. In

*Proceedings of the IPDPS*, 2011. [**bib**|**.pdf**]Aydin Buluç and John R. Gilbert. The Combinatorial BLAS: Design, implementation, and applications.

*The International Journal of High Performance Computing Applications*, 25(4):496 - 509, 2011. [**bib**|**DOI**|**.pdf**]