Luis Gravano
Alma materStanford University (MS, PhD)
Escuela Superior Latinoamericana de Informática (BS)
Known forSnowball (information extraction)
k-Shape algorithm
AwardsACM SIGMOD Test of Time Award (2025)
National Science Foundation CAREER Award (1998)
Scientific career
FieldsComputer science
Database systems
Information retrieval
Web search
Information extraction
InstitutionsColumbia University
Google
Thesis Querying Multiple Document Collections across the Internet  (1997)
Hector Garcia-Molina

Luis Gravano is a computer scientist and professor of computer science at Columbia University whose research spans database systems, information retrieval, web search, and information extraction. He is known for co-developing the Snowball relation-extraction system and, with his doctoral student John Paparrizos, the k-Shape algorithm for time-series clustering, which received the 2025 ACM SIGMOD Test of Time Award.[1][2]

Beginning in 2012, Gravano led the development of an automated system, deployed by the New York City Department of Health and Mental Hygiene, that detects unreported foodborne illness outbreaks by analyzing Yelp restaurant reviews. The system has identified thousands of complaints and multiple confirmed outbreaks not reported through the city's traditional channels, with results published by the Centers for Disease Control and Prevention and in the Journal of the American Medical Informatics Association.[3][4][5][6]

According to Google Scholar, his publications have received more than 18,000 citations, with an h-index of 58.[7]

Education

edit

Gravano received his Licenciatura en Informática (equivalent to a Bachelor of Science in Computer Science) from the Escuela Superior Latinoamericana de Informática (ESLAI) in Argentina in 1991.[8] He subsequently moved to the United States for graduate studies at Stanford University, where he earned his Master of Science degree in 1994 and his Ph.D. in 1997 under the supervision of Hector Garcia-Molina.[9] His doctoral dissertation, Querying Multiple Document Collections across the Internet, addressed the problem of efficiently routing queries across distributed databases and digital libraries.[9]

Career

edit

Columbia University

edit

Gravano joined the Department of Computer Science at Columbia University as an assistant professor in 1997.[8] He was promoted to associate professor with tenure in 2002 and to full professor in 2013.[8]

At Columbia, he has led research on information extraction, metasearch systems, top-k query processing, and time-series analysis. He received the Distinguished Teacher Award from the Columbia Computer Science Department in 2011 and the Distinguished Faculty Teaching Award from the Columbia Engineering Alumni Association in 2012.[8]

Industry experience

edit

Gravano has maintained connections with the technology industry throughout his academic career. He served as a Senior Research Scientist at Google in 2001 and returned as a Visiting Faculty Researcher in 2018–19.[8]

Research

edit

Information extraction (Snowball)

edit

In the late 1990s, Gravano and his doctoral student Eugene Agichtein developed the Snowball system, which uses a bootstrapping approach to extract structured relations from unstructured text.[10] The system begins with a small set of seed examples, such as known company-headquarters pairs, and iteratively discovers extraction patterns and new entity pairs. A central contribution was the development of confidence-estimation methods to prevent semantic drift, the phenomenon where errors compound through successive iterations.[10]

Metasearch and the deep web

edit

Gravano's research on metasearching addressed the challenge of querying multiple autonomous databases through a unified interface. He contributed to the development of the SDARTS protocol, combining the SDLIP and STARTS standards, which enabled translation of queries across different database systems.[11] His work on database selection algorithms allowed metasearch systems to route queries to relevant databases based on statistical profiles of their content. Subsequent work modeled how database content summaries change over time, applying survival analysis techniques to determine optimal update schedules; the resulting paper received the IEEE ICDE Best Paper Award in 2005.[12][8]

Top-k query processing

edit

Working with collaborators including Nicolás Bruno and Surajit Chaudhuri, Gravano developed algorithms for efficiently retrieving the top-k results from databases without scanning all records.[13] This work bridged information retrieval ranking techniques with relational database query optimization. A related line of research, on query optimization for text-centric tasks, received the ACM SIGMOD Best Paper Award in 2006.[14][8]

Time-series clustering (k-Shape)

edit

In 2015, Gravano and his doctoral student John Paparrizos introduced k-Shape, an algorithm for clustering time series data.[15] The algorithm uses a shape-based distance measure derived from cross-correlation, making it invariant to phase shifts and amplitude scaling. k-Shape achieves accuracy comparable to methods based on dynamic time warping while offering significantly better computational efficiency.[15]

The paper has been cited over 1,000 times according to Google Scholar and has been adopted in domains including healthcare, finance, and the Internet of Things.[1][16] A decade after its introduction, the paper received the 2025 ACM SIGMOD Test of Time Award, which recognizes the SIGMOD paper from 10–12 years prior judged to have had the greatest impact over the intervening decade.[2] The award citation recognized the work for advancing time-series clustering through a shape-based approach grounded in cross-correlation, combining accuracy, efficiency, and broad applicability.[2][1]

Public health applications

edit

Beginning in 2012, Gravano led a collaboration with the New York City Department of Health and Mental Hygiene to apply text mining techniques to public health surveillance. His team developed an automated classifier that analyzes Yelp restaurant reviews and identifies those likely to describe foodborne illness, transmitting flagged reviews to health department epidemiologists for follow-up investigation.[3][17]

A pilot evaluation conducted between July 2012 and March 2013 found that only about 3 percent of the foodborne illness incidents identified through online reviews had previously been reported through the city's established complaint channels, demonstrating the value of social-media surveillance as a complement to traditional reporting.[4] By 2018, the system had identified 8,523 complaints of foodborne illness in New York City and ten confirmed outbreaks that were detected solely through automated review analysis.[3][18] The findings were published in the Centers for Disease Control and Prevention's Morbidity and Mortality Weekly Report and the Journal of the American Medical Informatics Association,[4][5] and the project was covered by Consumer Reports.[6]

Subsequent work extended the approach to multilingual reviews and to additional social-media platforms including Twitter.[19]

Awards and honors

edit
YearAwardOrganization
2025Test of Time Award (for k-Shape)ACM SIGMOD[2][1]
2012Distinguished Faculty Teaching AwardColumbia Engineering Alumni Association[8]
2011Distinguished Teacher AwardColumbia Computer Science Department[8]
2006Best Paper Award (for "To search or to crawl?: towards a query optimizer for text-centric tasks")ACM SIGMOD[14][8]
2005Best Paper Award (for "Modeling and Managing Content Changes in Text Databases")IEEE ICDE[12][8]
2003Best Student Paper AwardIEEE ICDE[8]
1998NSF CAREER AwardNational Science Foundation[8]

Selected publications

edit
  • Agichtein, Eugene; Gravano, Luis (2000). Snowball: Extracting Relations from Large Plain-Text Collections. Proceedings of the Fifth ACM Conference on Digital Libraries. pp. 85–94. doi:10.1145/336597.336644.
  • Bruno, Nicolas; Gravano, Luis; Marian, Amélie (2002). Evaluating Top-k Queries over Web-Accessible Databases. Proceedings of the 18th IEEE International Conference on Data Engineering.
  • Paparrizos, John; Gravano, Luis (2015). k-Shape: Efficient and Accurate Clustering of Time Series. Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. pp. 1855–1870. doi:10.1145/2723372.2737793.
  • Effland, Thomas; Lawson, Anna; Balter, Sharon; Devinney, Katelynn; Reddy, Vasudha; Waechter, HaeNa; Gravano, Luis; Hsu, Daniel (2018). "Discovering foodborne illness in online restaurant reviews". Journal of the American Medical Informatics Association. 25 (12): 1586–1592. doi:10.1093/jamia/ocx093. PMC 7647154. PMID 29329402.

References

edit
  1. 1 2 3 4 "John Paparrizos & his doctoral advisor, Luis Gravano receive ACM SIGMOD Test of Time Award". Ohio State University Department of Computer Science and Engineering. June 20, 2025. Retrieved May 8, 2026.
  2. 1 2 3 4 "SIGMOD 2025 Test-of-Time Award". ACM SIGMOD. Retrieved May 8, 2026.
  3. 1 2 3 "Health Department IDs 10 outbreaks of foodborne illness using Yelp reviews since 2012". ScienceDaily. January 14, 2018. Retrieved May 8, 2026.
  4. 1 2 3 Harrison, C.; Jorder, M.; Stern, H.; Stavinsky, F.; Reddy, V.; Hanson, H.; Waechter, H.; Lowe, L.; Gravano, L.; Balter, S.; Centers for Disease Control and Prevention (CDC) (2014). "Using Online Reviews by Restaurant Patrons to Identify Unreported Cases of Foodborne Illness — New York City, 2012–2013". Morbidity and Mortality Weekly Report. 63 (20). Centers for Disease Control and Prevention: 441–445. PMC 4584915. PMID 24848215.
  5. 1 2 Effland, Thomas; Lawson, Anna; Balter, Sharon; Devinney, Katelynn; Reddy, Vasudha; Waechter, HaeNa; Gravano, Luis; Hsu, Daniel (2018). "Discovering foodborne illness in online restaurant reviews". Journal of the American Medical Informatics Association. 25 (12): 1586–1592. doi:10.1093/jamia/ocx093. PMC 7647154. PMID 29329402.
  6. 1 2 Calderone, Julia (January 11, 2018). "Can Yelp Help You Avoid Food Poisoning?". Consumer Reports. Retrieved May 8, 2026.
  7. "Luis Gravano". Google Scholar. Retrieved May 8, 2026.
  8. 1 2 3 4 5 6 7 8 9 10 11 12 13 "Luis Gravano". Columbia Engineering. 20 June 2024. Retrieved May 8, 2026.
  9. 1 2 "Luis Gravano's Curriculum Vitae". Columbia University Department of Computer Science. Retrieved May 8, 2026.
  10. 1 2 Agichtein, Eugene; Gravano, Luis (2000). Snowball: Extracting Relations from Large Plain-Text Collections. Proceedings of the Fifth ACM Conference on Digital Libraries. pp. 85–94. doi:10.1145/336597.336644.
  11. Gravano, Luis; Ipeirotis, Panagiotis; Sahami, Mehran (2001). SDLIP + STARTS = SDARTS: A Protocol and Toolkit for Metasearching (PDF). Proceedings of the 2001 ACM/IEEE Joint Conference on Digital Libraries.
  12. 1 2 Ipeirotis, Panagiotis G.; Ntoulas, Alexandros; Cho, Junghoo; Gravano, Luis (2005). Modeling and Managing Content Changes in Text Databases. Proceedings of the 21st IEEE International Conference on Data Engineering. pp. 606–617. doi:10.1109/ICDE.2005.91.
  13. Bruno, Nicolas; Gravano, Luis; Chaudhuri, Surajit (2002). "Top-k Selection Queries over Relational Databases: Mapping Strategies and Performance Evaluation" (PDF). ACM Transactions on Database Systems. doi:10.1145/568518.568519.
  14. 1 2 Ipeirotis, Panagiotis G.; Agichtein, Eugene; Jain, Pranay; Gravano, Luis (2006). To search or to crawl?: towards a query optimizer for text-centric tasks. Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data. pp. 265–276.
  15. 1 2 Paparrizos, John; Gravano, Luis (2015). k-Shape: Efficient and Accurate Clustering of Time Series (PDF). Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. pp. 1855–1870. doi:10.1145/2723372.2737793.
  16. "John Paparrizos receives 2025 ACM SIGMOD Test-of-Time Award for Influential Contributions to Time-Series Clustering". Aristotle University of Thessaloniki Data Lab. June 20, 2025. Retrieved May 8, 2026.
  17. "Yelp If You've Got Food Poisoning". Columbia Magazine. Retrieved May 8, 2026.
  18. "Health Department IDs 10 outbreaks of foodborne illness using Yelp reviews since 2012". EurekAlert!. American Association for the Advancement of Science. January 10, 2018. Retrieved May 8, 2026.
  19. "Machine learning system detects 10 outbreaks of foodborne illness from Yelp reviews". Health Exec. Retrieved May 8, 2026.
edit