Review waiting, please be patient.
This may take 3 months or more, since drafts are reviewed in no specific order. There are 4,594 pending submissions waiting for review.
Where to get help
How to improve a draft
You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article. Improving your odds of a speedy review To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags. Editor resources
Reviewer tools
|
Submission declined on 7 May 2026 by Nighfidelity (talk).
Where to get help
How to improve a draft
You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article. Improving your odds of a speedy review To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags. Editor resources
This draft has been resubmitted and is currently awaiting re-review. |
Luis Gravano | |
|---|---|
| Alma mater | Stanford University (MS, PhD) Escuela Superior Latinoamericana de Informática (BS) |
| Known for | Snowball (information extraction) k-Shape algorithm |
| Awards | ACM SIGMOD Test of Time Award (2025) National Science Foundation CAREER Award (1998) |
| Scientific career | |
| Fields | Computer science Database systems Information retrieval Web search Information extraction |
| Institutions | Columbia University |
| Thesis | Querying Multiple Document Collections across the Internet (1997) |
| Hector Garcia-Molina | |
Luis Gravano is a computer scientist and professor of computer science at Columbia University whose research spans database systems, information retrieval, web search, and information extraction. He is known for co-developing the Snowball relation-extraction system and, with his doctoral student John Paparrizos, the k-Shape algorithm for time-series clustering, which received the 2025 ACM SIGMOD Test of Time Award.[1][2]
Beginning in 2012, Gravano led the development of an automated system, deployed by the New York City Department of Health and Mental Hygiene, that detects unreported foodborne illness outbreaks by analyzing Yelp restaurant reviews. The system has identified thousands of complaints and multiple confirmed outbreaks not reported through the city's traditional channels, with results published by the Centers for Disease Control and Prevention and in the Journal of the American Medical Informatics Association.[3][4][5][6]
According to Google Scholar, his publications have received more than 18,000 citations, with an h-index of 58.[7]
Education
editGravano received his Licenciatura en Informática (equivalent to a Bachelor of Science in Computer Science) from the Escuela Superior Latinoamericana de Informática (ESLAI) in Argentina in 1991.[8] He subsequently moved to the United States for graduate studies at Stanford University, where he earned his Master of Science degree in 1994 and his Ph.D. in 1997 under the supervision of Hector Garcia-Molina.[9] His doctoral dissertation, Querying Multiple Document Collections across the Internet, addressed the problem of efficiently routing queries across distributed databases and digital libraries.[9]
Career
editColumbia University
editGravano joined the Department of Computer Science at Columbia University as an assistant professor in 1997.[8] He was promoted to associate professor with tenure in 2002 and to full professor in 2013.[8]
At Columbia, he has led research on information extraction, metasearch systems, top-k query processing, and time-series analysis. He received the Distinguished Teacher Award from the Columbia Computer Science Department in 2011 and the Distinguished Faculty Teaching Award from the Columbia Engineering Alumni Association in 2012.[8]
Industry experience
editResearch
editInformation extraction (Snowball)
editIn the late 1990s, Gravano and his doctoral student Eugene Agichtein developed the Snowball system, which uses a bootstrapping approach to extract structured relations from unstructured text.[10] The system begins with a small set of seed examples, such as known company-headquarters pairs, and iteratively discovers extraction patterns and new entity pairs. A central contribution was the development of confidence-estimation methods to prevent semantic drift, the phenomenon where errors compound through successive iterations.[10]
Metasearch and the deep web
editGravano's research on metasearching addressed the challenge of querying multiple autonomous databases through a unified interface. He contributed to the development of the SDARTS protocol, combining the SDLIP and STARTS standards, which enabled translation of queries across different database systems.[11] His work on database selection algorithms allowed metasearch systems to route queries to relevant databases based on statistical profiles of their content. Subsequent work modeled how database content summaries change over time, applying survival analysis techniques to determine optimal update schedules; the resulting paper received the IEEE ICDE Best Paper Award in 2005.[12][8]
Top-k query processing
editWorking with collaborators including Nicolás Bruno and Surajit Chaudhuri, Gravano developed algorithms for efficiently retrieving the top-k results from databases without scanning all records.[13] This work bridged information retrieval ranking techniques with relational database query optimization. A related line of research, on query optimization for text-centric tasks, received the ACM SIGMOD Best Paper Award in 2006.[14][8]
Time-series clustering (k-Shape)
editIn 2015, Gravano and his doctoral student John Paparrizos introduced k-Shape, an algorithm for clustering time series data.[15] The algorithm uses a shape-based distance measure derived from cross-correlation, making it invariant to phase shifts and amplitude scaling. k-Shape achieves accuracy comparable to methods based on dynamic time warping while offering significantly better computational efficiency.[15]
The paper has been cited over 1,000 times according to Google Scholar and has been adopted in domains including healthcare, finance, and the Internet of Things.[1][16] A decade after its introduction, the paper received the 2025 ACM SIGMOD Test of Time Award, which recognizes the SIGMOD paper from 10–12 years prior judged to have had the greatest impact over the intervening decade.[2] The award citation recognized the work for advancing time-series clustering through a shape-based approach grounded in cross-correlation, combining accuracy, efficiency, and broad applicability.[2][1]
Public health applications
editBeginning in 2012, Gravano led a collaboration with the New York City Department of Health and Mental Hygiene to apply text mining techniques to public health surveillance. His team developed an automated classifier that analyzes Yelp restaurant reviews and identifies those likely to describe foodborne illness, transmitting flagged reviews to health department epidemiologists for follow-up investigation.[3][17]
A pilot evaluation conducted between July 2012 and March 2013 found that only about 3 percent of the foodborne illness incidents identified through online reviews had previously been reported through the city's established complaint channels, demonstrating the value of social-media surveillance as a complement to traditional reporting.[4] By 2018, the system had identified 8,523 complaints of foodborne illness in New York City and ten confirmed outbreaks that were detected solely through automated review analysis.[3][18] The findings were published in the Centers for Disease Control and Prevention's Morbidity and Mortality Weekly Report and the Journal of the American Medical Informatics Association,[4][5] and the project was covered by Consumer Reports.[6]
Subsequent work extended the approach to multilingual reviews and to additional social-media platforms including Twitter.[19]
Awards and honors
edit| Year | Award | Organization |
|---|---|---|
| 2025 | Test of Time Award (for k-Shape) | ACM SIGMOD[2][1] |
| 2012 | Distinguished Faculty Teaching Award | Columbia Engineering Alumni Association[8] |
| 2011 | Distinguished Teacher Award | Columbia Computer Science Department[8] |
| 2006 | Best Paper Award (for "To search or to crawl?: towards a query optimizer for text-centric tasks") | ACM SIGMOD[14][8] |
| 2005 | Best Paper Award (for "Modeling and Managing Content Changes in Text Databases") | IEEE ICDE[12][8] |
| 2003 | Best Student Paper Award | IEEE ICDE[8] |
| 1998 | NSF CAREER Award | National Science Foundation[8] |
Selected publications
edit- Agichtein, Eugene; Gravano, Luis (2000). Snowball: Extracting Relations from Large Plain-Text Collections. Proceedings of the Fifth ACM Conference on Digital Libraries. pp. 85–94. doi:10.1145/336597.336644.
- Bruno, Nicolas; Gravano, Luis; Marian, Amélie (2002). Evaluating Top-k Queries over Web-Accessible Databases. Proceedings of the 18th IEEE International Conference on Data Engineering.
- Paparrizos, John; Gravano, Luis (2015). k-Shape: Efficient and Accurate Clustering of Time Series. Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. pp. 1855–1870. doi:10.1145/2723372.2737793.
- Effland, Thomas; Lawson, Anna; Balter, Sharon; Devinney, Katelynn; Reddy, Vasudha; Waechter, HaeNa; Gravano, Luis; Hsu, Daniel (2018). "Discovering foodborne illness in online restaurant reviews". Journal of the American Medical Informatics Association. 25 (12): 1586–1592. doi:10.1093/jamia/ocx093. PMC 7647154. PMID 29329402.
References
edit- 1 2 3 4 "John Paparrizos & his doctoral advisor, Luis Gravano receive ACM SIGMOD Test of Time Award". Ohio State University Department of Computer Science and Engineering. June 20, 2025. Retrieved May 8, 2026.
- 1 2 3 4 "SIGMOD 2025 Test-of-Time Award". ACM SIGMOD. Retrieved May 8, 2026.
- 1 2 3 "Health Department IDs 10 outbreaks of foodborne illness using Yelp reviews since 2012". ScienceDaily. January 14, 2018. Retrieved May 8, 2026.
- 1 2 3 Harrison, C.; Jorder, M.; Stern, H.; Stavinsky, F.; Reddy, V.; Hanson, H.; Waechter, H.; Lowe, L.; Gravano, L.; Balter, S.; Centers for Disease Control and Prevention (CDC) (2014). "Using Online Reviews by Restaurant Patrons to Identify Unreported Cases of Foodborne Illness — New York City, 2012–2013". Morbidity and Mortality Weekly Report. 63 (20). Centers for Disease Control and Prevention: 441–445. PMC 4584915. PMID 24848215.
- 1 2 Effland, Thomas; Lawson, Anna; Balter, Sharon; Devinney, Katelynn; Reddy, Vasudha; Waechter, HaeNa; Gravano, Luis; Hsu, Daniel (2018). "Discovering foodborne illness in online restaurant reviews". Journal of the American Medical Informatics Association. 25 (12): 1586–1592. doi:10.1093/jamia/ocx093. PMC 7647154. PMID 29329402.
- 1 2 Calderone, Julia (January 11, 2018). "Can Yelp Help You Avoid Food Poisoning?". Consumer Reports. Retrieved May 8, 2026.
- ↑ "Luis Gravano". Google Scholar. Retrieved May 8, 2026.
- 1 2 3 4 5 6 7 8 9 10 11 12 13 "Luis Gravano". Columbia Engineering. 20 June 2024. Retrieved May 8, 2026.
- 1 2 "Luis Gravano's Curriculum Vitae". Columbia University Department of Computer Science. Retrieved May 8, 2026.
- 1 2 Agichtein, Eugene; Gravano, Luis (2000). Snowball: Extracting Relations from Large Plain-Text Collections. Proceedings of the Fifth ACM Conference on Digital Libraries. pp. 85–94. doi:10.1145/336597.336644.
- ↑ Gravano, Luis; Ipeirotis, Panagiotis; Sahami, Mehran (2001). SDLIP + STARTS = SDARTS: A Protocol and Toolkit for Metasearching (PDF). Proceedings of the 2001 ACM/IEEE Joint Conference on Digital Libraries.
- 1 2 Ipeirotis, Panagiotis G.; Ntoulas, Alexandros; Cho, Junghoo; Gravano, Luis (2005). Modeling and Managing Content Changes in Text Databases. Proceedings of the 21st IEEE International Conference on Data Engineering. pp. 606–617. doi:10.1109/ICDE.2005.91.
- ↑ Bruno, Nicolas; Gravano, Luis; Chaudhuri, Surajit (2002). "Top-k Selection Queries over Relational Databases: Mapping Strategies and Performance Evaluation" (PDF). ACM Transactions on Database Systems. doi:10.1145/568518.568519.
- 1 2 Ipeirotis, Panagiotis G.; Agichtein, Eugene; Jain, Pranay; Gravano, Luis (2006). To search or to crawl?: towards a query optimizer for text-centric tasks. Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data. pp. 265–276.
- 1 2 Paparrizos, John; Gravano, Luis (2015). k-Shape: Efficient and Accurate Clustering of Time Series (PDF). Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. pp. 1855–1870. doi:10.1145/2723372.2737793.
- ↑ "John Paparrizos receives 2025 ACM SIGMOD Test-of-Time Award for Influential Contributions to Time-Series Clustering". Aristotle University of Thessaloniki Data Lab. June 20, 2025. Retrieved May 8, 2026.
- ↑ "Yelp If You've Got Food Poisoning". Columbia Magazine. Retrieved May 8, 2026.
- ↑ "Health Department IDs 10 outbreaks of foodborne illness using Yelp reviews since 2012". EurekAlert!. American Association for the Advancement of Science. January 10, 2018. Retrieved May 8, 2026.
- ↑ "Machine learning system detects 10 outbreaks of foodborne illness from Yelp reviews". Health Exec. Retrieved May 8, 2026.
External links
edit- Luis Gravano's homepage at Columbia University
- Luis Gravano publications indexed by Google Scholar
Category:Living people Category:Argentine computer scientists Category:American computer scientists Category:Columbia University faculty Category:Stanford University alumni Category:Database researchers Category:Information retrieval researchers


or multiple published secondary sources that:
- provide significant coverage: discuss the subject in detail, not just brief mentions or routine announcements;
- are reliable: from reputable outlets with editorial oversight;
- are independent: not connected to the subject, such as interviews, press releases, the subject's own website, or sponsored content.
Please add references that meet these criteria. If none exist, the subject is not yet suitable for Wikipedia.