Draft:Luis Gravano

Luis Gravano
Luis Gravano
Alma mater	Stanford University (MS, PhD); Escuela Superior Latinoamericana de Informática (BS)
Known for	Snowball (information extraction); k-Shape algorithm
Awards	ACM SIGMOD Test of Time Award (2025); National Science Foundation CAREER Award (1998)
	Scientific career
Fields	Computer science; Database systems; Information retrieval; Web search; Information extraction
Institutions	Columbia University; Google
Thesis	Querying Multiple Document Collections across the Internet (1997)
Doctoral advisor	Hector Garcia-Molina

Review waiting, please be patient.

This may take 3 months or more, since drafts are reviewed in no specific order. There are 4,594 pending submissions waiting for review.

If the submission is accepted, then this page will be moved into the article space.
If the submission is declined, then the reason will be posted here.
In the meantime, you can continue to improve this submission by editing normally.

Where to get help

If you need help editing or submitting your draft, please ask us a question at the AfC Help Desk or get live help from experienced editors. These venues are only for help with editing and the submission process, not to get reviews.
If you need feedback on your draft, or if the review is taking a lot of time, you can try asking for help on the talk page of a relevant WikiProject. Some WikiProjects are more active than others so a speedy reply is not guaranteed.

How to improve a draft

Wikipedia:Contributing to Wikipedia – a basic overview on how to edit Wikipedia.
Help:Wikitext – how to use the markup
Help:Referencing for beginners – how to include references
Wikipedia:Article development – how to develop your article
Wikipedia:Writing better articles – how to improve your article
Wikipedia:Verifiability – make sure your article includes reliable third-party sources

You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article.

Improving your odds of a speedy review

To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags.

Add tags to your draft

Editor resources

Easy tools: Citation bot (help) | Advanced: Fix bare URLs

Reviewer tools

Instructions · What links here · Luis Gravano (talk: + · bio) · (log) · Copyvios report · reFill · Citation Bot · (Search: Google, Wikipedia) · Submitted 39 days ago by AlmaMaterCT (talk: D · +) · Last edited 39 days ago by Citation bot

Submission declined on 7 May 2026 by Nighfidelity (talk).

This draft's references do not show that the subject meets Wikipedia's criteria for inclusion for academics. The draft requires either:

evidence that the subject meets any of the specific criteria for academics;

or multiple published secondary sources that:

provide significant coverage: discuss the subject in detail, not just brief mentions or routine announcements;
are reliable: from reputable outlets with editorial oversight;
are independent: not connected to the subject, such as interviews, press releases, the subject's own website, or sponsored content.

Please add references that meet these criteria. If none exist, the subject is not yet suitable for Wikipedia.

If you would like to continue working on the submission, click on the "Edit" tab at the top of the window.
If you have not resolved the issues listed above, your draft will be declined again and potentially deleted.
If you need extra help, please ask us a question at the AfC Help Desk or get live help from experienced editors.
Please do not remove reviewer comments or this notice until the submission is accepted.

Where to get help

If you need help editing or submitting your draft, please ask us a question at the AfC Help Desk or get live help from experienced editors. These venues are only for help with editing and the submission process, not to get reviews.
If you need feedback on your draft, or if the review is taking a lot of time, you can try asking for help on the talk page of a relevant WikiProject. Some WikiProjects are more active than others so a speedy reply is not guaranteed.

How to improve a draft

Wikipedia:Contributing to Wikipedia – a basic overview on how to edit Wikipedia.
Help:Wikitext – how to use the markup
Help:Referencing for beginners – how to include references
Wikipedia:Article development – how to develop your article
Wikipedia:Writing better articles – how to improve your article
Wikipedia:Verifiability – make sure your article includes reliable third-party sources

You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article.

Improving your odds of a speedy review

To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags.

Add tags to your draft

Editor resources

Easy tools: Citation bot (help) | Advanced: Fix bare URLs

Declined by Nighfidelity 40 days ago. Last edited by Citation bot 39 days ago. Reviewer: Inform author.

This draft has been resubmitted and is currently awaiting re-review.

Luis Gravano is a computer scientist and professor of computer science at Columbia University whose research spans database systems, information retrieval, web search, and information extraction. He is known for co-developing the Snowball relation-extraction system and, with his doctoral student John Paparrizos, the k-Shape algorithm for time-series clustering, which received the 2025 ACM SIGMOD Test of Time Award.^[1]^[2]

Beginning in 2012, Gravano led the development of an automated system, deployed by the New York City Department of Health and Mental Hygiene, that detects unreported foodborne illness outbreaks by analyzing Yelp restaurant reviews. The system has identified thousands of complaints and multiple confirmed outbreaks not reported through the city's traditional channels, with results published by the Centers for Disease Control and Prevention and in the Journal of the American Medical Informatics Association.^[3]^[4]^[5]^[6]

According to Google Scholar, his publications have received more than 18,000 citations, with an h-index of 58.^[7]

Education

Gravano received his Licenciatura en Informática (equivalent to a Bachelor of Science in Computer Science) from the Escuela Superior Latinoamericana de Informática (ESLAI) in Argentina in 1991.^[8] He subsequently moved to the United States for graduate studies at Stanford University, where he earned his Master of Science degree in 1994 and his Ph.D. in 1997 under the supervision of Hector Garcia-Molina.^[9] His doctoral dissertation, Querying Multiple Document Collections across the Internet, addressed the problem of efficiently routing queries across distributed databases and digital libraries.^[9]

Career

Columbia University

Gravano joined the Department of Computer Science at Columbia University as an assistant professor in 1997.^[8] He was promoted to associate professor with tenure in 2002 and to full professor in 2013.^[8]

At Columbia, he has led research on information extraction, metasearch systems, top-k query processing, and time-series analysis. He received the Distinguished Teacher Award from the Columbia Computer Science Department in 2011 and the Distinguished Faculty Teaching Award from the Columbia Engineering Alumni Association in 2012.^[8]

Industry experience

Gravano has maintained connections with the technology industry throughout his academic career. He served as a Senior Research Scientist at Google in 2001 and returned as a Visiting Faculty Researcher in 2018–19.^[8]

Research

Information extraction (Snowball)

In the late 1990s, Gravano and his doctoral student Eugene Agichtein developed the Snowball system, which uses a bootstrapping approach to extract structured relations from unstructured text.^[10] The system begins with a small set of seed examples, such as known company-headquarters pairs, and iteratively discovers extraction patterns and new entity pairs. A central contribution was the development of confidence-estimation methods to prevent semantic drift, the phenomenon where errors compound through successive iterations.^[10]

Metasearch and the deep web

Gravano's research on metasearching addressed the challenge of querying multiple autonomous databases through a unified interface. He contributed to the development of the SDARTS protocol, combining the SDLIP and STARTS standards, which enabled translation of queries across different database systems.^[11] His work on database selection algorithms allowed metasearch systems to route queries to relevant databases based on statistical profiles of their content. Subsequent work modeled how database content summaries change over time, applying survival analysis techniques to determine optimal update schedules; the resulting paper received the IEEE ICDE Best Paper Award in 2005.^[12]^[8]

Top-k query processing

Working with collaborators including Nicolás Bruno and Surajit Chaudhuri, Gravano developed algorithms for efficiently retrieving the top-k results from databases without scanning all records.^[13] This work bridged information retrieval ranking techniques with relational database query optimization. A related line of research, on query optimization for text-centric tasks, received the ACM SIGMOD Best Paper Award in 2006.^[14]^[8]

Time-series clustering (k-Shape)

In 2015, Gravano and his doctoral student John Paparrizos introduced k-Shape, an algorithm for clustering time series data.^[15] The algorithm uses a shape-based distance measure derived from cross-correlation, making it invariant to phase shifts and amplitude scaling. k-Shape achieves accuracy comparable to methods based on dynamic time warping while offering significantly better computational efficiency.^[15]

The paper has been cited over 1,000 times according to Google Scholar and has been adopted in domains including healthcare, finance, and the Internet of Things.^[1]^[16] A decade after its introduction, the paper received the 2025 ACM SIGMOD Test of Time Award, which recognizes the SIGMOD paper from 10–12 years prior judged to have had the greatest impact over the intervening decade.^[2] The award citation recognized the work for advancing time-series clustering through a shape-based approach grounded in cross-correlation, combining accuracy, efficiency, and broad applicability.^[2]^[1]

Public health applications

Beginning in 2012, Gravano led a collaboration with the New York City Department of Health and Mental Hygiene to apply text mining techniques to public health surveillance. His team developed an automated classifier that analyzes Yelp restaurant reviews and identifies those likely to describe foodborne illness, transmitting flagged reviews to health department epidemiologists for follow-up investigation.^[3]^[17]

A pilot evaluation conducted between July 2012 and March 2013 found that only about 3 percent of the foodborne illness incidents identified through online reviews had previously been reported through the city's established complaint channels, demonstrating the value of social-media surveillance as a complement to traditional reporting.^[4] By 2018, the system had identified 8,523 complaints of foodborne illness in New York City and ten confirmed outbreaks that were detected solely through automated review analysis.^[3]^[18] The findings were published in the Centers for Disease Control and Prevention's Morbidity and Mortality Weekly Report and the Journal of the American Medical Informatics Association,^[4]^[5] and the project was covered by Consumer Reports.^[6]

Subsequent work extended the approach to multilingual reviews and to additional social-media platforms including Twitter.^[19]

Awards and honors

Year	Award	Organization
2025	Test of Time Award (for k-Shape)	ACM SIGMOD^[2]^[1]
2012	Distinguished Faculty Teaching Award	Columbia Engineering Alumni Association^[8]
2011	Distinguished Teacher Award	Columbia Computer Science Department^[8]
2006	Best Paper Award (for "To search or to crawl?: towards a query optimizer for text-centric tasks")	ACM SIGMOD^[14]^[8]
2005	Best Paper Award (for "Modeling and Managing Content Changes in Text Databases")	IEEE ICDE^[12]^[8]
2003	Best Student Paper Award	IEEE ICDE^[8]
1998	NSF CAREER Award	National Science Foundation^[8]

Selected publications

Agichtein, Eugene; Gravano, Luis (2000). Snowball: Extracting Relations from Large Plain-Text Collections. Proceedings of the Fifth ACM Conference on Digital Libraries. pp. 85–94. doi:10.1145/336597.336644.
Bruno, Nicolas; Gravano, Luis; Marian, Amélie (2002). Evaluating Top-k Queries over Web-Accessible Databases. Proceedings of the 18th IEEE International Conference on Data Engineering.
Paparrizos, John; Gravano, Luis (2015). k-Shape: Efficient and Accurate Clustering of Time Series. Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. pp. 1855–1870. doi:10.1145/2723372.2737793.
Effland, Thomas; Lawson, Anna; Balter, Sharon; Devinney, Katelynn; Reddy, Vasudha; Waechter, HaeNa; Gravano, Luis; Hsu, Daniel (2018). "Discovering foodborne illness in online restaurant reviews". Journal of the American Medical Informatics Association. 25 (12): 1586–1592. doi:10.1093/jamia/ocx093. PMC 7647154. PMID 29329402.

References

1 2 3 4 "John Paparrizos & his doctoral advisor, Luis Gravano receive ACM SIGMOD Test of Time Award". Ohio State University Department of Computer Science and Engineering. June 20, 2025. Retrieved May 8, 2026.
1 2 3 4 "SIGMOD 2025 Test-of-Time Award". ACM SIGMOD. Retrieved May 8, 2026.
1 2 3 "Health Department IDs 10 outbreaks of foodborne illness using Yelp reviews since 2012". ScienceDaily. January 14, 2018. Retrieved May 8, 2026.
1 2 3 Harrison, C.; Jorder, M.; Stern, H.; Stavinsky, F.; Reddy, V.; Hanson, H.; Waechter, H.; Lowe, L.; Gravano, L.; Balter, S.; Centers for Disease Control and Prevention (CDC) (2014). "Using Online Reviews by Restaurant Patrons to Identify Unreported Cases of Foodborne Illness — New York City, 2012–2013". Morbidity and Mortality Weekly Report. 63 (20). Centers for Disease Control and Prevention: 441–445. PMC 4584915. PMID 24848215.
1 2 Effland, Thomas; Lawson, Anna; Balter, Sharon; Devinney, Katelynn; Reddy, Vasudha; Waechter, HaeNa; Gravano, Luis; Hsu, Daniel (2018). "Discovering foodborne illness in online restaurant reviews". Journal of the American Medical Informatics Association. 25 (12): 1586–1592. doi:10.1093/jamia/ocx093. PMC 7647154. PMID 29329402.
1 2 Calderone, Julia (January 11, 2018). "Can Yelp Help You Avoid Food Poisoning?". Consumer Reports. Retrieved May 8, 2026.
↑ "Luis Gravano". Google Scholar. Retrieved May 8, 2026.
1 2 3 4 5 6 7 8 9 10 11 12 13 "Luis Gravano". Columbia Engineering. 20 June 2024. Retrieved May 8, 2026.
1 2 "Luis Gravano's Curriculum Vitae". Columbia University Department of Computer Science. Retrieved May 8, 2026.
1 2 Agichtein, Eugene; Gravano, Luis (2000). Snowball: Extracting Relations from Large Plain-Text Collections. Proceedings of the Fifth ACM Conference on Digital Libraries. pp. 85–94. doi:10.1145/336597.336644.
↑ Gravano, Luis; Ipeirotis, Panagiotis; Sahami, Mehran (2001). SDLIP + STARTS = SDARTS: A Protocol and Toolkit for Metasearching (PDF). Proceedings of the 2001 ACM/IEEE Joint Conference on Digital Libraries.
1 2 Ipeirotis, Panagiotis G.; Ntoulas, Alexandros; Cho, Junghoo; Gravano, Luis (2005). Modeling and Managing Content Changes in Text Databases. Proceedings of the 21st IEEE International Conference on Data Engineering. pp. 606–617. doi:10.1109/ICDE.2005.91.
↑ Bruno, Nicolas; Gravano, Luis; Chaudhuri, Surajit (2002). "Top-k Selection Queries over Relational Databases: Mapping Strategies and Performance Evaluation" (PDF). ACM Transactions on Database Systems. doi:10.1145/568518.568519.
1 2 Ipeirotis, Panagiotis G.; Agichtein, Eugene; Jain, Pranay; Gravano, Luis (2006). To search or to crawl?: towards a query optimizer for text-centric tasks. Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data. pp. 265–276.
1 2 Paparrizos, John; Gravano, Luis (2015). k-Shape: Efficient and Accurate Clustering of Time Series (PDF). Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. pp. 1855–1870. doi:10.1145/2723372.2737793.
↑ "John Paparrizos receives 2025 ACM SIGMOD Test-of-Time Award for Influential Contributions to Time-Series Clustering". Aristotle University of Thessaloniki Data Lab. June 20, 2025. Retrieved May 8, 2026.
↑ "Yelp If You've Got Food Poisoning". Columbia Magazine. Retrieved May 8, 2026.
↑ "Health Department IDs 10 outbreaks of foodborne illness using Yelp reviews since 2012". EurekAlert!. American Association for the Advancement of Science. January 10, 2018. Retrieved May 8, 2026.
↑ "Machine learning system detects 10 outbreaks of foodborne illness from Yelp reviews". Health Exec. Retrieved May 8, 2026.

External links

Luis Gravano's homepage at Columbia University
Luis Gravano publications indexed by Google Scholar

Category:Living people Category:Argentine computer scientists Category:American computer scientists Category:Columbia University faculty Category:Stanford University alumni Category:Database researchers Category:Information retrieval researchers

[osu-1] 1 2 3 4 "John Paparrizos & his doctoral advisor, Luis Gravano receive ACM SIGMOD Test of Time Award". Ohio State University Department of Computer Science and Engineering. June 20, 2025. Retrieved May 8, 2026.

[sigmodtot-2] 1 2 3 4 "SIGMOD 2025 Test-of-Time Award". ACM SIGMOD. Retrieved May 8, 2026.

[sciencedaily-3] 1 2 3 "Health Department IDs 10 outbreaks of foodborne illness using Yelp reviews since 2012". ScienceDaily. January 14, 2018. Retrieved May 8, 2026.

[cdc-4] 1 2 3 Harrison, C.; Jorder, M.; Stern, H.; Stavinsky, F.; Reddy, V.; Hanson, H.; Waechter, H.; Lowe, L.; Gravano, L.; Balter, S.; Centers for Disease Control and Prevention (CDC) (2014). "Using Online Reviews by Restaurant Patrons to Identify Unreported Cases of Foodborne Illness — New York City, 2012–2013". Morbidity and Mortality Weekly Report. 63 (20). Centers for Disease Control and Prevention: 441–445. PMC 4584915. PMID 24848215.

[jamia-5] 1 2 Effland, Thomas; Lawson, Anna; Balter, Sharon; Devinney, Katelynn; Reddy, Vasudha; Waechter, HaeNa; Gravano, Luis; Hsu, Daniel (2018). "Discovering foodborne illness in online restaurant reviews". Journal of the American Medical Informatics Association. 25 (12): 1586–1592. doi:10.1093/jamia/ocx093. PMC 7647154. PMID 29329402.

[consumerreports-6] 1 2 Calderone, Julia (January 11, 2018). "Can Yelp Help You Avoid Food Poisoning?". Consumer Reports. Retrieved May 8, 2026.

[scholar-7] "Luis Gravano". Google Scholar. Retrieved May 8, 2026.

[columbia-8] 1 2 3 4 5 6 7 8 9 10 11 12 13 "Luis Gravano". Columbia Engineering. 20 June 2024. Retrieved May 8, 2026.

[cv-9] 1 2 "Luis Gravano's Curriculum Vitae". Columbia University Department of Computer Science. Retrieved May 8, 2026.

[snowball-10] 1 2 Agichtein, Eugene; Gravano, Luis (2000). Snowball: Extracting Relations from Large Plain-Text Collections. Proceedings of the Fifth ACM Conference on Digital Libraries. pp. 85–94. doi:10.1145/336597.336644.

[sdarts-11] Gravano, Luis; Ipeirotis, Panagiotis; Sahami, Mehran (2001). SDLIP + STARTS = SDARTS: A Protocol and Toolkit for Metasearching (PDF). Proceedings of the 2001 ACM/IEEE Joint Conference on Digital Libraries.

[icde2005-12] 1 2 Ipeirotis, Panagiotis G.; Ntoulas, Alexandros; Cho, Junghoo; Gravano, Luis (2005). Modeling and Managing Content Changes in Text Databases. Proceedings of the 21st IEEE International Conference on Data Engineering. pp. 606–617. doi:10.1109/ICDE.2005.91.

[topk-13] Bruno, Nicolas; Gravano, Luis; Chaudhuri, Surajit (2002). "Top-k Selection Queries over Relational Databases: Mapping Strategies and Performance Evaluation" (PDF). ACM Transactions on Database Systems. doi:10.1145/568518.568519.

[search-or-crawl-14] 1 2 Ipeirotis, Panagiotis G.; Agichtein, Eugene; Jain, Pranay; Gravano, Luis (2006). To search or to crawl?: towards a query optimizer for text-centric tasks. Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data. pp. 265–276.

[kshape-15] 1 2 Paparrizos, John; Gravano, Luis (2015). k-Shape: Efficient and Accurate Clustering of Time Series (PDF). Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. pp. 1855–1870. doi:10.1145/2723372.2737793.

[auth-16] "John Paparrizos receives 2025 ACM SIGMOD Test-of-Time Award for Influential Contributions to Time-Series Clustering". Aristotle University of Thessaloniki Data Lab. June 20, 2025. Retrieved May 8, 2026.

[columbiamag-17] "Yelp If You've Got Food Poisoning". Columbia Magazine. Retrieved May 8, 2026.

[eurekalert-18] "Health Department IDs 10 outbreaks of foodborne illness using Yelp reviews since 2012". EurekAlert!. American Association for the Advancement of Science. January 10, 2018. Retrieved May 8, 2026.

[healthexec-19] "Machine learning system detects 10 outbreaks of foodborne illness from Yelp reviews". Health Exec. Retrieved May 8, 2026.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

Luis Gravano
Alma mater	Stanford University (MS, PhD) Escuela Superior Latinoamericana de Informática (BS)
Known for	Snowball (information extraction) k-Shape algorithm
Awards	ACM SIGMOD Test of Time Award (2025) National Science Foundation CAREER Award (1998)
Scientific career
Fields	Computer science Database systems Information retrieval Web search Information extraction
Institutions	Columbia University Google
Thesis	Querying Multiple Document Collections across the Internet (1997)
Doctoral advisor	Hector Garcia-Molina