32 datasets found
  1. Yelp Open Dataset

    • live.european-language-grid.eu
    json
    Updated Dec 30, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yelp (2015). Yelp Open Dataset [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/5179
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Dec 30, 2015
    Dataset authored and provided by
    Yelphttp://yelp.com/
    License

    https://s3-media0.fl.yelpcdn.com/assets/srv0/engineering_pages/bea5c1e92bf3/assets/vendor/yelp-dataset-agreement.pdfhttps://s3-media0.fl.yelpcdn.com/assets/srv0/engineering_pages/bea5c1e92bf3/assets/vendor/yelp-dataset-agreement.pdf

    Description

    Dataset containing millions of reviews on Yelp. In addition it contains business data including location data, attributes, and categories.

  2. yelp_review_full

    • huggingface.co
    Updated Mar 6, 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yelp (2012). yelp_review_full [Dataset]. https://huggingface.co/datasets/Yelp/yelp_review_full
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 6, 2012
    Dataset authored and provided by
    Yelphttp://yelp.com/
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Dataset Card for YelpReviewFull

      Dataset Summary
    

    The Yelp reviews dataset consists of reviews from Yelp. It is extracted from the Yelp Dataset Challenge 2015 data.

      Supported Tasks and Leaderboards
    

    text-classification, sentiment-classification: The dataset is mainly used for text classification: given the text, predict the sentiment.

      Languages
    

    The reviews were mainly written in english.

      Dataset Structure
    
    
    
    
    
      Data Instances
    

    A… See the full description on the dataset page: https://huggingface.co/datasets/Yelp/yelp_review_full.

  3. h

    yelp-open-dataset-checkin

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yash Raizada, yelp-open-dataset-checkin [Dataset]. https://huggingface.co/datasets/yashraizada/yelp-open-dataset-checkin
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Yash Raizada
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    yashraizada/yelp-open-dataset-checkin dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. data extracted from Yelp Open Dataset

    • figshare.com
    txt
    Updated Jun 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sven Gedicke (2023). data extracted from Yelp Open Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.20318538.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 16, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Sven Gedicke
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The different data sets of point features related to food that we used for the different tasks in our user study. The data was extracted from the Yelp Open Dataset.

  5. h

    yelp-open-dataset-top-businesses

    • huggingface.co
    Updated Jan 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yash Raizada (2024). yelp-open-dataset-top-businesses [Dataset]. https://huggingface.co/datasets/yashraizad/yelp-open-dataset-top-businesses
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 25, 2024
    Authors
    Yash Raizada
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    yashraizad/yelp-open-dataset-top-businesses dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. Z

    The Yelp Collaborative Knowledge Graph

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heede, Thomas (2023). The Yelp Collaborative Knowledge Graph [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7878446
    Explore at:
    Dataset updated
    Jun 17, 2023
    Dataset provided by
    Heede, Thomas
    Nielsen, Christian Filip Pinderup
    Corfixen, Mads
    Olesen, Magnus
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the The Yelp Collaborative Knowledge Graph (YCKG) - a transformation of the Yelp Open Dataset into RDF format using Y2KG.

    Paper Abstract

    The Yelp Open Dataset (YOD) contains data about businesses, reviews, and users from the Yelp website and is available for research purposes. This dataset has been widely used to develop and test Recommender Systems (RS), especially those using Knowledge Graphs (KGs), e.g., integrating taxonomies, product categories, business locations, and social network information. Unfortunately, researchers applied naive or wrong mappings while converting YOD in KGs, consequently obtaining unrealistic results. Among the various issues, the conversion processes usually do not follow state-of-the-art methodologies, fail to properly link to other KGs and reuse existing vocabularies. In this work, we overcome these issues by introducing Y2KG, a utility to convert the Yelp dataset into a KG. Y2KG consists of two components. The first is a dataset including (1) a vocabulary that extends Schema.org with properties to describe the concepts in YOD and (2) mappings between the Yelp entities and Wikidata. The second component is a set of scripts to transform YOD in RDF and obtain the Yelp Collaborative Knowledge Graph (YCKG). The design of Y2KG was driven by 16 core competency questions. YCKG includes 150k businesses and 16.9M reviews from 1.9M distinct real users, resulting in over 244 million triples (with 144 distinct predicates) for about 72 million resources, with an average in-degree and out-degree of 3.3 and 12.2, respectively.

    Links

    Latest GitHub release: https://github.com/MadsCorfixen/The-Yelp-Collaborative-Knowledge-Graph/releases/latest

    PURL domain: https://purl.archive.org/domain/yckg

    Files

    Graph Data Triple Files

    One sample file for each of the Yelp domains (Businesses, Users, Reviews, Tips and Checkins), each containing 20 entities.

    yelp_schema_mappings.nt.gz containing the mappings from Yelp categories to Schema things.

    schema_hierarchy.nt.gz containing the full hierarchy of the mapped Schema things.

    yelp_wiki_mappings.nt.gz containing the mappings from Yelp categories to Wikidata entities.

    wikidata_location_mappings.nt.gz containing the mappings from Yelp locations to Wikidata entities.

    Graph Metadata Triple Files

    yelp_categories.ttl contains metadata for all Yelp categories.

    yelp_entities.ttl contains metadata regarding the dataset

    yelp_vocabulary.ttl contains metadata on the created Yelp vocabulary and properties.

    Utility Files

    yelp_category_schema_mappings.csv. This file contains the 310 mappings from Yelp categories to Schema types. These mappings have been manually verified to be correct.

    yelp_predicate_schema_mappings.csv. This file contains the 14 mappings from Yelp attributes to Schema properties. These mappings are manually found.

    ground_truth_yelp_category_schema_mappings.csv. This file contains the ground truth, based on 200 manually verified mappings from Yelp categories to Schema things. The ground truth mappings were used to calculate precision and recall for the semantic mappings.

    manually_split_categories.csv. This file contains all Yelp categories containing either a & or /, and their manually split versions. The split versions have been used in the semantic mappings to Schema things.

  7. Yelp dataset

    • kaggle.com
    Updated Feb 14, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MyrnaMFL (2020). Yelp dataset [Dataset]. https://www.kaggle.com/datasets/fireballbyedimyrnmom/yelp-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 14, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    MyrnaMFL
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Dataset

    This dataset was created by PrivacyMatters

    Released under Database: Open Database, Contents: © Original Authors

    Contents

  8. Yelp Dataset

    • kaggle.com
    zip
    Updated Mar 17, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yelp, Inc. (2022). Yelp Dataset [Dataset]. https://www.kaggle.com/yelp-dataset/yelp-dataset
    Explore at:
    zip(4374983563 bytes)Available download formats
    Dataset updated
    Mar 17, 2022
    Dataset provided by
    Yelphttp://yelp.com/
    Authors
    Yelp, Inc.
    Description

    Context

    This dataset is a subset of Yelp's businesses, reviews, and user data. It was originally put together for the Yelp Dataset Challenge which is a chance for students to conduct research or analysis on Yelp's data and share their discoveries. In the most recent dataset you'll find information about businesses across 8 metropolitan areas in the USA and Canada.

    Content

    This dataset contains five JSON files and the user agreement. More information about those files can be found here.

    Code snippet to read the files

    in Python, you can read the JSON files like this (using the json and pandas libraries):

    import json
    import pandas as pd
    data_file = open("yelp_academic_dataset_checkin.json")
    data = []
    for line in data_file:
     data.append(json.loads(line))
    checkin_df = pd.DataFrame(data)
    data_file.close()
    
    
  9. T

    yelp_polarity_reviews

    • tensorflow.org
    Updated Dec 6, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). yelp_polarity_reviews [Dataset]. https://www.tensorflow.org/datasets/catalog/yelp_polarity_reviews
    Explore at:
    Dataset updated
    Dec 6, 2022
    Description

    Large Yelp Review Dataset. This is a dataset for binary sentiment classification. We provide a set of 560,000 highly polar yelp reviews for training, and 38,000 for testing. ORIGIN The Yelp reviews dataset consists of reviews from Yelp. It is extracted from the Yelp Dataset Challenge 2015 data. For more information, please refer to http://www.yelp.com/dataset

    The Yelp reviews polarity dataset is constructed by Xiang Zhang (xiang.zhang@nyu.edu) from the above dataset. It is first used as a text classification benchmark in the following paper: Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015).

    DESCRIPTION

    The Yelp reviews polarity dataset is constructed by considering stars 1 and 2 negative, and 3 and 4 positive. For each polarity 280,000 training samples and 19,000 testing samples are take randomly. In total there are 560,000 trainig samples and 38,000 testing samples. Negative polarity is class 1, and positive class 2.

    The files train.csv and test.csv contain all the training samples as comma-sparated values. There are 2 columns in them, corresponding to class index (1 and 2) and review text. The review texts are escaped using double quotes ("), and any internal double quote is escaped by 2 double quotes (""). New lines are escaped by a backslash followed with an "n" character, that is " ".

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('yelp_polarity_reviews', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  10. D

    SYNERGY - Open machine learning dataset on study selection in systematic...

    • dataverse.nl
    csv, json, txt, zip
    Updated Apr 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan De Bruin; Jonathan De Bruin; Yongchao Ma; Yongchao Ma; Gerbrich Ferdinands; Gerbrich Ferdinands; Jelle Teijema; Jelle Teijema; Rens Van de Schoot; Rens Van de Schoot (2023). SYNERGY - Open machine learning dataset on study selection in systematic reviews [Dataset]. http://doi.org/10.34894/HE6NAQ
    Explore at:
    txt(212), json(702), zip(16028323), json(19426), txt(263), zip(3560967), txt(305), json(470), txt(279), zip(2355371), json(23201), csv(460956), txt(200), json(685), json(546), csv(63996), zip(2989015), zip(5749455), txt(331), txt(315), json(691), json(23775), csv(672721), json(468), txt(415), json(22778), csv(31919), csv(746832), json(18392), zip(62992826), csv(234822), txt(283), zip(34788857), json(475), txt(242), json(533), csv(42227), json(24548), zip(738232), json(22477), json(25491), zip(11463283), json(17741), csv(490660), json(19662), json(578), csv(19786), zip(14708207), zip(24619707), zip(2404439), json(713), json(27224), json(679), json(26426), txt(185), json(906), zip(18534723), json(23550), txt(266), txt(317), zip(6019723), json(33943), txt(436), csv(388378), json(469), zip(2106498), txt(320), csv(451336), txt(338), zip(19428163), json(14326), json(31652), txt(299), csv(96153), txt(220), csv(114789), json(15452), csv(5372708), json(908), csv(317928), csv(150923), json(465), csv(535584), json(26090), zip(8164831), json(19633), txt(316), json(23494), csv(133950), json(18638), csv(3944082), json(15345), json(473), zip(4411063), zip(10396095), zip(835096), txt(255), json(699), csv(654705), txt(294), csv(989865), zip(1028035), txt(322), zip(15085090), txt(237), txt(310), json(756), json(30628), json(19490), json(25908), txt(401), json(701), zip(5543909), json(29397), zip(14007470), json(30058), zip(58869042), csv(852937), json(35711), csv(298011), csv(187163), txt(258), zip(3526740), json(568), json(21552), zip(66466788), csv(215250), json(577), csv(103010), txt(306), zip(11840006)Available download formats
    Dataset updated
    Apr 24, 2023
    Dataset provided by
    DataverseNL
    Authors
    Jonathan De Bruin; Jonathan De Bruin; Yongchao Ma; Yongchao Ma; Gerbrich Ferdinands; Gerbrich Ferdinands; Jelle Teijema; Jelle Teijema; Rens Van de Schoot; Rens Van de Schoot
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    SYNERGY is a free and open dataset on study selection in systematic reviews, comprising 169,288 academic works from 26 systematic reviews. Only 2,834 (1.67%) of the academic works in the binary classified dataset are included in the systematic reviews. This makes the SYNERGY dataset a unique dataset for the development of information retrieval algorithms, especially for sparse labels. Due to the many available variables available per record (i.e. titles, abstracts, authors, references, topics), this dataset is useful for researchers in NLP, machine learning, network analysis, and more. In total, the dataset contains 82,668,134 trainable data points. The easiest way to get the SYNERGY dataset is via the synergy-dataset Python package. See https://github.com/asreview/synergy-dataset for all information.

  11. h

    yelp-open-dataset-top-users

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yash Raizada, yelp-open-dataset-top-users [Dataset]. https://huggingface.co/datasets/yashraizad/yelp-open-dataset-top-users
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Yash Raizada
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    yashraizad/yelp-open-dataset-top-users dataset hosted on Hugging Face and contributed by the HF Datasets community

  12. w

    Dataset of opening price of stocks over time for YELP

    • workwithdata.com
    Updated May 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of opening price of stocks over time for YELP [Dataset]. https://www.workwithdata.com/datasets/stocks-daily?col=date%2Copening_price%2Cstock&f=1&fcol0=stock&fop0=%3D&fval0=YELP
    Explore at:
    Dataset updated
    May 6, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about stocks per day. It has 3,313 rows and is filtered where the stock is YELP. It features 3 columns: stock, and opening price.

  13. u

    Goodreads Book Reviews

    • cseweb.ucsd.edu
    json
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCSD CSE Research Project, Goodreads Book Reviews [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
    Explore at:
    jsonAvailable download formats
    Dataset authored and provided by
    UCSD CSE Research Project
    Description

    These datasets contain reviews from the Goodreads book review website, and a variety of attributes describing the items. Critically, these datasets have multiple levels of user interaction, raging from adding to a shelf, rating, and reading.

    Metadata includes

    • reviews

    • add-to-shelf, read, review actions

    • book attributes: title, isbn

    • graph of similar books

    Basic Statistics:

    • Items: 1,561,465

    • Users: 808,749

    • Interactions: 225,394,930

  14. YELP Yelp Inc. Common Stock (Forecast)

    • kappasignal.com
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KappaSignal (2023). YELP Yelp Inc. Common Stock (Forecast) [Dataset]. https://www.kappasignal.com/2023/06/yelp-yelp-inc-common-stock.html
    Explore at:
    Dataset updated
    Jun 3, 2023
    Dataset authored and provided by
    KappaSignal
    License

    https://www.kappasignal.com/p/legal-disclaimer.htmlhttps://www.kappasignal.com/p/legal-disclaimer.html

    Description

    This analysis presents a rigorous exploration of financial data, incorporating a diverse range of statistical features. By providing a robust foundation, it facilitates advanced research and innovative modeling techniques within the field of finance.

    YELP Yelp Inc. Common Stock

    Financial data:

    • Historical daily stock prices (open, high, low, close, volume)

    • Fundamental data (e.g., market capitalization, price to earnings P/E ratio, dividend yield, earnings per share EPS, price to earnings growth, debt-to-equity ratio, price-to-book ratio, current ratio, free cash flow, projected earnings growth, return on equity, dividend payout ratio, price to sales ratio, credit rating)

    • Technical indicators (e.g., moving averages, RSI, MACD, average directional index, aroon oscillator, stochastic oscillator, on-balance volume, accumulation/distribution A/D line, parabolic SAR indicator, bollinger bands indicators, fibonacci, williams percent range, commodity channel index)

    Machine learning features:

    • Feature engineering based on financial data and technical indicators

    • Sentiment analysis data from social media and news articles

    • Macroeconomic data (e.g., GDP, unemployment rate, interest rates, consumer spending, building permits, consumer confidence, inflation, producer price index, money supply, home sales, retail sales, bond yields)

    Potential Applications:

    • Stock price prediction

    • Portfolio optimization

    • Algorithmic trading

    • Market sentiment analysis

    • Risk management

    Use Cases:

    • Researchers investigating the effectiveness of machine learning in stock market prediction

    • Analysts developing quantitative trading Buy/Sell strategies

    • Individuals interested in building their own stock market prediction models

    • Students learning about machine learning and financial applications

    Additional Notes:

    • The dataset may include different levels of granularity (e.g., daily, hourly)

    • Data cleaning and preprocessing are essential before model training

    • Regular updates are recommended to maintain the accuracy and relevance of the data

  15. O

    Food inspection fails 2015-2016

    • data.montgomerycountymd.gov
    • data.wu.ac.at
    Updated Mar 1, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2017). Food inspection fails 2015-2016 [Dataset]. https://data.montgomerycountymd.gov/w/vztm-qm9u/tdqt-sri3?cur=-2i5id2emQs&from=lIGex4_mSCs
    Explore at:
    xml, csv, application/rdfxml, application/geo+json, tsv, kml, application/rssxml, kmzAvailable download formats
    Dataset updated
    Mar 1, 2017
    Description

    Current food Inspection dataset published using LIVES data standard.

  16. Yelp (YELP) Stock Forecast: A Bite of Growth (Forecast)

    • kappasignal.com
    Updated Jul 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KappaSignal (2024). Yelp (YELP) Stock Forecast: A Bite of Growth (Forecast) [Dataset]. https://www.kappasignal.com/2024/07/yelp-yelp-stock-forecast-bite-of-growth.html
    Explore at:
    Dataset updated
    Jul 3, 2024
    Dataset authored and provided by
    KappaSignal
    License

    https://www.kappasignal.com/p/legal-disclaimer.htmlhttps://www.kappasignal.com/p/legal-disclaimer.html

    Description

    This analysis presents a rigorous exploration of financial data, incorporating a diverse range of statistical features. By providing a robust foundation, it facilitates advanced research and innovative modeling techniques within the field of finance.

    Yelp (YELP) Stock Forecast: A Bite of Growth

    Financial data:

    • Historical daily stock prices (open, high, low, close, volume)

    • Fundamental data (e.g., market capitalization, price to earnings P/E ratio, dividend yield, earnings per share EPS, price to earnings growth, debt-to-equity ratio, price-to-book ratio, current ratio, free cash flow, projected earnings growth, return on equity, dividend payout ratio, price to sales ratio, credit rating)

    • Technical indicators (e.g., moving averages, RSI, MACD, average directional index, aroon oscillator, stochastic oscillator, on-balance volume, accumulation/distribution A/D line, parabolic SAR indicator, bollinger bands indicators, fibonacci, williams percent range, commodity channel index)

    Machine learning features:

    • Feature engineering based on financial data and technical indicators

    • Sentiment analysis data from social media and news articles

    • Macroeconomic data (e.g., GDP, unemployment rate, interest rates, consumer spending, building permits, consumer confidence, inflation, producer price index, money supply, home sales, retail sales, bond yields)

    Potential Applications:

    • Stock price prediction

    • Portfolio optimization

    • Algorithmic trading

    • Market sentiment analysis

    • Risk management

    Use Cases:

    • Researchers investigating the effectiveness of machine learning in stock market prediction

    • Analysts developing quantitative trading Buy/Sell strategies

    • Individuals interested in building their own stock market prediction models

    • Students learning about machine learning and financial applications

    Additional Notes:

    • The dataset may include different levels of granularity (e.g., daily, hourly)

    • Data cleaning and preprocessing are essential before model training

    • Regular updates are recommended to maintain the accuracy and relevance of the data

  17. H

    Food Inspection Violations in Boston, MA

    • dataverse.harvard.edu
    Updated Aug 27, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bidisha Das; Alina Ristea; Daniel T. O'Brien (2020). Food Inspection Violations in Boston, MA [Dataset]. http://doi.org/10.7910/DVN/6MUQKX
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 27, 2020
    Dataset provided by
    Harvard Dataverse
    Authors
    Bidisha Das; Alina Ristea; Daniel T. O'Brien
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Massachusetts, Boston
    Description

    These datasets include Food Establishment Inspections processed from the city’s open data initiative (data.boston.gov), and the connection with data scraped from Yelp (namely, Yelp reviews) scraped by BARI. The data is within the city of Boston. The Food Inspections dataset is released by the Health Division of the Department of Inspectional Services of Boston which ensures that all food establishments in the City of Boston meet relevant sanitary codes and standards. The data scraped from Yelp pages includes information about restaurants in Boston that were reviewed on yelp.com. Thus, the data includes two files: Food.Inspections.Records.csv contains information about food inspections at record level (i.e. each record for each restaurant is included). Food.Inspections.Yelp.Restaurant.csv contains information about food inspections at the restaurant level plus information from Yelp reviews also at the restaurant level.

  18. SF Restaurant Scores - LIVES Standard

    • kaggle.com
    zip
    Updated Nov 30, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of San Francisco (2019). SF Restaurant Scores - LIVES Standard [Dataset]. https://www.kaggle.com/san-francisco/sf-restaurant-scores-lives-standard
    Explore at:
    zip(2734400 bytes)Available download formats
    Dataset updated
    Nov 30, 2019
    Dataset authored and provided by
    City of San Francisco
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Area covered
    San Francisco
    Description

    Content

    The Health Department has developed an inspection report and scoring system. After conducting an inspection of the facility, the Health Inspector calculates a score based on the violations observed. Violations can fall into:high risk category: records specific violations that directly relate to the transmission of food borne illnesses, the adulteration of food products and the contamination of food-contact surfaces.moderate risk category: records specific violations that are of a moderate risk to the public health and safety.low risk category: records violations that are low risk or have no immediate risk to the public health and safety.The score card that will be issued by the inspector is maintained at the food establishment and is available to the public in this dataset. San Francisco's LIVES restaurant inspection data leverages the LIVES Flattened Schema (https://goo.gl/c3nNvr), which is based on LIVES version 2.0, cited on Yelp's website (http://www.yelp.com/healthscores).

    Context

    This is a dataset hosted by the city of San Francisco. The organization has an open data platform found here and they update their information according the amount of data that is brought in. Explore San Francisco's Data using Kaggle and all of the data sources available through the San Francisco organization page!

    • Update Frequency: This dataset is updated daily.

    Acknowledgements

    This dataset is maintained using Socrata's API and Kaggle's API. Socrata has assisted countless organizations with hosting their open data and has been an integral part of the process of bringing more data to the public.

    Cover photo by Autumn Goodman on Unsplash
    Unsplash Images are distributed under a unique Unsplash License.

  19. g

    Amazon review data 2018

    • nijianmo.github.io
    • cseweb.ucsd.edu
    • +1more
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCSD CSE Research Project, Amazon review data 2018 [Dataset]. https://nijianmo.github.io/amazon/
    Explore at:
    Dataset authored and provided by
    UCSD CSE Research Project
    Description

    Context

    This Dataset is an updated version of the Amazon review dataset released in 2014. As in the previous version, this dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs). In addition, this version provides the following features:

    • More reviews:

      • The total number of reviews is 233.1 million (142.8 million in 2014).
    • New reviews:

      • Current data includes reviews in the range May 1996 - Oct 2018.
    • Metadata: - We have added transaction metadata for each review shown on the review page.

      • Added more detailed metadata of the product landing page.

    Acknowledgements

    If you publish articles based on this dataset, please cite the following paper:

    • Jianmo Ni, Jiacheng Li, Julian McAuley. Justifying recommendations using distantly-labeled reviews and fined-grained aspects. EMNLP, 2019.
  20. Geotagged Digital Traces

    • zenodo.org
    • ekoizpen-zientifikoa.ehu.eus
    • +1more
    application/gzip
    Updated May 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ricardo Munoz-Cancino; Ricardo Munoz-Cancino; Sebastián A. Ríos; Manuel Graña; Manuel Graña; Sebastián A. Ríos (2023). Geotagged Digital Traces [Dataset]. http://doi.org/10.5281/zenodo.7949307
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    May 19, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ricardo Munoz-Cancino; Ricardo Munoz-Cancino; Sebastián A. Ríos; Manuel Graña; Manuel Graña; Sebastián A. Ríos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset, divided into files by city, contains geotagged digital traces collected from different social media platforms, detailed below.

    • Tweets - Cheng et al. [1]

    • Gowalla [2]

    • Tweets - Lamsal [3]

    • YELP[4]

    • Tweets - Kejriwal et al. [5]

    • Geotagged Tweets [6]

    • UrbanActivity, [7]

    • Brightkite [8]

    • Weeplaces [8]

    • Flickr [9]

    • Foursquare [10]

    Each file is named according to the city to which the digital traces were associated and contains the columns:

    • Source: contains the name of the source platform
    • Event_date: contains the date associated with the digital trace
    • Lat: latitude of the digital trace
    • Lng: length of the digital trace

    The definition of city/town used is provided by Simplemaps [11], which considers a city/town any inhabited place as determined by U.S. government agencies. The location of cities and their respective centers were obtained from the World Cities Database provided by the same company.

    A specific group of these cities was utilized for the research presented in the article submitted to Sensors Journal:

    Muñoz-Cancino, R., Rios, S. A., & Graña, M. (2023). Clustering cities over features extracted from multiple virtual sensors measuring micro-level activity patterns allows to discriminate large-scale city characteristics. Sensors, Under Review.

    Comprehensive guidelines and the selection criteria can be found in the abovementioned article.

    References

    [1] Zhiyuan Cheng, James Caverlee, and Kyumin Lee. You are where you tweet: A content-based approach to geo-locating twitter users. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM '10, page 759{768, New York, NY, USA, 2010. Association for Computing Machinery.
    [2] Eunjoon Cho, Seth A. Myers, and Jure Leskovec. Friendship and mobility: User movement in location-based social networks. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '11, page 1082{1090, New York, NY, USA, 2011. Association for Computing Machinery.
    [3] Yunhe Feng and Wenjun Zhou. Is working from home the new norm? an observational study based on a large geo-tagged covid-19 twitter dataset, 2020.
    [4] Yelp Inc. Yelp Open Dataset, 2021. Retrieved from https://www.yelp.com/dataset. Accessed October 26, 2021.
    [5] Mayank Kejriwal and Sara Melotte. A Geo-Tagged COVID-19 Twitter Dataset for 10 North American Metropolitan Areas, January 2021.
    [6] Rabindra Lamsal. Design and analysis of a large-scale covid-19 tweets dataset. Applied Intelligence, 51(5):2790{2804, 2021.
    [7] Geraud Le Falher, Aristides Gionis, and Michael Mathioudakis. Where is the Soho of Rome? Measures and algorithms for finding similar neighborhoods in cities. In 9th AAAI Conference on Web and Social Media - ICWSM 2015, Oxford, United Kingdom, May 2015.
    [8] Yong Liu, WeiWei, Aixin Sun, and Chunyan Miao. Exploiting geographical neighborhood characteristics for location recommendation. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, CIKM '14, page 739{748, New York, NY,USA, 2014. Association for Computing Machinery.
    [9] Hatem Mousselly-Sergieh, Daniel Watzinger, Bastian Huber, Mario Doller, Elood Egyed-Zsigmond, and Harald Kosch. World-wide scale geotagged image dataset for automatic image annotation and reverse geotagging. In Proceedings of the 5th ACM Multimedia Systems Conference, MMSys '14, page 47{52, New York, NY, USA, 2014. Association for Computing Machinery.
    [10] Dingqi Yang, Daqing Zhang, Vincent W. Zheng, and Zhiyong Yu. Modeling user activity preference by leveraging user spatial temporal characteristics in lbsns. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 45(1):129{142, 2015.
    [11] Simple Maps. Basic World Cities Database, 2021. Retrieved from https://simplemaps.com/data/world-cities. Accessed September 3, 2021.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Yelp (2015). Yelp Open Dataset [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/5179
Organization logo

Yelp Open Dataset

Explore at:
468 scholarly articles cite this dataset (View in Google Scholar)
jsonAvailable download formats
Dataset updated
Dec 30, 2015
Dataset authored and provided by
Yelphttp://yelp.com/
License

https://s3-media0.fl.yelpcdn.com/assets/srv0/engineering_pages/bea5c1e92bf3/assets/vendor/yelp-dataset-agreement.pdfhttps://s3-media0.fl.yelpcdn.com/assets/srv0/engineering_pages/bea5c1e92bf3/assets/vendor/yelp-dataset-agreement.pdf

Description

Dataset containing millions of reviews on Yelp. In addition it contains business data including location data, attributes, and categories.

Search
Clear search
Close search
Google apps
Main menu