100+ datasets found

o
dataset + target vs output
explore.openaire.eu
Updated Dec 16, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Developer (2018). dataset + target vs output [Dataset]. http://doi.org/10.5281/zenodo.2336579
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.2336579
Dataset updated
Dec 16, 2018
Authors
Developer
Description
210 data points meaning of each excel sheet: IN - input variable values for each data point (each data point is one row) TARGET - target variable values for each data point (each data point is one row) VARS - presents the units used for each input (independent) and output/target (dependent) variables TARGET vs OUTPUT - presents the 210 expected (experimental) values and the ones obtained by the proposed ANN Check reference below (to be added when the paper is published) https://www.researchgate.net/publication/329932699_Neural_Networks_-_Shear_Strength_-_Corrugated_Web_Girders
f
Data and tools for studying isograms
figshare.com
Updated Jul 31, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Florian Breit (2017). Data and tools for studying isograms [Dataset]. http://doi.org/10.6084/m9.figshare.5245810.v1
Explore at:
application/x-sqlite3Available download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5245810.v1
Dataset updated
Jul 31, 2017
Dataset provided by
figshare
Authors
Florian Breit
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A collection of datasets and python scripts for extraction and analysis of isograms (and some palindromes and tautonyms) from corpus-based word-lists, specifically Google Ngram and the British National Corpus (BNC).Below follows a brief description, first, of the included datasets and, second, of the included scripts.1. DatasetsThe data from English Google Ngrams and the BNC is available in two formats: as a plain text CSV file and as a SQLite3 database.1.1 CSV formatThe CSV files for each dataset actually come in two parts: one labelled ".csv" and one ".totals". The ".csv" contains the actual extracted data, and the ".totals" file contains some basic summary statistics about the ".csv" dataset with the same name.The CSV files contain one row per data point, with the colums separated by a single tab stop. There are no labels at the top of the files. Each line has the following columns, in this order (the labels below are what I use in the database, which has an identical structure, see section below):

Label Data type Description

isogramy int The order of isogramy, e.g. "2" is a second order isogram

length int The length of the word in letters

word text The actual word/isogram in ASCII

source_pos text The Part of Speech tag from the original corpus

count int Token count (total number of occurences)

vol_count int Volume count (number of different sources which contain the word)

count_per_million int Token count per million words

vol_count_as_percent int Volume count as percentage of the total number of volumes

is_palindrome bool Whether the word is a palindrome (1) or not (0)

is_tautonym bool Whether the word is a tautonym (1) or not (0)

The ".totals" files have a slightly different format, with one row per data point, where the first column is the label and the second column is the associated value. The ".totals" files contain the following data:

Label

Data type

Description

!total_1grams

int

The total number of words in the corpus

!total_volumes

int

The total number of volumes (individual sources) in the corpus

!total_isograms

int

The total number of isograms found in the corpus (before compacting)

!total_palindromes

int

How many of the isograms found are palindromes

!total_tautonyms

int

How many of the isograms found are tautonyms

The CSV files are mainly useful for further automated data processing. For working with the data set directly (e.g. to do statistics or cross-check entries), I would recommend using the database format described below.1.2 SQLite database formatOn the other hand, the SQLite database combines the data from all four of the plain text files, and adds various useful combinations of the two datasets, namely:• Compacted versions of each dataset, where identical headwords are combined into a single entry.• A combined compacted dataset, combining and compacting the data from both Ngrams and the BNC.• An intersected dataset, which contains only those words which are found in both the Ngrams and the BNC dataset.The intersected dataset is by far the least noisy, but is missing some real isograms, too.The columns/layout of each of the tables in the database is identical to that described for the CSV/.totals files above.To get an idea of the various ways the database can be queried for various bits of data see the R script described below, which computes statistics based on the SQLite database.2. ScriptsThere are three scripts: one for tiding Ngram and BNC word lists and extracting isograms, one to create a neat SQLite database from the output, and one to compute some basic statistics from the data. The first script can be run using Python 3, the second script can be run using SQLite 3 from the command line, and the third script can be run in R/RStudio (R version 3).2.1 Source dataThe scripts were written to work with word lists from Google Ngram and the BNC, which can be obtained from http://storage.googleapis.com/books/ngrams/books/datasetsv2.html and [https://www.kilgarriff.co.uk/bnc-readme.html], (download all.al.gz).For Ngram the script expects the path to the directory containing the various files, for BNC the direct path to the *.gz file.2.2 Data preparationBefore processing proper, the word lists need to be tidied to exclude superfluous material and some of the most obvious noise. This will also bring them into a uniform format.Tidying and reformatting can be done by running one of the following commands:python isograms.py --ngrams --indir=INDIR --outfile=OUTFILEpython isograms.py --bnc --indir=INFILE --outfile=OUTFILEReplace INDIR/INFILE with the input directory or filename and OUTFILE with the filename for the tidied and reformatted output.2.3 Isogram ExtractionAfter preparing the data as above, isograms can be extracted from by running the following command on the reformatted and tidied files:python isograms.py --batch --infile=INFILE --outfile=OUTFILEHere INFILE should refer the the output from the previosu data cleaning process. Please note that the script will actually write two output files, one named OUTFILE with a word list of all the isograms and their associated frequency data, and one named "OUTFILE.totals" with very basic summary statistics.2.4 Creating a SQLite3 databaseThe output data from the above step can be easily collated into a SQLite3 database which allows for easy querying of the data directly for specific properties. The database can be created by following these steps:1. Make sure the files with the Ngrams and BNC data are named “ngrams-isograms.csv” and “bnc-isograms.csv” respectively. (The script assumes you have both of them, if you only want to load one, just create an empty file for the other one).2. Copy the “create-database.sql” script into the same directory as the two data files.3. On the command line, go to the directory where the files and the SQL script are. 4. Type: sqlite3 isograms.db 5. This will create a database called “isograms.db”.See the section 1 for a basic descript of the output data and how to work with the database.2.5 Statistical processingThe repository includes an R script (R version 3) named “statistics.r” that computes a number of statistics about the distribution of isograms by length, frequency, contextual diversity, etc. This can be used as a starting point for running your own stats. It uses RSQLite to access the SQLite database version of the data described above.
o
Limroth Row Cross Street Data in Point Pleasant Beach, NJ
ownerly.com
Updated Jan 15, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ownerly (2025). Limroth Row Cross Street Data in Point Pleasant Beach, NJ [Dataset]. https://www.ownerly.com/nj/point-pleasant-beach/limroth-row-home-details
Explore at:
Dataset updated
Jan 15, 2022
Dataset authored and provided by
Ownerly
Area covered
Limroth Row, New Jersey, Point Pleasant Beach
Description
This dataset provides information about the number of properties, residents, and average property values for Limroth Row cross streets in Point Pleasant Beach, NJ.
TIGER/Line Shapefile, 2023, State, Arkansas, Point Landmark
catalog.data.gov
datasets.ai
Updated Dec 14, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Department of Commerce, U.S. Census Bureau, Geography Division, Geospatial Products Branch (Point of Contact) (2023). TIGER/Line Shapefile, 2023, State, Arkansas, Point Landmark [Dataset]. https://catalog.data.gov/dataset/tiger-line-shapefile-2023-state-arkansas-point-landmark
Explore at:
Dataset updated
Dec 14, 2023
Dataset provided by
United States Census Bureauhttp://census.gov/
Description
The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national filewith no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independentdata set, or they can be combined to cover the entire nation. The Census Bureau includes landmarks in theMTDB for locating special features and to help enumerators during field operations. Some of the more common landmark types include area landmarks such as airports, cemeteries, parks, mountain peaks/summits, schools, and churches and other religious institutions. The Census Bureau has added landmark features to MTDB on an as-needed basis and made no attempt to ensure that all instances of a particular feature were included. The presence or absence of a landmark such as a hospital or prison does not mean that the living quarters associated with that landmark were geocoded to that census tabulation block or excluded from the census enumeration.
BLM Idaho Right Of Way Point
catalog.data.gov
gbp-blm-egis.hub.arcgis.com
Updated Nov 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bureau of Land Management (2024). BLM Idaho Right Of Way Point [Dataset]. https://catalog.data.gov/dataset/blm-idaho-right-of-way-point
Explore at:
Dataset updated
Nov 20, 2024
Dataset provided by
Bureau of Land Managementhttp://www.blm.gov/
Area covered
Idaho
Description
These are Rights-of-Ways (ROW) on Idaho BLM land (and some other Federal agency land) as shown on Bureau of Land Management (BLM) Master Title Plats (MTP). Every GIS ROW feature has a "CASEFILE" value, also known as the serial number of the ROW. This corresponds to the LR2000 database, which is a national BLM database for federal lands information. This GIS ROW feature class can be joined or related to exported information from LR2000 using the "CASEFILE" (GIS) and "SERIAL_NR_FULL" (LR2000) fields. NOTE: the LR2000 information is only available to internal BLM users and is not available to the public as it contains sensitive information. This ROW data for any given area may not be complete due to new ROW activity or because of missed or coincident ROW features during the initial data creation. It is recommended that a thorough inventory of all ROWs in a specific project area be obtained (an LR2000 report can provide this) and the GIS ROW data be checked before using this data for projects needing utmost ROW accuracy. The ROW data that was digitized is what was present on the MTP at the time of the digitizing done for that township. The project was performed over several years. Therefore, the "early" townships digitized are more out of date regarding ROWs compared to the ones more recently digitized. Unfortunately, there is no attribute that indicates the digitizing sequence. Any updates to this ROW feature class should be sent to the BLM Idaho State Office GIS staff for incorporation into the statewide GIS ROW feature classes for improvement over time. For more information contact us at blm_id_stateoffice@blm.gov.
o
Rivers Point Row Cross Street Data in Charleston, SC
ownerly.com
Updated Dec 8, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ownerly (2021). Rivers Point Row Cross Street Data in Charleston, SC [Dataset]. https://www.ownerly.com/sc/charleston/rivers-point-row-home-details
Explore at:
Dataset updated
Dec 8, 2021
Dataset authored and provided by
Ownerly
Area covered
Charleston, South Carolina, Rivers Point Row
Description
This dataset provides information about the number of properties, residents, and average property values for Rivers Point Row cross streets in Charleston, SC.
d
Mobile Location Data | Asia | +300M Unique Devices | +100M Daily Users |...
datarade.ai
.json, .csv, .xls
Updated Mar 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Quadrant (2025). Mobile Location Data | Asia | +300M Unique Devices | +100M Daily Users | +200B Events / Month [Dataset]. https://datarade.ai/data-products/mobile-location-data-asia-300m-unique-devices-100m-da-quadrant
Explore at:
.json, .csv, .xlsAvailable download formats
Dataset updated
Mar 21, 2025
Dataset authored and provided by
Quadrant
Area covered
Asia, Iran (Islamic Republic of), Oman, Armenia, Kyrgyzstan, Philippines, Bahrain, Korea (Democratic People's Republic of), Georgia, Israel, Palestine
Description
Quadrant provides Insightful, accurate, and reliable mobile location data.

Our privacy-first mobile location data unveils hidden patterns and opportunities, provides actionable insights, and fuels data-driven decision-making at the world's biggest companies.

These companies rely on our privacy-first Mobile Location and Points-of-Interest Data to unveil hidden patterns and opportunities, provide actionable insights, and fuel data-driven decision-making. They build better AI models, uncover business insights, and enable location-based services using our robust and reliable real-world data.

We conduct stringent evaluations on data providers to ensure authenticity and quality. Our proprietary algorithms detect, and cleanse corrupted and duplicated data points – allowing you to leverage our datasets rapidly with minimal processing or cleaning. During the ingestion process, our proprietary Data Filtering Algorithms remove events based on a number of both qualitative factors, as well as latency and other integrity variables to provide more efficient data delivery. The deduplicating algorithm focuses on a combination of four important attributes: Device ID, Latitude, Longitude, and Timestamp. This algorithm scours our data and identifies rows that contain the same combination of these four attributes. Post-identification, it retains a single copy and eliminates duplicate values to ensure our customers only receive complete and unique datasets.

We actively identify overlapping values at the provider level to determine the value each offers. Our data science team has developed a sophisticated overlap analysis model that helps us maintain a high-quality data feed by qualifying providers based on unique data values rather than volumes alone – measures that provide significant benefit to our end-use partners.

Quadrant mobility data contains all standard attributes such as Device ID, Latitude, Longitude, Timestamp, Horizontal Accuracy, and IP Address, and non-standard attributes such as Geohash and H3. In addition, we have historical data available back through 2022.

Through our in-house data science team, we offer sophisticated technical documentation, location data algorithms, and queries that help data buyers get a head start on their analyses. Our goal is to provide you with data that is “fit for purpose”.
Pseudo Periodic Synthetic Time Series
kaggle.com
Updated Aug 25, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Overfitted (2020). Pseudo Periodic Synthetic Time Series [Dataset]. https://www.kaggle.com/vipulgote4/pseudo-periodic-synthetic-time-series/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 25, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Overfitted
Description
Data type:

The data is a synthetic univariate time series.

Abstract

This data set is designed for testing indexing schemes in time seriesdatabases. The data appears highly periodic, but never exactly repeats itself. This feature is designed to challenge the indexing tasks.

Context

Data Characteristics

This data set is designed for testing indexing schemes in time series databases. It is a much larger dataset than has been used in any published study (That we are currently aware of). It contains one million data points. The data has been split into 10 sections to facilitate testing (see below). We recommend building the index with 9 of the 100,000-datapoint sections, and randomly extracting a query shape from the 10th section. (Some previously published work seems to have used queries that were also used to build the indexing structure. This will produce optimistic results) The data are interesting because they have structure at different resolutions. Each of the 10 sections where generated by independent invocations of the function:https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3650646%2F63a7467c9c096ba461b6f02702e6d816%2Fequation.jpg?generation=1598371655944726&alt=media" alt="">

Where rand(x) produces a random integer between zero and x. The data appears highly periodic, but never exactly repeats itself. This feature is designed to challenge the indexing structure.

Content

What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.

Data Format

The data is stored in one ASCII file. There are 10 columns, 100,000 rows. All data points are in the range -0.5 to +0.5. Rows are separated by carriage returns, columns by spaces.

Acknowledgements

Acknowledgements, Copyright Information, and Availability.Freely available for research use.

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?
n
Repository Analytics and Metrics Portal (RAMP) 2017 data
data.niaid.nih.gov
datadryad.org
+1more
zip
Updated Jul 27, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonathan Wheeler; Kenning Arlitsch (2021). Repository Analytics and Metrics Portal (RAMP) 2017 data [Dataset]. http://doi.org/10.5061/dryad.r7sqv9scf
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.r7sqv9scf
Dataset updated
Jul 27, 2021
Dataset provided by
Montana State University
University of New Mexico
Authors
Jonathan Wheeler; Kenning Arlitsch
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
The Repository Analytics and Metrics Portal (RAMP) is a web service that aggregates use and performance use data of institutional repositories. The data are a subset of data from RAMP, the Repository Analytics and Metrics Portal (http://rampanalytics.org), consisting of data from all participating repositories for the calendar year 2017. For a description of the data collection, processing, and output methods, please see the "methods" section below.

Methods RAMP Data Documentation – January 1, 2017 through August 18, 2018

Data Collection

RAMP data are downloaded for participating IR from Google Search Console (GSC) via the Search Console API. The data consist of aggregated information about IR pages which appeared in search result pages (SERP) within Google properties (including web search and Google Scholar).

Data from January 1, 2017 through August 18, 2018 were downloaded in one dataset per participating IR. The following fields were downloaded for each URL, with one row per URL:

url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property. impressions: The number of times the URL appears within the SERP. clicks: The number of clicks on a URL which took users to a page outside of the SERP. clickThrough: Calculated as the number of clicks divided by the number of impressions. position: The position of the URL within the SERP. country: The country from which the corresponding search originated. device: The device used for the search. date: The date of the search.

Following data processing describe below, on ingest into RAMP an additional field, citableContent, is added to the page level data.

Note that no personally identifiable information is downloaded by RAMP. Google does not make such information available.

More information about click-through rates, impressions, and position is available from Google's Search Console API documentation: https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query and https://support.google.com/webmasters/answer/7042828?hl=en

Data Processing

Upon download from GSC, data are processed to identify URLs that point to citable content. Citable content is defined within RAMP as any URL which points to any type of non-HTML content file (PDF, CSV, etc.). As part of the daily download of statistics from Google Search Console (GSC), URLs are analyzed to determine whether they point to HTML pages or actual content files. URLs that point to content files are flagged as "citable content." In addition to the fields downloaded from GSC described above, following this brief analysis one more field, citableContent, is added to the data which records whether each URL in the GSC data points to citable content. Possible values for the citableContent field are "Yes" and "No."

Processed data are then saved in a series of Elasticsearch indices. From January 1, 2017, through August 18, 2018, RAMP stored data in one index per participating IR.

About Citable Content Downloads

Data visualizations and aggregations in RAMP dashboards present information about citable content downloads, or CCD. As a measure of use of institutional repository content, CCD represent click activity on IR content that may correspond to research use.

CCD information is summary data calculated on the fly within the RAMP web application. As noted above, data provided by GSC include whether and how many times a URL was clicked by users. Within RAMP, a "click" is counted as a potential download, so a CCD is calculated as the sum of clicks on pages/URLs that are determined to point to citable content (as defined above).

For any specified date range, the steps to calculate CCD are:

Filter data to only include rows where "citableContent" is set to "Yes." Sum the value of the "clicks" field on these rows.

Output to CSV

Published RAMP data are exported from the production Elasticsearch instance and converted to CSV format. The CSV data consist of one "row" for each page or URL from a specific IR which appeared in search result pages (SERP) within Google properties as described above.

The data in these CSV files include the following fields:

url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property. impressions: The number of times the URL appears within the SERP. clicks: The number of clicks on a URL which took users to a page outside of the SERP. clickThrough: Calculated as the number of clicks divided by the number of impressions. position: The position of the URL within the SERP. country: The country from which the corresponding search originated. device: The device used for the search. date: The date of the search. citableContent: Whether or not the URL points to a content file (ending with pdf, csv, etc.) rather than HTML wrapper pages. Possible values are Yes or No. index: The Elasticsearch index corresponding to page click data for a single IR. repository_id: This is a human readable alias for the index and identifies the participating repository corresponding to each row. As RAMP has undergone platform and version migrations over time, index names as defined for the index field have not remained consistent. That is, a single participating repository may have multiple corresponding Elasticsearch index names over time. The repository_id is a canonical identifier that has been added to the data to provide an identifier that can be used to reference a single participating repository across all datasets. Filtering and aggregation for individual repositories or groups of repositories should be done using this field.

Filenames for files containing these data follow the format 2017-01_RAMP_all.csv. Using this example, the file 2017-01_RAMP_all.csv contains all data for all RAMP participating IR for the month of January, 2017.

References

Google, Inc. (2021). Search Console APIs. Retrieved from https://developers.google.com/webmaster-tools/search-console-api-original.
o
Multi-feature Golf Play Dataset
opendatabay.com
.undefined
Updated Jul 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Multi-feature Golf Play Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/23026657-8212-4f36-84a0-f6064a0b889b
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 4, 2025
Dataset authored and provided by
Datasimple
Area covered
Education & Learning Analytics
Description
This is the Extended Golf Play Dataset, a rich and detailed collection designed to expand upon the classic golf dataset [1]. It incorporates a wide array of features suitable for various data science applications and is especially valuable for teaching purposes [1]. The dataset is organised in a long format, where each row represents a single observation and often includes textual data, such as player reviews or comments [2]. It contains a special set of mini datasets, each tailored to a specific teaching point, for example, demonstrating data cleaning or combining datasets [1]. These are ideal for beginners to practise with real examples and are complemented by notebooks with step-by-step guides [1].

Columns

The dataset features a variety of columns, including core, extra, and text-based attributes: * ID: A unique identifying number for each player [1]. * Date: The specific day the data was recorded or the golf session took place [1, 2]. * Weekday: The day of the week, with numerical representation (e.g., 0 for Sunday, 1 for Monday) [1, 3]. * Holiday: Indicates whether the day was a special holiday (Yes/No), specifically noted for holidays in Japan (1 for yes, 0 for no) [1, 3]. * Month: The month in which golf was played [3]. * Season: The time of year, such as spring, summer, autumn, or winter [1, 3]. * Outlook: Describes the weather conditions during the session (e.g., sunny, cloudy, rainy, snowy) [1, 3]. * Temperature: The ambient temperature during the golf session, recorded in Celsius [1, 3]. * Humidity: The percentage of moisture in the air [1, 3]. * Windy: A boolean indicator (True/False or 1 for yes, 0 for no) if it was windy [1, 3]. * Crowded-ness: A measure of how busy the golf course was, ranging from 0 to 1 [1, 4]. * PlayTime-Hour: The duration for which people played golf, in hours [1]. * Play: Indicates whether golf was played or not (Yes/No) [1]. * Review: Textual feedback from players about their day at golf [1]. * EmailCampaign: Text content of emails sent daily by the golf place [1]. * MaintenanceTasks: Descriptions of work carried out to maintain the golf course [1].

Distribution

This dataset is organised in a long format, meaning each row represents a single observation [2]. Data files are typically in CSV format, with sample files updated separately to the platform [5]. Specific numbers for rows or records are not currently available within the provided sources. The dataset also includes a special collection of mini datasets within its structure [1].

Usage

This dataset is highly versatile and ideal for learning and applying various data science skills: * Data Visualisation: Learn to create graphs and identify patterns within the data [1]. * Predictive Modelling: Discover which data points are useful for predicting if golf will be played [1]. * Data Cleaning: Practise spotting and managing data that appears incorrect or inconsistent [1]. * Time Series Analysis: Understand how various factors change over time, such as daily or monthly trends [1, 2]. * Data Grouping: Learn to combine similar days or observations together [1]. * Text Analysis: Extract insights from textual features like player reviews, potentially for sentiment analysis or thematic extraction [1, 2]. * Recommendation Systems: Develop models to suggest optimal times to play golf based on historical data [1]. * Data Management: Gain experience in managing and analysing data structured in a long format, which is common for repeated measures [2].

Coverage

The dataset's regional coverage is global [6]. While the Date column records the day the data was captured or the session occurred, no specific time range for the collected data is stated beyond the listing date of 11/06/2025 [1, 6]. Demographic scope includes unique player IDs [1], but no specific demographic details or data availability notes for particular groups or years are provided.

License

CC-BY

Who Can Use It

This dataset is designed for a broad audience: * New Learners: It is easy to understand and comes with guides to aid the learning process [1]. * Teachers: An excellent resource for conducting classes on data visualisation and interpretation [1]. * Researchers: Suitable for testing novel data analysis methodologies [1]. * Students: Can acquire a wide range of skills, from making graphs to understanding textual data and building recommendation systems [1].

Dataset Name Suggestions

Golf Play Extended Analytics

Advanced Golf Session Data

Long Format Golf Insights

Multi-feature Golf Play Dataset

Textual Golf Data for Learning

Attributes

Original Data Source: ⛳️ Golf Play Dataset Extended
P
PLAD Dataset
paperswithcode.com
Updated Apr 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jinwoo Jeon; Hyunjun Lim; Dong-Uk Seo; Hyun Myung (2022). PLAD Dataset [Dataset]. https://paperswithcode.com/dataset/plad-1
Explore at:
Dataset updated
Apr 28, 2022
Authors
Jinwoo Jeon; Hyunjun Lim; Dong-Uk Seo; Hyun Myung
Description
PLAD is a dataset where sparse depth is provided by line-based visual SLAM to verify StructMDC.
TIGER/Line Shapefile, 2023, State, Maryland, Point Landmark
catalog.data.gov
s.cnmilf.com
Updated Dec 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Department of Commerce, U.S. Census Bureau, Geography Division, Geospatial Products Branch (Point of Contact) (2023). TIGER/Line Shapefile, 2023, State, Maryland, Point Landmark [Dataset]. https://catalog.data.gov/dataset/tiger-line-shapefile-2023-state-maryland-point-landmark
Explore at:
Dataset updated
Dec 15, 2023
Dataset provided by
United States Census Bureauhttp://census.gov/
Area covered
Maryland
Description
The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national filewith no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independentdata set, or they can be combined to cover the entire nation. The Census Bureau includes landmarks in theMTDB for locating special features and to help enumerators during field operations. Some of the more common landmark types include area landmarks such as airports, cemeteries, parks, mountain peaks/summits, schools, and churches and other religious institutions. The Census Bureau has added landmark features to MTDB on an as-needed basis and made no attempt to ensure that all instances of a particular feature were included. The presence or absence of a landmark such as a hospital or prison does not mean that the living quarters associated with that landmark were geocoded to that census tabulation block or excluded from the census enumeration.
o
Zilai Row Cross Street Data in Point Pleasant Beach, NJ
ownerly.com
Updated Jan 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ownerly (2022). Zilai Row Cross Street Data in Point Pleasant Beach, NJ [Dataset]. https://www.ownerly.com/nj/point-pleasant-beach/zilai-row-home-details
Explore at:
Dataset updated
Jan 5, 2022
Dataset authored and provided by
Ownerly
Area covered
Zilai Row, New Jersey, Point Pleasant Beach
Description
This dataset provides information about the number of properties, residents, and average property values for Zilai Row cross streets in Point Pleasant Beach, NJ.
a
ROW Permits - Points
egisdata-dallasgis.hub.arcgis.com
Updated Oct 18, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Dallas GIS Services (2020). ROW Permits - Points [Dataset]. https://egisdata-dallasgis.hub.arcgis.com/maps/DallasGIS::row-permits-points
Explore at:
Dataset updated
Oct 18, 2020
Dataset authored and provided by
City of Dallas GIS Services
Area covered

Description
Right-of-way permits associated with a specific street address. A permit is expected to have only one GIS feature (line or point), but the data may have some anomalies.A permit is required to perform any construction work within the public right-of-way or any construction work outside of the public right-of-way that will cut, break, or otherwise damage the public right-of-way. The authoritative source for permit information is the Right-of-Way Management System (https://rowmanagement.dallascityhall.com/Login.aspx).
Licensees and Registrants Fleet Information
data.cityofnewyork.us
catalog.data.gov
application/rdfxml +5
Updated Jul 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Business Integrity Commission (BIC) (2025). Licensees and Registrants Fleet Information [Dataset]. https://data.cityofnewyork.us/Business/Licensees-and-Registrants-Fleet-Information/n84m-kx4j
Explore at:
xml, csv, json, tsv, application/rssxml, application/rdfxmlAvailable download formats
Dataset updated
Jul 10, 2025
Dataset provided by
New York City Business Integrity Commissionhttp://www.nyc.gov/bic
Authors
Business Integrity Commission (BIC)
Description
Licensees and Registrants Fleet Information dataset contains information about active vehicles of approved BIC licensees and registrants. This data is partially collected from the application submitted to the commission by licensees and registrants. The majority of data points are collected from licensees and registrants via the Vehicle Management Portal. Business Integrity Commission maintains this dataset, updating it quarterly. Each row of data contains information about an active vehicle registered at BIC, including BIC plate number, vehicle year, vehicle model, vehicle make, engine year and so on.
Z
Outdoor NB-IoT and 5G coverage and channel information data in urban...
data.niaid.nih.gov
zenodo.org
Updated Feb 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anna Brunstrom (2025). Outdoor NB-IoT and 5G coverage and channel information data in urban environments [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7674298
Explore at:
Dataset updated
Feb 13, 2025
Dataset provided by
Özgü Alay
Anna Brunstrom
Luca De Nardis
Giuseppe Caso
Marco Neri
Maria-Gabriella Di Benedetto
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset includes data for NB-IoT and 5G networks as collected in two cities: Oslo, Norway (NB-IoT only) and Rome, Italy (both NB-IoT and 5G).

Data were collected using the Rohde & Schwarz TSMA6 mobile network scanner. 7 measurement campaigns are provided for Oslo, and 6 for Rome. Additional data collected in Rome are provided in the following large-scale dataset, focusing on the two major mobile network operators: https://ieee-dataport.org/documents/large-scale-dataset-4g-nb-iot-and-5g-non-standalone-network-measurements

The dataset includes a metadata file providing the following information for each campaign:

date of collection;

start time and end time of collection;

length;

type (walking/driving).

Two additional metadata files are provided: two .kml files, one for each city, allowing the import of coordinates of data points organized by campaign in a GIS engine, such as Google Earth, for interactive visualization.

The dataset contains the following data for NB-IoT:

Raw data for each campaign, stored in two .csv files. For a generic campaign , the files are:

NB-IoT_coverage_C.csv including a geo-tagged data entry in each row. Each entry provides information on a Narrowband Physical Cell Identifier (NPCI), with data related to the time stamp the NPCI was detected, GPS information, network (NPCI, Operator, Country Code, eNodeB-ID) and RF signal (RSSI, SINR, RSRP and RSRQ values);

NB-IoT_RefSig_cir_C.csv, also including a geo-tagged data entry in each row. Each entry provides information on a NPCI, with data related to the time stamp the NPCI was detected, GPS information, network (NPCI, Operator ID, Country Code, eNodeB-ID) and Channel Impulse Response (CIR) statistics, including the maximum delay.

Processed data, stored in a Matlab workspace (.mat) file for each city: data are grouped in data points, identified by pairs. Each data point provides RF and CIR maximum delay measurements for each unique combination detected at the coordinates of the data point.

Estimated positions of eNodeBs, stored in a csv file for each city;

A matlab script and a function to extract and generate processed data from the raw data for each city.

The dataset contains the following data for 5G:

Raw data for each campaign, stored in two .xslx files. For a generic campaign , the files are:

5G_coverage_C.xslx including a geo-tagged data entry in each row. Each entry provides information on a Physical Cell Identifier (PCI), with data related to the time stamp the PCI was detected, GPS information, network (PCI, Beamforming Index, Operator, Country Code) and RF data (SSB-RSSI, SSS-SINR, SSS-RSRP and SSS-RSRQ values, and similar information for the PBCH signal);

5G_RefSig_cir_C.csv, also including a geo-tagged data entry in each row. Each entry provides information on a PCI, with data related to the time stamp the PCI was detected, GPS information, network (PCI, Beamforming Index, Operator ID, Country Code) and Channel Impulse Response (CIR) statistics, including the maximum delay.

Processed data, stored in a Matlab workspace (.mat) file: data are grouped in data points, identified by pairs. Each data point provides RF and CIR maximum delay measurements for each unique combination detected at the coordinates of the data point.

A matlab script and a supporting function to extract and generate processed data from the raw data.

In addition, in the case of the Rome data additional matlab workspaces are provided, containing interpolated data in the feature dimensions according to two different approaches:

A campaign-by-campaign linear interpolation (both NB-IoT and 5G);

A bidimensional interpolation on all campaigns combined (NB-IoT only).

A function to interpolate missing data in the original data according to the first approach is also provided for each technology. The interpolation rationale and procedure for the first approach is detailed in:

L. De Nardis, G. Caso, Ö. Alay, U. Ali, M. Neri, A. Brunstrom and M.-G. Di Benedetto, "Positioning by Multicell Fingerprinting in Urban NB-IoT networks," Sensors, Volume 23, Issue 9, Article ID 4266, April 2023. DOI: 10.3390/s23094266.

The second interpolation approach is instead introduced and described in:

L. De Nardis, M. Savelli, G. Caso, F. Ferretti, L. Tonelli, N. Bouzar, A. Brunstrom, O. Alay, M. Neri, F. Elbahhar and M.-G. Di Benedetto, " Range-free Positioning in NB-IoT Networks by Machine Learning: beyond WkNN", under major revision in IEEE Journal of Indoor and Seamless Positioning and Navigation.

Positioning using the 5G data was furthermore in investigated in:

K. Kousias, M. Rajiullah, G. Caso, U. Ali, Ö. Alay, A. Brunstrom, L. De Nardis, M. Neri, and M.-G. Di Benedetto, "A Large-Scale Dataset of 4G, NB-IoT, and 5G Non-Standalone Network Measurements," IEEE Communications Magazine, Volume 62, Issue 5, pp. 44-49, May 2024. DOI: 10.1109/MCOM.011.2200707.

G. Caso, M. Rajiullah, K. Kousias, U. Ali, N. Bouzar, L. De Nardis, A. Brunstrom, Ö. Alay, M. Neri and M.-G. Di Benedetto,"The Chronicles of 5G Non-Standalone: An Empirical Analysis of Performance and Service Evolution", IEEE Open Journal of the Communications Society, Volume 5, pp. 7380 - 7399, 2024. DOI: 10.1109/OJCOMS.2024.3499370.

Please refer to the above publications when using and citing the dataset.
TIGER Road Network
search.dataone.org
Updated Oct 14, 2013
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cary Institute Of Ecosystem Studies; Jarlath O'Neil-Dunne (2013). TIGER Road Network [Dataset]. https://search.dataone.org/view/knb-lter-bes.86.570
Explore at:
Dataset updated
Oct 14, 2013
Dataset provided by
Long Term Ecological Research Networkhttp://www.lternet.edu/
Authors
Cary Institute Of Ecosystem Studies; Jarlath O'Neil-Dunne
Time period covered
Jan 1, 2004 - Nov 17, 2011
Area covered

Description
TIGER road data for the MSA. When compared to high-resolution imagery and other transportation datasets positional inaccuracies were observed. As a result caution should be taken when using this dataset. TIGER, TIGER/Line, and Census TIGER are registered trademarks of the U.S. Census Bureau. ZCTA is a trademark of the U.S. Census Bureau. The Census 2000 TIGER/Line files are an extract of selected geographic and cartographic information from the Census TIGER data base. The geographic coverage for a single TIGER/Line file is a county or statistical equivalent entity, with the coverage area based on January 1, 2000 legal boundaries. A complete set of census 2000 TIGER/Line files includes all counties and statistically equivalent entities in the United States, Puerto Rico, and the Island Areas. The Census TIGER data base represents a seamless national file with no overlaps or gaps between parts. However, each county-based TIGER/Line file is designed to stand alone as an independent data set or the files can be combined to cover the whole Nation. The Census 2000 TIGER/Line files consist of line segments representing physical features and governmental and statistical boundaries. The boundary information in the TIGER/Line files are for statistical data collection and tabulation purposes only; their depiction and designation for statistical purposes does not constitute a determination of jurisdictional authority or rights of ownership or entitlement. The Census 2000 TIGER/Line files do NOT contain the Census 2000 urban areas which have not yet been delineated. The files contain information distributed over a series of record types for the spatial objects of a county. There are 17 record types, including the basic data record, the shape coordinate points, and geographic codes that can be used with appropriate software to prepare maps. Other geographic information contained in the files includes attributes such as feature identifiers/census feature class codes (CFCC) used to differentiate feature types, address ranges and ZIP Codes, codes for legal and statistical entities, latitude/longitude coordinates of linear and point features, landmark point features, area landmarks, key geographic features, and area boundaries. The Census 2000 TIGER/Line data dictionary contains a complete list of all the fields in the 17 record types. This is part of a collection of 221 Baltimore Ecosystem Study metadata records that point to a geodatabase. The geodatabase is available online and is considerably large. Upon request, and under certain arrangements, it can be shipped on media, such as a usb hard drive. The geodatabase is roughly 51.4 Gb in size, consisting of 4,914 files in 160 folders. Although this metadata record and the others like it are not rich with attributes, it is nonetheless made available because the data that it represents could be indeed useful. This is part of a collection of 221 Baltimore Ecosystem Study metadata records that point to a geodatabase. The geodatabase is available online and is considerably large. Upon request, and under certain arrangements, it can be shipped on media, such as a usb hard drive. The geodatabase is roughly 51.4 Gb in size, consisting of 4,914 files in 160 folders. Although this metadata record and the others like it are not rich with attributes, it is nonetheless made available because the data that it represents could be indeed useful.
m
Indoor Fire Dataset with Distributed Multi-Sensor Nodes
data.mendeley.com
Updated Jun 7, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pascal V (2023). Indoor Fire Dataset with Distributed Multi-Sensor Nodes [Dataset]. http://doi.org/10.17632/npk2zcm85h.1
Explore at:
Unique identifier
https://doi.org/10.17632/npk2zcm85h.1
Dataset updated
Jun 7, 2023
Authors
Pascal V
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset comprises 4 fire experiments (repeated 3 times) and 3 nuisance experiments (Ethanol: repeated 3 times, Deodorant: repeated 2 times, Hairspray: repeated 1 time), with various background sequences interspersed between the conducted experiments. All exeriments were caried out in random order to reduce the influence of prehistory. It consists of a total of 305,304 rows and 16 columns, structured as a continuous multivariate time series. Each row represents the sensor measurements (CO2, CO, H2, humidity, particulate matter of different sizes, air temperature, and UV) from a unique sensor node position in the EN54 test room at a specific timestamp. The columns correspond to the sensor measurements and include additional labels: a scenario-specific label ("scenario_label"), a binary label ("anomaly_label") distinguishing between "Normal" (background) and "Anomaly" (fire or nuisance scenario), a ternary label ("ternary_label") categorizing the data as "Nuisance," "Fire," or "Background," and a progress label ("progress_label") that allows for dividing the event sequences into sub-sequences based on ongoing physical sub-processes. The dataset comprises 82.98% background data points and 17.02% anomaly data points, which can be further divided into 12.50% fire anomaly data points and 4.52% nuisance anomaly data points. The "Sensor_ID" column can be utilized to access data from different sensor node positions.
d
Mobile Location Data | United States | +300M Unique Devices | +150M Daily...
datarade.ai
.json, .xml, .csv
Updated Jul 7, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Quadrant (2020). Mobile Location Data | United States | +300M Unique Devices | +150M Daily Users | +200B Events / Month [Dataset]. https://datarade.ai/data-products/mobile-location-data-us
Explore at:
.json, .xml, .csvAvailable download formats
Dataset updated
Jul 7, 2020
Dataset authored and provided by
Quadrant
Area covered
United States
Description
Quadrant provides Insightful, accurate, and reliable mobile location data.

Our privacy-first mobile location data unveils hidden patterns and opportunities, provides actionable insights, and fuels data-driven decision-making at the world's biggest companies.

These companies rely on our privacy-first Mobile Location and Points-of-Interest Data to unveil hidden patterns and opportunities, provide actionable insights, and fuel data-driven decision-making. They build better AI models, uncover business insights, and enable location-based services using our robust and reliable real-world data.

We conduct stringent evaluations on data providers to ensure authenticity and quality. Our proprietary algorithms detect, and cleanse corrupted and duplicated data points – allowing you to leverage our datasets rapidly with minimal processing or cleaning. During the ingestion process, our proprietary Data Filtering Algorithms remove events based on a number of both qualitative factors, as well as latency and other integrity variables to provide more efficient data delivery. The deduplicating algorithm focuses on a combination of four important attributes: Device ID, Latitude, Longitude, and Timestamp. This algorithm scours our data and identifies rows that contain the same combination of these four attributes. Post-identification, it retains a single copy and eliminates duplicate values to ensure our customers only receive complete and unique datasets.

We actively identify overlapping values at the provider level to determine the value each offers. Our data science team has developed a sophisticated overlap analysis model that helps us maintain a high-quality data feed by qualifying providers based on unique data values rather than volumes alone – measures that provide significant benefit to our end-use partners.

Quadrant mobility data contains all standard attributes such as Device ID, Latitude, Longitude, Timestamp, Horizontal Accuracy, and IP Address, and non-standard attributes such as Geohash and H3. In addition, we have historical data available back through 2022.

Through our in-house data science team, we offer sophisticated technical documentation, location data algorithms, and queries that help data buyers get a head start on their analyses. Our goal is to provide you with data that is “fit for purpose”.
A
Stressor II transect line point intercept data
data.amerigeoss.org
portal.edirepository.org
+1more
csv, html
Updated Jul 30, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States (2019). Stressor II transect line point intercept data [Dataset]. https://data.amerigeoss.org/bg/dataset/stressor-ii-transect-line-point-intercept-data
Explore at:
html, csvAvailable download formats
Dataset updated
Jul 30, 2019
Dataset provided by
United States
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The goal of this sampling effort is to describe the vegetation response to treatments. Data were collected following the line-point intercept method (Herrick et al. 2009). Although the original LPI data set was in multivariate form with separate columns for canopy layers and soil surface, this data set has been transposed into vertical form, implementing a “layer” variable, so that all species and soil surface codes appear in one column. Within each exclosure, 4837 points were sampled with the following exceptions: year exclosure total_points_sampled 1996 5 4825 1996 7 4836 1996 9 4836 1996 10 4836 1997 1 4830 1997 2 4830 1997 3 4830 1997 4 4830 1997 5 4830 1997 6 4830 1997 7 4830 1997 8 4830 1997 9 4830 1997 10 4830 1997 11 4830 1997 12 4830 1997 13 4830 1997 14 4830 1997 15 4830 1997 16 4830 1997 17 4830 1997 18 4830 2002 12 4835

Facebook

Twitter

Click to copy link

Link copied

Cite

Developer (2018). dataset + target vs output [Dataset]. http://doi.org/10.5281/zenodo.2336579

dataset + target vs output

Explore at:

28 scholarly articles cite this dataset (View in Google Scholar)

Unique identifier

https://doi.org/10.5281/zenodo.2336579

Dataset updated

Dec 16, 2018

Authors

Developer

Description

210 data points meaning of each excel sheet: IN - input variable values for each data point (each data point is one row) TARGET - target variable values for each data point (each data point is one row) VARS - presents the units used for each input (independent) and output/target (dependent) variables TARGET vs OUTPUT - presents the 210 expected (experimental) values and the ones obtained by the proposed ANN Check reference below (to be added when the paper is published) https://www.researchgate.net/publication/329932699_Neural_Networks_-_Shear_Strength_-_Corrugated_Web_Girders

Clear search

Close search

Google apps

Main menu

dataset + target vs output

Data and tools for studying isograms

Limroth Row Cross Street Data in Point Pleasant Beach, NJ

TIGER/Line Shapefile, 2023, State, Arkansas, Point Landmark

BLM Idaho Right Of Way Point

Rivers Point Row Cross Street Data in Charleston, SC

Mobile Location Data | Asia | +300M Unique Devices | +100M Daily Users |...

Pseudo Periodic Synthetic Time Series

Data type:

Abstract

Context

Data Characteristics

Content

Data Format

Acknowledgements

Inspiration

Repository Analytics and Metrics Portal (RAMP) 2017 data

Multi-feature Golf Play Dataset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

PLAD Dataset

TIGER/Line Shapefile, 2023, State, Maryland, Point Landmark

Zilai Row Cross Street Data in Point Pleasant Beach, NJ

ROW Permits - Points

Licensees and Registrants Fleet Information

Outdoor NB-IoT and 5G coverage and channel information data in urban...

TIGER Road Network

Indoor Fire Dataset with Distributed Multi-Sensor Nodes

Mobile Location Data | United States | +300M Unique Devices | +150M Daily...

Stressor II transect line point intercept data

dataset + target vs output