210 data points meaning of each excel sheet: IN - input variable values for each data point (each data point is one row) TARGET - target variable values for each data point (each data point is one row) VARS - presents the units used for each input (independent) and output/target (dependent) variables TARGET vs OUTPUT - presents the 210 expected (experimental) values and the ones obtained by the proposed ANN Check reference below (to be added when the paper is published) https://www.researchgate.net/publication/329932699_Neural_Networks_-_Shear_Strength_-_Corrugated_Web_Girders
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A collection of datasets and python scripts for extraction and analysis of isograms (and some palindromes and tautonyms) from corpus-based word-lists, specifically Google Ngram and the British National Corpus (BNC).Below follows a brief description, first, of the included datasets and, second, of the included scripts.1. DatasetsThe data from English Google Ngrams and the BNC is available in two formats: as a plain text CSV file and as a SQLite3 database.1.1 CSV formatThe CSV files for each dataset actually come in two parts: one labelled ".csv" and one ".totals". The ".csv" contains the actual extracted data, and the ".totals" file contains some basic summary statistics about the ".csv" dataset with the same name.The CSV files contain one row per data point, with the colums separated by a single tab stop. There are no labels at the top of the files. Each line has the following columns, in this order (the labels below are what I use in the database, which has an identical structure, see section below):
Label Data type Description
isogramy int The order of isogramy, e.g. "2" is a second order isogram
length int The length of the word in letters
word text The actual word/isogram in ASCII
source_pos text The Part of Speech tag from the original corpus
count int Token count (total number of occurences)
vol_count int Volume count (number of different sources which contain the word)
count_per_million int Token count per million words
vol_count_as_percent int Volume count as percentage of the total number of volumes
is_palindrome bool Whether the word is a palindrome (1) or not (0)
is_tautonym bool Whether the word is a tautonym (1) or not (0)
The ".totals" files have a slightly different format, with one row per data point, where the first column is the label and the second column is the associated value. The ".totals" files contain the following data:
Label
Data type
Description
!total_1grams
int
The total number of words in the corpus
!total_volumes
int
The total number of volumes (individual sources) in the corpus
!total_isograms
int
The total number of isograms found in the corpus (before compacting)
!total_palindromes
int
How many of the isograms found are palindromes
!total_tautonyms
int
How many of the isograms found are tautonyms
The CSV files are mainly useful for further automated data processing. For working with the data set directly (e.g. to do statistics or cross-check entries), I would recommend using the database format described below.1.2 SQLite database formatOn the other hand, the SQLite database combines the data from all four of the plain text files, and adds various useful combinations of the two datasets, namely:• Compacted versions of each dataset, where identical headwords are combined into a single entry.• A combined compacted dataset, combining and compacting the data from both Ngrams and the BNC.• An intersected dataset, which contains only those words which are found in both the Ngrams and the BNC dataset.The intersected dataset is by far the least noisy, but is missing some real isograms, too.The columns/layout of each of the tables in the database is identical to that described for the CSV/.totals files above.To get an idea of the various ways the database can be queried for various bits of data see the R script described below, which computes statistics based on the SQLite database.2. ScriptsThere are three scripts: one for tiding Ngram and BNC word lists and extracting isograms, one to create a neat SQLite database from the output, and one to compute some basic statistics from the data. The first script can be run using Python 3, the second script can be run using SQLite 3 from the command line, and the third script can be run in R/RStudio (R version 3).2.1 Source dataThe scripts were written to work with word lists from Google Ngram and the BNC, which can be obtained from http://storage.googleapis.com/books/ngrams/books/datasetsv2.html and [https://www.kilgarriff.co.uk/bnc-readme.html], (download all.al.gz).For Ngram the script expects the path to the directory containing the various files, for BNC the direct path to the *.gz file.2.2 Data preparationBefore processing proper, the word lists need to be tidied to exclude superfluous material and some of the most obvious noise. This will also bring them into a uniform format.Tidying and reformatting can be done by running one of the following commands:python isograms.py --ngrams --indir=INDIR --outfile=OUTFILEpython isograms.py --bnc --indir=INFILE --outfile=OUTFILEReplace INDIR/INFILE with the input directory or filename and OUTFILE with the filename for the tidied and reformatted output.2.3 Isogram ExtractionAfter preparing the data as above, isograms can be extracted from by running the following command on the reformatted and tidied files:python isograms.py --batch --infile=INFILE --outfile=OUTFILEHere INFILE should refer the the output from the previosu data cleaning process. Please note that the script will actually write two output files, one named OUTFILE with a word list of all the isograms and their associated frequency data, and one named "OUTFILE.totals" with very basic summary statistics.2.4 Creating a SQLite3 databaseThe output data from the above step can be easily collated into a SQLite3 database which allows for easy querying of the data directly for specific properties. The database can be created by following these steps:1. Make sure the files with the Ngrams and BNC data are named “ngrams-isograms.csv” and “bnc-isograms.csv” respectively. (The script assumes you have both of them, if you only want to load one, just create an empty file for the other one).2. Copy the “create-database.sql” script into the same directory as the two data files.3. On the command line, go to the directory where the files and the SQL script are. 4. Type: sqlite3 isograms.db 5. This will create a database called “isograms.db”.See the section 1 for a basic descript of the output data and how to work with the database.2.5 Statistical processingThe repository includes an R script (R version 3) named “statistics.r” that computes a number of statistics about the distribution of isograms by length, frequency, contextual diversity, etc. This can be used as a starting point for running your own stats. It uses RSQLite to access the SQLite database version of the data described above.
This dataset provides information about the number of properties, residents, and average property values for Limroth Row cross streets in Point Pleasant Beach, NJ.
The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national filewith no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independentdata set, or they can be combined to cover the entire nation. The Census Bureau includes landmarks in theMTDB for locating special features and to help enumerators during field operations. Some of the more common landmark types include area landmarks such as airports, cemeteries, parks, mountain peaks/summits, schools, and churches and other religious institutions. The Census Bureau has added landmark features to MTDB on an as-needed basis and made no attempt to ensure that all instances of a particular feature were included. The presence or absence of a landmark such as a hospital or prison does not mean that the living quarters associated with that landmark were geocoded to that census tabulation block or excluded from the census enumeration.
These are Rights-of-Ways (ROW) on Idaho BLM land (and some other Federal agency land) as shown on Bureau of Land Management (BLM) Master Title Plats (MTP). Every GIS ROW feature has a "CASEFILE" value, also known as the serial number of the ROW. This corresponds to the LR2000 database, which is a national BLM database for federal lands information. This GIS ROW feature class can be joined or related to exported information from LR2000 using the "CASEFILE" (GIS) and "SERIAL_NR_FULL" (LR2000) fields. NOTE: the LR2000 information is only available to internal BLM users and is not available to the public as it contains sensitive information. This ROW data for any given area may not be complete due to new ROW activity or because of missed or coincident ROW features during the initial data creation. It is recommended that a thorough inventory of all ROWs in a specific project area be obtained (an LR2000 report can provide this) and the GIS ROW data be checked before using this data for projects needing utmost ROW accuracy. The ROW data that was digitized is what was present on the MTP at the time of the digitizing done for that township. The project was performed over several years. Therefore, the "early" townships digitized are more out of date regarding ROWs compared to the ones more recently digitized. Unfortunately, there is no attribute that indicates the digitizing sequence. Any updates to this ROW feature class should be sent to the BLM Idaho State Office GIS staff for incorporation into the statewide GIS ROW feature classes for improvement over time. For more information contact us at blm_id_stateoffice@blm.gov.
This dataset provides information about the number of properties, residents, and average property values for Rivers Point Row cross streets in Charleston, SC.
Quadrant provides Insightful, accurate, and reliable mobile location data.
Our privacy-first mobile location data unveils hidden patterns and opportunities, provides actionable insights, and fuels data-driven decision-making at the world's biggest companies.
These companies rely on our privacy-first Mobile Location and Points-of-Interest Data to unveil hidden patterns and opportunities, provide actionable insights, and fuel data-driven decision-making. They build better AI models, uncover business insights, and enable location-based services using our robust and reliable real-world data.
We conduct stringent evaluations on data providers to ensure authenticity and quality. Our proprietary algorithms detect, and cleanse corrupted and duplicated data points – allowing you to leverage our datasets rapidly with minimal processing or cleaning. During the ingestion process, our proprietary Data Filtering Algorithms remove events based on a number of both qualitative factors, as well as latency and other integrity variables to provide more efficient data delivery. The deduplicating algorithm focuses on a combination of four important attributes: Device ID, Latitude, Longitude, and Timestamp. This algorithm scours our data and identifies rows that contain the same combination of these four attributes. Post-identification, it retains a single copy and eliminates duplicate values to ensure our customers only receive complete and unique datasets.
We actively identify overlapping values at the provider level to determine the value each offers. Our data science team has developed a sophisticated overlap analysis model that helps us maintain a high-quality data feed by qualifying providers based on unique data values rather than volumes alone – measures that provide significant benefit to our end-use partners.
Quadrant mobility data contains all standard attributes such as Device ID, Latitude, Longitude, Timestamp, Horizontal Accuracy, and IP Address, and non-standard attributes such as Geohash and H3. In addition, we have historical data available back through 2022.
Through our in-house data science team, we offer sophisticated technical documentation, location data algorithms, and queries that help data buyers get a head start on their analyses. Our goal is to provide you with data that is “fit for purpose”.
The data is a synthetic univariate time series.
This data set is designed for testing indexing schemes in time seriesdatabases. The data appears highly periodic, but never exactly repeats itself. This feature is designed to challenge the indexing tasks.
This data set is designed for testing indexing schemes in time series databases. It is a much larger dataset than has been used in any published study (That we are currently aware of). It contains one million data points. The data has been split into 10 sections to facilitate testing (see below). We recommend building the index with 9 of the 100,000-datapoint sections, and randomly extracting a query shape from the 10th section. (Some previously published work seems to have used queries that were also used to build the indexing structure. This will produce optimistic results) The data are interesting because they have structure at different resolutions. Each of the 10 sections where generated by independent invocations of the function:https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3650646%2F63a7467c9c096ba461b6f02702e6d816%2Fequation.jpg?generation=1598371655944726&alt=media" alt="">
Where rand(x) produces a random integer between zero and x. The data appears highly periodic, but never exactly repeats itself. This feature is designed to challenge the indexing structure.
What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
The data is stored in one ASCII file. There are 10 columns, 100,000 rows. All data points are in the range -0.5 to +0.5. Rows are separated by carriage returns, columns by spaces.
Acknowledgements, Copyright Information, and Availability.Freely available for research use.
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
The Repository Analytics and Metrics Portal (RAMP) is a web service that aggregates use and performance use data of institutional repositories. The data are a subset of data from RAMP, the Repository Analytics and Metrics Portal (http://rampanalytics.org), consisting of data from all participating repositories for the calendar year 2017. For a description of the data collection, processing, and output methods, please see the "methods" section below.
Methods RAMP Data Documentation – January 1, 2017 through August 18, 2018
Data Collection
RAMP data are downloaded for participating IR from Google Search Console (GSC) via the Search Console API. The data consist of aggregated information about IR pages which appeared in search result pages (SERP) within Google properties (including web search and Google Scholar).
Data from January 1, 2017 through August 18, 2018 were downloaded in one dataset per participating IR. The following fields were downloaded for each URL, with one row per URL:
url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
impressions: The number of times the URL appears within the SERP.
clicks: The number of clicks on a URL which took users to a page outside of the SERP.
clickThrough: Calculated as the number of clicks divided by the number of impressions.
position: The position of the URL within the SERP.
country: The country from which the corresponding search originated.
device: The device used for the search.
date: The date of the search.
Following data processing describe below, on ingest into RAMP an additional field, citableContent, is added to the page level data.
Note that no personally identifiable information is downloaded by RAMP. Google does not make such information available.
More information about click-through rates, impressions, and position is available from Google's Search Console API documentation: https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query and https://support.google.com/webmasters/answer/7042828?hl=en
Data Processing
Upon download from GSC, data are processed to identify URLs that point to citable content. Citable content is defined within RAMP as any URL which points to any type of non-HTML content file (PDF, CSV, etc.). As part of the daily download of statistics from Google Search Console (GSC), URLs are analyzed to determine whether they point to HTML pages or actual content files. URLs that point to content files are flagged as "citable content." In addition to the fields downloaded from GSC described above, following this brief analysis one more field, citableContent, is added to the data which records whether each URL in the GSC data points to citable content. Possible values for the citableContent field are "Yes" and "No."
Processed data are then saved in a series of Elasticsearch indices. From January 1, 2017, through August 18, 2018, RAMP stored data in one index per participating IR.
About Citable Content Downloads
Data visualizations and aggregations in RAMP dashboards present information about citable content downloads, or CCD. As a measure of use of institutional repository content, CCD represent click activity on IR content that may correspond to research use.
CCD information is summary data calculated on the fly within the RAMP web application. As noted above, data provided by GSC include whether and how many times a URL was clicked by users. Within RAMP, a "click" is counted as a potential download, so a CCD is calculated as the sum of clicks on pages/URLs that are determined to point to citable content (as defined above).
For any specified date range, the steps to calculate CCD are:
Filter data to only include rows where "citableContent" is set to "Yes."
Sum the value of the "clicks" field on these rows.
Output to CSV
Published RAMP data are exported from the production Elasticsearch instance and converted to CSV format. The CSV data consist of one "row" for each page or URL from a specific IR which appeared in search result pages (SERP) within Google properties as described above.
The data in these CSV files include the following fields:
url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
impressions: The number of times the URL appears within the SERP.
clicks: The number of clicks on a URL which took users to a page outside of the SERP.
clickThrough: Calculated as the number of clicks divided by the number of impressions.
position: The position of the URL within the SERP.
country: The country from which the corresponding search originated.
device: The device used for the search.
date: The date of the search.
citableContent: Whether or not the URL points to a content file (ending with pdf, csv, etc.) rather than HTML wrapper pages. Possible values are Yes or No.
index: The Elasticsearch index corresponding to page click data for a single IR.
repository_id: This is a human readable alias for the index and identifies the participating repository corresponding to each row. As RAMP has undergone platform and version migrations over time, index names as defined for the index field have not remained consistent. That is, a single participating repository may have multiple corresponding Elasticsearch index names over time. The repository_id is a canonical identifier that has been added to the data to provide an identifier that can be used to reference a single participating repository across all datasets. Filtering and aggregation for individual repositories or groups of repositories should be done using this field.
Filenames for files containing these data follow the format 2017-01_RAMP_all.csv. Using this example, the file 2017-01_RAMP_all.csv contains all data for all RAMP participating IR for the month of January, 2017.
References
Google, Inc. (2021). Search Console APIs. Retrieved from https://developers.google.com/webmaster-tools/search-console-api-original.
This is the Extended Golf Play Dataset, a rich and detailed collection designed to expand upon the classic golf dataset [1]. It incorporates a wide array of features suitable for various data science applications and is especially valuable for teaching purposes [1]. The dataset is organised in a long format, where each row represents a single observation and often includes textual data, such as player reviews or comments [2]. It contains a special set of mini datasets, each tailored to a specific teaching point, for example, demonstrating data cleaning or combining datasets [1]. These are ideal for beginners to practise with real examples and are complemented by notebooks with step-by-step guides [1].
The dataset features a variety of columns, including core, extra, and text-based attributes: * ID: A unique identifying number for each player [1]. * Date: The specific day the data was recorded or the golf session took place [1, 2]. * Weekday: The day of the week, with numerical representation (e.g., 0 for Sunday, 1 for Monday) [1, 3]. * Holiday: Indicates whether the day was a special holiday (Yes/No), specifically noted for holidays in Japan (1 for yes, 0 for no) [1, 3]. * Month: The month in which golf was played [3]. * Season: The time of year, such as spring, summer, autumn, or winter [1, 3]. * Outlook: Describes the weather conditions during the session (e.g., sunny, cloudy, rainy, snowy) [1, 3]. * Temperature: The ambient temperature during the golf session, recorded in Celsius [1, 3]. * Humidity: The percentage of moisture in the air [1, 3]. * Windy: A boolean indicator (True/False or 1 for yes, 0 for no) if it was windy [1, 3]. * Crowded-ness: A measure of how busy the golf course was, ranging from 0 to 1 [1, 4]. * PlayTime-Hour: The duration for which people played golf, in hours [1]. * Play: Indicates whether golf was played or not (Yes/No) [1]. * Review: Textual feedback from players about their day at golf [1]. * EmailCampaign: Text content of emails sent daily by the golf place [1]. * MaintenanceTasks: Descriptions of work carried out to maintain the golf course [1].
This dataset is organised in a long format, meaning each row represents a single observation [2]. Data files are typically in CSV format, with sample files updated separately to the platform [5]. Specific numbers for rows or records are not currently available within the provided sources. The dataset also includes a special collection of mini datasets within its structure [1].
This dataset is highly versatile and ideal for learning and applying various data science skills: * Data Visualisation: Learn to create graphs and identify patterns within the data [1]. * Predictive Modelling: Discover which data points are useful for predicting if golf will be played [1]. * Data Cleaning: Practise spotting and managing data that appears incorrect or inconsistent [1]. * Time Series Analysis: Understand how various factors change over time, such as daily or monthly trends [1, 2]. * Data Grouping: Learn to combine similar days or observations together [1]. * Text Analysis: Extract insights from textual features like player reviews, potentially for sentiment analysis or thematic extraction [1, 2]. * Recommendation Systems: Develop models to suggest optimal times to play golf based on historical data [1]. * Data Management: Gain experience in managing and analysing data structured in a long format, which is common for repeated measures [2].
The dataset's regional coverage is global [6]. While the Date
column records the day the data was captured or the session occurred, no specific time range for the collected data is stated beyond the listing date of 11/06/2025 [1, 6]. Demographic scope includes unique player IDs [1], but no specific demographic details or data availability notes for particular groups or years are provided.
CC-BY
This dataset is designed for a broad audience: * New Learners: It is easy to understand and comes with guides to aid the learning process [1]. * Teachers: An excellent resource for conducting classes on data visualisation and interpretation [1]. * Researchers: Suitable for testing novel data analysis methodologies [1]. * Students: Can acquire a wide range of skills, from making graphs to understanding textual data and building recommendation systems [1].
Original Data Source: ⛳️ Golf Play Dataset Extended
PLAD is a dataset where sparse depth is provided by line-based visual SLAM to verify StructMDC.
The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national filewith no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independentdata set, or they can be combined to cover the entire nation. The Census Bureau includes landmarks in theMTDB for locating special features and to help enumerators during field operations. Some of the more common landmark types include area landmarks such as airports, cemeteries, parks, mountain peaks/summits, schools, and churches and other religious institutions. The Census Bureau has added landmark features to MTDB on an as-needed basis and made no attempt to ensure that all instances of a particular feature were included. The presence or absence of a landmark such as a hospital or prison does not mean that the living quarters associated with that landmark were geocoded to that census tabulation block or excluded from the census enumeration.
This dataset provides information about the number of properties, residents, and average property values for Zilai Row cross streets in Point Pleasant Beach, NJ.
Right-of-way permits associated with a specific street address. A permit is expected to have only one GIS feature (line or point), but the data may have some anomalies.A permit is required to perform any construction work within the public right-of-way or any construction work outside of the public right-of-way that will cut, break, or otherwise damage the public right-of-way. The authoritative source for permit information is the Right-of-Way Management System (https://rowmanagement.dallascityhall.com/Login.aspx).
Licensees and Registrants Fleet Information dataset contains information about active vehicles of approved BIC licensees and registrants. This data is partially collected from the application submitted to the commission by licensees and registrants. The majority of data points are collected from licensees and registrants via the Vehicle Management Portal. Business Integrity Commission maintains this dataset, updating it quarterly. Each row of data contains information about an active vehicle registered at BIC, including BIC plate number, vehicle year, vehicle model, vehicle make, engine year and so on.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes data for NB-IoT and 5G networks as collected in two cities: Oslo, Norway (NB-IoT only) and Rome, Italy (both NB-IoT and 5G).
Data were collected using the Rohde & Schwarz TSMA6 mobile network scanner. 7 measurement campaigns are provided for Oslo, and 6 for Rome. Additional data collected in Rome are provided in the following large-scale dataset, focusing on the two major mobile network operators: https://ieee-dataport.org/documents/large-scale-dataset-4g-nb-iot-and-5g-non-standalone-network-measurements
The dataset includes a metadata file providing the following information for each campaign:
date of collection;
start time and end time of collection;
length;
type (walking/driving).
Two additional metadata files are provided: two .kml files, one for each city, allowing the import of coordinates of data points organized by campaign in a GIS engine, such as Google Earth, for interactive visualization.
The dataset contains the following data for NB-IoT:
Raw data for each campaign, stored in two .csv files. For a generic campaign , the files are:
NB-IoT_coverage_C.csv including a geo-tagged data entry in each row. Each entry provides information on a Narrowband Physical Cell Identifier (NPCI), with data related to the time stamp the NPCI was detected, GPS information, network (NPCI, Operator, Country Code, eNodeB-ID) and RF signal (RSSI, SINR, RSRP and RSRQ values);
NB-IoT_RefSig_cir_C.csv, also including a geo-tagged data entry in each row. Each entry provides information on a NPCI, with data related to the time stamp the NPCI was detected, GPS information, network (NPCI, Operator ID, Country Code, eNodeB-ID) and Channel Impulse Response (CIR) statistics, including the maximum delay.
Processed data, stored in a Matlab workspace (.mat) file for each city: data are grouped in data points, identified by pairs. Each data point provides RF and CIR maximum delay measurements for each unique combination detected at the coordinates of the data point.
Estimated positions of eNodeBs, stored in a csv file for each city;
A matlab script and a function to extract and generate processed data from the raw data for each city.
The dataset contains the following data for 5G:
Raw data for each campaign, stored in two .xslx files. For a generic campaign , the files are:
5G_coverage_C.xslx including a geo-tagged data entry in each row. Each entry provides information on a Physical Cell Identifier (PCI), with data related to the time stamp the PCI was detected, GPS information, network (PCI, Beamforming Index, Operator, Country Code) and RF data (SSB-RSSI, SSS-SINR, SSS-RSRP and SSS-RSRQ values, and similar information for the PBCH signal);
5G_RefSig_cir_C.csv, also including a geo-tagged data entry in each row. Each entry provides information on a PCI, with data related to the time stamp the PCI was detected, GPS information, network (PCI, Beamforming Index, Operator ID, Country Code) and Channel Impulse Response (CIR) statistics, including the maximum delay.
Processed data, stored in a Matlab workspace (.mat) file: data are grouped in data points, identified by pairs. Each data point provides RF and CIR maximum delay measurements for each unique combination detected at the coordinates of the data point.
A matlab script and a supporting function to extract and generate processed data from the raw data.
In addition, in the case of the Rome data additional matlab workspaces are provided, containing interpolated data in the feature dimensions according to two different approaches:
A campaign-by-campaign linear interpolation (both NB-IoT and 5G);
A bidimensional interpolation on all campaigns combined (NB-IoT only).
A function to interpolate missing data in the original data according to the first approach is also provided for each technology. The interpolation rationale and procedure for the first approach is detailed in:
L. De Nardis, G. Caso, Ö. Alay, U. Ali, M. Neri, A. Brunstrom and M.-G. Di Benedetto, "Positioning by Multicell Fingerprinting in Urban NB-IoT networks," Sensors, Volume 23, Issue 9, Article ID 4266, April 2023. DOI: 10.3390/s23094266.
The second interpolation approach is instead introduced and described in:
L. De Nardis, M. Savelli, G. Caso, F. Ferretti, L. Tonelli, N. Bouzar, A. Brunstrom, O. Alay, M. Neri, F. Elbahhar and M.-G. Di Benedetto, " Range-free Positioning in NB-IoT Networks by Machine Learning: beyond WkNN", under major revision in IEEE Journal of Indoor and Seamless Positioning and Navigation.
Positioning using the 5G data was furthermore in investigated in:
K. Kousias, M. Rajiullah, G. Caso, U. Ali, Ö. Alay, A. Brunstrom, L. De Nardis, M. Neri, and M.-G. Di Benedetto, "A Large-Scale Dataset of 4G, NB-IoT, and 5G Non-Standalone Network Measurements," IEEE Communications Magazine, Volume 62, Issue 5, pp. 44-49, May 2024. DOI: 10.1109/MCOM.011.2200707.
G. Caso, M. Rajiullah, K. Kousias, U. Ali, N. Bouzar, L. De Nardis, A. Brunstrom, Ö. Alay, M. Neri and M.-G. Di Benedetto,"The Chronicles of 5G Non-Standalone: An Empirical Analysis of Performance and Service Evolution", IEEE Open Journal of the Communications Society, Volume 5, pp. 7380 - 7399, 2024. DOI: 10.1109/OJCOMS.2024.3499370.
Please refer to the above publications when using and citing the dataset.
TIGER road data for the MSA. When compared to high-resolution imagery and other transportation datasets positional inaccuracies were observed. As a result caution should be taken when using this dataset. TIGER, TIGER/Line, and Census TIGER are registered trademarks of the U.S. Census Bureau. ZCTA is a trademark of the U.S. Census Bureau. The Census 2000 TIGER/Line files are an extract of selected geographic and cartographic information from the Census TIGER data base. The geographic coverage for a single TIGER/Line file is a county or statistical equivalent entity, with the coverage area based on January 1, 2000 legal boundaries. A complete set of census 2000 TIGER/Line files includes all counties and statistically equivalent entities in the United States, Puerto Rico, and the Island Areas. The Census TIGER data base represents a seamless national file with no overlaps or gaps between parts. However, each county-based TIGER/Line file is designed to stand alone as an independent data set or the files can be combined to cover the whole Nation. The Census 2000 TIGER/Line files consist of line segments representing physical features and governmental and statistical boundaries. The boundary information in the TIGER/Line files are for statistical data collection and tabulation purposes only; their depiction and designation for statistical purposes does not constitute a determination of jurisdictional authority or rights of ownership or entitlement. The Census 2000 TIGER/Line files do NOT contain the Census 2000 urban areas which have not yet been delineated. The files contain information distributed over a series of record types for the spatial objects of a county. There are 17 record types, including the basic data record, the shape coordinate points, and geographic codes that can be used with appropriate software to prepare maps. Other geographic information contained in the files includes attributes such as feature identifiers/census feature class codes (CFCC) used to differentiate feature types, address ranges and ZIP Codes, codes for legal and statistical entities, latitude/longitude coordinates of linear and point features, landmark point features, area landmarks, key geographic features, and area boundaries. The Census 2000 TIGER/Line data dictionary contains a complete list of all the fields in the 17 record types. This is part of a collection of 221 Baltimore Ecosystem Study metadata records that point to a geodatabase. The geodatabase is available online and is considerably large. Upon request, and under certain arrangements, it can be shipped on media, such as a usb hard drive. The geodatabase is roughly 51.4 Gb in size, consisting of 4,914 files in 160 folders. Although this metadata record and the others like it are not rich with attributes, it is nonetheless made available because the data that it represents could be indeed useful. This is part of a collection of 221 Baltimore Ecosystem Study metadata records that point to a geodatabase. The geodatabase is available online and is considerably large. Upon request, and under certain arrangements, it can be shipped on media, such as a usb hard drive. The geodatabase is roughly 51.4 Gb in size, consisting of 4,914 files in 160 folders. Although this metadata record and the others like it are not rich with attributes, it is nonetheless made available because the data that it represents could be indeed useful.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset comprises 4 fire experiments (repeated 3 times) and 3 nuisance experiments (Ethanol: repeated 3 times, Deodorant: repeated 2 times, Hairspray: repeated 1 time), with various background sequences interspersed between the conducted experiments. All exeriments were caried out in random order to reduce the influence of prehistory. It consists of a total of 305,304 rows and 16 columns, structured as a continuous multivariate time series. Each row represents the sensor measurements (CO2, CO, H2, humidity, particulate matter of different sizes, air temperature, and UV) from a unique sensor node position in the EN54 test room at a specific timestamp. The columns correspond to the sensor measurements and include additional labels: a scenario-specific label ("scenario_label"), a binary label ("anomaly_label") distinguishing between "Normal" (background) and "Anomaly" (fire or nuisance scenario), a ternary label ("ternary_label") categorizing the data as "Nuisance," "Fire," or "Background," and a progress label ("progress_label") that allows for dividing the event sequences into sub-sequences based on ongoing physical sub-processes. The dataset comprises 82.98% background data points and 17.02% anomaly data points, which can be further divided into 12.50% fire anomaly data points and 4.52% nuisance anomaly data points. The "Sensor_ID" column can be utilized to access data from different sensor node positions.
Quadrant provides Insightful, accurate, and reliable mobile location data.
Our privacy-first mobile location data unveils hidden patterns and opportunities, provides actionable insights, and fuels data-driven decision-making at the world's biggest companies.
These companies rely on our privacy-first Mobile Location and Points-of-Interest Data to unveil hidden patterns and opportunities, provide actionable insights, and fuel data-driven decision-making. They build better AI models, uncover business insights, and enable location-based services using our robust and reliable real-world data.
We conduct stringent evaluations on data providers to ensure authenticity and quality. Our proprietary algorithms detect, and cleanse corrupted and duplicated data points – allowing you to leverage our datasets rapidly with minimal processing or cleaning. During the ingestion process, our proprietary Data Filtering Algorithms remove events based on a number of both qualitative factors, as well as latency and other integrity variables to provide more efficient data delivery. The deduplicating algorithm focuses on a combination of four important attributes: Device ID, Latitude, Longitude, and Timestamp. This algorithm scours our data and identifies rows that contain the same combination of these four attributes. Post-identification, it retains a single copy and eliminates duplicate values to ensure our customers only receive complete and unique datasets.
We actively identify overlapping values at the provider level to determine the value each offers. Our data science team has developed a sophisticated overlap analysis model that helps us maintain a high-quality data feed by qualifying providers based on unique data values rather than volumes alone – measures that provide significant benefit to our end-use partners.
Quadrant mobility data contains all standard attributes such as Device ID, Latitude, Longitude, Timestamp, Horizontal Accuracy, and IP Address, and non-standard attributes such as Geohash and H3. In addition, we have historical data available back through 2022.
Through our in-house data science team, we offer sophisticated technical documentation, location data algorithms, and queries that help data buyers get a head start on their analyses. Our goal is to provide you with data that is “fit for purpose”.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The goal of this sampling effort is to describe the vegetation response to treatments. Data were collected following the line-point intercept method (Herrick et al. 2009). Although the original LPI data set was in multivariate form with separate columns for canopy layers and soil surface, this data set has been transposed into vertical form, implementing a “layer” variable, so that all species and soil surface codes appear in one column. Within each exclosure, 4837 points were sampled with the following exceptions: year exclosure total_points_sampled 1996 5 4825 1996 7 4836 1996 9 4836 1996 10 4836 1997 1 4830 1997 2 4830 1997 3 4830 1997 4 4830 1997 5 4830 1997 6 4830 1997 7 4830 1997 8 4830 1997 9 4830 1997 10 4830 1997 11 4830 1997 12 4830 1997 13 4830 1997 14 4830 1997 15 4830 1997 16 4830 1997 17 4830 1997 18 4830 2002 12 4835
210 data points meaning of each excel sheet: IN - input variable values for each data point (each data point is one row) TARGET - target variable values for each data point (each data point is one row) VARS - presents the units used for each input (independent) and output/target (dependent) variables TARGET vs OUTPUT - presents the 210 expected (experimental) values and the ones obtained by the proposed ANN Check reference below (to be added when the paper is published) https://www.researchgate.net/publication/329932699_Neural_Networks_-_Shear_Strength_-_Corrugated_Web_Girders