100+ datasets found

R
Data Merge Up 2 Dataset
universe.roboflow.com
zip
Updated Apr 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Senarios (2025). Data Merge Up 2 Dataset [Dataset]. https://universe.roboflow.com/senarios-k3093/data-merge-up-2
Explore at:
zipAvailable download formats
Dataset updated
Apr 24, 2025
Dataset authored and provided by
Senarios
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Cardboard H3Yh Masks
Description
Data Merge Up 2

## Overview Data Merge Up 2 is a dataset for semantic segmentation tasks - it contains Cardboard H3Yh annotations for 3,453 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
R
Merge 2 Projects Dataset
universe.roboflow.com
zip
Updated Apr 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
school (2024). Merge 2 Projects Dataset [Dataset]. https://universe.roboflow.com/school-x9hrn/merge-2-projects
Explore at:
zipAvailable download formats
Dataset updated
Apr 25, 2024
Dataset authored and provided by
school
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Stop TurnRight PkFs Bounding Boxes
Description
Merge 2 Projects

## Overview Merge 2 Projects is a dataset for object detection tasks - it contains Stop TurnRight PkFs annotations for 1,220 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
H
pop_dhs: Merging multiple DHS files
dataverse.harvard.edu
Updated May 22, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Günther Fink (2014). pop_dhs: Merging multiple DHS files [Dataset]. http://doi.org/10.7910/DVN/26098
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/26098
Dataset updated
May 22, 2014
Dataset provided by
Harvard Dataverse
Authors
Günther Fink
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This .do file merges multiple DHS's, and keeps memory requirements to a minimum; what it does it quite simple: 1. Loops through all available survey data files (and counts them) 2. For each file, generates variables with missing values if they are not in the file 3. Generates a partial file for each variable extraction 4. Merges all partial files and deletes them What the user needs to do is to: - paste the name of all surveys on top of the file (after local survey_list) - paste the list of all DHS variables needed into the file (after global myvars)
f
Statistical Analysis of Individual Participant Data Meta-Analyses: A...
plos.figshare.com
datasetcatalog.nlm.nih.gov
tiff
Updated Jun 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gavin B. Stewart; Douglas G. Altman; Lisa M. Askie; Lelia Duley; Mark C. Simmonds; Lesley A. Stewart (2023). Statistical Analysis of Individual Participant Data Meta-Analyses: A Comparison of Methods and Recommendations for Practice [Dataset]. http://doi.org/10.1371/journal.pone.0046042
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0046042
Dataset updated
Jun 8, 2023
Dataset provided by
PLOS ONE
Authors
Gavin B. Stewart; Douglas G. Altman; Lisa M. Askie; Lelia Duley; Mark C. Simmonds; Lesley A. Stewart
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundIndividual participant data (IPD) meta-analyses that obtain “raw” data from studies rather than summary data typically adopt a “two-stage” approach to analysis whereby IPD within trials generate summary measures, which are combined using standard meta-analytical methods. Recently, a range of “one-stage” approaches which combine all individual participant data in a single meta-analysis have been suggested as providing a more powerful and flexible approach. However, they are more complex to implement and require statistical support. This study uses a dataset to compare “two-stage” and “one-stage” models of varying complexity, to ascertain whether results obtained from the approaches differ in a clinically meaningful way. Methods and FindingsWe included data from 24 randomised controlled trials, evaluating antiplatelet agents, for the prevention of pre-eclampsia in pregnancy. We performed two-stage and one-stage IPD meta-analyses to estimate overall treatment effect and to explore potential treatment interactions whereby particular types of women and their babies might benefit differentially from receiving antiplatelets. Two-stage and one-stage approaches gave similar results, showing a benefit of using anti-platelets (Relative risk 0.90, 95% CI 0.84 to 0.97). Neither approach suggested that any particular type of women benefited more or less from antiplatelets. There were no material differences in results between different types of one-stage model. ConclusionsFor these data, two-stage and one-stage approaches to analysis produce similar results. Although one-stage models offer a flexible environment for exploring model structure and are useful where across study patterns relating to types of participant, intervention and outcome mask similar relationships within trials, the additional insights provided by their usage may not outweigh the costs of statistical support for routine application in syntheses of randomised controlled trials. Researchers considering undertaking an IPD meta-analysis should not necessarily be deterred by a perceived need for sophisticated statistical methods when combining information from large randomised trials.
R
Football Merge 2 Dataset
universe.roboflow.com
zip
Updated Nov 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tes (2024). Football Merge 2 Dataset [Dataset]. https://universe.roboflow.com/tes-2f60k/football-merge-2
Explore at:
zipAvailable download formats
Dataset updated
Nov 10, 2024
Dataset authored and provided by
Tes
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Referee Player Ball Football Players Detection Referee Player Ball Bounding Boxes
Description
Football Merge 2

## Overview Football Merge 2 is a dataset for object detection tasks - it contains Referee Player Ball Football Players Detection Referee Player Ball annotations for 1,396 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Citation data of arXiv eprints and the associated...
zenodo.org
data.niaid.nih.gov
bin, csv
Updated Aug 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Keisuke Okamura; Keisuke Okamura; Hitoshi Koshiba; Hitoshi Koshiba (2025). Citation data of arXiv eprints and the associated quantitatively-and-temporally normalised impact metrics [Dataset]. http://doi.org/10.5281/zenodo.5803962
Explore at:
csv, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5803962
Dataset updated
Aug 30, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Keisuke Okamura; Keisuke Okamura; Hitoshi Koshiba; Hitoshi Koshiba
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 25, 2021
Description
Data collection

This dataset contains information on the eprints posted on arXiv from its launch in 1991 until the end of 2019 (1,589,006 unique eprints), plus the data on their citations and the associated impact metrics. Here, eprints include preprints, conference proceedings, book chapters, data sets and commentary, i.e. every electronic material that has been posted on arXiv.

The content and metadata of the arXiv eprints were retrieved from the arXiv API (https://arxiv.org/help/api/) as of 21st January 2020, where the metadata included data of the eprint’s title, author, abstract, subject category and the arXiv ID (the arXiv’s original eprint identifier). In addition, the associated citation data were derived from the Semantic Scholar API (https://api.semanticscholar.org/) from 24th January 2020 to 7th February 2020, containing the citation information in and out of the arXiv eprints and their published versions (if applicable). Here, whether an eprint has been published in a journal or other means is assumed to be inferrable, albeit indirectly, from the status of the digital object identifier (DOI) assignment. It is also assumed that if an arXiv eprint received c_pre and c_pub citations until the data retrieval date (7th February 2020) before and after it is assigned a DOI, respectively, then the citation count of this eprint is recorded in the Semantic Scholar dataset as c_pre + c_pub. Both the arXiv API and the Semantic Scholar datasets contained the arXiv ID as metadata, which served as a key variable to merge the two datasets.

The classification of research disciplines is based on that described in the arXiv.org website (https://arxiv.org/help/stats/2020_by_area/). There, the arXiv subject categories are aggregated into several disciplines, of which we restrict our attention to the following six disciplines: Astrophysics (‘astro-ph’), Computer Science (‘comp-sci’), Condensed Matter Physics (‘cond-mat’), High Energy Physics (‘hep’), Mathematics (‘math’) and Other Physics (‘oth-phys’), which collectively accounted for 98% of all the eprints. Those eprints tagged to multiple arXiv disciplines were counted independently for each discipline. Due to this overlapping feature, the current dataset contains a cumulative total of 2,011,216 eprints.

Some general statistics and visualisations per research discipline are provided in the original article (Okamura, 2022), where the validity and limitations associated with the dataset are also discussed.

Description of columns (variables)

arxiv_id : arXiv ID

category : Research discipline

pre_year : Year of posting v1 on arXiv

pub_year : Year of DOI acquisition

c_tot : No. of citations acquired during 1991–2019

c_pre : No. of citations acquired before and including the year of DOI acquisition

c_pub : No. of citations acquired after the year of DOI acquisition

c_yyyy (yyyy = 1991, …, 2019) : No. of citations acquired in the year yyyy (with ‘yyyy’ running from 1991 to 2019)

gamma : The quantitatively-and-temporally normalised citation index

gamma_star : The quantitatively-and-temporally standardised citation index

Note: The definition of the quantitatively-and-temporally normalised citation index (γ; ‘gamma’) and that of the standardised citation index (γ*; ‘gamma_star’) are provided in the original article (Okamura, 2022). Both indices can be used to compare the citational impact of papers/eprints published in different research disciplines at different times.

Data files

A comma-separated values file (‘arXiv_impact.csv’) and a Stata file (‘arXiv_impact.dta’) are provided, both containing the same information.
h
ymcki_gemma-2-2b-ORPO-jpn-it-abliterated-18-merge-details
huggingface.co
Updated Jul 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Open LLM Leaderboard (2025). ymcki_gemma-2-2b-ORPO-jpn-it-abliterated-18-merge-details [Dataset]. https://huggingface.co/datasets/open-llm-leaderboard/ymcki_gemma-2-2b-ORPO-jpn-it-abliterated-18-merge-details
Explore at:
Dataset updated
Jul 30, 2025
Dataset authored and provided by
Open LLM Leaderboard
Description
Dataset Card for Evaluation run of ymcki/gemma-2-2b-ORPO-jpn-it-abliterated-18-merge

Dataset automatically created during the evaluation run of model ymcki/gemma-2-2b-ORPO-jpn-it-abliterated-18-merge The dataset is composed of 38 configuration(s), each one corresponding to one of the evaluated task. The dataset has been created from 4 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is… See the full description on the dataset page: https://huggingface.co/datasets/open-llm-leaderboard/ymcki_gemma-2-2b-ORPO-jpn-it-abliterated-18-merge-details.
d
Hydrologic Unit (HUC8) Boundaries for Alaskan Watersheds
dataone.org
search.dataone.org
Updated Mar 20, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jared Kibele (2019). Hydrologic Unit (HUC8) Boundaries for Alaskan Watersheds [Dataset]. http://doi.org/10.5063/F14T6GM3
Explore at:
Unique identifier
https://doi.org/10.5063/F14T6GM3
Dataset updated
Mar 20, 2019
Dataset provided by
Knowledge Network for Biocomplexity
Authors
Jared Kibele
Time period covered
Jan 19, 2018
Area covered

Variables measured
name, source, id_numeric, id_original
Description
The United States is divided and sub-divided into successively smaller hydrologic units which are classified into four levels: regions, sub-regions, accounting units, and cataloging units. The hydrologic units are arranged or nested within each other, from the largest geographic area (regions) to the smallest geographic area (cataloging units). Each hydrologic unit is identified by a unique hydrologic unit code (HUC) consisting of two to eight digits based on the four levels of classification in the hydrologic unit system. A shapefile (or geodatabase) of watersheds for the state of Alaska and parts of western Canada was created by merging two datasets: the U.S. Watershed Boundary Dataset (WBD) and the Government of Canada's National Hydro Network (NHN). Since many rivers in Alaska are transboundary, the NHN data is necessary to capture their watersheds. The WBD data can be found at https://catalog.data.gov/dataset/usgs-national-watershed-boundary-dataset-wbd-downloadable-data-collection-national-geospatial- and the NHN data can be found here: https://open.canada.ca/data/en/dataset/a4b190fe-e090-4e6d-881e-b87956c07977. The included python script was used to subset and merge the two datasets into the single dataset, archived here.
h
merge-with-keys-2
huggingface.co
Updated Jun 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pierre-Louis B (2025). merge-with-keys-2 [Dataset]. https://huggingface.co/datasets/PLB/merge-with-keys-2
Explore at:
Dataset updated
Jun 1, 2025
Authors
Pierre-Louis B
Description
merge-with-keys-2

This dataset was generated using a phospho starter pack. This dataset contains a series of episodes recorded with a robot and multiple cameras. It can be directly used to train a policy using imitation learning. It's compatible with LeRobot and RLDS.
a
VT Road Centerline
geodata1-vcgi.opendata.arcgis.com
geodata.vermont.gov
+3more
Updated Jun 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VTrans (2021). VT Road Centerline [Dataset]. https://geodata1-vcgi.opendata.arcgis.com/datasets/VTrans::vt-road-centerline
Explore at:
Dataset updated
Jun 1, 2021
Dataset authored and provided by
VTrans
Area covered

Description
(Link to Metadata)(User Guide)(Symbology layer files: aotclass_only.lyr aotclass_surfacetyp.lyr)The Vermont Road Centerline data layer (TransRoad_RDS) contains all town and state highways, as well as many private roads. The centerlines were originally developed under contract by Greenhorne and O'Mara under the guidance of VCGI (1992). VCGI was the original steward of the road centerline data between 1992 and 2004. The Vermont Agency of Transportation (VTrans) is now considered the steward of this data layer. Updates have been performed over the years by VCGI, RPCs, and VTrans. The VT Agency of Transportation has taken over the update and maintenance of the road centerline data layer and has revised the layer to match "Official" highway mileage. NOTE: TransRoad_RDS meets the requirements articulated in the VGIS Road Centerline Data Standard (https://vcgi.vermont.gov/resources/standards).This layer is the most reliable source for official VTrans road class (AOTCLASS) information. However, this layer may not include every private road, and the road name information is not may not match perfectly with the EmergencyE911_RDS data layer. The EmergencyE911_RDS road centerline layer maintained by VT's E911 Board as the most up-to-date roadname information. It was originally based on TransRoad_RDS, and is therefore very similar. However, it includes all private roads and generally more reliable road name and address range information. There was a significant change in the schema in the June 2013 release as part of the effort between VTrans and E911 to merge their two datasets. The data layer includes the field structure agreed to by both entities, but most of the fields that are E911's have not been populated in this release.Stewards: Information Technology, Data Owner: Mapping Unit
Data from: A dataset to model Levantine landcover and land-use change...
zenodo.org
data.niaid.nih.gov
zip
Updated Dec 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Kempf; Michael Kempf (2023). A dataset to model Levantine landcover and land-use change connected to climate change, the Arab Spring and COVID-19 [Dataset]. http://doi.org/10.5281/zenodo.10396148
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10396148
Dataset updated
Dec 16, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Michael Kempf; Michael Kempf
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 16, 2023
Area covered
Levant
Description
Overview

This dataset is the repository for the following paper submitted to Data in Brief:

Kempf, M. A dataset to model Levantine landcover and land-use change connected to climate change, the Arab Spring and COVID-19. Data in Brief (submitted: December 2023).

The Data in Brief article contains the supplement information and is the related data paper to:

Kempf, M. Climate change, the Arab Spring, and COVID-19 - Impacts on landcover transformations in the Levant. Journal of Arid Environments (revision submitted: December 2023).

Description/abstract

The Levant region is highly vulnerable to climate change, experiencing prolonged heat waves that have led to societal crises and population displacement. Since 2010, the area has been marked by socio-political turmoil, including the Syrian civil war and currently the escalation of the so-called Israeli-Palestinian Conflict, which strained neighbouring countries like Jordan due to the influx of Syrian refugees and increases population vulnerability to governmental decision-making. Jordan, in particular, has seen rapid population growth and significant changes in land-use and infrastructure, leading to over-exploitation of the landscape through irrigation and construction. This dataset uses climate data, satellite imagery, and land cover information to illustrate the substantial increase in construction activity and highlights the intricate relationship between climate change predictions and current socio-political developments in the Levant.

Folder structure

The main folder after download contains all data, in which the following subfolders are stored are stored as zipped files:

“code” stores the above described 9 code chunks to read, extract, process, analyse, and visualize the data.

“MODIS_merged” contains the 16-days, 250 m resolution NDVI imagery merged from three tiles (h20v05, h21v05, h21v06) and cropped to the study area, n=510, covering January 2001 to December 2022 and including January and February 2023.

“mask” contains a single shapefile, which is the merged product of administrative boundaries, including Jordan, Lebanon, Israel, Syria, and Palestine (“MERGED_LEVANT.shp”).

“yield_productivity” contains .csv files of yield information for all countries listed above.

“population” contains two files with the same name but different format. The .csv file is for processing and plotting in R. The .ods file is for enhanced visualization of population dynamics in the Levant (Socio_cultural_political_development_database_FAO2023.ods).

“GLDAS” stores the raw data of the NASA Global Land Data Assimilation System datasets that can be read, extracted (variable name), and processed using code “8_GLDAS_read_extract_trend” from the respective folder. One folder contains data from 1975-2022 and a second the additional January and February 2023 data.

“built_up” contains the landcover and built-up change data from 1975 to 2022. This folder is subdivided into two subfolder which contain the raw data and the already processed data. “raw_data” contains the unprocessed datasets and “derived_data” stores the cropped built_up datasets at 5 year intervals, e.g., “Levant_built_up_1975.tif”.

Code structure

1_MODIS_NDVI_hdf_file_extraction.R

This is the first code chunk that refers to the extraction of MODIS data from .hdf file format. The following packages must be installed and the raw data must be downloaded using a simple mass downloader, e.g., from google chrome. Packages: terra. Download MODIS data from after registration from: https://lpdaac.usgs.gov/products/mod13q1v061/ or https://search.earthdata.nasa.gov/search (MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid V061, last accessed, 09th of October 2023). The code reads a list of files, extracts the NDVI, and saves each file to a single .tif-file with the indication “NDVI”. Because the study area is quite large, we have to load three different (spatially) time series and merge them later. Note that the time series are temporally consistent.

2_MERGE_MODIS_tiles.R

In this code, we load and merge the three different stacks to produce large and consistent time series of NDVI imagery across the study area. We further use the package gtools to load the files in (1, 2, 3, 4, 5, 6, etc.). Here, we have three stacks from which we merge the first two (stack 1, stack 2) and store them. We then merge this stack with stack 3. We produce single files named NDVI_final_*consecutivenumber*.tif. Before saving the final output of single merged files, create a folder called “merged” and set the working directory to this folder, e.g., setwd("your directory_MODIS/merged").

3_CROP_MODIS_merged_tiles.R

Now we want to crop the derived MODIS tiles to our study area. We are using a mask, which is provided as .shp file in the repository, named "MERGED_LEVANT.shp". We load the merged .tif files and crop the stack with the vector. Saving to individual files, we name them “NDVI_merged_clip_*consecutivenumber*.tif. We now produced single cropped NDVI time series data from MODIS.
The repository provides the already clipped and merged NDVI datasets.

4_TREND_analysis_NDVI.R

Now, we want to perform trend analysis from the derived data. The data we load is tricky as it contains 16-days return period across a year for the period of 22 years. Growing season sums contain MAM (March-May), JJA (June-August), and SON (September-November). December is represented as a single file, which means that the period DJF (December-February) is represented by 5 images instead of 6. For the last DJF period (December 2022), the data from January and February 2023 can be added. The code selects the respective images from the stack, depending on which period is under consideration. From these stacks, individual annually resolved growing season sums are generated and the slope is calculated. We can then extract the p-values of the trend and characterize all values with high confidence level (0.05). Using the ggplot2 package and the melt function from reshape2 package, we can create a plot of the reclassified NDVI trends together with a local smoother (LOESS) of value 0.3.
To increase comparability and understand the amplitude of the trends, z-scores were calculated and plotted, which show the deviation of the values from the mean. This has been done for the NDVI values as well as the GLDAS climate variables as a normalization technique.

5_BUILT_UP_change_raster.R

Let us look at the landcover changes now. We are working with the terra package and get raster data from here: https://ghsl.jrc.ec.europa.eu/download.php?ds=bu (last accessed 03. March 2023, 100 m resolution, global coverage). Here, one can download the temporal coverage that is aimed for and reclassify it using the code after cropping to the individual study area. Here, I summed up different raster to characterize the built-up change in continuous values between 1975 and 2022.

6_POPULATION_numbers_plot.R

For this plot, one needs to load the .csv-file “Socio_cultural_political_development_database_FAO2023.csv” from the repository. The ggplot script provided produces the desired plot with all countries under consideration.

7_YIELD_plot.R

In this section, we are using the country productivity from the supplement in the repository “yield_productivity” (e.g., "Jordan_yield.csv". Each of the single country yield datasets is plotted in a ggplot and combined using the patchwork package in R.

8_GLDAS_read_extract_trend

The last code provides the basis for the trend analysis of the climate variables used in the paper. The raw data can be accessed https://disc.gsfc.nasa.gov/datasets?keywords=GLDAS%20Noah%20Land%20Surface%20Model%20L4%20monthly&page=1 (last accessed 9th of October 2023). The raw data comes in .nc file format and various variables can be extracted using the [“^a variable name”] command from the spatraster collection. Each time you run the code, this variable name must be adjusted to meet the requirements for the variables (see this link for abbreviations: https://disc.gsfc.nasa.gov/datasets/GLDAS_CLSM025_D_2.0/summary, last accessed 09th of October 2023; or the respective code chunk when reading a .nc file with the ncdf4 package in R) or run print(nc) from the code or use names(the spatraster collection).
Choosing one variable, the code uses the MERGED_LEVANT.shp mask from the repository to crop and mask the data to the outline of the study area.
From the processed data, trend analysis are conducted and z-scores were calculated following the code described above. However, annual trends require the frequency of the time series analysis to be set to value = 12. Regarding, e.g., rainfall, which is measured as annual sums and not means, the chunk r.sum=r.sum/12 has to be removed or set to r.sum=r.sum/1 to avoid calculating annual mean values (see other variables). Seasonal subset can be calculated as described in the code. Here, 3-month subsets were chosen for growing seasons, e.g. March-May (MAM), June-July (JJA), September-November (SON), and DJF (December-February, including Jan/Feb of the consecutive year).
From the data, mean values of 48 consecutive years are calculated and trend analysis are performed as describe above. In the same way, p-values are extracted and 95 % confidence level values are marked with dots on the raster plot. This analysis can be performed with a much longer time series, other variables, ad different spatial extent across the globe due to the availability of the GLDAS variables.
e
Happily unmarried survey - Dataset - B2FIND
b2find.eudat.eu
Updated Apr 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Happily unmarried survey - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/125bd8e6-db34-5bcc-902b-ab2d3c721dec
Explore at:
Dataset updated
Apr 27, 2023
Description
The data set comprises responses to a questionnaire survey with a wide range of items concerning finances, legal and relationship issues in non-traditional, cohabiting heterosexual couples. It is in the format of an SPSS .sav file with an N of 235 (having excluded a small number of respondents who did not meet the study criteria). The bulk of the data were obtained by means of an on-line survey with the remaining few obtained from a paper version of the questionnaire. Owing to the format imposed by the software used for the on-line version (PHPSurveyor) there are some instances where the precise response format for the items differed between the two versions. In order to merge the data into SPSS, some minor adjustments had to be made to make them compatible, such as combining 2 separate items in the on-line version on cohabitation length to obtain a single measure. To clarify, in the on-line version, two separate responses asked for the number of years of cohabitation and the number of months. These were combined in the final SPSS file into a single measure of overall cohabitation length in months. Thus, a respondent who had cohabited for 2 years and 6 months would receive a value in the final data set of 30 months’ cohabitation length. To indicate in full detail how some of the items have been combined for certain measures, an Excel file has been provided. At the top of the Excel file are the actual question items from the hard copy version of the questionnaire. Underneath in the purple band are the respective variable labels as they appear in the SPSS file. Below this, in blue, can be found the labels for the composite or recorded items that combine information from more than one of the original variables (for example, ‘household income combines information from the items asking for respondent’s own and partner’s income). The labels for the variable values can be found in the SPSS file in the conventional way. Studies of the monetary practices of (mainly) married couples have revealed gender-associated asymmetries in access to household resources. However, theory development has been restricted because gender issues are easily confounded with the ideological meaning(s) of ‘marriage’. In other words, is it being a ‘wife’ or ‘husband’ that produces such asymmetries, rather than gender per se? The proposed project aims to disentangle this conflation of ‘gender’ with ‘marriage’ by focusing on money management in non-traditional (ie unmarried cohabiting or non-cohabiting) heterosexual couples, The research will be in two phases: (1) in-depth qualitative interviews with individual partners in 15 non-traditional heterosexual (NTH) couples, including some that have specifically rejected the notion of marriage on ideological grounds, and (2) a larger scale survey of 300 NTH couples. Our main aims are to: (1) Provide a detailed analysis of how NTH couples organise their finances and compare this with existing data on married couples; (2) Develop theories of household financial management that are grounded in a more inclusive definition of ‘household’ or ‘family’; (3) Explore NTH couples’ understandings of their financial rights and responsibilities and consider the implications of their financial management practices for the proposed law reform governing financial provision on cohabitation breakdown. Our main aims are to: (1) Provide a detailed analysis of how NTH couples organise their finances and compare this with existing data on married couples; (2) Develop theories of household financial management that are grounded in a more inclusive definition of ‘household’ or ‘family’; (3) Explore NTH couples’ understandings of their financial rights and responsibilities and consider the implications of their financial management practices for the proposed law reform governing financial provision on cohabitation breakdown. Data was collected mainly via on-line questionnaire with 267 individual respondents; majority (235) were cohabiting and the rest married.
a
Township Range Section & Rancho Boundaries
hub.arcgis.com
Updated Nov 1, 2013
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
County of Los Angeles (2013). Township Range Section & Rancho Boundaries [Dataset]. https://hub.arcgis.com/datasets/5d7dcae143c84535b9ba0eb005f791b0
Explore at:
Dataset updated
Nov 1, 2013
Dataset authored and provided by
County of Los Angeles
Area covered

Description
This dataset contains the boundaries of the Public Land Survey System (PLSS) – Township Range Section boundaries, as well as the boundaries of the Ranchos and Landgrants that pre-dated the PLSS. In general these match the USGS topographic Quad Sheets from the US Geological Survey.Note – some boundaries may not match parcel boundaries where that is logical – we haven’t had the time to complete that movement.These are historic boundaries that still have impacts upon the names and geography of Los Angeles county today. For example, where does the name “Verdugo Mountains” come from – it comes from the Rancho established there.BackgroundPLSS (from wikipedia)The Public Land Survey System (PLSS) is the surveying method used historically over the largest fraction of the United States to survey and spatially identify land parcels before designation of eventual ownership, particularly for rural, wild or undeveloped land. It is sometimes referred to as the rectangular survey system (although non rectangular methods such as meandering can also be used)RanchosThe Spanish and, later, Mexican governments encouraged settlement of territory now known as California by the establishment of large land grants called ranchos, from which the English word ranch is derived. Land-grant titles (concessions) were government-issued, permanent, unencumbered property-ownership rights to land called ranchos.Why this dataset?This dataset was created in order to integrate the boundaries from two different datasets – a Rancho Boundary file created by Mike McDaniel of El Segundo, and a parcel-accurate Township Range Section file created by the LA County Department of Public Works. Thanks to both of those entities for creating those valuable source files. There are many sources of this data out there, but the rancho are holes in the PLSS datasets and the TRS is a hole in the rancho files. This combines both of those.Method of conflationTo merge these two datasets together, they were combined, and then any holes and overlaps were conflated to match the Rancho boundaries that were created by Mike McDaniel. When there were questions I used the USGS topographic quad sheets to verify numbering and naming. We have not (yet) attempted to snap the boundaires to parcels where they most likely should be.FieldsThese fields contains information about ranchos/landgrantsLANDGRANT - name of the land grantNAME_2 - secondary name of the land grantGRANTEE_P - the Grantee NamePATENTEE_ - Secondary Grantee NameGRANT_NO - Grant NumberGRANTED - Date grantedPATENT_DAT - Date patentedSURVEYOR - Survery NameSURVEY_DAT - Survey Month and DateCO - CountyTYPE - Rancho (Ro) or PuebloACRES_1 - Number of acres of the grantThe fields contain information about the PLSSSECTION - Section NumberMERIDIAN - The meridianTOWNSHIP - Township NumberRANGE - Range NumberNOTES - Notes that show changes in informationFeatType - IF this is a TRS or a Rancho
c
California Overlapping Cities and Counties and Identifiers
gis.data.ca.gov
data.ca.gov
+1more
Updated Sep 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Technology (2024). California Overlapping Cities and Counties and Identifiers [Dataset]. https://gis.data.ca.gov/datasets/california-overlapping-cities-and-counties-and-identifiers/about
Explore at:
Dataset updated
Sep 16, 2024
Dataset authored and provided by
California Department of Technology
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Area covered

Description
WARNING: This is a pre-release dataset and its fields names and data structures are subject to change. It should be considered pre-release until the end of 2024. Expected changes:Metadata is missing or incomplete for some layers at this time and will be continuously improved.We expect to update this layer roughly in line with CDTFA at some point, but will increase the update cadence over time as we are able to automate the final pieces of the process.This dataset is continuously updated as the source data from CDTFA is updated, as often as many times a month. If you require unchanging point-in-time data, export a copy for your own use rather than using the service directly in your applications.PurposeCounty and incorporated place (city) boundaries along with third party identifiers used to join in external data. Boundaries are from the authoritative source the California Department of Tax and Fee Administration (CDTFA), altered to show the counties as one polygon. This layer displays the city polygons on top of the County polygons so the area isn"t interrupted. The GEOID attribute information is added from the US Census. GEOID is based on merged State and County FIPS codes for the Counties. Abbreviations for Counties and Cities were added from Caltrans Division of Local Assistance (DLA) data. Place Type was populated with information extracted from the Census. Names and IDs from the US Board on Geographic Names (BGN), the authoritative source of place names as published in the Geographic Name Information System (GNIS), are attached as well. Finally, coastal buffers are removed, leaving the land-based portions of jurisdictions. This feature layer is for public use.Related LayersThis dataset is part of a grouping of many datasets:Cities: Only the city boundaries and attributes, without any unincorporated areasWith Coastal BuffersWithout Coastal BuffersCounties: Full county boundaries and attributes, including all cities within as a single polygonWith Coastal BuffersWithout Coastal BuffersCities and Full Counties: A merge of the other two layers, so polygons overlap within city boundaries. Some customers require this behavior, so we provide it as a separate service.With Coastal BuffersWithout Coastal Buffers (this dataset)Place AbbreviationsUnincorporated Areas (Coming Soon)Census Designated Places (Coming Soon)Cartographic CoastlinePolygonLine source (Coming Soon)Working with Coastal BuffersThe dataset you are currently viewing includes the coastal buffers for cities and counties that have them in the authoritative source data from CDTFA. In the versions where they are included, they remain as a second polygon on cities or counties that have them, with all the same identifiers, and a value in the COASTAL field indicating if it"s an ocean or a bay buffer. If you wish to have a single polygon per jurisdiction that includes the coastal buffers, you can run a Dissolve on the version that has the coastal buffers on all the fields except COASTAL, Area_SqMi, Shape_Area, and Shape_Length to get a version with the correct identifiers.Point of ContactCalifornia Department of Technology, Office of Digital Services, odsdataservices@state.ca.govField and Abbreviation DefinitionsCOPRI: county number followed by the 3-digit city primary number used in the Board of Equalization"s 6-digit tax rate area numbering systemPlace Name: CDTFA incorporated (city) or county nameCounty: CDTFA county name. For counties, this will be the name of the polygon itself. For cities, it is the name of the county the city polygon is within.Legal Place Name: Board on Geographic Names authorized nomenclature for area names published in the Geographic Name Information SystemGNIS_ID: The numeric identifier from the Board on Geographic Names that can be used to join these boundaries to other datasets utilizing this identifier.GEOID: numeric geographic identifiers from the US Census Bureau Place Type: Board on Geographic Names authorized nomenclature for boundary type published in the Geographic Name Information SystemPlace Abbr: CalTrans Division of Local Assistance abbreviations of incorporated area namesCNTY Abbr: CalTrans Division of Local Assistance abbreviations of county namesArea_SqMi: The area of the administrative unit (city or county) in square miles, calculated in EPSG 3310 California Teale Albers.COASTAL: Indicates if the polygon is a coastal buffer. Null for land polygons. Additional values include "ocean" and "bay".GlobalID: While all of the layers we provide in this dataset include a GlobalID field with unique values, we do not recommend you make any use of it. The GlobalID field exists to support offline sync, but is not persistent, so data keyed to it will be orphaned at our next update. Use one of the other persistent identifiers, such as GNIS_ID or GEOID instead.AccuracyCDTFA"s source data notes the following about accuracy:City boundary changes and county boundary line adjustments filed with the Board of Equalization per Government Code 54900. This GIS layer contains the boundaries of the unincorporated county and incorporated cities within the state of California. The initial dataset was created in March of 2015 and was based on the State Board of Equalization tax rate area boundaries. As of April 1, 2024, the maintenance of this dataset is provided by the California Department of Tax and Fee Administration for the purpose of determining sales and use tax rates. The boundaries are continuously being revised to align with aerial imagery when areas of conflict are discovered between the original boundary provided by the California State Board of Equalization and the boundary made publicly available by local, state, and federal government. Some differences may occur between actual recorded boundaries and the boundaries used for sales and use tax purposes. The boundaries in this map are representations of taxing jurisdictions for the purpose of determining sales and use tax rates and should not be used to determine precise city or county boundary line locations. COUNTY = county name; CITY = city name or unincorporated territory; COPRI = county number followed by the 3-digit city primary number used in the California State Board of Equalization"s 6-digit tax rate area numbering system (for the purpose of this map, unincorporated areas are assigned 000 to indicate that the area is not within a city).Boundary ProcessingThese data make a structural change from the source data. While the full boundaries provided by CDTFA include coastal buffers of varying sizes, many users need boundaries to end at the shoreline of the ocean or a bay. As a result, after examining existing city and county boundary layers, these datasets provide a coastline cut generally along the ocean facing coastline. For county boundaries in northern California, the cut runs near the Golden Gate Bridge, while for cities, we cut along the bay shoreline and into the edge of the Delta at the boundaries of Solano, Contra Costa, and Sacramento counties.In the services linked above, the versions that include the coastal buffers contain them as a second (or third) polygon for the city or county, with the value in the COASTAL field set to whether it"s a bay or ocean polygon. These can be processed back into a single polygon by dissolving on all the fields you wish to keep, since the attributes, other than the COASTAL field and geometry attributes (like areas) remain the same between the polygons for this purpose.SliversIn cases where a city or county"s boundary ends near a coastline, our coastline data may cross back and forth many times while roughly paralleling the jurisdiction"s boundary, resulting in many polygon slivers. We post-process the data to remove these slivers using a city/county boundary priority algorithm. That is, when the data run parallel to each other, we discard the coastline cut and keep the CDTFA-provided boundary, even if it extends into the ocean a small amount. This processing supports consistent boundaries for Fort Bragg, Point Arena, San Francisco, Pacifica, Half Moon Bay, and Capitola, in addition to others. More information on this algorithm will be provided soon.Coastline CaveatsSome cities have buffers extending into water bodies that we do not cut at the shoreline. These include South Lake Tahoe and Folsom, which extend into neighboring lakes, and San Diego and surrounding cities that extend into San Diego Bay, which our shoreline encloses. If you have feedback on the exclusion of these items, or others, from the shoreline cuts, please reach out using the contact information above.Offline UseThis service is fully enabled for sync and export using Esri Field Maps or other similar tools. Importantly, the GlobalID field exists only to support that use case and should not be used for any other purpose (see note in field descriptions).Updates and Date of ProcessingConcurrent with CDTFA updates, approximately every two weeks, Last Processed: 12/17/2024 by Nick Santos using code path at https://github.com/CDT-ODS-DevSecOps/cdt-ods-gis-city-county/ at commit 0bf269d24464c14c9cf4f7dea876aa562984db63. It incorporates updates from CDTFA as of 12/12/2024. Future updates will include improvements to metadata and update frequency.
d
California County Boundaries and Identifiers
catalog.data.gov
data.ca.gov
+1more
Updated Jul 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Technology (2025). California County Boundaries and Identifiers [Dataset]. https://catalog.data.gov/dataset/california-county-boundaries-and-identifiers
Explore at:
Dataset updated
Jul 24, 2025
Dataset provided by
California Department of Technology
Area covered
California
Description
WARNING: This is a pre-release dataset and its fields names and data structures are subject to change. It should be considered pre-release until the end of March 2025. The schema changed in February 2025 - please see below. We will post a roadmap of upcoming changes, but service URLs and schema are now stable. For deployment status of new services in February 2025, see https://gis.data.ca.gov/pages/city-and-county-boundary-data-status. Additional roadmap and status links at the bottom of this metadata.This dataset is continuously updated as the source data from CDTFA is updated, as often as many times a month. If you require unchanging point-in-time data, export a copy for your own use rather than using the service directly in your applications.PurposeCounty boundaries along with third party identifiers used to join in external data. Boundaries are from the California Department of Tax and Fee Administration (CDTFA). These boundaries are the best available statewide data source in that CDTFA receives changes in incorporation and boundary lines from the Board of Equalization, who receives them from local jurisdictions for tax purposes. Boundary accuracy is not guaranteed, and though CDTFA works to align boundaries based on historical records and local changes, errors will exist. If you require a legal assessment of boundary location, contact a licensed surveyor.This dataset joins in multiple attributes and identifiers from the US Census Bureau and Board on Geographic Names to facilitate adding additional third party data sources. In addition, we attach attributes of our own to ease and reduce common processing needs and questions. Finally, coastal buffers are separated into separate polygons, leaving the land-based portions of jurisdictions and coastal buffers in adjacent polygons. This layer removes the coastal buffer polygons. This feature layer is for public use.Related LayersThis dataset is part of a grouping of many datasets:Cities: Only the city boundaries and attributes, without any unincorporated areasWith Coastal BuffersWithout Coastal BuffersCounties: Full county boundaries and attributes, including all cities within as a single polygonWith Coastal BuffersWithout Coastal Buffers (this dataset)Cities and Full Counties: A merge of the other two layers, so polygons overlap within city boundaries. Some customers require this behavior, so we provide it as a separate service.With Coastal BuffersWithout Coastal BuffersCity and County AbbreviationsUnincorporated Areas (Coming Soon)Census Designated PlacesCartographic CoastlinePolygonLine source (Coming Soon)Working with Coastal BuffersThe dataset you are currently viewing excludes the coastal buffers for cities and counties that have them in the source data from CDTFA. In the versions where they are included, they remain as a second polygon on cities or counties that have them, with all the same identifiers, and a value in the COASTAL field indicating if it"s an ocean or a bay buffer. If you wish to have a single polygon per jurisdiction that includes the coastal buffers, you can run a Dissolve on the version that has the coastal buffers on all the fields except OFFSHORE and AREA_SQMI to get a version with the correct identifiers.Point of ContactCalifornia Department of Technology, Office of Digital Services, odsdataservices@state.ca.govField and Abbreviation DefinitionsCDTFA_COUNTY: CDTFA county name. For counties, this will be the name of the polygon itself. For cities, it is the name of the county the city polygon is within.CDTFA_COPRI: county number followed by the 3-digit city primary number used in the Board of Equalization"s 6-digit tax rate area numbering system. The boundary data originate with CDTFA's teams managing tax rate information, so this field is preserved and flows into this dataset.CENSUS_GEOID: numeric geographic identifiers from the US Census BureauCENSUS_PLACE_TYPE: City, County, or Town, stripped off the census name for identification purpose.GNIS_PLACE_NAME: Board on Geographic Names authorized nomenclature for area names published in the Geographic Name Information SystemGNIS_ID: The numeric identifier from the Board on Geographic Names that can be used to join these boundaries to other datasets utilizing this identifier.CDT_COUNTY_ABBR: Abbreviations of county names - originally derived from CalTrans Division of Local Assistance and now managed by CDT. Abbreviations are 3 characters.CDT_NAME_SHORT: The name of the jurisdiction (city or county) with the word "City" or "County" stripped off the end. Some changes may come to how we process this value to make it more consistent.AREA_SQMI: The area of the administrative unit (city or county) in square miles, calculated in EPSG 3310 California Teale Albers.OFFSHORE: Indicates if the polygon is a coastal buffer. Null for land polygons. Additional values include "ocean" and "bay".PRIMARY_DOMAIN: Currently empty/null for all records. Placeholder field for official URL of the city or countyCENSUS_POPULATION: Currently null for all records. In the future, it will include the most recent US Census population estimate for the jurisdiction.GlobalID: While all of the layers we provide in this dataset include a GlobalID field with unique values, we do not recommend you make any use of it. The GlobalID field exists to support offline sync, but is not persistent, so data keyed to it will be orphaned at our next update. Use one of the other persistent identifiers, such as GNIS_ID or GEOID instead.Boundary AccuracyCounty boundaries were originally derived from a 1:24,000 accuracy dataset, with improvements made in some places to boundary alignments based on research into historical records and boundary changes as CDTFA learns of them. City boundary data are derived from pre-GIS tax maps, digitized at BOE and CDTFA, with adjustments made directly in GIS for new annexations, detachments, and corrections. Boundary accuracy within the dataset varies. While CDTFA strives to correctly include or exclude parcels from jurisdictions for accurate tax assessment, this dataset does not guarantee that a parcel is placed in the correct jurisdiction. When a parcel is in the correct jurisdiction, this dataset cannot guarantee accurate placement of boundary lines within or between parcels or rights of way. This dataset also provides no information on parcel boundaries. For exact jurisdictional or parcel boundary locations, please consult the county assessor's office and a licensed surveyor.CDTFA's data is used as the best available source because BOE and CDTFA receive information about changes in jurisdictions which otherwise need to be collected independently by an agency or company to compile into usable map boundaries. CDTFA maintains the best available statewide boundary information.CDTFA's source data notes the following about accuracy:City boundary changes and county boundary line adjustments filed with the Board of Equalization per Government Code 54900. This GIS layer contains the boundaries of the unincorporated county and incorporated cities within the state of California. The initial dataset was created in March of 2015 and was based on the State Board of Equalization tax rate area boundaries. As of April 1, 2024, the maintenance of this dataset is provided by the California Department of Tax and Fee Administration for the purpose of determining sales and use tax rates. The boundaries are continuously being revised to align with aerial imagery when areas of conflict are discovered between the original boundary provided by the California State Board of Equalization and the boundary made publicly available by local, state, and federal government. Some differences may occur between actual recorded boundaries and the boundaries used for sales and use tax purposes. The boundaries in this map are representations of taxing jurisdictions for the purpose of determining sales and use tax rates and should not be used to determine precise city or county boundary line locations. Boundary ProcessingThese data make a structural change from the source data. While the full boundaries provided by CDTFA include coastal buffers of varying sizes, many users need boundaries to end at the shoreline of the ocean or a bay. As a result, after examining existing city and county boundary layers, these datasets provide a coastline cut generally along the ocean facing coastline. For county boundaries in northern California, the cut runs near the Golden Gate Bridge, while for cities, we cut along the bay shoreline and into the edge of the Delta at the boundaries of Solano, Contra Costa, and Sacramento counties.In the services linked above, the versions that include the coastal buffers contain them as a second (or third) polygon for the city or county, with the value in the COASTAL field set to whether it"s a bay or ocean polygon. These can be processed back into a single polygon by dissolving on all the fields you wish to keep, since the attributes, other than the COASTAL field and geometry attributes (like areas) remain the same between the polygons for this purpose.SliversIn cases where a city or county"s boundary ends near a coastline, our coastline data may cross back and forth many times while roughly paralleling the jurisdiction"s boundary, resulting in many polygon slivers. We post-process the data to remove these slivers using a city/county boundary priority algorithm. That is, when the data run parallel to each other, we discard the coastline cut and keep the CDTFA-provided boundary, even if it extends into the ocean a small amount. This processing supports consistent boundaries for Fort Bragg, Point Arena, San Francisco, Pacifica, Half Moon Bay, and Capitola, in addition to others. More information on this algorithm will be provided soon.Coastline CaveatsSome cities have buffers extending into water bodies that we do not cut at the shoreline. These include South Lake Tahoe and Folsom, which extend into neighboring lakes, and
R
Phase 2 Merge + Combine Overgrowth Dataset
universe.roboflow.com
zip
Updated Jan 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tour de Chicago (2024). Phase 2 Merge + Combine Overgrowth Dataset [Dataset]. https://universe.roboflow.com/tour-de-chicago/phase-2-merge-combine-overgrowth
Explore at:
zipAvailable download formats
Dataset updated
Jan 3, 2024
Dataset authored and provided by
Tour de Chicago
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Instances 5pnE House Attributes TcoT Ivh3 Bounding Boxes
Description
Phase 2 Merge + Combine Overgrowth

## Overview Phase 2 Merge + Combine Overgrowth is a dataset for object detection tasks - it contains Instances 5pnE House Attributes TcoT Ivh3 annotations for 892 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Student Performance Data Set
kaggle.com
Updated Mar 27, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data-Science Sean (2020). Student Performance Data Set [Dataset]. https://www.kaggle.com/datasets/larsen0966/student-performance-data-set
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 27, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Data-Science Sean
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
If this Data Set is useful, and upvote is appreciated. This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd-period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).
Z
Data from: Redocking the PDB
data.niaid.nih.gov
Updated Dec 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Flachsenberg, Florian; Ehrt, Christiane; Gutermuth, Torben; Rarey, Matthias (2023). Redocking the PDB [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7579501
Explore at:
Dataset updated
Dec 6, 2023
Dataset provided by
Universität Hamburg, ZBH - Center for Bioinformatics, Bundesstraße 43, 20146 Hamburg, Germany
Authors
Flachsenberg, Florian; Ehrt, Christiane; Gutermuth, Torben; Rarey, Matthias
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains supplementary data to the journal article 'Redocking the PDB' by Flachsenberg et al. (https://doi.org/10.1021/acs.jcim.3c01573)[1]. In this paper, we described two datasets: The PDBScan22 dataset with a large set of 322,051 macromolecule–ligand binding sites generally suitable for redocking and the PDBScan22-HQ dataset with 21,355 binding sites passing different structure quality filters. These datasets were further characterized by calculating properties of the ligand (e.g., molecular weight), properties of the binding site (e.g., volume), and structure quality descriptors (e.g., crystal structure resolution). Additionally, we performed redocking experiments with our novel JAMDA structure preparation and docking workflow[1] and with AutoDock Vina[2,3]. Details for all these experiments and the dataset composition can be found in the journal article[1]. Here, we provide all the datasets, i.e., the PDBScan22 and PDBScan22-HQ datasets as well as the docking results and the additionally calculated properties (for the ligand, the binding sites, and structure quality descriptors). Furthermore, we give a detailed description of their content (i.e., the data types and a description of the column values). All datasets consist of CSV files with the actual data and associated metadata JSON files describing their content. The CSV/JSON files are compliant with the CSV on the web standard (https://csvw.org/). General hints

All docking experiment results consist of two CSV files, one with general information about the docking run (e.g., was it successful?) and one with individual pose results (i.e., score and RMSD to the crystal structure). All files (except for the docking pose tables) can be indexed uniquely by the column tuple '(pdb, name)' containing the PDB code of the complex (e.g., 1gm8) and the name ligand (in the format '_', e.g., 'SOX_B_1559'). All files (except for the docking pose tables) have exactly the same number of rows as the dataset they were calculated on (e.g., PDBScan22 or PDBScan22-HQ). However, some CSV files may have missing values (see also the JSON metadata files) in some or even all columns (except for 'pdb' and 'name'). The docking pose tables also contain the 'pdb' and 'name' columns. However, these alone are not unique but only together with the 'rank' column (i.e., there might be multiple poses for each docking run or none). Example usage Using the pandas library (https://pandas.pydata.org/) in Python, we can calculate the number of protein-ligand complexes in the PDBScan22-HQ dataset with a top-ranked pose RMSD to the crystal structure ≤ 2.0 Å in the JAMDA redocking experiment and a molecular weight between 100 Da and 200 Da:

import pandas as pd df = pd.read_csv('PDBScan22-HQ.csv') df_poses = pd.read_csv('PDBScan22-HQ_JAMDA_NL_NR_poses.csv') df_properties = pd.read_csv('PDBScan22_ligand_properties.csv') merged = df.merge(df_properties, how='left', on=['pdb', 'name']) merged = merged[(merged['MW'] >= 100) & (merged['MW'] <= 200)].merge(df_poses[df_poses['rank'] == 1], how='left', on=['pdb', 'name']) nof_successful_top_ranked = (merged['rmsd_ai'] <= 2.0).sum() nof_no_top_ranked = merged['rmsd_ai'].isna().sum() Datasets

PDBScan22.csv: This is the PDBScan22 dataset[1]. This dataset was derived from the PDB4. It contains macromolecule–ligand binding sites (defined by PDB code and ligand identifier) that can be read by the NAOMI library[5,6] and pass basic consistency filters. PDBScan22-HQ.csv: This is the PDBScan22-HQ dataset[1]. It contains macromolecule–ligand binding sites from the PDBScan22 dataset that pass certain structure quality filters described in our publication[1]. PDBScan22-HQ-ADV-Success.csv: This is a subset of the PDBScan22-HQ dataset without 336 binding sites where AutoDock Vina[2,3] fails. PDBScan22-HQ-Macrocycles.csv: This is a subset of the PDBScan22-HQ dataset without 336 binding sites where AutoDock Vina[2,3] fails and only contains molecules with macrocycles with at least ten atoms. Properties for PDBScan22

PDBScan22_ligand_properties.csv: Conformation-independent properties of all ligand molecules in the PDBScan22 dataset. Properties were calculated using an in-house tool developed with the NAOMI library[5,6]. PDBScan22_StructureProfiler_quality_descriptors.csv: Structure quality descriptors for the binding sites in the PDBScan22 dataset calculated using the StructureProfiler tool[7]. PDBScan22_basic_complex_properties.csv: Simple properties of the binding sites in the PDBScan22 dataset. Properties were calculated using an in-house tool developed with the NAOMI library[5,6]. Properties for PDBScan22-HQ

PDBScan22-HQ_DoGSite3_pocket_descriptors.csv: Binding site descriptors calculated for the binding sites in the PDBScan22-HQ dataset using the DoGSite3 tool[8]. PDBScan22-HQ_molecule_types.csv: Assignment of ligands in the PDBScan22-HQ dataset (without 336 binding sites where AutoDock Vina fails) to different molecular classes (i.e., drug-like, fragment-like oligosaccharide, oligopeptide, cofactor, macrocyclic). A detailed description of the assignment can be found in our publication[1]. Docking results on PDBScan22

PDBScan22_JAMDA_NL_NR.csv: Docking results of JAMDA[1] on the PDBScan22 dataset. This is the general overview for the docking runs; the pose results are given in 'PDBScan22_JAMDA_NL_NR_poses.csv'. For this experiment, the ligand was not considered during preprocessing of the binding site, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled. PDBScan22_JAMDA_NL_NR_poses.csv: Pose scores and RMSDs for the docking results of JAMDA[1] on the PDBScan22 dataset. For this experiment, the ligand was not considered during preprocessing of the binding site, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled. Docking results on PDBScan22-HQ

PDBScan22-HQ_JAMDA_NL_NR.csv: Docking results of JAMDA[1] on the PDBScan22-HQ dataset. This is the general overview for the docking runs; the pose results are given in 'PDBScan22-HQ_JAMDA_NL_NR_poses.csv'. For this experiment, the ligand was not considered during preprocessing of the binding site, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled. PDBScan22-HQ_JAMDA_NL_NR_poses.csv: Pose scores and RMSDs for the docking results of JAMDA[1] on the PDBScan22-HQ dataset. For this experiment, the ligand was not considered during preprocessing of the binding site, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled. PDBScan22-HQ_JAMDA_NL_WR.csv: Docking results of JAMDA[1] on the PDBScan22-HQ dataset. This is the general overview for the docking runs; the pose results are given in 'PDBScan22-HQ_JAMDA_NL_WR_poses.csv'. For this experiment, the ligand was not considered during preprocessing of the binding site, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was enabled. PDBScan22-HQ_JAMDA_NL_WR_poses.csv: Pose scores and RMSDs for the docking results of JAMDA[1] on the PDBScan22-HQ dataset. For this experiment, the ligand was not considered during preprocessing of the binding site and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was enabled. PDBScan22-HQ_JAMDA_NW_NR.csv: Docking results of JAMDA[1] on the PDBScan22-HQ dataset. This is the general overview for the docking runs; the pose results are given in 'PDBScan22-HQ_JAMDA_NW_NR_poses.csv'. For this experiment, the ligand was not considered during preprocessing of the binding site, all water molecules were removed from the binding site during preprocessing, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled. PDBScan22-HQ_JAMDA_NW_NR_poses.csv: Pose scores and RMSDs for the docking results of JAMDA[1] on the PDBScan22-HQ dataset. For this experiment, the ligand was not considered during preprocessing of the binding site, all water molecules were removed from the binding site during preprocessing, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled. PDBScan22-HQ_JAMDA_NW_WR.csv: Docking results of JAMDA[1] on the PDBScan22-HQ dataset. This is the general overview for the docking runs; the pose results are given in 'PDBScan22-HQ_JAMDA_NW_WR_poses.csv'. For this experiment, the ligand was not considered during preprocessing of the binding site, all water molecules were removed from the binding site during preprocessing, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was enabled. PDBScan22-HQ_JAMDA_NW_WR_poses.csv: Pose scores and RMSDs for the docking results of JAMDA[1] on the PDBScan22-HQ dataset. For this experiment, the ligand was not considered during preprocessing of the binding site, all water molecules were removed from the binding site during preprocessing, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was enabled. PDBScan22-HQ_JAMDA_WL_NR.csv: Docking results of JAMDA[1] on the PDBScan22-HQ dataset. This is the general overview for the docking runs; the pose results are given in 'PDBScan22-HQ_JAMDA_WL_NR_poses.csv'. For this experiment, the ligand was considered during preprocessing of the binding site, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled. PDBScan22-HQ_JAMDA_WL_NR_poses.csv: Pose scores and RMSDs for the docking results of JAMDA[1] on the PDBScan22-HQ dataset. For this experiment, the ligand was considered during preprocessing of the binding site, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand
a
California County Boundaries and Identifiers with Coastal Buffers
hub.arcgis.com
data.ca.gov
+1more
Updated Oct 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Technology (2024). California County Boundaries and Identifiers with Coastal Buffers [Dataset]. https://hub.arcgis.com/datasets/28c9f9dd8c3d4eb5a534cb30ddb3ce39
Explore at:
Dataset updated
Oct 24, 2024
Dataset authored and provided by
California Department of Technology
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Area covered

Description
Note: The schema changed in February 2025 - please see below. We will post a roadmap of upcoming changes, but service URLs and schema are now stable. For deployment status of new services beginning in February 2025, see https://gis.data.ca.gov/pages/city-and-county-boundary-data-status. Additional roadmap and status links at the bottom of this metadata.This dataset is regularly updated as the source data from CDTFA is updated, as often as many times a month. If you require unchanging point-in-time data, export a copy for your own use rather than using the service directly in your applications. PurposeCounty boundaries along with third party identifiers used to join in external data. Boundaries are from the California Department of Tax and Fee Administration (CDTFA). These boundaries are the best available statewide data source in that CDTFA receives changes in incorporation and boundary lines from the Board of Equalization, who receives them from local jurisdictions for tax purposes. Boundary accuracy is not guaranteed, and though CDTFA works to align boundaries based on historical records and local changes, errors will exist. If you require a legal assessment of boundary location, contact a licensed surveyor.This dataset joins in multiple attributes and identifiers from the US Census Bureau and Board on Geographic Names to facilitate adding additional third party data sources. In addition, we attach attributes of our own to ease and reduce common processing needs and questions. Finally, coastal buffers are separated into separate polygons, leaving the land-based portions of jurisdictions and coastal buffers in adjacent polygons. This feature layer is for public use. Related LayersThis dataset is part of a grouping of many datasets:Cities: Only the city boundaries and attributes, without any unincorporated areasWith Coastal BuffersWithout Coastal BuffersCounties: Full county boundaries and attributes, including all cities within as a single polygonWith Coastal Buffers (this dataset)Without Coastal BuffersCities and Full Counties: A merge of the other two layers, so polygons overlap within city boundaries. Some customers require this behavior, so we provide it as a separate service.With Coastal BuffersWithout Coastal BuffersCity and County AbbreviationsUnincorporated Areas (Coming Soon)Census Designated PlacesCartographic CoastlinePolygonLine source (Coming Soon) Working with Coastal Buffers The dataset you are currently viewing includes the coastal buffers for cities and counties that have them in the source data from CDTFA. In the versions where they are included, they remain as a second polygon on cities or counties that have them, with all the same identifiers, and a value in the COASTAL field indicating if it"s an ocean or a bay buffer. If you wish to have a single polygon per jurisdiction that includes the coastal buffers, you can run a Dissolve on the version that has the coastal buffers on all the fields except OFFSHORE and AREA_SQMI to get a version with the correct identifiers. Point of ContactCalifornia Department of Technology, Office of Digital Services, gis@state.ca.gov Field and Abbreviation DefinitionsCDTFA_COUNTY: CDTFA county name. For counties, this will be the name of the polygon itself. For cities, it is the name of the county the city polygon is within.CDTFA_COPRI: county number followed by the 3-digit city primary number used in the Board of Equalization"s 6-digit tax rate area numbering system. The boundary data originate with CDTFA's teams managing tax rate information, so this field is preserved and flows into this dataset.CENSUS_GEOID: numeric geographic identifiers from the US Census BureauCENSUS_PLACE_TYPE: City, County, or Town, stripped off the census name for identification purpose.GNIS_PLACE_NAME: Board on Geographic Names authorized nomenclature for area names published in the Geographic Name Information SystemGNIS_ID: The numeric identifier from the Board on Geographic Names that can be used to join these boundaries to other datasets utilizing this identifier.CDT_COUNTY_ABBR: Abbreviations of county names - originally derived from CalTrans Division of Local Assistance and now managed by CDT. Abbreviations are 3 characters.CDT_NAME_SHORT: The name of the jurisdiction (city or county) with the word "City" or "County" stripped off the end. Some changes may come to how we process this value to make it more consistent.AREA_SQMI: The area of the administrative unit (city or county) in square miles, calculated in EPSG 3310 California Teale Albers.OFFSHORE: Indicates if the polygon is a coastal buffer. Null for land polygons. Additional values include "ocean" and "bay".PRIMARY_DOMAIN: Currently empty/null for all records. Placeholder field for official URL of the city or countyCENSUS_POPULATION: Currently null for all records. In the future, it will include the most recent US Census population estimate for the jurisdiction.GlobalID: While all of the layers we provide in this dataset include a GlobalID field with unique values, we do not recommend you make any use of it. The GlobalID field exists to support offline sync, but is not persistent, so data keyed to it will be orphaned at our next update. Use one of the other persistent identifiers, such as GNIS_ID or GEOID instead. Boundary AccuracyCounty boundaries were originally derived from a 1:24,000 accuracy dataset, with improvements made in some places to boundary alignments based on research into historical records and boundary changes as CDTFA learns of them. City boundary data are derived from pre-GIS tax maps, digitized at BOE and CDTFA, with adjustments made directly in GIS for new annexations, detachments, and corrections.Boundary accuracy within the dataset varies. While CDTFA strives to correctly include or exclude parcels from jurisdictions for accurate tax assessment, this dataset does not guarantee that a parcel is placed in the correct jurisdiction. When a parcel is in the correct jurisdiction, this dataset cannot guarantee accurate placement of boundary lines within or between parcels or rights of way. This dataset also provides no information on parcel boundaries. For exact jurisdictional or parcel boundary locations, please consult the county assessor's office and a licensed surveyor. CDTFA's data is used as the best available source because BOE and CDTFA receive information about changes in jurisdictions which otherwise need to be collected independently by an agency or company to compile into usable map boundaries. CDTFA maintains the best available statewide boundary information. CDTFA's source data notes the following about accuracy: City boundary changes and county boundary line adjustments filed with the Board of Equalization per Government Code 54900. This GIS layer contains the boundaries of the unincorporated county and incorporated cities within the state of California. The initial dataset was created in March of 2015 and was based on the State Board of Equalization tax rate area boundaries. As of April 1, 2024, the maintenance of this dataset is provided by the California Department of Tax and Fee Administration for the purpose of determining sales and use tax rates. The boundaries are continuously being revised to align with aerial imagery when areas of conflict are discovered between the original boundary provided by the California State Board of Equalization and the boundary made publicly available by local, state, and federal government. Some differences may occur between actual recorded boundaries and the boundaries used for sales and use tax purposes. The boundaries in this map are representations of taxing jurisdictions for the purpose of determining sales and use tax rates and should not be used to determine precise city or county boundary line locations. Boundary ProcessingThese data make a structural change from the source data. While the full boundaries provided by CDTFA include coastal buffers of varying sizes, many users need boundaries to end at the shoreline of the ocean or a bay. As a result, after examining existing city and county boundary layers, these datasets provide a coastline cut generally along the ocean facing coastline. For county boundaries in northern California, the cut runs near the Golden Gate Bridge, while for cities, we cut along the bay shoreline and into the edge of the Delta at the boundaries of Solano, Contra Costa, and Sacramento counties. In the services linked above, the versions that include the coastal buffers contain them as a second (or third) polygon for the city or county, with the value in the COASTAL field set to whether it"s a bay or ocean polygon. These can be processed back into a single polygon by dissolving on all the fields you wish to keep, since the attributes, other than the COASTAL field and geometry attributes (like areas) remain the same between the polygons for this purpose. SliversIn cases where a city or county"s boundary ends near a coastline, our coastline data may cross back and forth many times while roughly paralleling the jurisdiction"s boundary, resulting in many polygon slivers. We post-process the data to remove these slivers using a city/county boundary priority algorithm. That is, when the data run parallel to each other, we discard the coastline cut and keep the CDTFA-provided boundary, even if it extends into the ocean a small amount. This processing supports consistent boundaries for Fort Bragg, Point Arena, San Francisco, Pacifica, Half Moon Bay, and Capitola, in addition to others. More information on this algorithm will be provided soon. Coastline CaveatsSome cities have buffers extending into water bodies that we do not cut at the shoreline. These include South Lake Tahoe and Folsom, which extend into neighboring lakes, and San Diego and surrounding cities that extend into San Diego Bay, which our shoreline encloses. If you have feedback on the exclusion of these items, or others, from the shoreline cuts,
f
Scripts for Analysis
figshare.com
txt
Updated Jul 18, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sneddon Lab UCSF (2018). Scripts for Analysis [Dataset]. http://doi.org/10.6084/m9.figshare.6783569.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6783569.v2
Dataset updated
Jul 18, 2018
Dataset provided by
figshare
Authors
Sneddon Lab UCSF
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Scripts used for analysis of V1 and V2 Datasets.seurat_v1.R - initialize seurat object from 10X Genomics cellranger outputs. Includes filtering, normalization, regression, variable gene identification, PCA analysis, clustering, tSNE visualization. Used for v1 datasets. merge_seurat.R - merge two or more seurat objects into one seurat object. Perform linear regression to remove batch effects from separate objects. Used for v1 datasets. subcluster_seurat_v1.R - subcluster clusters of interest from Seurat object. Determine variable genes, perform regression and PCA. Used for v1 datasets.seurat_v2.R - initialize seurat object from 10X Genomics cellranger outputs. Includes filtering, normalization, regression, variable gene identification, and PCA analysis. Used for v2 datasets. clustering_markers_v2.R - clustering and tSNE visualization for v2 datasets. subcluster_seurat_v2.R - subcluster clusters of interest from Seurat object. Determine variable genes, perform regression and PCA analysis. Used for v2 datasets.seurat_object_analysis_v1_and_v2.R - downstream analysis and plotting functions for seurat object created by seurat_v1.R or seurat_v2.R. merge_clusters.R - merge clusters that do not meet gene threshold. Used for both v1 and v2 datasets. prepare_for_monocle_v1.R - subcluster cells of interest and perform linear regression, but not scaling in order to input normalized, regressed values into monocle with monocle_seurat_input_v1.R monocle_seurat_input_v1.R - monocle script using seurat batch corrected values as input for v1 merged timecourse datasets. monocle_lineage_trace.R - monocle script using nUMI as input for v2 lineage traced dataset. monocle_object_analysis.R - downstream analysis for monocle object - BEAM and plotting. CCA_merging_v2.R - script for merging v2 endocrine datasets with canonical correlation analysis and determining the number of CCs to include in downstream analysis. CCA_alignment_v2.R - script for downstream alignment, clustering, tSNE visualization, and differential gene expression analysis.

Facebook

Twitter

Click to copy link

Link copied

Cite

Senarios (2025). Data Merge Up 2 Dataset [Dataset]. https://universe.roboflow.com/senarios-k3093/data-merge-up-2

Data Merge Up 2 Dataset

data-merge-up-2

data-merge-up-2-dataset

Explore at:

zipAvailable download formats

Dataset updated

Apr 24, 2025

Dataset authored and provided by

Senarios

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Variables measured

Cardboard H3Yh Masks

Description

Data Merge Up 2

## Overview

Data Merge Up 2 is a dataset for semantic segmentation tasks - it contains Cardboard H3Yh annotations for 3,453 images.

## Getting Started

You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.

  ## License

  This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).

Clear search

Close search

Google apps

Main menu

Data Merge Up 2 Dataset

Data Merge Up 2

Merge 2 Projects Dataset

Merge 2 Projects

pop_dhs: Merging multiple DHS files

Statistical Analysis of Individual Participant Data Meta-Analyses: A...

Football Merge 2 Dataset

Football Merge 2

Citation data of arXiv eprints and the associated...

ymcki_gemma-2-2b-ORPO-jpn-it-abliterated-18-merge-details

Hydrologic Unit (HUC8) Boundaries for Alaskan Watersheds

merge-with-keys-2

VT Road Centerline

Data from: A dataset to model Levantine landcover and land-use change...

Happily unmarried survey - Dataset - B2FIND

Township Range Section & Rancho Boundaries

California Overlapping Cities and Counties and Identifiers

California County Boundaries and Identifiers

Phase 2 Merge + Combine Overgrowth Dataset

Phase 2 Merge + Combine Overgrowth

Student Performance Data Set

Data from: Redocking the PDB

California County Boundaries and Identifiers with Coastal Buffers

Scripts for Analysis

Data Merge Up 2 Dataset

data-merge-up-2

data-merge-up-2-dataset

Data Merge Up 2