Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
51WORLD Synthetic Dataset Usage Documentation
1 Introduction
The 51WORLD systhetic dataset mainly contains camera sensor-related data and LiDAR sensor-related data generated by 51Sim-One. Camera sensor-related data mainly includes images and corresponding semantic segmentation, instance segmentation, depth annotation, and Object Detection annotation; LiDAR sensor-related data mainly includes laser point clouds and annotation of 3Dbboxes, semantic segmentation annotation… See the full description on the dataset page: https://huggingface.co/datasets/51WORLD/DataOne-synthetic-nuscenes-v1.1-sample.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
A biodiversity dataset graph: DataONE
The intended use of this archive is to facilitate (meta-)analysis of the Data Observation Network for Earth (DataONE). DataONE is a distributed infrastructure that provides information about earth observation data.
This dataset provides versioned snapshots of the DataONE network as tracked by Preston [2] between 2018-11-06 and 2020-05-07 using "preston update -u https://dataone.org".
The archive consists of 256 individual parts (e.g., preston-00.tar.gz, preston-01.tar.gz, ...) to allow for parallel file downloads. The archive contains three types of files: index files, provenance logs and data files. In addition, index files have been individually included in this dataset publication to facilitate remote access. Index files provide a way to links provenance files in time to establish a versioning mechanism. Provenance files describe how, when, what and where the DataONE content was retrieved. For more information, please visit https://preston.guoda.bio or https://doi.org/10.5281/zenodo.1410543 .
To retrieve and verify the downloaded DataONE biodiversity dataset graph, first concatenate all the downloaded preston-*.tar.gz files (e.g., cat preston-*.tar.gz > preston.tar.gz). Then, extract the archives into a "data" folder. Alternatively, you can use the preston[2] command-line tool to "clone" this dataset using:
$ java -jar preston.jar clone --remote https://zenodo.org/record/3849494/files
After that, verify the index of the archive by reproducing the following provenance log history:
$ java -jar preston.jar history
<0659a54f-b713-4f86-a917-5be166a14110> <http://purl.org/pav/hasVersion>
To check the integrity of the extracted archive, confirm that each line produce by the command "preston verify" produces lines as shown below, with each line including "CONTENT_PRESENT_VALID_HASH". Depending on hardware capacity, this may take a while.
$ java -jar preston.jar verify
hash://sha256/e55c1034d985740926564e94decd6dc7a70f779a33e7deb931553739cda16945 file:/home/preston/preston-dataone/data/e5/5c/e55c1034d985740926564e94decd6dc7a70f779a33e7deb931553739cda16945 OK CONTENT_PRESENT_VALID_HASH 21580 hash://sha256/e55c1034d985740926564e94decd6dc7a70f779a33e7deb931553739cda16945
hash://sha256/d0ddcc2111b6134a570bcc7d89375920ef4d754130cecc0727c79d2b05a9f81f file:/home/preston/preston-dataone/data/d0/dd/d0ddcc2111b6134a570bcc7d89375920ef4d754130cecc0727c79d2b05a9f81f OK CONTENT_PRESENT_VALID_HASH 2035 hash://sha256/d0ddcc2111b6134a570bcc7d89375920ef4d754130cecc0727c79d2b05a9f81f
hash://sha256/472de9d1c9fd7e044aac409abfbfff9f12c6b69359df995d431009580ffb0f53 file:/home/preston/preston-dataone/data/47/2d/472de9d1c9fd7e044aac409abfbfff9f12c6b69359df995d431009580ffb0f53 OK CONTENT_PRESENT_VALID_HASH 1935 hash://sha256/472de9d1c9fd7e044aac409abfbfff9f12c6b69359df995d431009580ffb0f53
hash://sha256/b29879462cd43862129c5cf9b149c41ecd33ffef284a4dbea4ac1c0f90108687 file:/home/preston/preston-dataone/data/b2/98/b29879462cd43862129c5cf9b149c41ecd33ffef284a4dbea4ac1c0f90108687 OK CONTENT_PRESENT_VALID_HASH 1553 hash://sha256/b29879462cd43862129c5cf9b149c41ecd33ffef284a4dbea4ac1c0f90108687
Note that a copy of the java program "preston", preston.jar, is included in this publication. The program runs on java 8+ virtual machine using "java -jar preston.jar", or in short "preston".
Files in this data publication:
--- start of file descriptions ---
-- description of archive and its contents (this file) --
README
-- executable java jar containing preston[2] v0.1.15. --
preston.jar
-- preston archives containing DataONE data files, associated provenance logs and a provenance index --
preston-[00-ff].tar.gz
-- individual provenance index files --
2a5de79372318317a382ea9a2cef069780b852b01210ef59e06b640a3539cb5a
2aecaf289def0e23a27058bf7715f226ef9189905f0be13228174825633125cf
2f65ae542401d4c2daf1bca70de640211da6749188f67d28ea71acd7d8ba070b
35eb1e17e2bf3e71212cde35bdb03e8a6545a57483ea3c1633929257b70cf637
3d38b70198e448674be6a63d14b9817f3a956f48bba7418fa7baa086a56c05b7
66ad3e5e904740f1e835ac6718dda4279e0c24b204ea0d1113cda1352a5072ba
7466a35e42dea7e2be068060ec0c926f9a8686388ed504ef5c6c990c1ba4e8d0
81161d9746c2a5823641c436e773fb4508516b055da85f4494b38c545349da39
8bf062872ce958545d361e9d53a552ffb025ac29ab875caad1157c0995d34f66
a90eed8d70c54c8e554f2dfde4fceb434eda162d9615d62de96ded2344f88a78
c33ef5e29100b323412f1f3bc66908c8e01e4f0d1db4ea3685d2fffc47981dd6
c84dffef20fec958255e759db6445fc469d73695674a33ae6f7e567a088c9fe0
d362d599d72000c4feb464db5a669b12e15fc3ca1a49b1e7d4d6f7d6d5d15411
d9378616636be3686bbabd5bf29d50f0ef0e5ceb5ddd7dfce47f7e755b596b7d
da26fa6e7371385ed3f61af9a766221c833060d59dfd4869bbd7110f95f288db
e4103a75627857de3ee2e317429108611c244fc448c01d1d7bf652115c3b8a55
eb368fedb8f100210dd968edcf80f4d13cab3dd64135a6ab744102cf15e68c94
f13ab4bca04f894ae8eabb51fa01b4dfbc69f717eabc9896c728e2ba39c4db27
f493baf276892a199a0b0d078359f64a38fe8ad3f807921f8d41ef73f7343b1f
ff92b6c06ae5286bd2f1db679e0fcc4da294acb9bc01b2e9522378d99218c2e3
--- end of file descriptions ---
References
[1] Data Observation Network for Earth (DataONE, https://dataone.org) accessed from 2018-11-06 to 2020-05-07 with provenance hash://sha256/2b5c445f0b7b918c14a50de36e29a32854ed55f00d8639e09f58f049b85e50e3.
[2] https://preston.guoda.bio, https://doi.org/10.5281/zenodo.1410543 .
This work is funded in part by grant NSF OAC 1839201 from the National Science Foundation.
DataONE has consistently focused on interoperability among data repositories to enable seamless access to well-described data on the Earth and the environment. Our existing services promote data discovery and access through harmonization of the diverse metadata specifications used across communities, and through our integrated data search portal and services. In terms of the FAIR principles, we have done a good job at Findable and Accessible, while as a community we have placed less emphasis on Interoperable and Reusable. We present new DataONE services for quantitatively assessing metadata completeness and effectiveness relative to the FAIR principles. The services produce guidance for FAIRness at both the level of an individual data set and trends through time for repository, user, and funder data collections. These analytical results regarding conformance to FAIR principles are preliminary and based on proposed quantitative assessment metrics for FAIR which will be changed with input from the community. Thus, these results should not be viewed as conclusive about the data sets presented, but rather illustrate the types of quantitative comparisons that will be able to be made when the FAIR metrics at DataONE have been finalized.
Data is an indispensable part of research, but it isn’t recognized as an important component of a researcher’s scholarly output. The Public Library of Science (PLOS), in partnership with the California Digital Library (CDL) and DataONE, has undertaken a project called Make Data Count (http://articlemetrics.github.io/MDC/) to develop data-level metrics (DLM). This 12-month NSF-funded project is aimed at piloting a suite of metrics that track and measure data use so that it can be shared to funders, tenure & promotion committees, and other stakeholders. The first phase of this project is to gather information about the needs of researchers– how do they want to get credit for the data they produce? What do they want to know about how their data is used? What do they want to know about others’ data to evaluate quality? We connected with the community to determine requirements and understand use cases.In November and December of 2014, we conducted a pair of online surveys of researche...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ESA ignite talk 2014. Data Citations. Here is the dataset I collected for slide 19 of the talk to examine the relative efficacy of data citations to capture usage of dateasets my colleagues and I have published. Outcome: Do not capture usage. Abstract: For better or worse, citations are here stay. Citations have the capacity to serve as a proxy estimate of uptake or use by the community of ones products. Fortunately, the range of acceptable scientific products is rapidly expanding, datasets in many forms continue to serve as pivotal resources, and big data syntheses are reshaping the standards for acceptable derived evidence. Data citations are defined, general rules provided, and the unique elements of datasets described such as versioning and persistent identifiers. The cultural and scientific discovery implications of data citations are also described focusing on emerging linked-data futures.
Survey data collected in Canada, 2019. n = 1539. Using, Age, Facebook use and meme understanding to determine differences between demographics in relation to Instagram use
This is the second report in the series based on an online survey of 1,500 Canadians. Building on the first report that provides a snapshot of the social media usage trends in Canada, this second report analyzes social media users’ privacy perceptions and expectations.
test data - do not use. Visit https://dataone.org/datasets/doi%3A10.5063%2FAA%2Fnceas.967.1 for complete metadata about this dataset.
Natural regeneration is less expensive than tree planting, but determining what species will arrive and establish to serve as templates for tropical forest restoration remains poorly investigated in eastern Africa. This study summarises seedling recruitment under 29 isolated legacy trees (14 trees comprised of three exotic species and 15 trees comprised of seven native species) in tea plantations in the East Usambara Mountains, Tanzania. Among the findings were that pioneer recruits were very abundant whereas non-pioneers were disproportionately fewer. Importantly, 98% of all recruits were animal-dispersed. The size of legacy trees, driven mostly by the exotic Grevillea robusta, and to some extent, the native Milicia excelsa, explained abundance of recruits. The distribution of bird-dispersed recruits suggested that some bird species use all types of legacy trees equally in this fragmented landscape. In contrast, the distribution of bat-dispersed recruits provided strong evidence that seedling composition differed under native versus exotic legacy trees likely due to fruit bats showing more preference for native legacy trees. Native, as compared to exotic legacy trees, had almost two times more non-pioneer recruits, with Ficus and Milicia excelsa driving this trend. Implications of our findings regarding restoration in the tropics are numerous for the movement of native animal-dispersed tree species in fragmented and disturbed tropical forests surrounded by farmland. Isolated native trees that bear fleshy fruits can attract more frugivores, resulting not only in high recruitment under them, but depending on the dispersal mode of the legacy trees, also different suites of recruited species. When selecting tree species for plantings, to maximize visitation by different dispersal agents and to enhance seedling recruit diversity, bat-dispersed Milicia excelsa and Ficus species are recommended. [Study published here: https://doi.org/10.1371/journal.pone.0250859]
No description is available. Visit https://dataone.org/datasets/ess-dive-ed0df00af2baf17-20210429T234427355308 for complete metadata about this dataset.
A distributed framework and cyberinfrastructure for open, persistent, and secure access to Earth observational data. It ensures the preservation, access, use and reuse of multi-scale, multi-discipline, and multi-national science data via three primary cyberinfrastucture elements and a broad education and outreach program. The DataONE Investigator Toolkit is a collection of software tools for finding, using, and contributing data in DataONE. DataONE currently hosts three Coordinating Nodes that provide network-wide services to enhance interoperability of the Member Nodes and support indexing and replication services. Coordinating Nodes provide a replicated catalog of Member Node holdings and make it easy for scientists to discover data wherever they reside, also enabling data repositories to make their data and services more broadly available to the international community. DataONE Coordinating Nodes are located at the University of New Mexico, the University of California Santa Barbara and at the University of Tennessee (in collaboration with Oak Ridge National Laboratory). DataONE comprises a distributed network of data centers, science networks or organizations. These organizations can expose their data within the DataONE network through the implementation of the DataONE Member Node service interface. In addition to scientific data, Member Nodes can provide computing resources, or services such as data replication, to the DataONE community.
Contains data and code used in analyses.. Visit https://dataone.org/datasets/sha256%3A13f85e813a6a08c6cb2f7a02f12e2f4d6ff13f16c78dae55cda7962b8ef37d80 for complete metadata about this dataset.
Replication Data for: Science as a public good. Visit https://dataone.org/datasets/sha256%3A7855da2320b88e952012e50c545beb05e098bc15da4394ee826db50735bdccfc for complete metadata about this dataset.
No description is available. Visit https://dataone.org/datasets/de8286116302f62495a59df51f1e5cf6 for complete metadata about this dataset.
This dataset summarizes characteristics of 11 land use efficiency visualization tools that address vehicle miles traveled, gentrification, and equity. Summary characteristics include the tools' purpose, year of data or publication, data sources, methods used, units of anlaysis, and evaluation of the tool and ease of use. Links to tools and documentation are included., ,
This data set is a subset of data acquired from the United States Geological Survey's (USGS) National Land Cover Dataset (NLCD). The NLCD is comprised of Landsat data collected during 2001 that are representative of the land cover conditions present in the Georgia, USA regional study area during the Soil Moisture Experiment of 2003 (SMEX03).
This dataset contains data, documentation, and code files associated with studies performed on snapshots of the contents of Harvard Dataverse taken on 28 and 29 October 2019.
The 2018 Canadian Internet Use Survey (CIUS) measures access to the Internet and the online behaviours of individual residents of Canada 15 years of age and over, living in the provinces. The survey is built off the previous iteration of the CIUS, last conducted in 2012. The 2018 iteration has been redesigned and modernized to increase international comparability, answer government policy-relevant questions, and measure a wider range of online activities, given the rapid pace at which the Internet has evolved. The 2018 CIUS aims to measure the impact of digital technologies on the lives of Canadians. Information gathered will help to better understand how individuals use the Internet, including intensity of use, demand for online activities and online interactions. The CIUS examines, use of online government services, use of social networking websites or apps, smartphone use, digital skills, e-commerce, online work, and security, privacy and trust as it relates to the Internet.
Teaching undergraduate political methodology courses is a challenging task, yet has garnered little pedagogical discussion within the discipline. With the growing use of technology in the classroom, as well as the growing demand for data science and data literacy in our society, better understanding how we use statistical software in these courses is warranted. In this short paper, we shed light on current practices in teaching political methodology courses, with a particular emphasis on the use of statistical software. Combining an analysis of 93 course syllabi with a quantitative survey of research method instructors, we provide key information on the structure of these courses and how they incorporate statistical software. Our results reflect the growing importance of data literacy within the discipline, and suggest that more intentional discussions of research method pedagogy are needed in the future.
ESS-DIVE’s (Environmental Systems Science Data Infrastructure for a Virtual Ecosystem) dataset metadata reporting format is intended to compile information about a dataset (e.g., title, description, funding sources) that can enable reuse of data submitted to the ESS-DIVE data repository. The files contained in this dataset include instructions (dataset_metadata_guide.md and README.md) that can be used to understand the types of metadata ESS-DIVE collects. The data dictionary (dd.csv) follows ESS-DIVE’s file-level metadata reporting format and includes brief descriptions about each element of the dataset metadata reporting format. This dataset also includes a terminology crosswalk (dataset_metadata_crosswalk.csv) that shows how ESS-DIVE’s metadata reporting format maps onto other existing metadata standards and reporting formats. Data contributors to ESS-DIVE can provide this metadata by manual entry using a web form or programmatically via ESS-DIVE’s API (Application Programming Interface). A metadata template (dataset_metadata_template.docx or dataset_metadata_template.pdf) can be used to collaboratively compile metadata before providing it to ESS-DIVE. Since being incorporated into ESS-DIVE’s data submission user interface, ESS-DIVE’s dataset metadata reporting format, has enabled features like automated metadata quality checks, and dissemination of ESS-DIVE datasets onto other data platforms including Google Dataset Search and DataCite.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
51WORLD Synthetic Dataset Usage Documentation
1 Introduction
The 51WORLD systhetic dataset mainly contains camera sensor-related data and LiDAR sensor-related data generated by 51Sim-One. Camera sensor-related data mainly includes images and corresponding semantic segmentation, instance segmentation, depth annotation, and Object Detection annotation; LiDAR sensor-related data mainly includes laser point clouds and annotation of 3Dbboxes, semantic segmentation annotation… See the full description on the dataset page: https://huggingface.co/datasets/51WORLD/DataOne-synthetic-nuscenes-v1.1-sample.