24 datasets found

d
Data from: Data reuse and the open data citation advantage
datadryad.org
search.dataone.org
+2more
zip
Updated Oct 1, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Heather A. Piwowar; Todd J. Vision (2013). Data reuse and the open data citation advantage [Dataset]. http://doi.org/10.5061/dryad.781pv
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.781pv
Dataset updated
Oct 1, 2013
Dataset provided by
Dryad
Authors
Heather A. Piwowar; Todd J. Vision
Time period covered
2013
Description
Background: Attribution to the original contributor upon reuse of published data is important both as a reward for data creators and to document the provenance of research findings. Previous studies have found that papers with publicly available datasets receive a higher number of citations than similar studies without available data. However, few previous analyses have had the statistical power to control for the many variables known to predict citation rate, which has led to uncertain estimates of the "citation benefit". Furthermore, little is known about patterns in data reuse over time and across datasets. Method and Results: Here, we look at citation rates while controlling for many known citation predictors, and investigate the variability of data reuse. In a multivariate regression on 10,555 studies that created gene expression microarray data, we found that studies that made data available in a public repository received 9% (95% confidence interval: 5% to 13%) more citations th...
f
Statistical information of data set D2.
plos.figshare.com
figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shahzad Nazir; Muhammad Asif; Shahbaz Ahmad; Faisal Bukhari; Muhammad Tanvir Afzal; Hanan Aljuaid (2023). Statistical information of data set D2. [Dataset]. http://doi.org/10.1371/journal.pone.0228885.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0228885.t003
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Shahzad Nazir; Muhammad Asif; Shahbaz Ahmad; Faisal Bukhari; Muhammad Tanvir Afzal; Hanan Aljuaid
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Statistical information of data set D2.
c
Data from: Open Data engages Citation and Reuse: A Follow-up Study on...
datacatalogue.cessda.eu
ssh.datastations.nl
Updated Apr 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
D. Farace (2023). Open Data engages Citation and Reuse: A Follow-up Study on Enhanced Publication [Dataset]. http://doi.org/10.17026/dans-zy8-fcjw
Explore at:
Unique identifier
https://doi.org/10.17026/dans-zy8-fcjw
Dataset updated
Apr 11, 2023
Dataset provided by
GreyNet International
Authors
D. Farace
Description
In 2011, GreyNet embarked on an Enhanced Publications Project (EPP) in order to link its collection of full text conference papers with accompanying research data. The initial phase in the study dealt with the design and implementation of an online questionnaire among authors, who were published in the International Conference Series on Grey Literature. From 2012 onwards, subsequent phases in the project dealt with the acquisition, submission, indexing, and archiving of GreyNet’s collection of published datasets now housed in the DANS EASY data archive.
In 2017, GreyNet’s Enhanced Publications Project was further broadened to include a Data Papers Project. Here, emphasis focused on describing the data rather than analyzing it. As such, the data paper signals data sharing and in this way promotes both data citation and the potential reuse of research data in line with the FAIR Guiding Principles for scientific data management and stewardship.
Available results from the Data Papers Project presented last year at GL19 concludes where this study commences. Here, we now seek to demonstrate the reuse of survey data collected in 2011 combined with survey data that will be newly collected via an online questionnaire. The survey population will be drawn from among GreyNet’s author base; and, a selection of questions from the 2011 Survey will be joined by newly formulated questions in constructing the questionnaire. Furthermore, GreyNet relying upon available use and usage statistics compiled from various sources will seek to provide evidence of data citation and referencing.
The results of this study are expected to demonstrate an increased willingness among GreyNet authors to share their research data – this in part due to GreyNet’s program of enhanced publication embedded in its workflow over the past six years. The study will provide an example of the reuse and further comparison of the results of survey data, which can be incorporated in GreyNet’s program of training and instruction. However, statistics on data citation and referencing are less likely expected to provide indicative results.

Date: Survey of 2018
U
Statistical Abstract of the United States, 2011
dataverse-staging.rdmc.unc.edu
Updated Oct 28, 2011
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UNC Dataverse (2011). Statistical Abstract of the United States, 2011 [Dataset]. https://dataverse-staging.rdmc.unc.edu/dataset.xhtml?persistentId=hdl:1902.29/CD-10849
Explore at:
Dataset updated
Oct 28, 2011
Dataset provided by
UNC Dataverse
License
https://dataverse-staging.rdmc.unc.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=hdl:1902.29/CD-10849https://dataverse-staging.rdmc.unc.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=hdl:1902.29/CD-10849
Description
"The Statistical Abstract of the United States, published since 1878, is the standard summary of statistics on the social, political, and economic organization of the United States. It is designed to serve as a convenient volume for statistical reference and as a guide to other statistical publications and sources. The latter function is served by the introductory text to each section, the source note appearing below each table, and Appendix I, which comprises the Guide to Sources of Statisti cs, the Guide to State Statistical Abstracts, and the Guide to Foreign Statistical Abstracts. The Statistical Abstract sections and tables are compiled into one Adobe PDF named StatAbstract2009.pdf. This PDF is bookmarked by section and by table and can be searched using the Acrobat Search feature. The Statistical Abstract on CD-ROM is best viewed using Adobe Acrobat 5, or any subsequent version of Acrobat or Acrobat Reader. The Statistical Abstract tables and the metropolitan areas tables from Appendix II are available as Excel(.xls or .xlw) spreadsheets. In most cases, these spreadsheet files offer the user direct access to more data than are shown either in the publication or Adobe Acrobat. These files usually contain more years of data, more geographic areas, and/or more categories of subjects than those shown in the Acrobat version. The extensive selection of statistics is provided for the United States, with selected data for regions, divisions, states, metropolitan areas, cities, and foreign countries from reports and records of government and private agencies. Software on the disc can be used to perform full-text searches, view official statistics, open tables as Lotus worksheets or Excel workbooks, and link directly to source agencies and organizations for supporting information. Except as indicated, figures are for the United States as presently constituted. Although emphasis in the Statistical Abstract is primarily given to national data, many tables present data for regions and individual states and a smaller number for metropolitan areas and cities.Statistics for the Commonwealth of Puerto Rico and for island areas of the United States are included in many state tables and are supplemented by information in Section 29. Additional information for states, cities, counties, metropolitan areas, and other small units, as well as more historical data are available in various supplements to the Abstract. Statistics in this edition are generally for the most recent year or period available by summer 2006. Each year over 1,400 tables and charts are reviewed and evaluated; new tables and charts of current interest are added, continuing series are updated, and less timely data are condensed or eliminated. Text notes and appendices are revised as appropriate. This year we have introduced 72 new tables covering a wide range of subject areas. These cover a variety of topics including: learning disability for children, people impacted by the hurricanes in the Gulf Coast area, employees with alternative work arrangements, adult computer and Internet users by selected characteristics, North America cruise industry, women- and minority-owned businesses, and the percentage of the adult population considered to be obese. Some of the annually surveyed topics are population; vital statistics; health and nutrition; education; law enforcement, courts and prison; geography and environment; elections; state and local government; federal government finances and employment; national defense and veterans affairs; social insurance and human services; labor force, employment, and earnings; income, expenditures, and wealth; prices; business enterprise; science and technology; agriculture; natural resources; energy; construction and housing; manufactures; domestic trade and services; transportation; information and communication; banking, finance, and insurance; arts, entertainment, and recreation; accommodation, food services, and other services; foreign commerce and aid; outlying areas; and comparative international statistics." Note to Users: This CD is part of a collection located in the Data Archive of the Odum Institute for Research in Social Science, at the University of North Carolina at Chapel Hill. The collection is located in Room 10, Manning Hall. Users may check the CDs out subscribing to the honor system. Items can be checked out for a period of two weeks. Loan forms are located adjacent to the collection.
English Monograph OCR Dataset (Preprocessed) 📄🔍
kaggle.com
Updated Mar 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arjav 007 (2025). English Monograph OCR Dataset (Preprocessed) 📄🔍 [Dataset]. https://www.kaggle.com/datasets/arjav007/icdar-eng
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 21, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Arjav 007
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This dataset is a preprocessed version of the English Monograph subset from the ICDAR 2017 OCR Post-Correction competition. It contains OCR-generated text alongside its corresponding aligned ground truth, making it useful for OCR error detection and correction tasks.

📌 About the Dataset

The dataset consists of historical English texts that were processed using OCR technology. Due to OCR errors, the text contains misrecognized characters, missing words, and other inaccuracies. This dataset provides both raw OCR output and gold-standard corrected text.

🚀 Use Cases

This dataset is ideal for:
- OCR Error Detection & Correction 📝
- Training Character-Based Machine Translation Models 🔠
- Natural Language Processing (NLP) on Historical Texts 📜

📊 Dataset Statistics

Total Entries: 724

Character-Level OCR Error Rate: ~1.79%

Common OCR Errors Observed:

1 → I

tbe → the

tho → the

aud → and

📜 Citation

If you use this dataset, please cite the original ICDAR 2017 OCR Post-Correction paper:

Chiron, G., Doucet, A., Coustaty, M., Moreux, J.P. (2017). ICDAR 2017 Competition on Post-OCR Text Correction.
f
Statistics of dataset D2.
figshare.com
plos.figshare.com
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shahzad Nazir; Muhammad Asif; Shahbaz Ahmad; Faisal Bukhari; Muhammad Tanvir Afzal; Hanan Aljuaid (2023). Statistics of dataset D2. [Dataset]. http://doi.org/10.1371/journal.pone.0228885.t008
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0228885.t008
Dataset updated
Jun 3, 2023
Dataset provided by
PLOS ONE
Authors
Shahzad Nazir; Muhammad Asif; Shahbaz Ahmad; Faisal Bukhari; Muhammad Tanvir Afzal; Hanan Aljuaid
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Statistics of dataset D2.
d
Statistical daily streamflow estimates at GAGES-II non-reference streamgages...
catalog.data.gov
data.usgs.gov
+1more
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Statistical daily streamflow estimates at GAGES-II non-reference streamgages in the conterminous United States, Water Years 1981-2017 [Dataset]. https://catalog.data.gov/dataset/statistical-daily-streamflow-estimates-at-gages-ii-non-reference-streamgages-in-the-c-1981
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Contiguous United States, United States
Description
This data release contains daily time series estimates of natural streamflow at 5,439 GAGES-II non-reference streamgages in 19 study regions across the conterminous United States from October 1, 1980 through September 30, 2017, using five statistical techniques: nearest-neighbor drainage area ratio (NNDAR), map-correlation drainage area ratio (MCDAR), nearest-neighbor nonlinear spatial interpolation using flow duration curves (NNQPPQ), map-correlation nonlinear spatial interpolation using flow duration curves (MCQPPQ), and ordinary kriging of the logarithms of discharge per unit area (OKDAR). NNDAR, MCDAR, NNQPPQ, and MCQPPQ estimates were computed following methods described in Farmer and others (2014), with updates to the flow-duration curve modeling which is described in Over and others (2018). OKDAR estimates were computed using pooled variograms for each study region following methods described in Farmer (2016). Daily streamflow estimation was conducted by study region (hydrologic unit code level-2 regions as defined in Falcone, 2011) by building statistical models using 1,385 GAGES-II reference streamgages from mostly undisturbed watersheds as index gages (Russell and others, 2020). Estimates were then made at GAGES-II non-reference streamgages. Location information and basin characteristics for study gages were obtained from the GAGES-II dataset (Falcone, 2011). Observed daily streamflow data were retrieved from the National Water Information System (USGS, 2019). This data release contains 19 separate zip files; one for each study region. Each zip file contains an individual tab-delimited text file for each non-reference streamgage in the study region. A text file summarizing period of record information for each non-reference streamgage is provided (non-reference_gages_summary.csv). This data release also contains a text file (Model_info.csv) of regional regression equations for 27 flow quantiles that were developed in each study region in order to implement the QPPQ methods and a text file (BC_transformations.csv) describing transformations made to the GAGES-II derived basin characteristics prior to use in the regression equations. The five sets of streamflow estimates represent expected natural streamflow conditions with minimal disturbance by human activities, in other words, without the effects of regulation, diversion, land development, or other anthropogenic activities. The observed streamflow records at the non-reference streamgages were compared to the five simulated streamflow records. These performance metrics are provided at each gage for all five statistical methods (NonRef_PMs_byStation.csv) and as summaries by region (NonRef_PM_summaries_byRegion.csv). References cited: Falcone, J.A., 2011, GAGES-II: Geospatial Attributes of Gages for Evaluating Streamflow [digital spatial dataset]: U.S. Geological Survey Water Resources NSDI Node web page, https://water.usgs.gov/lookup/getspatial?gagesII_Sept2011. Farmer, W.H., Archfield, S.A., Over, T.M., Hay, L.E., LaFontaine, J.H., and Kiang, J.E., 2014, A comparison of methods to predict historical daily streamflow time series in the southeastern United States: U.S. Geological Survey Scientific Investigations Report 2014–5231, 34 p., http://dx.doi.org/10.3133/sir20145231. Farmer, W. H., 2016, Ordinary kriging as a tool to estimate historical daily streamflow records, Hydrology and Earth System Sciences, 20, 2721-2735, https://doi.org/10.5194/hess-20-2721-2016. Over, T.M., Farmer, W.H., and Russell, A.M., 2018, Refinement of a regression-based method for prediction of flow-duration curves of daily streamflow in the conterminous United States: U.S. Geological Survey Scientific Investigations Report 2018–5072, 34 p., https://doi.org/10.3133/sir20185072. Russell, A.M., Over, T.M., and Farmer, W.H., 2020, Cross-validation results for five statistical methods of daily streamflow estimation at 1,385 reference streamgages in the conterminous United States, Water Years 1981-2017: U.S. Geological Survey data release, https://doi.org/10.5066/P9XT4WSP. U.S. Geological Survey, 2019, National Water Information System data available on the World Wide Web (USGS Water Data for the Nation), accessed 07/08/2019, at http://dx.doi.org/10.5066/F7P55KJN.
Data from: Reference Site Condition Datasets for Floating Wind Arrays in the...
data.openei.org
osti.gov
+1more
archive
Updated Aug 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Biglu; Hall; Lozon; Housner; Biglu; Hall; Lozon; Housner (2024). Reference Site Condition Datasets for Floating Wind Arrays in the United States [Dataset]. https://data.openei.org/submissions/8289
Explore at:
archiveAvailable download formats
Dataset updated
Aug 2, 2024
Dataset provided by
United States Department of Energyhttp://energy.gov/
National Renewable Energy Laboratory
Open Energy Data Initiative (OEDI)
Authors
Biglu; Hall; Lozon; Housner; Biglu; Hall; Lozon; Housner
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United States
Description
Floating offshore wind farm design is highly site-specific, requiring detailed information about the specific conditions of a project area for realistic design studies. Unfortunately, publicly available site condition data for potential floating offshore wind project sites in the United States is scarce. To support U.S. offshore wind research, we developed reference site condition datasets, including metocean and seabed information, for four potential floating wind project areas in the U.S.: Humboldt Bay, Morro Bay, the Gulf of Maine, and the Gulf of Mexico. These datasets were compiled using publicly available data. Our metocean analysis, covering wind, waves, and surface currents, utilized measurement data from 2000 to 2020. Sources included the National Renewable Energy Laboratory’s National Offshore Wind Dataset for wind data, National Data Buoy Center buoys for wave data, and the High Frequency Radar Network for surface currents. These data were integrated into hourly time series used to compute extreme return periods up to 500 years, monthly statistics, and joint probability clusters for fatigue analysis. Soil conditions were evaluated using the usSEABED database and bathymetry grids were interpolated from the NCEI Digital Elevation Model Global Mosaic. Further information on the datasets and how they were created can be found in: Biglu, Michael, Matthew Hall, Ericka Lozon, and Stein Housner. 2024. Reference Site Conditions for Floating Wind Arrays in the United States. Golden, CO: National Renewable Energy Laboratory. NREL/TP-5000-89897. https://www.nrel.gov/docs/fy24osti/89897.pdf The data are also available at: https://github.com/FloatingArrayDesign/SiteConditions The content of each dataset is as follows: _NOW23_wind.txt: Hourly NOW-23 wind data up to a height of 400 meter. _metocean_1hr.txt: Hourly time series including wind, wave, surface current and temperature data. _Summary.xlsx: Metocean data, including extreme values, joint probability distributions and monthly statistics. _usSEABED_soil.csv: Extract of the usSEABED database for this specific site. _bathymetry_200m.txt (and 500m, 1000m): Gridded seabed depth data.
Portable pseudo-random reference sequences with Mersenne Twister using GNU...
figshare.com
png
Updated Jun 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniele de Rigo (2023). Portable pseudo-random reference sequences with Mersenne Twister using GNU Octave. Mastrave project technical report [Dataset]. http://doi.org/10.6084/m9.figshare.94593.v10
Explore at:
pngAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.94593.v10
Dataset updated
Jun 1, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Daniele de Rigo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
de Rigo, D. (2012). Portable pseudo-random reference sequences with Mersenne Twister using GNU Octave. Mastrave project technical report. FigShare Digital Science. doi: 10.6084/m9.figshare.94593

Portable pseudo-random reference sequences with Mersenne Twister using GNU Octave

Mastrave project technical report

Daniele de Rigo

Abstract: Computationally intensive numerical tasks such as those involving statistical resampling, evolutionary techniques or Monte Carlo based applications are known to require robust algorithms for generating large sequences of pseudo-random numbers (PRN). While several languages, libraries and computing environments offer suitable PRN generators, the underlying algorithms and parametrization widely differ. Therefore, easily replicating a certain PRN sequence generally implies forcing researchers to use a very specific language or computing environment, also paying attention to its version, possible critical dependencies or even operating system and computer architecture. Despite the awareness of the benefits of reproducible research is rapidly growing, the definition itself of “reproducibility” for PRN based applications may lead to diverging interpretations and expectations. Where the cardinality of PRN sequences needed for data to be processed is relatively moderate, the paradigm of reproducible research is in principle suitable to be applied not only to algorithms, free software, data and metadata (classic reproducible research), but also to the involved pseudo-random sequences themselves (deep reproducible research). This would allow not only the “typical” scientific results to be reproducible “except for PRN-related statistical fluctuations”, but also the exact results published by a research team to be independently reproduced by other scientists - without of course preventing sensitivity analysis with different PRN sequences, as even classic reproducible research should easily allow. However, finding reference sequences of pseudo random numbers suitable to enable such a deep reproducibility may be surprisingly difficult. Here, sequences eligible to be used as reference dataset of uniformly distributed pseudo-random numbers are presented. The dataset of sequences has been generated using Mersenne Twister with a period of 2^19937-1, as implemented in GNU Octave (version 3.6.1) with the Mastrave modelling library. The sequences are available in plain text format and also in the format MATLAB version 7, which is portable in both GNU Octave and MATLAB computing environments. The plain text format uses a fixed number of characters per each PRN so allowing random access to sparse PRNs to be easily done in constant time without needing a whole file to be loaded. This straightforward solution is language neutral, with the advantage of enabling wide and immediate portability for the presented reference PRN dataset, irrespective of the language, libraries, computing environment of choice for the users.

Naming conventions: Each file pseudorand_seq_[N].[ext] contains a sequence of N pseudo-random numbers, uniformly distributed (generated using a Mersenne Twister with a period of 2^19937-1, as implemented in GNU Octave version 3.6.1). The extension may be “txt” for the pure text sequence of PRN (35 characters – including the endline one – and one PRN per each line) or “mat” for the corresponding format MATLAB version 7 (containing a structure with two fields: the filed “values” contains N numerical PRN in double precision; the field “string” contains a matrix of characters witn N rows and 24 columns – the endline character being omitted – the last one fulfilling the constraint to only contain digits whose value is “0” ).

Download (permanent URLs aside from the ones provided at http://dx.doi.org/10.6084/m9.figshare.94593): http://mastrave.org/doc/refdata/fs94593/10.txt pseudorand_seq_10.txthttp://mastrave.org/doc/refdata/fs94593/100.txt pseudorand_seq_100.txthttp://mastrave.org/doc/refdata/fs94593/1000.txt pseudorand_seq_1000.txthttp://mastrave.org/doc/refdata/fs94593/10000.txt pseudorand_seq_10000.txthttp://mastrave.org/doc/refdata/fs94593/100000.txt pseudorand_seq_100000.txthttp://mastrave.org/doc/refdata/fs94593/1000000.txt pseudorand_seq_1000000.txt http://mastrave.org/doc/refdata/fs94593/10.mat pseudorand_seq_10.mathttp://mastrave.org/doc/refdata/fs94593/100.mat pseudorand_seq_100.mathttp://mastrave.org/doc/refdata/fs94593/1000.mat pseudorand_seq_1000.mathttp://mastrave.org/doc/refdata/fs94593/10000.mat pseudorand_seq_10000.mathttp://mastrave.org/doc/refdata/fs94593/100000.mat pseudorand_seq_100000.mathttp://mastrave.org/doc/refdata/fs94593/1000000.mat pseudorand_seq_1000000.mat

MD5 checksums ( http://mastrave.org/doc/refdata/fs94593/md5 ): d25fd2747eea3c1ab0aa81ef64aa7769 pseudorand_seq_10.txta26711dfc2fafa7fac3dc0d1cc6472cd pseudorand_seq_100.txteb54b41e8dd7c946a9799342c27f4c90 pseudorand_seq_1000.txtbdf6b4afd237ccf0fe4a97bf5c847f1d pseudorand_seq_10000.txt20b8398775a93e59533e9f14ba402caa pseudorand_seq_100000.txt00b1d615be143d5dd093ba6bd066a833 pseudorand_seq_1000000.txt

SHA1 checksums ( http://mastrave.org/doc/refdata/fs94593/sha1 ): 837e0f2793d1b0767f6eb03868a7e081ea530073 pseudorand_seq_10.txt2bb59e7cc5369eb896df86061e720baaa4da1d96 pseudorand_seq_100.txt79ae593da0b052bf11713155cd6e4f7d3906baff pseudorand_seq_1000.txte23bd2e157f5447096f36e704a84aa77224ab9ae pseudorand_seq_10000.txt3b4a088752cebee4158a8cebfdd88193c4a03872 pseudorand_seq_100000.txtaa9fb09d59c9b331fbba2360c345e47ede032b0e pseudorand_seq_1000000.txt The Mastrave modelling library offers the module rprand (acronym for reproducible pseudo-random generator) which allows all the PRN reference sequences to be exactly reproduced in a variety of versions of GNU Octave and MATLAB computing environments, by silently downloading and importing the corresponding published reference files. The PRN sequences can also be easily imported and used in a variety of languages, operating systems and computer architectures (see Appendix B for a further discussion). This is straightforward to do by directly importing the plain text files listing the PRN reference sequences in csv format.Therefore, users and applications of the published PRN reference sequences may not be directly interested in their exact numerical reproducibility. The exact numerical reproducibility by re-generation of the PRN sequences (deep reproducible research) is the subject of the Appendix A. Appendix A: reproducing the PRN reference sequences within GNU Octave The proposed sequences can be numerically reproduced within the GNU Octave computing environment. The GNU Octave version with which the sequences have been generated is the version 3.6.1. Alternative strategies for generating reference PRN sequences could have implied dedicated source code to be released as an autonomous free software package.This possible strategy was discarded due to the need for the PRN sequences to be as reliable as possible so to offer long-term general reusability. Extensive testing of both PRN generators' algorithms and implementations is of obvious importance. While systematic, exhaustive testing of all aspects of nontrivial code is almost impossible, linking this phase to already established free software numerical packages with broad diffusion and endorsement among computational scientists is perhaps the safest way for mitigating the risk of generating unexpectedly biased PRN sequences.GNU Octave is a well-established and widely used environment for computational science applications. It is also free software and part of the GNU project, which is one of the most relevant free software projects. GNU Octave belongs to the GNU list of high priority projects, where its development is supported as high-level language for numerical computations. This further corroborates the selection of GNU Octave as a suitable free software environment for ensuring the long-term availability to run the PRN sequences' source code.The following codelet generates the double precision floating-point PRN reference sequences (GNU Octave language): assert( exist( 'OCTAVE_VERSION' ) ) assert( isequal( OCTAVE_VERSION, '3.6.1' ) ) % Initializating the Mersenne Twister PRN generator rand( 'seed', pi ) rand( 1000, 1000 ); % Generating the reference sequences pseudorand_seq_10 = rand( 10 , 1 ); pseudorand_seq_100 = rand( 1e2, 1 ); pseudorand_seq_1000 = rand( 1e3, 1 ); pseudorand_seq_10000 = rand( 1e4, 1 ); pseudorand_seq_100000 = rand( 1e5, 1 ); pseudorand_seq_1000000 = rand( 1e6, 1 );Compliancy checks (GNU Octave language: the Mastrave modelling library is also required): assert( rprand( 10 , 10 , 0 ) == pseudorand_seq_10 ) assert( rprand( 1e2, 1e2, 0 ) == pseudorand_seq_100 ) assert( rprand( 1e3, 1e3, 0 ) == pseudorand_seq_1000 ) assert( rprand( 1e4, 1e4, 0 ) == pseudorand_seq_10000 ) assert( rprand( 1e5, 1e5, 0 ) == pseudorand_seq_100000 ) assert( rprand( 1e6, 1e6, 0 ) == pseudorand_seq_1000000 )Appendix B: low-level memory representation of the PRN seed
The seed of the PRN generator is the IEEE754 double precision floating-point value approximating the mathematical constant π. The following codelets are expressed in GNU Octave language (the assertions check necessary but not sufficient conditions): assert( pi == 3.1415926535897932 ) pi_bytecode = double( typecast( pi, 'uint8' ) ) ; pi_hexcode = dec2hex( double( typecast( pi, 'uint8' ) ) );where pi_bytecode value is (unsigned integers): 24 45 68 84 251 33 9 64and pi_hexcode value is (hexadecimal values): 18 2D 44 54 FB 21 09 40which can be reverted to the original double format by means of the codelet: typecast( uint8( hex2dec( pi_hexcode ) ), 'double' )so to

‘Port Statistical Area’ analyzed by Analyst-2

analyst-2.ai

Updated Jan 26, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Port Statistical Area’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-gov-port-statistical-area-acd1/latest

Explore at:

Dataset updated

Jan 26, 2022

Dataset authored and provided by

Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Analysis of ‘Port Statistical Area’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/6b5fc628-4632-4495-81c3-5f8e2414f374 on 26 January 2022.

--- Dataset description provided by original source is as follows ---

Per Engineering Regulation 1130-2-520, USACE’s NDC and WCSC are responsible for collecting, compiling, printing, and distributing all domestic waterborne commerce statistics for which the USACE has responsibility. Per a 1998 Office of Management and Budget (OMB) memorandum, the WCSC inherited the requirement to include foreign waterborne commerce formally executed by the U.S. Census Bureau. Performance of this work is in accordance with the Rivers and Harbors Appropriation Act of 1922 (33 USC 555).

Engineering Regulation 1130-2-520 defines a port as:

(1) Port limits defined by legislative enactments of state, county, or city governments.

(2) The corporate limits of a municipality.

At minimum, the feature class includes the following attribution:

Attribute Name

Definition

Data Type

Length

featureDescription

The narrative describing the feature. This attribute column will describe how the statistical port boundary was generated using GIS. It can include the legislative description, a note that the U.S. Census Bureau municipal limit was used, or other details.

String

Max

featureName

The common name of the feature. This will be the port name as defined by the legislative enactment or the municipality. Each name should include which State(s) the port is located (ex. Louisville-Jefferson County Riverport Authority, KY).

String

installationId

The codes assigned by the DoD Component used to identify the site or group of sites that make up an installation. This field will remain empty, as the project focus is not on military installations.

String

mediaId

Used to link the record to associated multimedia records the reference data. The number used in this column will reference a related “mediaId” table that will store the source document for appropriate legislation or municipality limit reference.

String

metadataId

Used to represent or link to feature level metadata. For this project, a common code for the port area geometry source will be employed.

<td style='width:1.75in; border-top:none; border-left:none; border-bottom:solid windowtext 1.0pt; border-right:solid windowtext 1.0pt;

L	Legislative Enactment
M	Municipal Limits
O

f
Statistics of dataset D1.
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shahzad Nazir; Muhammad Asif; Shahbaz Ahmad; Faisal Bukhari; Muhammad Tanvir Afzal; Hanan Aljuaid (2023). Statistics of dataset D1. [Dataset]. http://doi.org/10.1371/journal.pone.0228885.t007
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0228885.t007
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Shahzad Nazir; Muhammad Asif; Shahbaz Ahmad; Faisal Bukhari; Muhammad Tanvir Afzal; Hanan Aljuaid
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Statistics of dataset D1.
U
Statistical Abstract of the United States, 2002
dataverse-staging.rdmc.unc.edu
Updated Nov 30, 2007
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UNC Dataverse (2007). Statistical Abstract of the United States, 2002 [Dataset]. https://dataverse-staging.rdmc.unc.edu/dataset.xhtml?persistentId=hdl:1902.29/CD-0175
Explore at:
Dataset updated
Nov 30, 2007
Dataset provided by
UNC Dataverse
License
https://dataverse-staging.rdmc.unc.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=hdl:1902.29/CD-0175https://dataverse-staging.rdmc.unc.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=hdl:1902.29/CD-0175
Area covered
United States
Description
"The Statistical Abstract is the nation's best known and most popular single source of statistics on the social, political, and economic organization of the country. The print version has been published since 1878, and a compact disc version has been available since 1993. Both are designed to serve as a convenient, easy-to-use statistical reference source and guide to statistical publications and sources. The extensive selection of statistics is provided for the United States, with selected d ata for regions, divisions, states, metropolitan areas, cities, and foreign countries from reports and records of government and private agencies. Software on the disc can be used to perform full-text searches, view official statistics, open tables as Lotus worksheets or Excel workbooks, and link directly to source agencies and organizations for supporting information. The disc contains over 1,500 tables from over 250 different governmental, private, and international organizations. Some of the topics are population; vital statistics; health and nutrition; education; law enforcement, courts and prison; geography and environment; elections; state and local government; federal government finances and employment; national defense and veterans affairs; social insurance and human services; labor force, employment, and earnings; income, expenditures, and wealth; prices; business enterprise; science and technology; agriculture; natural resources; energy; construction and housing; manufactures; domestic trade and services; transportation; information and communication; banking, finance, and insurance; arts, entertainment, and recreation; accommodation, food services, and other services; foreign commerce and aid; outlying areas; and comparative international statistics. Significant changes in the 2002 data include new data from the 2000 census and new tables that include data covering resident population's migration status, educational attainment, disability status, ancestry, place of birth, and language spoken at home as well as househol d income, poverty, and selected housing characteristics from the sample portion of the 2000 census. New tables cover topics such as unmarried households, state children's health insurance programs, limitation of activity level caused by chronic conditions, characteristics of homeschooled children, firearm-use offenders, home- based work and flexible work by workers, computer use in the workplace, employee benefits, and computer and Internet use." Note to Users: This CD is part of a collection located in the Data Archive of the Odum Institute for Research in Social Science, at the University of North Carolina at Chapel Hill. The collection is located in Room 10, Manning Hall. Users may check the CDs out subscribing to the honor system. Items can be checked out for a period of two weeks. Loan forms are located adjacent to the collection.
Z
Data used in the manuscript - A Hierarchical Approach for Evaluating Athlete...
data.niaid.nih.gov
zenodo.org
Updated Jun 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thiago de Paula Oliveira (2023). Data used in the manuscript - A Hierarchical Approach for Evaluating Athlete Performance with an Application in Elite Basketball [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8056756
Explore at:
Dataset updated
Jun 20, 2023
Dataset authored and provided by
Thiago de Paula Oliveira
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The database contains several datasets and files with NBA statistical data spanning four seasons (2015-2016 to 2018-2019). These datasets were procured from the Basketball Reference database (https://www.basketball-reference.com/), a publicly accessible source of NBA data.

The main file, dat.cleaned.csv, includes the Win/Loss records for all thirty NBA teams, along with box scores and advanced statistics. The data captured over the four seasons correspond to about 4,920 regular-season games. A distinguishing feature of this dataset is the repeated measurements per player within a team across the seasons. However, it's important to note that these repeated measurements are not independent, necessitating the use of hierarchical modelling to properly handle the data.

Two sets of additional text files (per_2017.txt, per_2018.txt, rpm_2017.txt, rpm_2018.txt) provide specific metrics for player performance. The 'PER' files contain the Athlete Efficiency Rating (PER) for the years 2017 and 2018. The 'RPM' files contain the ESPN-developed score called Real Plus-Minus (RPM) for the same years.

However, potential biases or limitations within the datasets should be acknowledged. For instance, the Basketball Reference website might not include data from some matches or may exclude certain variables, potentially affecting the quality and accuracy of the dataset.
Data from: The genomics and evolution of inter-sexual mimicry and...
zenodo.org
explore.openaire.eu
application/gzip, bin +2
Updated Sep 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Beatriz Willink; Beatriz Willink; Kalle Tunström; Kalle Tunström; Sofie Nilén; Rayan Chikhi; Rayan Chikhi; Téo Lemane; Téo Lemane; Michihiko Takahashi; Yuma Takahashi; Yuma Takahashi; Erik I. Svensson; Erik I. Svensson; Chris W. Wheat; Chris W. Wheat; Sofie Nilén; Michihiko Takahashi (2023). Data from: The genomics and evolution of inter-sexual mimicry and female-limited polymorphisms in damselflies [Dataset]. http://doi.org/10.5281/zenodo.8304153
Explore at:
application/gzip, bin, csv, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8304153
Dataset updated
Sep 20, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Beatriz Willink; Beatriz Willink; Kalle Tunström; Kalle Tunström; Sofie Nilén; Rayan Chikhi; Rayan Chikhi; Téo Lemane; Téo Lemane; Michihiko Takahashi; Yuma Takahashi; Yuma Takahashi; Erik I. Svensson; Erik I. Svensson; Chris W. Wheat; Chris W. Wheat; Sofie Nilén; Michihiko Takahashi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains intermediate output files required to reproduce the figures in the main text and Supporting Material of Willink et al. 2023. The genomics and evolution of inter-sexual mimicry and female-limited polymorphisms in damselflies.

FILE OVERVIEW:

1. Morph-specific assemblies
A. File names: Afem_1354_ragtag.fasta.gz, Ifem_1049_ragtag.fa.gz, Ofem_0081_ragtag.fa.gz, O054_Shasta_run2.PMDV.HAP1.purged.fasta.gz, A059_Shasta_run1.PMDV.HAP1.purged.fa.gz
B. Description: genome assemblies for different morphs of Ischnura elegans (Afem_1354, Ifem_1049, and Ofem_0081) and Ischnura senegalensis (A059 and O054), generated in this study from long-read Nanopore data using Shasta v 0.7.0 (https://github.com/paoloshasta/shasta).

2. Assembly statistics
A. File names: Assembly_statistics.csv, Assembly_statistics_sen.csv
B. Description: Completeness and quality metrics for de novo genome assemblies of I. elegans and I. senegalensis female morphs. See Fig. S1-S2.

3. Repetitive content annotation
A. File names: A1354_ragtag_RED.bed.repeats.bed.gz, Afem_Shasta1_polished_ragtag_UPPER.fa.out.gz, Ifem_Shasta2_polished_ragtag_UPPER.fa.out.gz, ioIscEleg1.1.primary_UPPER.fa.out.gz, ToL_RED.repeats.bed.gz
B. Description: Annotation of repetitive sequences in morph-specific assemblies. All morph assemblies (A, I and Darwin Tree of Life assemblies) were annotated using RepeatModeler v 2.0.1 and RepeatMasker v 1.0.93 (http://www.repeatmasker.org). The A morph and DToL assemblies were additionally annotated using Red v 0.0.1 (https://github.com/BioinformaticsToolsmith/Red). RepeatMasker annotations were then used to estimate TE coverage. See Extended Data Fig. 4 and Fig. S7.

4. GWAS output
A. File names: A1354_ragtag_AvI.assoc_filtered.txt.gz, A1354_ragtag_AvO.assoc_filtered.txt.gz, A1354_ragtag_IvO.assoc_filtered.txt.gz, ToL_AvI.assoc_filtered.txt.gz, ToL_AvO.assoc_filtered.txt.gz, ToL_IvO.assoc_filtered.txt.gz
B. Description: filtered SNPs in pairwise association tests between morphs (n = 19 resequencing samples per morph) of I. elegans. Analyses were conducted in PLINK v 1.9 (http://pngu.mgh.harvard.edu/purcell/plink/), using either the A morph assembly (Fig. 2a-b), or the Darwin Tree of Life (DToL) reference assembly (Extended Data Figure 8a-b) as mapping reference.

5. Population statistics
A. File names: Afem_pixy_30K_fst.txt.gz, A1354_30kb.Tajima.D.gz, Afem_pi_30K_pi.txt.gz, ToL_30K_fst.txt.gz, ToL_30kb.Tajima.D.gz, ToL_30K_onepop_pi.txt.gz
B. Description: Genetic differentiation (fst) between morphs, Tajima's D statistics, and nucleotide diversity across 30 kb windows of the I. elegans genome. Population statistics were computed using either the A morph assembly (Fig. 2c-e), or the DToL reference assembly (Extended Data Figure 8c-e) as mapping reference.

6. k-mer based GWAS
A. File names: AvI_kmers.fa.gz, AvO_kmers.fa.gz, OvAI_kmers.fa.gz, AvI_kmers.fa_v_A1354_Shasta_run1_table.tsv.gz, AvO_kmers.fa_v_A1354_Shasta_run1_table.tsv.gz, OvAI_kmers.fa_v_A1354_Shasta_run1_table.tsv.gz, OvAI_kmers.fa_v_Ifem_1049_ragtag_table.tsv.gz
B. Description: List of significant k-mers (in fasta format) in three k-mer based association analyses (n = 19 resequencing samples per morph) between morphs of I. elegans. Significant k-mers were then mapped to morph-specific assemblies using Blast v 2.22.28 (https://blast.ncbi.nlm.nih.gov/Blast.cgi) for short sequences. We include mapping results shown in Fig. 3a-b.

7. Read-depth coverage
A. File names: reseq_coverage_norepeat_500_window.bed.gz, nano_coverage_norepeat_500_window.bed.gz, Ifem_nano_coverage_norepeat_500_window.bed.gz, Ifem_reseq_coverage_norepeat_500_window_15Mb.bed.gz, poolseq_coverage_norepeat_500_window.bed.gz, morph_coverage_norepeat_diff_500.tsv.gz, SwD_popmap
B. Description: Read depth coverage of the morph locus and a 15 mb region used to estimate baseline read depths. 19 Illumina resequencing samples, and one long-read Nanopore sample of each morph of I. elegans were mapped to both the A and I assemblies to estimate read depth. Two poolseq samples (each pool consisting of 30 females of each morph) of I. senegalensis were mapped to the A assembly of I. elegans to estimate read depth. Read depth was estimated in mosdepth v 0.2.8 (https://github.com/brentp/mosdepth) across 500 bp windows after filtering windows with more than 10% repetitive content. For poolseq samples, the difference in coverage values between the A and O pools was computed across the entire genome. Sample information for resequencing samples is recorded in the file SwD_popmap. See Fig. 3c-d, 5b, and S8.

8. Assembly alignment
A. File names: nucmer_aln_Ifem_1049_ragtag_Afem_1354_ragtag.qr1_filter.reformat.coords.gz, nucmer_aln_Ofem_0081_ragtag_Afem_1354_ragtag.qr1_filter.reformat.coords.gz, nucmer_aln_Afem_Isen_Afem_Iele.qr1_filter.reformat.coords.gz, nucmer_aln_Ofem_Isen_Afem_Iele.qr1_filter.reformat.coords.gz, karyotype_AI_RagTag.csv, karyotype_AO_RagTag.csv, karyotype_AIsen_AIele.cs, karyotype_OIsen_AIele.csv
B. Description: Assembly alignments using nucmer v 4.0.0 (https://github.com/mummer4/mummer) and contig synteny for plotting using RIdeogram v 0.2.2 (https://cran.r-project.org/web/packages/RIdeogram/vignettes/RIdeogram.html) in R v 4.2.2 (https://www.r-project.org/). The A morph assembly of I. elegans was aligned to the I and O morph assemblies of I. elegans and to the A and O-like assemblies of I. senegalensis. See Fig. 4a, 5c.

9. Genotyping the Darwin Tree of Life assembly
A. File names: nucmer_aln_Afem_ragtag_ToL-haplotigs.qr1_filter.reformat.coords.gz, nucmer_aln_Afem_ragtag_ToL-primary.qr1_filter.reformat.coords.gz, ToL_500_norepeat.regions.bed.gz, karyotype_AToL_13_unloc_RagTag.csv, karyotype_AToL_RagTag_haplotigs.csv
B. Description: To genotype the DToL reference assembly of I. elegans, we estimated read-depth coverage of the DToL long-read Pacbio data mapped to the A morph assembly of I. elegans generated in this study, and aligned the A morph assembly to both the primary DToL assembly and to the purged haplotigs. Read depth was estimated in mosdepth v 0.2.8 (https://github.com/brentp/mosdepth) and assembly alignments were conducted using nucmer v 4.0.0 (https://github.com/mummer4/mummer). See Fig. S3.

10. SV calling
A. File names: A_to_A.bam, A_to_A.bam.bai, A_to_I.bam, A_to_I.bam.bai, A_to_O.bam, A_to_O.bam.bai, A_to_ToL_2mb.bam, A_to_ToL_2mb.bam.bai, I_to_A.bam, I_to_A.bam.bai, I_to_I.bam, I_to_I.bam.bai, I_to_O.bam, I_to_O.bam.bai, I_to_ToL_2mb.bam, I_to_ToL_2mb.bam.bai, O_to_A.bam, O_to_A.bam.bai, O_to_I.bam, O_to_I.bam.bai, O_to_O.bam, O_to_O.bam.bai, O_to_ToL_2mb.bam, O_to_ToL_2mb.bam.bai
B. Description: mergede alignements of resequencing samples (n = 19 per morph) to alternative reference assemblies (A, I, O, and DToL) for I. elegans. The alignments have been filtered by quality and to contain only the unlocalized scaffold 2 of chromosome 13, which includes the morph locus. These files were used to call morph-specific structural variants using samplot v 1.3.0 (https://github.com/ryanlayer/samplot). See Extended Data Figs 2, 7, and Fig. S5-S6.

11. Mapping of inversion breakpoint reads
A. File names: AvO_3K.tsv.gz, AvO_22K.tsv.gz, AvO_sen_3K.tsv.gz, AvO_sen_22K.tsv.gz, IvO_3K.tsv.gz
B. Description: Signatures of an inversion with breakpoints at ~ 3 kb and ~ 22 kb of the unlocalized scaffold 2 of chromosome 13 on the O assembly were found in A and I resequencing samples of I. elegans and in poolseq samples of A females of I. senegalensis. We queried the reads mapping to the inversion breakpoints and then tabulated their mapping locations of the A morph assembly of I. elegans (Fig. 6 and Extended Data Fig. 3, 7b-c). For the first inversion breakpoint, we also mapped reads on the I morph assembly of I. elegans (Fig. S12).

12. Evidence of translocation in I
A. File names: Ifem_nano_SUPER_13_unloc_2.bam, Ifem_nano_SUPER_13_unloc_2.bam.bai
B. Description: Long-read Nanopore data of a I morph female of I. elegans mapped to the A morph of I. elegans and filtered to contain the entire unlocalized scaffold 2 of chromosome 13. Read mapping was conducted in minimap2 v 2.22-r1110 (https://github.com/lh3/minimap2) and used to identify a translocation signature in the I morph, relative to the A morph of I. elegans. See Extended Data Fig. 6.

13. PCA output
A. File names: A1354_all.eigenval, A1354_all.eigenvec, I1049_all.eigenval, I1049_all.eigenvec
B. Description: Eigenvectors and eigenvalues of PCA analyses of population structure between morphs of I. elegans. PCA analysis were conducted on morph locus, using either the A morph or the I morph assembly as mapping reference in PLINK v 1.9 (http://pngu.mgh.harvard.edu/purcell/plink/). See Fig. S4.

14. Linkage disequilibrium
A. File names: A1354_SUPER_1_allr.ld.gz, A1354_SUPER_2_allr.ld.gz, A1354_SUPER_3_allr.ld.gz, A1354_SUPER_4_allr.ld.gz, A1354_SUPER_5_allr.ld.gz, A1354_SUPER_6_allr.ld.gz, A1354_SUPER_7_allr.ld.gz, A1354_SUPER_8_allr.ld.gz, A1354_SUPER_9_allr.ld.gz, A1354_SUPER_10_allr.ld.gz, A1354_SUPER_11_allr.ld.gz, A1354_SUPER_12_allr.ld.gz, A1354_SUPER_13_allr.ld.gz, A1354_SUPER_13_unloc_1_allr.ld.gz, A1354_SUPER_13_unloc_2_allr.ld.gz, A1354_SUPER_13_unloc_3_allr.ld.gz, A1354_SUPER_13_unloc_4_allr.ld.gz, A1354_SUPER_X_allr.ld.gz
B. Description: Estimates of recombination rate (R2) between SNPs across the first 15 mb of each chromosome and unlocalized segments of chromosome 13 of
d
105-year crime situation and its analysis - 2016 crime trend key report
data.gov.tw
json
Updated Jun 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Academy for the Judiciary, MOJ (2025). 105-year crime situation and its analysis - 2016 crime trend key report [Dataset]. https://data.gov.tw/en/datasets/80191
Explore at:
jsonAvailable download formats
Dataset updated
Jun 1, 2025
Dataset authored and provided by
Academy for the Judiciary, MOJ
License
https://data.gov.tw/licensehttps://data.gov.tw/license
Description
Since 1973, the Ministry of Justice's "Criminal Research Center" has annually compiled the book "Crime Status and Its Analysis," which consolidates important statistical data on the government's handling of criminal cases and provides explanatory text. Due to its long history and detailed content, it has been an important reference for academia in the study of criminal policy and criminology, as well as a crucial reference for the practical understanding of the overall crime issues within the country and the formulation of relevant crime prevention strategies. In order to enhance the depth and breadth of research and analysis in "Crime Status and Its Analysis," it has gradually aligned with international crime prevention research. This study takes into account the statistical systems and content of advanced countries to address the crime situation in Taiwan in 2016 from the perspective of criminal policy and criminology. Through systematic collection and analysis of government statistical data, the study aims to achieve four main objectives: (1) strengthen the international orientation and communication aspect; (2) deepen the depth of research and analysis, in line with societal needs; (3) enhance data and chart interpretation tools to promote research and analysis functions; (4) propose specific policy recommendations as references for government administration.
m
Corona-virus disease (COVID-19) Data-set with Improved Measurement Errors of...
data.mendeley.com
Updated May 4, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Afshin Ashofteh (2020). Corona-virus disease (COVID-19) Data-set with Improved Measurement Errors of Referenced Official Data Sources [Dataset]. http://doi.org/10.17632/nw5m4hs3jr.2
Explore at:
Unique identifier
https://doi.org/10.17632/nw5m4hs3jr.2
Dataset updated
May 4, 2020
Authors
Afshin Ashofteh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is the result of a study on the quality of official datasets available for COVID-19. We used comparative statistical analysis to evaluate the accuracy of data collection by a national (Chinese Center for Disease Control and Prevention) and two international (World Health Organization; European Centre for Disease Prevention and Control) organisations based on the value of systematic measurement errors. The data is collected by using text mining techniques and reviewing reports, metadata, and reference data. The combined dataset includes complete spatial data such as countries area, standard country codes (M49 code), Alpha-2 codes, Alpha-3 codes, latitude, longitude, and some additional attributes such as population. The data of China is presented in more detail in another sheet, which is extracted from the attached reports to the main page of the CCDC website. Additionally, it is beneficiary of major corrections on the referenced data-sets and official reports such as adjustment of the date of reports (which was suffering from one or two days lags), removing four negative values, detecting unreasonable changes of historical data in new reports (which was revealed by comparing the daily reports), and finally the corrections on systematic measurement errors, (which was increased by the increase of the number of infected countries). An aggregated root mean square error was used to identify the main problematic parts of data-sets in addition to comparative statistical analysis to evaluate the errors. The result is a combined dataset with improved systematic measurement errors and with some new attributes in addition to the normal attributes of SARS-CoV-2 and cronavirus disease, such as daily mortality, and fatality rates. This data-set could be considered as a comprehensive and reliable source of COVID-19 data for further studies.
f
Text patterns considered as PDB URLs.
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yi-Hung Huang; Peter W. Rose; Chun-Nan Hsu (2023). Text patterns considered as PDB URLs. [Dataset]. http://doi.org/10.1371/journal.pone.0136631.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0136631.t001
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Yi-Hung Huang; Peter W. Rose; Chun-Nan Hsu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Text patterns considered as PDB URLs.
Z
Tutorial Data Bundle for PyPSA-Eur: An Open Optimisation Model of the...
data.niaid.nih.gov
zenodo.org
Updated Jan 24, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Schlachtberger (2020). Tutorial Data Bundle for PyPSA-Eur: An Open Optimisation Model of the European Transmission System [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3517920
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
Jonas Hörsch
Fabian Hofmann
David Schlachtberger
Tom Brown
Fabian Neumann
Description
PyPSA-Eur is an open model dataset of the European power system at the transmission network level that covers the full ENTSO-E area. It can be built using the code provided at https://github.com/PyPSA/PyPSA-eur.

It contains alternating current lines at and above 220 kV voltage level and all high voltage direct current lines, substations, an open database of conventional power plants, time series for electrical demand and variable renewable generator availability, and geographic potentials for the expansion of wind and solar power.

Not all data dependencies are shipped with the code repository, since git is not suited for handling large changing files. Instead we provide separate data bundles to be downloaded and extracted as noted in the documentation.

This is the lightweight data bundle to be used for the PyPSA-Eur tutorial. It excludes large bathymetry and natural protection area datasets.

While the code in PyPSA-Eur is released as free software under the GPLv3, different licenses and terms of use apply to the various input data, which are summarised and linked below:

corine/*

CORINE Land Cover (CLC) database

Source: https://land.copernicus.eu/pan-european/corine-land-cover/clc-2012/

Extract from Terms of Use:

Access to data is based on a principle of full, open and free access as established by the Copernicus data and information policy Regulation (EU) No 1159/2013 of 12 July 2013. This regulation establishes registration and licensing conditions for GMES/Copernicus users and can be found here. Free, full and open access to this data set is made on the conditions that:

When distributing or communicating Copernicus dedicated data and Copernicus service information to the public, users shall inform the public of the source of that data and information.

Users shall make sure not to convey the impression to the public that the user's activities are officially endorsed by the Union.

Where that data or information has been adapted or modified, the user shall clearly state this.

The data remain the sole property of the European Union. Any information and data produced in the framework of the action shall be the sole property of the European Union. Any communication and publication by the beneficiary shall acknowledge that the data were produced “with funding by the European Union”.

https://land.copernicus.eu/pan-european/corine-land-cover/clc-2012?tab=metadata

eez/*

World exclusive economic zones (EEZ)

Source: http://www.marineregions.org/sources.php#unioneezcountry

Extract from Terms of Use:

Marine Regions’ products are licensed under CC-BY-NC-SA. Please contact us for other uses of the Licensed Material beyond license terms. We kindly request our users not to make our products available for download elsewhere and to always refer to marineregions.org for the most up-to-date products and services.

http://www.marineregions.org/disclaimer.php

naturalearth/*

World country shapes

Source: https://www.naturalearthdata.com/downloads/10m-cultural-vectors/10m-admin-0-countries/

Extract from Terms of Use:

All versions of Natural Earth raster + vector map data found on this website are in the public domain. You may use the maps in any manner, including modifying the content and design, electronic dissemination, and offset printing. The primary authors, Tom Patterson and Nathaniel Vaughn Kelso, and all other contributors renounce all financial claim to the maps and invites you to use them for personal, educational, and commercial purposes.

No permission is needed to use Natural Earth. Crediting the authors is unnecessary.

http://www.naturalearthdata.com/about/terms-of-use/

NUTS_2013_60M_SH/*

Europe NUTS3 regions

Source: https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units

Extract from Terms of Use:

In addition to the general copyright and licence policy applicable to the whole Eurostat website, the following specific provisions apply to the datasets you are downloading. The download and usage of these data is subject to the acceptance of the following clauses:

The Commission agrees to grant the non-exclusive and not transferable right to use and process the Eurostat/GISCO geographical data downloaded from this page (the "data").

The permission to use the data is granted on condition that: the data will not be used for commercial purposes; the source will be acknowledged. A copyright notice, as specified below, will have to be visible on any printed or electronic publication using the data downloaded from this page.

https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units

https://ec.europa.eu/eurostat/about/policies/copyright

ch_cantons.csv

Mapping between Swiss Cantons and NUTS3 regions

Source: https://en.wikipedia.org/wiki/Data_codes_for_Switzerland

Extract from Terms of Use:

Creative Commons Attribution-ShareAlike 3.0 Unported License

https://en.wikipedia.org/wiki/Data_codes_for_Switzerland

EIA_hydro_generation_2000_2014.csv

Hydroelectricity generation per country and year

Source: https://www.eia.gov/beta/international/data/browser/#/?pa=000000000000000000000000000000g&c=1028i008006gg6168g80a4k000e0ag00gg0004g800ho00g8&ct=0&ug=8&tl_id=2-A&vs=INTL.33-12-ALB-BKWH.A&cy=2014&vo=0&v=H&start=2000&end=2016

Extract from Terms of Use:

Public domain and use of EIA content: U.S. government publications are in the public domain and are not subject to copyright protection. You may use and/or distribute any of our data, files, databases, reports, graphs, charts, and other information products that are on our website or that you receive through our email distribution service. However, if you use or reproduce any of our information products, you should use an acknowledgment, which includes the publication date, such as: "Source: U.S. Energy Information Administration (Oct 2008)."

https://www.eia.gov/about/copyrights_reuse.php

hydro_capacities.csv

Hydroelectricity generation and storage capacities

Source:

A. Kies, K. Chattopadhyay, L. von Bremen, E. Lorenz, D. Heinemann, RESTORE 2050 Work Package Report D12: Simulation of renewable feed-in for power system studies., Tech. rep., RESTORE 2050 (2016).

B. Pfluger, F. Sensfuß, G. Schubert, J. Leisentritt, Tangible ways towards climate protection in the European Union (EU Long-term scenarios 2050), Fraunhofer ISI. https://www.isi.fraunhofer.de/content/dam/isi/dokumente/ccx/2011/Final_Report_EU-Long-term-scenarios-2050.pdf

je-e-21.03.02.xls

Population and GDP data for Swiss Cantons

Source: https://www.bfs.admin.ch/bfs/en/home/news/whats-new.assetdetail.7786557.html

Extract from Terms of Use:

Information on the websites of the Federal Authorities is accessible to the public. Downloading, copying or integrating content (texts, tables, graphics, maps, photos or any other data) does not entail any transfer of rights to the content.

Copyright and any other rights relating to content available on the websites of the Federal Authorities are the exclusive property of the Federal Authorities or of any other expressly mentioned owners.

Any reproduction requires the prior written consent of the copyright holder. The source of the content (statistical results) should always be given. Anyone who intends on using statistical results for commercial purposes or gain must obtain an authorisation pursuant to Art. 13 of the Fee Ordinance and is liable to pay an indemnity. Please contact the FSO for this purpose.

https://www.bfs.admin.ch/bfs/en/home/fso/swiss-federal-statistical-office/terms-of-use.html

https://www.bfs.admin.ch/bfs/de/home/bfs/oeffentliche-statistik/copyright.html

nama_10r_3gdp.tsv.gz

Gross domestic product (GDP) by NUTS3 region

Source: http://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=nama_10r_3gdp&lang=

Extract from Terms of Use:

Eurostat has a policy of encouraging free re-use of its data, both for non-commercial and commercial purposes. All statistical data, metadata, content of web pages or other dissemination tools, official publications and other documents published on its website, with the exceptions listed below, can be reused without any payment or written licence provided that:

the source is indicated as Eurostat;

when re-use involves modifications to the data or text, this must be stated clearly to the end user of the information.

Exceptions

The permission granted above does not extend to any material whose copyright is identified as belonging to a third-party, such as photos or illustrations from copyright holders other than the European Union. In these circumstances, authorisation must be obtained from the relevant copyright holder(s).

Logos and trademarks are excluded from the above mentioned general permission, except if they are redistributed as an integral part of a Eurostat publication and if the publication is redistributed unchanged.

When reuse involves translations of publications or modifications to the data or text, this must be stated clearly to the end user of the information. A disclaimer regarding the non-responsibility of Eurostat shall be included.

https://ec.europa.eu/eurostat/about/policies/copyright

nama_10r_3popgdp.tsv.gz

Population by NUTS3 region

Source: http://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=nama_10r_3popgdp&lang=en

Extract from Terms of Use:

Eurostat has a policy of encouraging free re-use of its data, both for non-commercial and commercial purposes. All statistical data, metadata, content of web pages or other dissemination tools, official publications and other documents published on its website, with the exceptions listed below, can be reused without any payment or written licence provided
f
Datasets and code tables for research project: Metacognition-Intensive...
figshare.com
xlsx
Updated Jul 24, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oleg Gorfinkel; María Sobeida Leticia Blázquez-Morales; Roberto Lagunes-Córdoba (2018). Datasets and code tables for research project: Metacognition-Intensive Mindfulness training delivers superior clinical and quality-of-life outcomes via enhanced mindful self-regulation [Dataset]. http://doi.org/10.6084/m9.figshare.5929228.v8
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5929228.v8
Dataset updated
Jul 24, 2018
Dataset provided by
figshare
Authors
Oleg Gorfinkel; María Sobeida Leticia Blázquez-Morales; Roberto Lagunes-Córdoba
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The two provided Excel books contain:1. The anonymized, processed dataset supporting the main findings of the study (Sheet 1), along with its corresponding code table (Sheet 2). To create this dataset, the overall outcome measure data were subsetted to include only the values actually employed in the statistical analysis -- global pretest, posttest and difference scores for each of the main measures (mindfulness, wellbeing, emotional dysregulation, psychological distress and interpersonal communication performance). The procedures used for obtaining these summatories are described in the Methods/Data Processing and Statistical Analysis section of the manuscript, as well as the Measures subsection under Materials and Methods in the online supplement. For those variables that had undergone outlier adjustment, their original, non-winsorized values are included at the end of the dataset for reference. The reliability-corrected pretest values used for the ANCOVA and MANCOVA analyses are also provided. Complete, unprocessed pretest and posttest measures data are available from the corresponding author upon reasonable request.2. The anonymized, processed dataset containing responses to the Early Termination Questionnaire (Sheet 1), along with its corresponding code table (Sheet 2). The ETQ survey administered to those participants who did not complete the mindfulness intervention, to assess how interesting and useful they had found the program, whether they would like to retake and complete it in the future and their main reason for dropping out early.NOTE: The text for any relevant survey items and coded answers appears in English. Any free text responses by the participants, however, are given in their original Spanish version.

NSW Administrative Boundaries Theme - ABS Regional Boundaries Local...

data.nsw.gov.au

arcgis rest service

Updated Jul 11, 2025

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Spatial Services (DCS) (2025). NSW Administrative Boundaries Theme - ABS Regional Boundaries Local Government Area [Dataset]. https://data.nsw.gov.au/data/dataset/1-5c16aa3bdf944b6d81e04f423696c158

Explore at:

arcgis rest serviceAvailable download formats

Dataset updated

Jul 11, 2025

Dataset provided by

Spatial Services (DCS)

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered

New South Wales

Description

Export Data Access API

NSW Administrative Boundaries Theme – Australian Bureau of Statistics Regional Boundaries – Local Government Area

Please Note

WGS 84 service aligned to GDA94
This dataset has spatial reference [WGS 84 ≈ GDA94] which may result in misalignments when viewed in GDA2020 environments. A similar service with a ‘multiCRS’ suffix is available which can support GDA2020, GDA94 and WGS 84 ≈ GDA2020 environments.

In due course, and allowing time for user feedback and testing, it is intended that the original service name will adopt the new multiCRS functionality

Metadata Portal Metadata Information

Content Title	NSW Administrative Boundaries Theme - ABS Regional Boundaries Local Government Area
Content Type	Hosted Feature Layer
Description	Australian Bureau of Statistics (ABS) Statistical Geographical Standard Boundaries Suburb divides an area of interest throughout the state of NSW on which statistics are collected for purposes under the Census and Statistics Act 1905 (Cth). The Australian Statistical Geography Standard (ASGS) brings together in one framework all of the regions which the Australian Bureau of Statistics (ABS) and many other organisations use to collect, release and analyse geographically classified statistics. The ASGS ensures that these statistics are comparable and geospatially integrated and provides users with a coherent set of standard regions so that they can access, visualise, analyse and understand statistics. The 2016 ASGS will be used for the 2016 Census of Population and Housing and progressively introduced into other ABS data collections. The ABS encourages the use of the ASGS by other organisations to improve the comparability and usefulness of statistics generally, and in analysis and visualisation of statistical and other data. The ABS Structures are a hierarchy of regions developed for the release of ABS statistical information. The main components are as follows: Statistical Areas Level 1 Statistical Areas Level 2 Statistical Areas Level 3 Statistical Areas Level 4 Regional Boundaries (Local Government Area, Suburb) The Australian Bureau of Statistics Geographical Standard Boundaries - Statistical Areas are used to define geographical areas to support statistical and socio-economic analysis at a state and regional scale. They are useful for analytical purposes within statistical boundaries through the aggregation of a wide swath of data and information. The ABS maintains the Australian Statistical Geography Standard (ASGS) and the Australian Standard Geographical Classification (ASGC) for pre-2011 census information. In addition to the NSW Administrative Boundaries Theme Australian Bureau of Statistics also provides this data via a web service direct from ABS. Further standards, specifications and classifications can be found at: Australian Bureau of Statistics Standards Australian Bureau of Statistics Classifications The regions defined in the ABS Structures will not change until the next Census in 2021. The Non-ABS Structures are updated only when the ABS considers that there are major changes to the administrative boundaries they represent.
Initial Publication Date	05/02/2020
Data Currency	01/01/3000
Data Update Frequency	Other
Content Source	API
File Type	Map Feature Service
Attribution	© State of New South Wales (Spatial Services, a business unit of the Department of Customer Service NSW). For current information go to spatial.nsw.gov.au.
Data Theme, Classification or Relationship to other Datasets	NSW Administrative Boundaries Theme of the Foundation Spatial Data Framework (FSDF)
Accuracy	The dataset maintains a positional relationship to, and alignment with, the Lot and Property digital datasets. This dataset was captured by digitising the best available cadastral mapping at a variety of scales and accuracies, ranging from 1:500 to 1:250 000 according to the National Mapping Council of Australia, Standards of Map Accuracy (1975). Therefore, the position of the feature instance will be within 0.5mm at map scale for 90% of the well-defined points. That is, 1:500 = 0.25m, 1:2000 = 1m, 1:4000 = 2m, 1:25000 = 12.5m, 1:50000 = 25m and 1:100000 = 50m. A program to upgrade the spatial location and accuracy of data is ongoing.
Spatial Reference System (dataset)	GDA94
Spatial Reference System (web service)	EPSG:3857
WGS84 Equivalent To	GDA94
Spatial Extent	Full State
Content Lineage	For additional

Facebook

Twitter

Click to copy link

Link copied

Cite

Heather A. Piwowar; Todd J. Vision (2013). Data reuse and the open data citation advantage [Dataset]. http://doi.org/10.5061/dryad.781pv

Data from: Data reuse and the open data citation advantage

Explore at:

3 scholarly articles cite this dataset (View in Google Scholar)

zipAvailable download formats

Unique identifier

https://doi.org/10.5061/dryad.781pv

Dataset updated

Oct 1, 2013

Dataset provided by

Dryad

Authors

Heather A. Piwowar; Todd J. Vision

Time period covered

2013

Description

Background: Attribution to the original contributor upon reuse of published data is important both as a reward for data creators and to document the provenance of research findings. Previous studies have found that papers with publicly available datasets receive a higher number of citations than similar studies without available data. However, few previous analyses have had the statistical power to control for the many variables known to predict citation rate, which has led to uncertain estimates of the "citation benefit". Furthermore, little is known about patterns in data reuse over time and across datasets. Method and Results: Here, we look at citation rates while controlling for many known citation predictors, and investigate the variability of data reuse. In a multivariate regression on 10,555 studies that created gene expression microarray data, we found that studies that made data available in a public repository received 9% (95% confidence interval: 5% to 13%) more citations th...

Clear search

Close search

Google apps

Main menu

Data from: Data reuse and the open data citation advantage

Statistical information of data set D2.

Data from: Open Data engages Citation and Reuse: A Follow-up Study on...

Statistical Abstract of the United States, 2011

English Monograph OCR Dataset (Preprocessed) 📄🔍

📌 About the Dataset

🚀 Use Cases

📊 Dataset Statistics

📜 Citation

Statistics of dataset D2.

Statistical daily streamflow estimates at GAGES-II non-reference streamgages...

Data from: Reference Site Condition Datasets for Floating Wind Arrays in the...

Portable pseudo-random reference sequences with Mersenne Twister using GNU...

‘Port Statistical Area’ analyzed by Analyst-2

Statistics of dataset D1.

Statistical Abstract of the United States, 2002

Data used in the manuscript - A Hierarchical Approach for Evaluating Athlete...

Data from: The genomics and evolution of inter-sexual mimicry and...

105-year crime situation and its analysis - 2016 crime trend key report

Corona-virus disease (COVID-19) Data-set with Improved Measurement Errors of...

Text patterns considered as PDB URLs.

Tutorial Data Bundle for PyPSA-Eur: An Open Optimisation Model of the...

Datasets and code tables for research project: Metacognition-Intensive...

NSW Administrative Boundaries Theme - ABS Regional Boundaries Local...

Data from: Data reuse and the open data citation advantage