Excel spreadsheets by species (4 letter code is abbreviation for genus and species used in study, year 2010 or 2011 is year data collected, SH indicates data for Science Hub, date is date of file preparation). The data in a file are described in a read me file which is the first worksheet in each file. Each row in a species spreadsheet is for one plot (plant). The data themselves are in the data worksheet. One file includes a read me description of the column in the date set for chemical analysis. In this file one row is an herbicide treatment and sample for chemical analysis (if taken). This dataset is associated with the following publication: Olszyk , D., T. Pfleeger, T. Shiroyama, M. Blakely-Smith, E. Lee , and M. Plocher. Plant reproduction is altered by simulated herbicide drift toconstructed plant communities. ENVIRONMENTAL TOXICOLOGY AND CHEMISTRY. Society of Environmental Toxicology and Chemistry, Pensacola, FL, USA, 36(10): 2799-2813, (2017).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Raw data outputs 1-18 Raw data output 1. Differentially expressed genes in AML CSCs compared with GTCs as well as in TCGA AML cancer samples compared with normal ones. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 2. Commonly and uniquely differentially expressed genes in AML CSC/GTC microarray and TCGA bulk RNA-seq datasets. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 3. Common differentially expressed genes between training and test set samples the microarray dataset. This data was generated based on the results of AML microarray data analysis. Raw data output 4. Detailed information on the samples of the breast cancer microarray dataset (GSE52327) used in this study. Raw data output 5. Differentially expressed genes in breast CSCs compared with GTCs as well as in TCGA BRCA cancer samples compared with normal ones. Raw data output 6. Commonly and uniquely differentially expressed genes in breast cancer CSC/GTC microarray and TCGA BRCA bulk RNA-seq datasets. This data was generated based on the results of breast cancer microarray and TCGA BRCA data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 7. Differential and common co-expression and protein-protein interaction of genes between CSC and GTC samples. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 8. Differentially expressed genes between AML dormant and active CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 9. Uniquely expressed genes in dormant or active AML CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 10. Intersections between the targeting transcription factors of AML key CSC genes and differentially expressed genes between AML CSCs vs GTCs and between dormant and active AML CSCs or the uniquely expressed genes in either class of CSCs. Raw data output 11. Targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 12. CSC-specific targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 13. The protein-protein interactions between AML key CSC genes with themselves and their targeting transcription factors. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. Raw data output 14. The previously confirmed associations of genes having the highest targeting desirableness and CSC-specific targeting desirableness scores with AML or other cancers’ (stem) cells as well as hematopoietic stem cells. These data were generated based on a PubMed database-based literature mining. Raw data output 15. Drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 16. CSC-specific drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 17. Candidate drugs for experimental validation. These drugs were selected based on their respective (CSC-specific) drug scores. CSC is the abbreviation of cancer stem cell. Raw data output 18. Detailed information on the samples of the AML microarray dataset GSE30375 used in this study.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In "Sample Student Data", there are 6 sheets. There are three sheets with sample datasets, one for each of the three different exercise protocols described (CrP Sample Dataset, Glycolytic Dataset, Oxidative Dataset). Additionally, there are three sheets with sample graphs created using one of the three datasets (CrP Sample Graph, Glycolytic Graph, Oxidative Graph). Each dataset and graph pairs are from different subjects. · CrP Sample Dataset and CrP Sample Graph: This is an example of a dataset and graph created from an exercise protocol designed to stress the creatine phosphate system. Here, the subject was a track and field athlete who threw the shot put for the DeSales University track team. The NIRS monitor was placed on the right triceps muscle, and the student threw the shot put six times with a minute rest in between throws. Data was collected telemetrically by the NIRS device and then downloaded after the student had completed the protocol. · Glycolytic Dataset and Glycolytic Graph: This is an example of a dataset and graph created from an exercise protocol designed to stress the glycolytic energy system. In this example, the subject performed continuous squat jumps for 30 seconds, followed by a 90 second rest period, for a total of three exercise bouts. The NIRS monitor was place on the left gastrocnemius muscle. Here again, data was collected telemetrically by the NIRS device and then downloaded after he had completed the protocol. · Oxidative Dataset and Oxidative Graph: In this example, the dataset and graph are from an exercise protocol designed to stress the oxidative system. Here, the student held a sustained, light-intensity, isometric biceps contraction (pushing against a table). The NIRS monitor was attached to the left biceps muscle belly. Here, data was collected by a student observing the SmO2 values displayed on a secondary device; specifically, a smartphone with the IPSensorMan APP displaying data. The recorder student observed and recorded the data on an Excel Spreadsheet, and marked the times that exercise began and ended on the Spreadsheet.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Sample data for exercises in Further Adventures in Data Cleaning.
The documentation covers Enterprise Survey panel datasets that were collected in Slovenia in 2009, 2013 and 2019.
The Slovenia ES 2009 was conducted between 2008 and 2009. The Slovenia ES 2013 was conducted between March 2013 and September 2013. Finally, the Slovenia ES 2019 was conducted between December 2018 and November 2019. The objective of the Enterprise Survey is to gain an understanding of what firms experience in the private sector.
As part of its strategic goal of building a climate for investment, job creation, and sustainable growth, the World Bank has promoted improving the business environment as a key strategy for development, which has led to a systematic effort in collecting enterprise data across countries. The Enterprise Surveys (ES) are an ongoing World Bank project in collecting both objective data based on firms' experiences and enterprises' perception of the environment in which they operate.
National
The primary sampling unit of the study is the establishment. An establishment is a physical location where business is carried out and where industrial operations take place or services are provided. A firm may be composed of one or more establishments. For example, a brewery may have several bottling plants and several establishments for distribution. For the purposes of this survey an establishment must take its own financial decisions and have its own financial statements separate from those of the firm. An establishment must also have its own management and control over its payroll.
As it is standard for the ES, the Slovenia ES was based on the following size stratification: small (5 to 19 employees), medium (20 to 99 employees), and large (100 or more employees).
Sample survey data [ssd]
The sample for Slovenia ES 2009, 2013, 2019 were selected using stratified random sampling, following the methodology explained in the Sampling Manual for Slovenia 2009 ES and for Slovenia 2013 ES, and in the Sampling Note for 2019 Slovenia ES.
Three levels of stratification were used in this country: industry, establishment size, and oblast (region). The original sample designs with specific information of the industries and regions chosen are included in the attached Excel file (Sampling Report.xls.) for Slovenia 2009 ES. For Slovenia 2013 and 2019 ES, specific information of the industries and regions chosen is described in the "The Slovenia 2013 Enterprise Surveys Data Set" and "The Slovenia 2019 Enterprise Surveys Data Set" reports respectively, Appendix E.
For the Slovenia 2009 ES, industry stratification was designed in the way that follows: the universe was stratified into manufacturing industries, services industries, and one residual (core) sector as defined in the sampling manual. Each industry had a target of 90 interviews. For the manufacturing industries sample sizes were inflated by about 17% to account for potential non-response cases when requesting sensitive financial data and also because of likely attrition in future surveys that would affect the construction of a panel. For the other industries (residuals) sample sizes were inflated by about 12% to account for under sampling in firms in service industries.
For Slovenia 2013 ES, industry stratification was designed in the way that follows: the universe was stratified into one manufacturing industry, and two service industries (retail, and other services).
Finally, for Slovenia 2019 ES, three levels of stratification were used in this country: industry, establishment size, and region. The original sample design with specific information of the industries and regions chosen is described in "The Slovenia 2019 Enterprise Surveys Data Set" report, Appendix C. Industry stratification was done as follows: Manufacturing – combining all the relevant activities (ISIC Rev. 4.0 codes 10-33), Retail (ISIC 47), and Other Services (ISIC 41-43, 45, 46, 49-53, 55, 56, 58, 61, 62, 79, 95).
For Slovenia 2009 and 2013 ES, size stratification was defined following the standardized definition for the rollout: small (5 to 19 employees), medium (20 to 99 employees), and large (more than 99 employees). For stratification purposes, the number of employees was defined on the basis of reported permanent full-time workers. This seems to be an appropriate definition of the labor force since seasonal/casual/part-time employment is not a common practice, except in the sectors of construction and agriculture.
For Slovenia 2009 ES, regional stratification was defined in 2 regions. These regions are Vzhodna Slovenija and Zahodna Slovenija. The Slovenia sample contains panel data. The wave 1 panel “Investment Climate Private Enterprise Survey implemented in Slovenia” consisted of 223 establishments interviewed in 2005. A total of 57 establishments have been re-interviewed in the 2008 Business Environment and Enterprise Performance Survey.
For Slovenia 2013 ES, regional stratification was defined in 2 regions (city and the surrounding business area) throughout Slovenia.
Finally, for Slovenia 2019 ES, regional stratification was done across two regions: Eastern Slovenia (NUTS code SI03) and Western Slovenia (SI04).
Computer Assisted Personal Interview [capi]
Questionnaires have common questions (core module) and respectfully additional manufacturing- and services-specific questions. The eligible manufacturing industries have been surveyed using the Manufacturing questionnaire (includes the core module, plus manufacturing specific questions). Retail firms have been interviewed using the Services questionnaire (includes the core module plus retail specific questions) and the residual eligible services have been covered using the Services questionnaire (includes the core module). Each variation of the questionnaire is identified by the index variable, a0.
Survey non-response must be differentiated from item non-response. The former refers to refusals to participate in the survey altogether whereas the latter refers to the refusals to answer some specific questions. Enterprise Surveys suffer from both problems and different strategies were used to address these issues.
Item non-response was addressed by two strategies: a- For sensitive questions that may generate negative reactions from the respondent, such as corruption or tax evasion, enumerators were instructed to collect the refusal to respond as (-8). b- Establishments with incomplete information were re-contacted in order to complete this information, whenever necessary. However, there were clear cases of low response.
For 2009 and 2013 Slovenia ES, the survey non-response was addressed by maximizing efforts to contact establishments that were initially selected for interview. Up to 4 attempts were made to contact the establishment for interview at different times/days of the week before a replacement establishment (with similar strata characteristics) was suggested for interview. Survey non-response did occur but substitutions were made in order to potentially achieve strata-specific goals. Further research is needed on survey non-response in the Enterprise Surveys regarding potential introduction of bias.
For 2009, the number of contacted establishments per realized interview was 6.18. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The relatively low ratio of contacted establishments per realized interview (6.18) suggests that the main source of error in estimates in the Slovenia may be selection bias and not frame inaccuracy.
For 2013, the number of realized interviews per contacted establishment was 25%. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The number of rejections per contact was 44%.
Finally, for 2019, the number of interviews per contacted establishments was 9.7%. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The share of rejections per contact was 75.2%.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
To create the dataset, the top 10 countries leading in the incidence of COVID-19 in the world were selected as of October 22, 2020 (on the eve of the second full of pandemics), which are presented in the Global 500 ranking for 2020: USA, India, Brazil, Russia, Spain, France and Mexico. For each of these countries, no more than 10 of the largest transnational corporations included in the Global 500 rating for 2020 and 2019 were selected separately. The arithmetic averages were calculated and the change (increase) in indicators such as profitability and profitability of enterprises, their ranking position (competitiveness), asset value and number of employees. The arithmetic mean values of these indicators for all countries of the sample were found, characterizing the situation in international entrepreneurship as a whole in the context of the COVID-19 crisis in 2020 on the eve of the second wave of the pandemic. The data is collected in a general Microsoft Excel table. Dataset is a unique database that combines COVID-19 statistics and entrepreneurship statistics. The dataset is flexible data that can be supplemented with data from other countries and newer statistics on the COVID-19 pandemic. Due to the fact that the data in the dataset are not ready-made numbers, but formulas, when adding and / or changing the values in the original table at the beginning of the dataset, most of the subsequent tables will be automatically recalculated and the graphs will be updated. This allows the dataset to be used not just as an array of data, but as an analytical tool for automating scientific research on the impact of the COVID-19 pandemic and crisis on international entrepreneurship. The dataset includes not only tabular data, but also charts that provide data visualization. The dataset contains not only actual, but also forecast data on morbidity and mortality from COVID-19 for the period of the second wave of the pandemic in 2020. The forecasts are presented in the form of a normal distribution of predicted values and the probability of their occurrence in practice. This allows for a broad scenario analysis of the impact of the COVID-19 pandemic and crisis on international entrepreneurship, substituting various predicted morbidity and mortality rates in risk assessment tables and obtaining automatically calculated consequences (changes) on the characteristics of international entrepreneurship. It is also possible to substitute the actual values identified in the process and following the results of the second wave of the pandemic to check the reliability of pre-made forecasts and conduct a plan-fact analysis. The dataset contains not only the numerical values of the initial and predicted values of the set of studied indicators, but also their qualitative interpretation, reflecting the presence and level of risks of a pandemic and COVID-19 crisis for international entrepreneurship.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Peer-to-peer (P2P) energy sharing involves novel technologies and business models at the demand-side of power systems, which is able to manage the increasing connection of distributed energy resources (DERs). In P2P energy sharing, prosumers directly trade energy with each other to achieve a win-win outcome. A research paper titled "Evaluation of peer-to-peer energy sharing mechanisms based on a multiagent simulation framework" has been published on Applied Energy regarding this topic. In the paper, a general multiagent framework was established to simulate P2P energy sharing, with two original techniques proposed to facilitate simulation convergence. Furthermore, a systematic index system was established to evaluate P2P energy sharing mechanisms from both economic and technical perspectives.In case studies of the paper, two sets of cases were conducted to validate the proposed simulation and evaluation methods and to give some practical implications on applying P2P energy sharing in Great Britain (GB) at present and in the future. The household demand dataset and electric vehicle (EV) dataset used in the paper has been provided for researchers to reproduce the results in the paper or to conduct further related studies. Also, the original numerical data of the results in the case studies of the paper have been provided, for researchers to better understand the results or to use the results for other purposes.The whole dataset includes 9 excel files in total. The detailed description for them are presented as follows:1. “CREST_Demand_Model_v2.2 (Great Britain).xlsm” is a high-resolution stochastic integrated thermal-electrical domestic demand simulation tool developed by Centre for Renewable Energy Systems Technology (CREST) of Loughborough University (refering to http://www.lboro.ac.uk/research/crest/demand-model/). It contains a lot of sheets and VBA codes, which are used to generate “fake” demand curves of domestic customers sampled from statistical distributions that are based on real-life data. In the “Main Sheet”, input parameters like “day of month”, “month of year”, “latitude”, “longitude”, etc. can be entered, and then the “Run simulation” button can be clicked to start the simulation. After the simulation, daily curves like “occupancy and activity”, “total electrical demand”, “total gas demand”, etc. are generated and visualized, with very high time resolution.2. “Electric_Vehicle_Dataset (Great Britain).xlsx” is a dataset based on the research conducted jointly by Centre for Integrated Renewable Energy of Cardiff University and Key Laboratory of Smart Grid of Ministry of Education of Tianjin University (referring to https://doi.org/10.1016/j.apenergy.2015.10.159). It contains two sheets, which provide the parameters of 1000 typical electric vehicles of Great Britain respectively. For each electric vehicle, the parameters include: (1) “Time starting charging / returning home (hour)”, (2) “Time finishing charging / leaving home (hour)”, (3) “Battery capacity (kWh)”, (4) “Energy consumption due to travel (measured by SOC)”, (5) “Lowerlimit of SOC”, (6) “Upperlimit of SOC”, (7) “Maximum charging/discharging power”, (8) “Charging efficiency”, and (9) “Discharging efficiency”.3. “Numerical results and figures _ Case 1-1.xlsx” provides the numerical results of Case 1-1 of the paper. It contains three sheets, providing the data behind Fig. 6, Fig. 7 and Fig. 8 of the paper respectively. In the “Fig. 6” sheet, the “Total Net Consumption (kWh)” and “Total PV Generation (kWh)” under “SDR mechanism” and “conventional paradigm” are provided. In the “Fig. 7” sheet, the “Net energy cost under SDR mechanism (£)” and “Net energy cost under conventional paradigm (£)” of each prosumer are provided. In the “Fig. 8” sheet, the “Internal selling price (£/MWh)”, “Internal buying price (£/MWh)” and “Total Net Energy Cost (£)” of each iteration are provided.4. “Numerical results and figures _ Case 1-2.xlsx” provides the numerical results of Case 1-2 of the paper. It contains two sheets, providing the data behind Fig. 9, Fig. 10 and Fig. 11 of the paper. In the “Fig. 9 and 10” sheet, for Fig. 9, the “The iteration at which the simulation stopped” given different ramping rates are provided; for Fig. 10, the “Overall Performance Index” with different ramping rates given different demand profiles are provided. In the “Fig. 11” sheet, the “Total net energy cost (ramping rate = 0.3) (£)” and “Total Net Energy Cost (ramping rate = 0.6) (£)” at each iteration are provided.5. “Numerical results and figures _ Case 1-3.xlsx” provides the numerical results of Case 1-3 of the paper. It contains only one sheets, providing the data behind Fig. 12 of the paper. In the “Fig. 12” sheet, the “Overall Performance Index” with different learning rates given different demand profiles are provided.6. “Numerical results and figures _ Case 1-4.xlsx” provides the numerical results of Case 1-4 of the paper. It contains two sheets, providing the data behind Fig. 13 and Fig. 14 of the paper. In the “Fig. 13” sheet, the “Overall Performance Index” with different ramping rates given different initial values are provided. In the “Fig. 14” sheet, the “Overall Performance Index” with different learning rates given different initial values are provided.7. “Numerical results and figures _ Case 1-5.xlsx” provides the numerical results of Case 1-5 of the paper. It contains only one sheet, providing the data behind Fig. 15 and Fig. 16 of the paper. In the “Fig. 15 and 16” sheet, for Fig. 15, the number of iterations when the simulation stopped given different maximum number of iterations and ramping rates are provided; for Fig. 16, the overall performance given different maximum number of iterations and ramping rates are provided.8. “Numerical results and figures _ Case 2-2.xlsx” provides the numerical results of Case 2-2 of the paper. It contains only one sheet, providing the data behind Fig. 17 of the paper. In the “Fig. 17” sheet, the overall performance scores of the three mechanisms (SDR, MMR and BS) and conventional paradigm in scenarios with different PV and EV penetration levels are provided.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was derived by the Bioregional Assessment Programme. The parent datasets are identified in the Lineage statement in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.
This dataset comprises of interpreted elevation surfaces and contours for the major Triassic and Upper Permian units of the Galilee Geological Basin.
This dataset was created to provide formation extents for aquifers in the Galilee geological basin
A Quality Assurance (QA) and validation process was conducted on the original well and bore data to choose wells/bores that are within 25 kilometres of the BA Galilee Region extent.
The QA/Validation process is as follows:
Well data
a. Obtained excel file "QPED_July_2013_galilee.xlsx" from GA
b. Based on stratigraphic information in "BH_costrat" tab formation names were regularised and simplified based on current naming conventions.
c. Simplified names added to QPED_July_2013_galileet.xlsx as "Steve_geo" and "Steve_group"
d. Produced new file "GSQ_Geology.xlsx" contained decimal latitude and longitude, KB elevation, top of unit in metres from KB, top of unit in metres AHD, bottom of unit in metres from KB, bottom of unit in metres AHD, original geology, simplified geology, simplified Group geology.
i. KB obtained from "BH_wellhist"
ii. Where no KB information was available ie KB=0, sample the 1S DEM at the well's location to obtain height. KB=DEM+10. Marked well as having lower reliability.
iii. Calculated Top_m_AHD = KB - Top_m_KB
iv. Calculated Bottom_m_AHD = KB - Bottom_m_KB
e. Brought GSQ_Geology.xlsx into ArcGIS
f. Selected wells based on "Steve_geo" field for each model layer to produce a geodatabase for each layer.
i. GSQ_basement_wells
ii. GSQ_top_joe_joe_group
iii. GSQ_top_bandanna_merge
iv. GSQ_rewan_group
v. GSQ_clematis
vi. GSQ_moolyember
g. Additional wells and reinterpreted tops added to appropriate geodatabase based on well completion reports
h. Additional wells added to coverages to help model building process
i. Well_name listed as Fake
ii. Exception being GSQ_top_basement_fake which was created as a separate geodatabase
Bore data
a. Obtained QLD_DNRM_GroundwaterDatabaseExtract_20131111 from GA
b. Used files REGISTRATIONS.txt, ELEVATIONS.txt and AQUIFER.txt to build GW_stratigraphy.xlsx
i. Based on RN
ii. Latitude from GIS_LAT (REGISTRATIONS.txt)
iii. Longitude from GIS_LNG (REGISTRATIONS.txt)
iv. Elevation from (ELEVATIONS.txt)
v. FORM_DESC from (AQUIFER.txt)
vi. Top from (AQUIFER.txt)
vii. Bottom from (AQUIFER.txt)
c. Brought GW_stratigraphy.xlsx into ArcGIS
d. Created gw_bores_galilee_dem
i. Sampled 1S DEM to obtain ground level elevation column RASTERVALU
ii. Created column top_m_AHD by RASTERVALU - Top
e. Selected bores based on "FORM_DESC" field for each model layer to produce a geodatabase for each layer.
i. Gw_basement
ii. GW_bores_joe_joe_group
iii. GW_bores_bandanna
iv. Gw_bores_rewan
v. Gw_bores_clematis
vi. Gw_bores_moolyember
Georectified seismic surfaces
a. Extracted interpreted seismic surfaces for base Permian (interpreted as basement) and top Bandanna (in time) from the following seismic surveys
i. Y80A, W81A, Carmichael, Pendine, T81A, Quilpie, Ward and Powell Creek seismic survey downloaded https://qdexguest.deedi.qld.gov.au/portal/site/qdex/search?searchType=general
ii. Brought TIF images into ArcGIS and georectified
iii. Digitised shape of contours and faults into geodatabase
1. Basement_contours and basement_faults
2. bandanna_contours_new_data and bandanna_faults
iv. Added field "contour" to geodatabase
v. Converted contours to depth in "contour" field based on well and bore data (top_m_AHD) and contour progression
vi. Use the shape and depth derived from OZ SEEBASE to help to add additional contours and faults to basement and bandanna datasets
Additional contour and fault surfaces were built derived from underlying surfaces and wells/bore data
a. Joejoe_contours and joejoe)faults
b. Rewan_contour_clip (used bandanna_faults as fault coverage)
c. Clematis_contour and clematis_faults
d. Moolyember_contour (used clematis_faults as fault coverage)
Surface geology
a. Extracted surface geology from QUEENSLAND GEOLOGY_AUGUST_2012 using Galilee BA region boundary with 25 kilometre boundary to form geodatabase QLD_geology_galilee
b. Selected relevant surface geology from QLD_geology_galilee based on field "Name" for each model layer and created new geodatabase layers
i. Basement_geology: Argentine Metamorphics,Running River Metamorphics,Charters Towers Metamorphics; Bimurra Volcanics, Foyle Volcanics, Mount Wyatt Formation, Saint Anns Formation, Silver Hills Volcanics, Stones Creek Volcanics; Bulliwallah Formation, Ducabrook Formation, Mount Rankin Formation, Natal Formation, Star of Hope Formation; Cape River Metamorphics; Einasleigh Metamorphics; Gem Park Granite; Macrossan Province Cambrian-Ordovician intrusives; Macrossan Province Ordovician-Silurian intrusives; Macrossan Province Ordovician intrusives; Mount Formartine, unnamed plutonic units; Pama Province Silurian-Devonian intrusives; Seventy Mile Range Group; and Kirk River beds, Les Jumelles beds.
ii. Joe_joe_geology: Joe Joe Group
iii. Galilee_permian_geology: Back Creek Group, Betts Creek Group, Blackwater Group
iv. Rewan_geology: Rewan Group
1. Later also made dunda_beds_geology to be included in Rewan model: Dunda beds
v. Clematis_geology: Clematis Group
1. Later also made warang_sandstone_geology to be included in Clematis model: Warang Sandstone
vi. Moolyember_surface_geology: Moolyember Formation
DEM for each model layer
a. Using surface geology geodatabase extent extract grid from dem_s_1s to represent the top of the model layer at the surface
i. Basement_dem
ii. Joejoe_dem
iii. Bandanna_dem
iv. Rewan_dem and dunda_dem
v. Clematis_dem and warang_dem
vi. Moolyember_surface_dem
b. Used Contour tool in ArcGIS to obtain a 25 metre contour geodatabase from the relevant model DEM
i. Basement_dem_contours
ii. Joejoe_dem_contours
iii. Bandanna_dem_contours
iv. Rewan_dem_contours and dunda_dem_contours
v. Clematis_dem_contours and warang_dem_contours
vi. Moolyember_dem_contours
c. For the purpose of guiding the model building process additional fields were added to each DEM contour geodatabase was added based on average thickness derived from groundwater bores and petroleum wells.
i. Basement_dem_contours: Joejoe, bandanna, rewan, clematis, moolyember
ii. Joejoe_dem_contours: basement, bandanna
iii. Bandanna_dem_contours: joejoe, rewan
iv. Rewan_dem_contours and dunda_dem_contours: clematis, rewan
v. Clematis_dem_contours and warang_dem_contours: moolyember, rewan
vi. Moolyember_dem_contours: clematis
The model building process is as follows:
Used the tope to raster tool to create surface based on the following rules
a. Environment
i. Extent
1. Top: -19.7012030024424
2. Right: 148.891511819054
3. Bottom: -27.5812030024424
4. Left: 139.141511819054
ii. Output cell size: 0.01 degrees
iii. Drainage enforcement: No_enforce
b. Input
i. Basement
1. Basement_dem_contour; field - contour; type - contour
2. Joejoe_dem_contour; field - basement; type - contour
3. Basement_contour; field - contour; type - contour
4. GSQ_basement_wells; field - top_m_AHD; type - point elevation
5. GW_basement; field - top_m_AHDl type - point elevation
6. GSQ_top_basement_fake; field - top_m_AHDl type - point elevation
7. Basement_faults; type - cliff
ii. Joe Joe Group
1. Joejoe_dem_contour; field - basement; type - contour
2. Basement_dem_contour; field - joejoe; type - contour
3. permian_dem_contour; field - joejoe, type - contour
4. joejoe_contour; field - joejoe; type - contour
5. GSQ_top_joejoe_group; field - top_m_AHD; type - point elevation
6. GW_bores_joe_joe_group; field - top_m_AHDl type - point elevation
7. joejoe_faults; type - cliff
iii. Bandanna Group
1. Permian_dem_contour; field - contour; type - contour
2. Joejoe_dem_contour; field - bandanna; type - contour
3. Rewan_dem_contour: field - bandanna; type - contour
4. Dunda_dem_contour; field - bandanna; type - contour
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This Dataset contains the results of both the requirement prioritization survey, as well as the explanation quality evaluation.
We also publish all the prompts we used to generate the explanations for the quality evaluation, as well as the excel file used to generate the dummy data.
On an annual basis (individual hospital fiscal year), individual hospitals and hospital systems report detailed facility-level data on services capacity, inpatient/outpatient utilization, patients, revenues and expenses by type and payer, balance sheet and income statement.
Due to the large size of the complete dataset, a selected set of data representing a wide range of commonly used data items, has been created that can be easily managed and downloaded. The selected data file includes general hospital information, utilization data by payer, revenue data by payer, expense data by natural expense category, financial ratios, and labor information.
There are two groups of data contained in this dataset: 1) Selected Data - Calendar Year: To make it easier to compare hospitals by year, hospital reports with report periods ending within a given calendar year are grouped together. The Pivot Tables for a specific calendar year are also found here. 2) Selected Data - Fiscal Year: Hospital reports with report periods ending within a given fiscal year (July-June) are grouped together.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Heterogenous Big dataset is presented in this proposed work: electrocardiogram (ECG) signal, blood pressure signal, oxygen saturation (SpO2) signal, and the text input. This work is an extension version for our relevant formulating of dataset that presented in [1] and a trustworthy and relevant medical dataset library (PhysioNet [2]) was used to acquire these signals. The dataset includes medical features from heterogenous sources (sensory data and non-sensory). Firstly, ECG sensor’s signals which contains QRS width, ST elevation, peak numbers, and cycle interval. Secondly: SpO2 level from SpO2 sensor’s signals. Third, blood pressure sensors’ signals which contain high (systolic) and low (diastolic) values and finally text input which consider non-sensory data. The text inputs were formulated based on doctors diagnosing procedures for heart chronic diseases. Python software environment was used, and the simulated big data is presented along with analyses.
*** Fake News on Twitter ***
These 5 datasets are the results of an empirical study on the spreading process of newly fake news on Twitter. Particularly, we have focused on those fake news which have given rise to a truth spreading simultaneously against them. The story of each fake news is as follow:
1- FN1: A Muslim waitress refused to seat a church group at a restaurant, claiming "religious freedom" allowed her to do so.
2- FN2: Actor Denzel Washington said electing President Trump saved the U.S. from becoming an "Orwellian police state."
3- FN3: Joy Behar of "The View" sent a crass tweet about a fatal fire in Trump Tower.
4- FN4: The animated children's program 'VeggieTales' introduced a cannabis character in August 2018.
5- FN5: In September 2018, the University of Alabama football program ended its uniform contract with Nike, in response to Nike's endorsement deal with Colin Kaepernick.
The data collection has been done in two stages that each provided a new dataset: 1- attaining Dataset of Diffusion (DD) that includes information of fake news/truth tweets and retweets 2- Query of neighbors for spreaders of tweets that provides us with Dataset of Graph (DG).
DD
DD for each fake news story is an excel file, named FNx_DD where x is the number of fake news, and has the following structure:
The structure of excel files for each dataset is as follow:
Each row belongs to one captured tweet/retweet related to the rumor, and each column of the dataset presents a specific information about the tweet/retweet. These columns from left to right present the following information about the tweet/retweet:
User ID (user who has posted the current tweet/retweet)
The description sentence in the profile of the user who has published the tweet/retweet
The number of published tweet/retweet by the user at the time of posting the current tweet/retweet
Date and time of creation of the account by which the current tweet/retweet has been posted
Language of the tweet/retweet
Number of followers
Number of followings (friends)
Date and time of posting the current tweet/retweet
Number of like (favorite) the current tweet had been acquired before crawling it
Number of times the current tweet had been retweeted before crawling it
Is there any other tweet inside of the current tweet/retweet (for example this happens when the current tweet is a quote or reply or retweet)
The source (OS) of device by which the current tweet/retweet was posted
Tweet/Retweet ID
Retweet ID (if the post is a retweet then this feature gives the ID of the tweet that is retweeted by the current post)
Quote ID (if the post is a quote then this feature gives the ID of the tweet that is quoted by the current post)
Reply ID (if the post is a reply then this feature gives the ID of the tweet that is replied by the current post)
Frequency of tweet occurrences which means the number of times the current tweet is repeated in the dataset (for example the number of times that a tweet exists in the dataset in the form of retweet posted by others)
State of the tweet which can be one of the following forms (achieved by an agreement between the annotators):
r : The tweet/retweet is a fake news post
a : The tweet/retweet is a truth post
q : The tweet/retweet is a question about the fake news, however neither confirm nor deny it
n : The tweet/retweet is not related to the fake news (even though it contains the queries related to the rumor, but does not refer to the given fake news)
DG
DG for each fake news contains two files:
A file in graph format (.graph) which includes the information of graph such as who is linked to whom. (This file named FNx_DG.graph, where x is the number of fake news)
A file in Jsonl format (.jsonl) which includes the real user IDs of nodes in the graph file. (This file named FNx_Labels.jsonl, where x is the number of fake news)
Because in the graph file, the label of each node is the number of its entrance in the graph. For example if node with user ID 12345637 be the first node which has been entered into the graph file then its label in the graph is 0 and its real ID (12345637) would be at the row number 1 (because the row number 0 belongs to column labels) in the jsonl file and so on other node IDs would be at the next rows of the file (each row corresponds to 1 user id). Therefore, if we want to know for example what the user id of node 200 (labeled 200 in the graph) is, then in jsonl file we should look at row number 202.
The user IDs of spreaders in DG (those who have had a post in DD) would be available in DD to get extra information about them and their tweet/retweet. The other user IDs in DG are the neighbors of these spreaders and might not exist in DD.
The data sets below provide selected information extracted from exhibits to corporate financial reports filed with the Commission using eXtensible Business Reporting Language (XBRL).
This dataset contains a list of sales and movement data by item and department appended monthly. Update Frequency : Monthly
The 2003 Agriculture Sample Census was designed to meet the data needs of a wide range of users down to district level including policy makers at local, regional and national levels, rural development agencies, funding institutions, researchers, NGOs, farmer organisations, etc. As a result the dataset is both more numerous in its sample and detailed in its scope compared to previous censuses and surveys. To date this is the most detailed Agricultural Census carried out in Africa.
The census was carried out in order to: · Identify structural changes if any, in the size of farm household holdings, crop and livestock production, farm input and implement use. It also seeks to determine if there are any improvements in rural infrastructure and in the level of agriculture household living conditions; · Provide benchmark data on productivity, production and agricultural practices in relation to policies and interventions promoted by the Ministry of Agriculture and Food Security and other stake holders. · Establish baseline data for the measurement of the impact of high level objectives of the Agriculture Sector Development Programme (ASDP), National Strategy for Growth and Reduction of Poverty (NSGRP) and other rural development programs and projects. · Obtain benchmark data that will be used to address specific issues such as: food security, rural poverty, gender, agro-processing, marketing, service delivery, etc.
Tanzania Mainland and Zanzibar
Large scale, small scale and community farms.
Census/enumeration data [cen]
The Mainland sample consisted of 3,221 villages. These villages were drawn from the National Master Sample (NMS) developed by the National Bureau of Statistics (NBS) to serve as a national framework for the conduct of household based surveys in the country. The National Master Sample was developed from the 2002 Population and Housing Census. The total Mainland sample was 48,315 agricultural households. In Zanzibar a total of 317 enumeration areas (EAs) were selected and 4,755 agriculture households were covered. Nationwide, all regions and districts were sampled with the exception of three urban districts (two from Mainland and one from Zanzibar).
In both Mainland and Zanzibar, a stratified two stage sample was used. The number of villages/EAs selected for the first stage was based on a probability proportional to the number of villages in each district. In the second stage, 15 households were selected from a list of farming households in each selected Village/EA, using systematic random sampling, with the village chairpersons assisting to locate the selected households.
Face-to-face [f2f]
The census covered agriculture in detail as well as many other aspects of rural development and was conducted using three different questionnaires: • Small scale questionnaire • Community level questionnaire • Large scale farm questionnaire
The small scale farm questionnaire was the main census instrument and it includes questions related to crop and livestock production and practices; population demographics; access to services, resources and infrastructure; and issues on poverty, gender and subsistence versus profit making production unit.
The community level questionnaire was designed to collect village level data such as access and use of common resources, community tree plantation and seasonal farm gate prices.
The large scale farm questionnaire was administered to large farms either privately or corporately managed.
Questionnaire Design The questionnaires were designed following user meetings to ensure that the questions asked were in line with users data needs. Several features were incorporated into the design of the questionnaires to increase the accuracy of the data: • Where feasible all variables were extensively coded to reduce post enumeration coding error. • The definitions for each section were printed on the opposite page so that the enumerator could easily refer to the instructions whilst interviewing the farmer. • The responses to all questions were placed in boxes printed on the questionnaire, with one box per character. This feature made it possible to use scanning and Intelligent Character Recognition (ICR) technologies for data entry. • Skip patterns were used to reduce unnecessary and incorrect coding of sections which do not apply to the respondent. • Each section was clearly numbered, which facilitated the use of skip patterns and provided a reference for data type coding for the programming of CSPro, SPSS and the dissemination applications.
Data processing consisted of the following processes: · Data entry · Data structure formatting · Batch validation · Tabulation
Data Entry Scanning and ICR data capture technology for the small holder questionnaire were used on the Mainland. This not only increased the speed of data entry, it also increased the accuracy due to the reduction of keystroke errors. Interactive validation routines were incorporated into the ICR software to track errors during the verification process. The scanning operation was so successful that it is highly recommended for adoption in future censuses/surveys. In Zanzibar all data was entered manually using CSPro.
Prior to scanning, all questionnaires underwent a manual cleaning exercise. This involved checking that the questionnaire had a full set of pages, correct identification and good handwriting. A score was given to each questionnaire based on the legibility and the completeness of enumeration. This score will be used to assess the quality of enumeration and supervision in order to select the best field staff for future censuses/surveys.
CSPro was used for data entry of all Large Scale Farm and community based questionnaires due to the relatively small number of questionnaires. It was also used to enter data from the 2,880 small holder questionnaires that were rejected by the ICR extraction application.
Data Structure Formatting A program was developed in visual basic to automatically alter the structure of the output from the scanning/extraction process in order to harmonise it with the manually entered data. The program automatically checked and changed the number of digits for each variable, the record type code, the number of questionnaires in the village, the consistency of the Village ID Code and saved the data of one village in a file named after the village code.
Batch Validation A batch validation program was developed in order to identify inconsistencies within a questionnaire. This is in addition to the interactive validation during the ICR extraction process. The procedures varied from simple range checking within each variable to the more complex checking between variables. It took six months to screen, edit and validate the data from the smallholder questionnaires. After the long process of data cleaning, tabulations were prepared based on a pre-designed tabulation plan.
Tabulations Statistical Package for Social Sciences (SPSS) was used to produce the Census tabulations and Microsoft Excel was used to organize the tables and compute additional indicators. Excel was also used to produce charts while ArcView and Freehand were used for the maps.
Analysis and Report Preparation The analysis in this report focuses on regional comparisons, time series and national production estimates. Microsoft Excel was used to produce charts; ArcView and Freehand were used for maps, whereas Microsoft Word was used to compile the report.
Data Quality A great deal of emphasis was placed on data quality throughout the whole exercise from planning, questionnaire design, training, supervision, data entry, validation and cleaning/editing. As a result of this, it is believed that the census is highly accurate and representative of what was experienced at field level during the Census year. With very few exceptions, the variables in the questionnaire are within the norms for Tanzania and they follow expected time series trends when compared to historical data. Standard Errors and Coefficients of Variation for the main variables are presented in the Technical Report (Volume I).
The Sampling Error found on page (21) up to page (22) in the Technical Report for Agriculture Sample Census Survey 2002-2003
This component contains the data and syntax code used to conduct the Exploratory Factor Analysis and compute Velicer’s minimum average partial test in sample 1
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
ISO 3166-1-alpha-2 English country names and code elements. This list states the country names (official short names in English) in alphabetical order as given in ISO 3166-1 and the corresponding ISO 3166-1-alpha-2 code elements.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
John Ioannidis and co-authors [1] created a publicly available database of top-cited scientists in the world. This database, intended to address the misuse of citation metrics, has generated a lot of interest among the scientific community, institutions, and media. Many institutions used this as a yardstick to assess the quality of researchers. At the same time, some people look at this list with skepticism citing problems with the methodology used. Two separate databases are created based on career-long and, single recent year impact. This database is created using Scopus data from Elsevier[1-3]. The Scientists included in this database are classified into 22 scientific fields and 174 sub-fields. The parameters considered for this analysis are total citations from 1996 to 2022 (nc9622), h index in 2022 (h22), c-score, and world rank based on c-score (Rank ns). Citations without self-cites are considered in all cases (indicated as ns). In the case of a single-year case, citations during 2022 (nc2222) instead of Nc9622 are considered.
To evaluate the robustness of c-score-based ranking, I have done a detailed analysis of the matrix parameters of the last 25 years (1998-2022) of Nobel laureates of Physics, chemistry, and medicine, and compared them with the top 100 rank holders in the list. The latest career-long and single-year-based databases (2022) were used for this analysis. The details of the analysis are presented below:
Though the article says the selection is based on the top 100,000 scientists by c-score (with and without self-citations) or a percentile rank of 2% or above in the sub-field, the actual career-based ranking list has 204644 names[1]. The single-year database contains 210199 names. So, the list published contains ~ the top 4% of scientists. In the career-based rank list, for the person with the lowest rank of 4809825, the nc9622, h22, and c-score were 41, 3, and 1.3632, respectively. Whereas for the person with the No.1 rank in the list, the nc9622, h22, and c-score were 345061, 264, and 5.5927, respectively. Three people on the list had less than 100 citations during 96-2022, 1155 people had an h22 less than 10, and 6 people had a C-score less than 2.
In the single year-based rank list, for the person with the lowest rank (6547764), the nc2222, h22, and c-score were 1, 1, and 0. 6, respectively. Whereas for the person with the No.1 rank, the nc9622, h22, and c-score were 34582, 68, and 5.3368, respectively. 4463 people on the list had less than 100 citations in 2022, 71512 people had an h22 less than 10, and 313 people had a C-score less than 2. The entry of many authors having single digit H index and a very meager total number of citations indicates serious shortcomings of the c-score-based ranking methodology. These results indicate shortcomings in the ranking methodology.
https://brightdata.com/licensehttps://brightdata.com/license
The Myntra Products Dataset serves as a comprehensive resource empowering businesses, researchers, and analysts to gain a comprehensive understanding of the Myntra fashion and lifestyle platform. Whether your aim is to conduct market analysis, refine pricing strategies, decipher customer behavior, or evaluate competitors, this dataset provides indispensable information to drive informed decision-making and excel in the dynamic realm of Myntra. At its foundation, this dataset includes crucial attributes such as product ID, title, ratings, reviews, pricing details, and seller information, among others. These fundamental data elements offer insights into product performance, customer sentiment, and seller reliability, enabling a thorough examination of Myntra's fashion landscape.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Excel spreadsheets by species (4 letter code is abbreviation for genus and species used in study, year 2010 or 2011 is year data collected, SH indicates data for Science Hub, date is date of file preparation). The data in a file are described in a read me file which is the first worksheet in each file. Each row in a species spreadsheet is for one plot (plant). The data themselves are in the data worksheet. One file includes a read me description of the column in the date set for chemical analysis. In this file one row is an herbicide treatment and sample for chemical analysis (if taken). This dataset is associated with the following publication: Olszyk , D., T. Pfleeger, T. Shiroyama, M. Blakely-Smith, E. Lee , and M. Plocher. Plant reproduction is altered by simulated herbicide drift toconstructed plant communities. ENVIRONMENTAL TOXICOLOGY AND CHEMISTRY. Society of Environmental Toxicology and Chemistry, Pensacola, FL, USA, 36(10): 2799-2813, (2017).