Facebook
TwitterSpatial analysis and statistical summaries of the Protected Areas Database of the United States (PAD-US) provide land managers and decision makers with a general assessment of management intent for biodiversity protection, natural resource management, and recreation access across the nation. The PAD-US 3.0 Combined Fee, Designation, Easement feature class (with Military Lands and Tribal Areas from the Proclamation and Other Planning Boundaries feature class) was modified to remove overlaps, avoiding overestimation in protected area statistics and to support user needs. A Python scripted process ("PADUS3_0_CreateVectorAnalysisFileScript.zip") associated with this data release prioritized overlapping designations (e.g. Wilderness within a National Forest) based upon their relative biodiversity conservation status (e.g. GAP Status Code 1 over 2), public access values (in the order of Closed, Restricted, Open, Unknown), and geodatabase load order (records are deliberately organized in the PAD-US full inventory with fee owned lands loaded before overlapping management designations, and easements). The Vector Analysis File ("PADUS3_0VectorAnalysisFile_ClipCensus.zip") associated item of PAD-US 3.0 Spatial Analysis and Statistics ( https://doi.org/10.5066/P9KLBB5D ) was clipped to the Census state boundary file to define the extent and serve as a common denominator for statistical summaries. Boundaries of interest to stakeholders (State, Department of the Interior Region, Congressional District, County, EcoRegions I-IV, Urban Areas, Landscape Conservation Cooperative) were incorporated into separate geodatabase feature classes to support various data summaries ("PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.zip") and Comma-separated Value (CSV) tables ("PADUS3_0SummaryStatistics_TabularData_CSV.zip") summarizing "PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.zip" are provided as an alternative format and enable users to explore and download summary statistics of interest (Comma-separated Table [CSV], Microsoft Excel Workbook [.XLSX], Portable Document Format [.PDF] Report) from the PAD-US Lands and Inland Water Statistics Dashboard ( https://www.usgs.gov/programs/gap-analysis-project/science/pad-us-statistics ). In addition, a "flattened" version of the PAD-US 3.0 combined file without other extent boundaries ("PADUS3_0VectorAnalysisFile_ClipCensus.zip") allow for other applications that require a representation of overall protection status without overlapping designation boundaries. The "PADUS3_0VectorAnalysis_State_Clip_CENSUS2020" feature class ("PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.gdb") is the source of the PAD-US 3.0 raster files (associated item of PAD-US 3.0 Spatial Analysis and Statistics, https://doi.org/10.5066/P9KLBB5D ). Note, the PAD-US inventory is now considered functionally complete with the vast majority of land protection types represented in some manner, while work continues to maintain updates and improve data quality (see inventory completeness estimates at: http://www.protectedlands.net/data-stewards/ ). In addition, changes in protected area status between versions of the PAD-US may be attributed to improving the completeness and accuracy of the spatial data more than actual management actions or new acquisitions. USGS provides no legal warranty for the use of this data. While PAD-US is the official aggregation of protected areas ( https://www.fgdc.gov/ngda-reports/NGDA_Datasets.html ), agencies are the best source of their lands data.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Data organization for the figures in the document: Figure 3A LineOutWithSun_SSAzi_135to225_green_Correct_ROI5_INFO.xls Figure 3b LineOutWithSun_SSAzi_m45to45_green_Correct_ROI5_INFO.xls Figure 4 fulllinear_inDic_SqAzi_m180to0_CP_20to50_green_Correct_ROI5_INFO.xls fulllinear_inDic_SqAzi_m180to0_CP_20to50_green_Sim_Correct_ROI5_INFO.xls Figure 5a LineOut_Camera_Elevation_SqAzi_m180to0_green_Sim_Correct_ROI5_INFO.xls LineOut_Camera_Elevation_SqAzi_m180to0_green_Correct_ROI5_INFO.xls Figure 5b LineOut_Camera_Elevation_SqAzi_0to180_green_Correct_ROI5_INFO.xls LineOut_Camera_Elevation_SqAzi_0to180_green_Sim_Correct_ROI5_INFO.xls Figure 6a LineOutColor_SqAzi_m180to0_CP_20to50_Correct_ROI5_INFO.xls Figure 6b LineOutROI_SqAzi_m180to0_CP_20to50_green_Correct_INFO.xls Figure 7 fulllinear_inDic_SqAzi_m180to0_CP_20to50_green_Correct_ROI5_INFO.xls LineOut_MeshAoPDif_Camera_Elevation_SqAzi_0to180_green_Correct_ROI5_INFO.xls LineOut_MeshAoPDif_Camera_Elevation_SqAzi_m180to0_green_Correct_ROI5_INFO.xls
Facebook
TwitterAttribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
License information was derived automatically
I used Excel to analyze bank loan data, focusing on regional borrowing trends. The project included cleaning and organizing data, creating visualizations, and identifying areas with high loan activity. The findings helped in understanding loan patterns and making better decisions.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
PROJECT OBJECTIVE
We are a part of XYZ Co Pvt Ltd company who is in the business of organizing the sports events at international level. Countries nominate sportsmen from different departments and our team has been given the responsibility to systematize the membership roster and generate different reports as per business requirements.
Questions (KPIs)
TASK 1: STANDARDIZING THE DATASET
TASK 2: DATA FORMATING
TASK 3: SUMMARIZE DATA - PIVOT TABLE (Use SPORTSMEN worksheet after attempting TASK 1) • Create a PIVOT table in the worksheet ANALYSIS, starting at cell B3,with the following details:
TASK 4: SUMMARIZE DATA - EXCEL FUNCTIONS (Use SPORTSMEN worksheet after attempting TASK 1)
• Create a SUMMARY table in the worksheet ANALYSIS,starting at cell G4, with the following details:
TASK 5: GENERATE REPORT - PIVOT TABLE (Use SPORTSMEN worksheet after attempting TASK 1)
• Create a PIVOT table report in the worksheet REPORT, starting at cell A3, with the following information:
Process
Facebook
TwitterThe harmonized data set on health, created and published by the ERF, is a subset of Iraq Household Socio Economic Survey (IHSES) 2012. It was derived from the household, individual and health modules, collected in the context of the above mentioned survey. The sample was then used to create a harmonized health survey, comparable with the Iraq Household Socio Economic Survey (IHSES) 2007 micro data set.
----> Overview of the Iraq Household Socio Economic Survey (IHSES) 2012:
Iraq is considered a leader in household expenditure and income surveys where the first was conducted in 1946 followed by surveys in 1954 and 1961. After the establishment of Central Statistical Organization, household expenditure and income surveys were carried out every 3-5 years in (1971/ 1972, 1976, 1979, 1984/ 1985, 1988, 1993, 2002 / 2007). Implementing the cooperation between CSO and WB, Central Statistical Organization (CSO) and Kurdistan Region Statistics Office (KRSO) launched fieldwork on IHSES on 1/1/2012. The survey was carried out over a full year covering all governorates including those in Kurdistan Region.
The survey has six main objectives. These objectives are:
The raw survey data provided by the Statistical Office were then harmonized by the Economic Research Forum, to create a comparable version with the 2006/2007 Household Socio Economic Survey in Iraq. Harmonization at this stage only included unifying variables' names, labels and some definitions. See: Iraq 2007 & 2012- Variables Mapping & Availability Matrix.pdf provided in the external resources for further information on the mapping of the original variables on the harmonized ones, in addition to more indications on the variables' availability in both survey years and relevant comments.
National coverage: Covering a sample of urban, rural and metropolitan areas in all the governorates including those in Kurdistan Region.
1- Household/family. 2- Individual/person.
The survey was carried out over a full year covering all governorates including those in Kurdistan Region.
Sample survey data [ssd]
----> Design:
Sample size was (25488) household for the whole Iraq, 216 households for each district of 118 districts, 2832 clusters each of which includes 9 households distributed on districts and governorates for rural and urban.
----> Sample frame:
Listing and numbering results of 2009-2010 Population and Housing Survey were adopted in all the governorates including Kurdistan Region as a frame to select households, the sample was selected in two stages: Stage 1: Primary sampling unit (blocks) within each stratum (district) for urban and rural were systematically selected with probability proportional to size to reach 2832 units (cluster). Stage two: 9 households from each primary sampling unit were selected to create a cluster, thus the sample size of total survey clusters was 25488 households distributed on the governorates, 216 households in each district.
----> Sampling Stages:
In each district, the sample was selected in two stages: Stage 1: based on 2010 listing and numbering frame 24 sample points were selected within each stratum through systematic sampling with probability proportional to size, in addition to the implicit breakdown urban and rural and geographic breakdown (sub-district, quarter, street, county, village and block). Stage 2: Using households as secondary sampling units, 9 households were selected from each sample point using systematic equal probability sampling. Sampling frames of each stages can be developed based on 2010 building listing and numbering without updating household lists. In some small districts, random selection processes of primary sampling may lead to select less than 24 units therefore a sampling unit is selected more than once , the selection may reach two cluster or more from the same enumeration unit when it is necessary.
Face-to-face [f2f]
----> Preparation:
The questionnaire of 2006 survey was adopted in designing the questionnaire of 2012 survey on which many revisions were made. Two rounds of pre-test were carried out. Revision were made based on the feedback of field work team, World Bank consultants and others, other revisions were made before final version was implemented in a pilot survey in September 2011. After the pilot survey implemented, other revisions were made in based on the challenges and feedbacks emerged during the implementation to implement the final version in the actual survey.
----> Questionnaire Parts:
The questionnaire consists of four parts each with several sections: Part 1: Socio – Economic Data: - Section 1: Household Roster - Section 2: Emigration - Section 3: Food Rations - Section 4: housing - Section 5: education - Section 6: health - Section 7: Physical measurements - Section 8: job seeking and previous job
Part 2: Monthly, Quarterly and Annual Expenditures: - Section 9: Expenditures on Non – Food Commodities and Services (past 30 days). - Section 10 : Expenditures on Non – Food Commodities and Services (past 90 days). - Section 11: Expenditures on Non – Food Commodities and Services (past 12 months). - Section 12: Expenditures on Non-food Frequent Food Stuff and Commodities (7 days). - Section 12, Table 1: Meals Had Within the Residential Unit. - Section 12, table 2: Number of Persons Participate in the Meals within Household Expenditure Other Than its Members.
Part 3: Income and Other Data: - Section 13: Job - Section 14: paid jobs - Section 15: Agriculture, forestry and fishing - Section 16: Household non – agricultural projects - Section 17: Income from ownership and transfers - Section 18: Durable goods - Section 19: Loans, advances and subsidies - Section 20: Shocks and strategy of dealing in the households - Section 21: Time use - Section 22: Justice - Section 23: Satisfaction in life - Section 24: Food consumption during past 7 days
Part 4: Diary of Daily Expenditures: Diary of expenditure is an essential component of this survey. It is left at the household to record all the daily purchases such as expenditures on food and frequent non-food items such as gasoline, newspapers…etc. during 7 days. Two pages were allocated for recording the expenditures of each day, thus the roster will be consists of 14 pages.
----> Raw Data:
Data Editing and Processing: To ensure accuracy and consistency, the data were edited at the following stages: 1. Interviewer: Checks all answers on the household questionnaire, confirming that they are clear and correct. 2. Local Supervisor: Checks to make sure that questions has been correctly completed. 3. Statistical analysis: After exporting data files from excel to SPSS, the Statistical Analysis Unit uses program commands to identify irregular or non-logical values in addition to auditing some variables. 4. World Bank consultants in coordination with the CSO data management team: the World Bank technical consultants use additional programs in SPSS and STAT to examine and correct remaining inconsistencies within the data files. The software detects errors by analyzing questionnaire items according to the expected parameter for each variable.
----> Harmonized Data:
Iraq Household Socio Economic Survey (IHSES) reached a total of 25488 households. Number of households refused to response was 305, response rate was 98.6%. The highest interview rates were in Ninevah and Muthanna (100%) while the lowest rates were in Sulaimaniya (92%).
Facebook
TwitterISO is an independent advisory organization that collects information on a community's building-code adoption and enforcement services in order to provide a ranking for insurance companies. ISO assigns a Building Code Effectiveness Classification from 1 to 10 based on the data collected. Class 1 represents exemplary commitment to building-code enforcement.Municipalities with better rankings are lower risk, and their residents' insurance rates can reflect that. The prospect of minimizing catastrophe-related damage and ultimately lowering insurance costs gives communities an incentive to enforce their building codes rigorously.This page provides data for the Insurance Services Organization (ISO) performance measure. This data includes residential and commercial building code enforcement ratings for the City of Tempe.The performance measure dashboard is available at 1.15 Insurance Services Organization (ISO) RatingAdditional InformationSource: Insurance Service Organization RatingContact: Chris ThompsonContact E-Mail: Christopher_Thompson@tempe.govData Source Type: ExcelPreparation Method: Information added to Excel spreadsheet from rating reportPublish Frequency: Every 5 YearsPublish Method: ManualData Dictionary
Facebook
TwitterGSA, the nation's largest public real estate organization, provides workspace for over one million federal workers. These employees, along with government property, are housed in space owned by the federal government and in leased properties including buildings, land, antenna sites, etc. across the country.
Facebook
TwitterThe Ontario government, generates and maintains thousands of datasets. Since 2012, we have shared data with Ontarians via a data catalogue. Open data is data that is shared with the public. Click here to learn more about open data and why Ontario releases it. Ontario’s Open Data Directive states that all data must be open, unless there is good reason for it to remain confidential. Ontario’s Chief Digital and Data Officer also has the authority to make certain datasets available publicly. Datasets listed in the catalogue that are not open will have one of the following labels: If you want to use data you find in the catalogue, that data must have a licence – a set of rules that describes how you can use it. A licence: Most of the data available in the catalogue is released under Ontario’s Open Government Licence. However, each dataset may be shared with the public under other kinds of licences or no licence at all. If a dataset doesn’t have a licence, you don’t have the right to use the data. If you have questions about how you can use a specific dataset, please contact us. The Ontario Data Catalogue endeavors to publish open data in a machine readable format. For machine readable datasets, you can simply retrieve the file you need using the file URL. The Ontario Data Catalogue is built on CKAN, which means the catalogue has the following features you can use when building applications. APIs (Application programming interfaces) let software applications communicate directly with each other. If you are using the catalogue in a software application, you might want to extract data from the catalogue through the catalogue API. Note: All Datastore API requests to the Ontario Data Catalogue must be made server-side. The catalogue's collection of dataset metadata (and dataset files) is searchable through the CKAN API. The Ontario Data Catalogue has more than just CKAN's documented search fields. You can also search these custom fields. You can also use the CKAN API to retrieve metadata about a particular dataset and check for updated files. Read the complete documentation for CKAN's API. Some of the open data in the Ontario Data Catalogue is available through the Datastore API. You can also search and access the machine-readable open data that is available in the catalogue. How to use the API feature: Read the complete documentation for CKAN's Datastore API. The Ontario Data Catalogue contains a record for each dataset that the Government of Ontario possesses. Some of these datasets will be available to you as open data. Others will not be available to you. This is because the Government of Ontario is unable to share data that would break the law or put someone's safety at risk. You can search for a dataset with a word that might describe a dataset or topic. Use words like “taxes” or “hospital locations” to discover what datasets the catalogue contains. You can search for a dataset from 3 spots on the catalogue: the homepage, the dataset search page, or the menu bar available across the catalogue. On the dataset search page, you can also filter your search results. You can select filters on the left hand side of the page to limit your search for datasets with your favourite file format, datasets that are updated weekly, datasets released by a particular organization, or datasets that are released under a specific licence. Go to the dataset search page to see the filters that are available to make your search easier. You can also do a quick search by selecting one of the catalogue’s categories on the homepage. These categories can help you see the types of data we have on key topic areas. When you find the dataset you are looking for, click on it to go to the dataset record. Each dataset record will tell you whether the data is available, and, if so, tell you about the data available. An open dataset might contain several data files. These files might represent different periods of time, different sub-sets of the dataset, different regions, language translations, or other breakdowns. You can select a file and either download it or preview it. Make sure to read the licence agreement to make sure you have permission to use it the way you want. Read more about previewing data. A non-open dataset may be not available for many reasons. Read more about non-open data. Read more about restricted data. Data that is non-open may still be subject to freedom of information requests. The catalogue has tools that enable all users to visualize the data in the catalogue without leaving the catalogue – no additional software needed. Have a look at our walk-through of how to make a chart in the catalogue. Get automatic notifications when datasets are updated. You can choose to get notifications for individual datasets, an organization’s datasets or the full catalogue. You don’t have to provide and personal information – just subscribe to our feeds using any feed reader you like using the corresponding notification web addresses. Copy those addresses and paste them into your reader. Your feed reader will let you know when the catalogue has been updated. The catalogue provides open data in several file formats (e.g., spreadsheets, geospatial data, etc). Learn about each format and how you can access and use the data each file contains. A file that has a list of items and values separated by commas without formatting (e.g. colours, italics, etc.) or extra visual features. This format provides just the data that you would display in a table. XLSX (Excel) files may be converted to CSV so they can be opened in a text editor. How to access the data: Open with any spreadsheet software application (e.g., Open Office Calc, Microsoft Excel) or text editor. Note: This format is considered machine-readable, it can be easily processed and used by a computer. Files that have visual formatting (e.g. bolded headers and colour-coded rows) can be hard for machines to understand, these elements make a file more human-readable and less machine-readable. A file that provides information without formatted text or extra visual features that may not follow a pattern of separated values like a CSV. How to access the data: Open with any word processor or text editor available on your device (e.g., Microsoft Word, Notepad). A spreadsheet file that may also include charts, graphs, and formatting. How to access the data: Open with a spreadsheet software application that supports this format (e.g., Open Office Calc, Microsoft Excel). Data can be converted to a CSV for a non-proprietary format of the same data without formatted text or extra visual features. A shapefile provides geographic information that can be used to create a map or perform geospatial analysis based on location, points/lines and other data about the shape and features of the area. It includes required files (.shp, .shx, .dbt) and might include corresponding files (e.g., .prj). How to access the data: Open with a geographic information system (GIS) software program (e.g., QGIS). A package of files and folders. The package can contain any number of different file types. How to access the data: Open with an unzipping software application (e.g., WinZIP, 7Zip). Note: If a ZIP file contains .shp, .shx, and .dbt file types, it is an ArcGIS ZIP: a package of shapefiles which provide information to create maps or perform geospatial analysis that can be opened with ArcGIS (a geographic information system software program). A file that provides information related to a geographic area (e.g., phone number, address, average rainfall, number of owl sightings in 2011 etc.) and its geospatial location (i.e., points/lines). How to access the data: Open using a GIS software application to create a map or do geospatial analysis. It can also be opened with a text editor to view raw information. Note: This format is machine-readable, and it can be easily processed and used by a computer. Human-readable data (including visual formatting) is easy for users to read and understand. A text-based format for sharing data in a machine-readable way that can store data with more unconventional structures such as complex lists. How to access the data: Open with any text editor (e.g., Notepad) or access through a browser. Note: This format is machine-readable, and it can be easily processed and used by a computer. Human-readable data (including visual formatting) is easy for users to read and understand. A text-based format to store and organize data in a machine-readable way that can store data with more unconventional structures (not just data organized in tables). How to access the data: Open with any text editor (e.g., Notepad). Note: This format is machine-readable, and it can be easily processed and used by a computer. Human-readable data (including visual formatting) is easy for users to read and understand. A file that provides information related to an area (e.g., phone number, address, average rainfall, number of owl sightings in 2011 etc.) and its geospatial location (i.e., points/lines). How to access the data: Open with a geospatial software application that supports the KML format (e.g., Google Earth). Note: This format is machine-readable, and it can be easily processed and used by a computer. Human-readable data (including visual formatting) is easy for users to read and understand. This format contains files with data from tables used for statistical analysis and data visualization of Statistics Canada census data. How to access the data: Open with the Beyond 20/20 application. A database which links and combines data from different files or applications (including HTML, XML, Excel, etc.). The database file can be converted to a CSV/TXT to make the data machine-readable, but human-readable formatting will be lost. How to access the data: Open with Microsoft Office Access (a database management system used to develop application software). A file that keeps the original layout and
Facebook
TwitterData Collecting
Historical Data on the Global Coffee Trade https://www.ico.org/new_historical.asp
International Coffee Organization(ICO) provides data on the coffee industry over 30 years between 1990-2019. I select 9 .xlsx files: production, domestic consumption, gross opening stocks, exports, imports, re-exports, price to growers, retail price, and consumption recorded for different countries and years. 5 of them --production, domestic consumption, gross opening stocks, exports, and price to growers-- includes data from the exporting countries and 4 of them have data from importing countries.
Data Cleaning
Each excel file should be a feature in the data frame in pandas. Our goal is to combine the tables into one, using 'pd.melt'and 'pd.merge'. At the same time, we deal with missing values and redundant values. Also, messy string data is cleaned.
Structure: assign a proper data type Quality: drop empty rows, unified format, remove unnessarily aggregated rows
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Users can find data on a range of global health topics like mortality, the burden of disease, infectious diseases, risk factors and health expenditures. Background The Global Health Observatory (GHO) database is the World Health Organization's main health statistics repository. Data is available for 193 World Health Organization member states on topics including but not limited to: Health related millennium goals, mortality, immunization, nutrition, infectious disease, non- communicable disease, tobacco control, violence, injuries, alcohol, HIV/AIDS, tuberculosis, malaria, water and sanitation, maternal and reproductive health, cho lera, child health, child nutrition, and road safety. User FunctionalityUsers can generate tables and charts according to country or region, health indicator, and time period. Data can also be compared across countries. Data can be filtered, tabulated, charted, and downloaded into Excel statistical software. These data are also published in statistical reports covering topics including: Alcohol and health, Child health, Cholera, HIV/AIDS, Malaria, Maternal and reproductive heal th, Non-communicable diseases, Public health and environment, Road safety, Tuberculosis, Tobacco control. Data Notes Data are derived from surveillance and household surveys. Years in which data were collected is indicated with these health statistics. Information is available for each WHO member country and international region. The most recent data is available from 2009.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
General description
This dataset contains some markers of Open Science in the publications of the Chemical Biology Consortium Sweden (CBCS) between 2010 and July 2023. The sample of CBCS publications during this period consists of 188 articles. Every publication was visited manually at its DOI URL to answer the following questions.
Is the research article an Open Access publication?
Does the research article have a Creative Common license or a similar license?
Does the research article contain a data availability statement?
Did the authors submit data of their study to a repository such as EMBL, Genbank, Protein Data Bank PDB, Cambridge Crystallographic Data Centre CCDC, Dryad or a similar repository?
Does the research article contain supplementary data?
Do the supplementary data have a persistent identifier that makes them citable as a defined research output?
Variables
The data were compiled in a Microsoft Excel 365 document that includes the following variables.
DOI URL of research article
Year of publication
Research article published with Open Access
License for research article
Data availability statement in article
Supplementary data added to article
Persistent identifier for supplementary data
Authors submitted data to NCBI or EMBL or PDB or Dryad or CCDC
Visualization
Parts of the data were visualized in two figures as bar diagrams using Microsoft Excel 365. The first figure displays the number of publications during a year, the number of publications that is published with open access and the number of publications that contain a data availability statement (Figure 1). The second figure shows the number of publication sper year and how many publications contain supplementary data. This figure also shows how many of the supplementary datasets have a persistent identifier (Figure 2).
File formats and software
The file formats used in this dataset are:
.csv (Text file) .docx (Microsoft Word 365 file) .jpg (JPEG image file) .pdf/A (Portable Document Format for archiving) .png (Portable Network Graphics image file) .pptx (Microsoft Power Point 365 file) .txt (Text file) .xlsx (Microsoft Excel 365 file)
All files can be opened with Microsoft Office 365 and work likely also with the older versions Office 2019 and 2016.
MD5 checksums
Here is a list of all files of this dataset and of their MD5 checksums.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The GAPs Data Repository provides a comprehensive overview of available qualitative and quantitative data on national return regimes, now accessible through an advanced web interface at https://data.returnmigration.eu/.
This updated guideline outlines the complete process, starting from the initial data collection for the return migration data repository to the development of a comprehensive web-based platform. Through iterative development, participatory approaches, and rigorous quality checks, we have ensured a systematic representation of return migration data at both national and comparative levels.
The Repository organizes data into five main categories, covering diverse aspects and offering a holistic view of return regimes: country profiles, legislation, infrastructure, international cooperation, and descriptive statistics. These categories, further divided into subcategories, are based on insights from a literature review, existing datasets, and empirical data collection from 14 countries. The selection of categories prioritizes relevance for understanding return and readmission policies and practices, data accessibility, reliability, clarity, and comparability. Raw data is meticulously collected by the national experts.
The transition to a web-based interface builds upon the Repository’s original structure, which was initially developed using REDCap (Research Electronic Data Capture). It is a secure web application for building and managing online surveys and databases.The REDCAP ensures systematic data entries and store them on Uppsala University’s servers while significantly improving accessibility and usability as well as data security. It also enables users to export any or all data from the Project when granted full data export privileges. Data can be exported in various ways and formats, including Microsoft Excel, SAS, Stata, R, or SPSS for analysis. At this stage, the Data Repository design team also converted tailored records of available data into public reports accessible to anyone with a unique URL, without the need to log in to REDCap or obtain permission to access the GAPs Project Data Repository. Public reports can be used to share information with stakeholders or external partners without granting them access to the Project or requiring them to set up a personal account. Currently, all public report links inserted in this report are also available on the Repository’s webpage, allowing users to export original data.
This report also includes a detailed codebook to help users understand the structure, variables, and methodologies used in data collection and organization. This addition ensures transparency and provides a comprehensive framework for researchers and practitioners to effectively interpret the data.
The GAPs Data Repository is committed to providing accessible, well-organized, and reliable data by moving to a centralized web platform and incorporating advanced visuals. This Repository aims to contribute inputs for research, policy analysis, and evidence-based decision-making in the return and readmission field.
Explore the GAPs Data Repository at https://data.returnmigration.eu/.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
**Introduction ** This case study will be based on Cyclistic, a bike sharing company in Chicago. I will perform tasks of a junior data analyst to answer business questions. I will do this by following a process that includes the following phases: ask, prepare, process, analyze, share and act.
Background Cyclistic is a bike sharing company that operates 5828 bikes within 692 docking stations. The company has been around since 2016 and separates itself from the competition due to the fact that they offer a variety of bike services including assistive options. Lily Moreno is the director of the marketing team and will be the person to receive these insights from this analysis.
Case Study and business task Lily Morenos perspective on how to generate more income by marketing Cyclistics services correctly includes converting casual riders (one day passes and/or pay per ride customers) into annual riders with a membership. Annual riders are more profitable than casual riders according to the finance analysts. She would rather see a campaign targeting casual riders into annual riders, instead of launching campaigns targeting new costumers. So her strategy as the manager of the marketing team is simply to maximize the amount of annual riders by converting casual riders.
In order to make a data driven decision, Moreno needs the following insights: - A better understanding of how casual riders and annual riders differ - Why would a casual rider become an annual one - How digital media can affect the marketing tactics
Moreno has directed me to the first question - how do casual riders and annual riders differ?
Stakeholders Lily Moreno, manager of the marketing team Cyclistic Marketing team Executive team
Data sources and organization Data used in this report is made available and is licensed by Motivate International Inc. Personal data is hidden to protect personal information. Data used is from the past 12 months (01/04/2021 – 31/03/2022) of bike share dataset.
By merging all 12 monthly bike share data provided, an extensive amount of data with 5,400,000 rows were returned and included in this analysis.
Data security and limitations: Personal information is secured and hidden to prevent unlawful use. Original files are backed up in folders and subfolders.
Tools and documentation of cleaning process The tools used for data verification and data cleaning are Microsoft Excel and R programming. The original files made accessible by Motivate International Inc. are backed up in their original format and in separate files.
Microsoft Excel is used to generally look through the dataset and get a overview of the content. I performed simple checks of the data by filtering, sorting, formatting and standardizing the data to make it easily mergeable.. In Excel, I also changed data type to have the right format, removed unnecessary data if its incomplete or incorrect, created new columns to subtract and reformat existing columns and deleting empty cells. These tasks are easily done in spreadsheets and provides an initial cleaning process of the data.
R will be used to perform queries of bigger datasets such as this one. R will also be used to create visualizations to answer the question at hand.
Limitations Microsoft Excel has a limitation of 1,048,576 rows while the data of the 12 months combined are over 5,500,000 rows. When combining the 12 months of data into one table/sheet, Excel is no longer efficient and I switched over to R programming.
Facebook
TwitterAI Generated Summary: The Ontario Data Catalogue is a data portal providing access to open datasets generated and maintained by the Ontario government. It allows users to search, access, visualize, and download data in various machine-readable formats, often through APIs, while also indicating licensing terms and data update frequencies. The catalogue also provides tools for data visualization and notifications for dataset updates. About: The Ontario government generates and maintains thousands of datasets. Since 2012, we have shared data with Ontarians via a data catalogue. Open data is data that is shared with the public. Click here to learn more about open data and why Ontario releases it. Ontario’s Digital and Data Directive states that all data must be open, unless there is good reason for it to remain confidential. Ontario’s Chief Digital and Data Officer also has the authority to make certain datasets available publicly. Datasets listed in the catalogue that are not open will have one of the following labels: If you want to use data you find in the catalogue, that data must have a licence – a set of rules that describes how you can use it. A licence: Most of the data available in the catalogue is released under Ontario’s Open Government Licence. However, each dataset may be shared with the public under other kinds of licences or no licence at all. If a dataset doesn’t have a licence, you don’t have the right to use the data. If you have questions about how you can use a specific dataset, please contact us. The Ontario Data Catalogue endeavors to publish open data in a machine readable format. For machine readable datasets, you can simply retrieve the file you need using the file URL. The Ontario Data Catalogue is built on CKAN, which means the catalogue has the following features you can use when building applications. APIs (Application programming interfaces) let software applications communicate directly with each other. If you are using the catalogue in a software application, you might want to extract data from the catalogue through the catalogue API. Note: All Datastore API requests to the Ontario Data Catalogue must be made server-side. The catalogue's collection of dataset metadata (and dataset files) is searchable through the CKAN API. The Ontario Data Catalogue has more than just CKAN's documented search fields. You can also search these custom fields. You can also use the CKAN API to retrieve metadata about a particular dataset and check for updated files. Read the complete documentation for CKAN's API. Some of the open data in the Ontario Data Catalogue is available through the Datastore API. You can also search and access the machine-readable open data that is available in the catalogue. How to use the API feature: Read the complete documentation for CKAN's Datastore API. The Ontario Data Catalogue contains a record for each dataset that the Government of Ontario possesses. Some of these datasets will be available to you as open data. Others will not be available to you. This is because the Government of Ontario is unable to share data that would break the law or put someone's safety at risk. You can search for a dataset with a word that might describe a dataset or topic. Use words like “taxes” or “hospital locations” to discover what datasets the catalogue contains. You can search for a dataset from 3 spots on the catalogue: the homepage, the dataset search page, or the menu bar available across the catalogue. On the dataset search page, you can also filter your search results. You can select filters on the left hand side of the page to limit your search for datasets with your favourite file format, datasets that are updated weekly, datasets released by a particular ministry, or datasets that are released under a specific licence. Go to the dataset search page to see the filters that are available to make your search easier. You can also do a quick search by selecting one of the catalogue’s categories on the homepage. These categories can help you see the types of data we have on key topic areas. When you find the dataset you are looking for, click on it to go to the dataset record. Each dataset record will tell you whether the data is available, and, if so, tell you about the data available. An open dataset might contain several data files. These files might represent different periods of time, different sub-sets of the dataset, different regions, language translations, or other breakdowns. You can select a file and either download it or preview it. Make sure to read the licence agreement to make sure you have permission to use it the way you want. A non-open dataset may be not available for many reasons. Read more about non-open data. Read more about restricted data. Data that is non-open may still be subject to freedom of information requests. The catalogue has tools that enable all users to visualize the data in the catalogue without leaving the catalogue – no additional software needed. Get automatic notifications when datasets are updated. You can choose to get notifications for individual datasets, an organization’s datasets or the full catalogue. You don’t have to provide and personal information – just subscribe to our feeds using any feed reader you like using the corresponding notification web addresses. Copy those addresses and paste them into your reader. Your feed reader will let you know when the catalogue has been updated. The catalogue provides open data in several file formats (e.g., spreadsheets, geospatial data, etc). Learn about each format and how you can access and use the data each file contains. A file that has a list of items and values separated by commas without formatting (e.g. colours, italics, etc.) or extra visual features. This format provides just the data that you would display in a table. XLSX (Excel) files may be converted to CSV so they can be opened in a text editor. How to access the data: Open with any spreadsheet software application (e.g., Open Office Calc, Microsoft Excel) or text editor. Note: This format is considered machine-readable, it can be easily processed and used by a computer. Files that have visual formatting (e.g. bolded headers and colour-coded rows) can be hard for machines to understand, these elements make a file more human-readable and less machine-readable. A file that provides information without formatted text or extra visual features that may not follow a pattern of separated values like a CSV. How to access the data: Open with any word processor or text editor available on your device (e.g., Microsoft Word, Notepad). A spreadsheet file that may also include charts, graphs, and formatting. How to access the data: Open with a spreadsheet software application that supports this format (e.g., Open Office Calc, Microsoft Excel). Data can be converted to a CSV for a non-proprietary format of the same data without formatted text or extra visual features. A shapefile provides geographic information that can be used to create a map or perform geospatial analysis based on location, points/lines and other data about the shape and features of the area. It includes required files (.shp, .shx, .dbt) and might include corresponding files (e.g., .prj). How to access the data: Open with a geographic information system (GIS) software program (e.g., QGIS). A package of files and folders. The package can contain any number of different file types. How to access the data: Open with an unzipping software application (e.g., WinZIP, 7Zip). Note: If a ZIP file contains .shp, .shx, and .dbt file types, it is an ArcGIS ZIP: a package of shapefiles which provide information to create maps or perform geospatial analysis that can be opened with ArcGIS (a geographic information system software program). A file that provides information related to a geographic area (e.g., phone number, address, average rainfall, number of owl sightings in 2011 etc.) and its geospatial location (i.e., points/lines). How to access the data: Open using a GIS software application to create a map or do geospatial analysis. It can also be opened with a text editor to view raw information. Note: This format is machine-readable, and it can be easily processed and used by a computer. Human-readable data (including visual formatting) is easy for users to read and understand. A text-based format for sharing data in a machine-readable way that can store data with more unconventional structures such as complex lists. How to access the data: Open with any text editor (e.g., Notepad) or access through a browser. Note: This format is machine-readable, and it can be easily processed and used by a computer. Human-readable data (including visual formatting) is easy for users to read and understand. A text-based format to store and organize data in a machine-readable way that can store data with more unconventional structures (not just data organized in tables). How to access the data: Open with any text editor (e.g., Notepad). Note: This format is machine-readable, and it can be easily processed and used by a computer. Human-readable data (including visual formatting) is easy for users to read and understand. A file that provides information related to an area (e.g., phone number, address, average rainfall, number of owl sightings in 2011 etc.) and its geospatial location (i.e., points/lines). How to access the data: Open with a geospatial software application that supports the KML format (e.g., Google Earth). Note: This format is machine-readable, and it can be easily processed and used by a computer. Human-readable data (including visual formatting) is easy for users to read and understand. This format contains files with data from tables used for statistical analysis and data visualization of Statistics Canada census data. How to access the data: Open with the Beyond 20/20 application. A database which links and combines data from different files or
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Macros for organizing raw data from LabView software that logs timestamps for every rotation of a flight mill arms and then summarizes the time of individual flight bouts.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides an in-depth look at the League of Legends Champions Korea (LCK) Spring 2024 season. It includes detailed metrics for players, champions, and matches, meticulously cleaned and organized for easy analysis and modeling.
The data was collected using a combination of manual efforts and automated web scraping tools. Specifically:
Source: Data was gathered from Gol.gg, a well-known platform for League of Legends statistics. Automation: Web scraping was performed using Python libraries like BeautifulSoup and Selenium to extract information on players, matches, and champions efficiently. Focus: The scripts were designed to capture relevant performance metrics for each player and champion used during the Spring 2024 split.
The raw data obtained from web scraping required significant preprocessing to ensure its usability. The following steps were taken:
Extracted key performance indicators like KDA, Win Rate, Games Played, and Match Durations from the source. Normalized inconsistent formats for metrics such as win rates (e.g., removing %) and durations (e.g., converting MM:SS to total seconds).
Removed duplicate rows and ensured no missing values. Fixed inconsistencies in player and champion names to maintain uniformity. Checked for outliers in numerical metrics (e.g., unrealistically high KDA values).
Created three separate tables for better data management:
Player Statistics: General player performance metrics like KDA, win rates, and average kills. Champion Statistics: Data on games played, win rates, and KDA for each champion. Match List: Details of each match, including players, champions, and results. Added sequential Player IDs to connect the three datasets, facilitating relational analysis. Date Formatting: Converted all date fields to the DD/MM/YYYY format for consistency. Removed irrelevant time data to focus solely on match dates.
The following tools were used throughout the project:
Python: Libraries: Pandas, NumPy for data manipulation; BeautifulSoup, Selenium for web scraping. Visualization: Matplotlib, Seaborn, Plotly for potential analysis. Excel: Consolidated final datasets into a structured Excel file with multiple sheets. Data Validation: Used Python scripts to check for missing data, validate numerical columns, and ensure data consistency. Kaggle Integration: Cleaned datasets and a comprehensive README file were prepared for direct upload to Kaggle.
This dataset is ready for use in: Exploratory Data Analysis (EDA): Visualize player and champion performance trends across matches. Machine Learning: Develop models to predict match outcomes based on player and champion statistics. Sports Analytics: Gain insights into champion picks, win rates, and individual player strategies.
This dataset was made possible by the extensive statistics available on Gol.gg and the use of Python-based web scraping and data cleaning methodologies. It is shared under the CC BY 4.0 License to encourage reuse and collaboration.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Sample data for exercises in Further Adventures in Data Cleaning.
Facebook
TwitterThe documentation covers Enterprise Survey panel datasets that were collected in Slovenia in 2009, 2013 and 2019.
The Slovenia ES 2009 was conducted between 2008 and 2009. The Slovenia ES 2013 was conducted between March 2013 and September 2013. Finally, the Slovenia ES 2019 was conducted between December 2018 and November 2019. The objective of the Enterprise Survey is to gain an understanding of what firms experience in the private sector.
As part of its strategic goal of building a climate for investment, job creation, and sustainable growth, the World Bank has promoted improving the business environment as a key strategy for development, which has led to a systematic effort in collecting enterprise data across countries. The Enterprise Surveys (ES) are an ongoing World Bank project in collecting both objective data based on firms' experiences and enterprises' perception of the environment in which they operate.
National
The primary sampling unit of the study is the establishment. An establishment is a physical location where business is carried out and where industrial operations take place or services are provided. A firm may be composed of one or more establishments. For example, a brewery may have several bottling plants and several establishments for distribution. For the purposes of this survey an establishment must take its own financial decisions and have its own financial statements separate from those of the firm. An establishment must also have its own management and control over its payroll.
As it is standard for the ES, the Slovenia ES was based on the following size stratification: small (5 to 19 employees), medium (20 to 99 employees), and large (100 or more employees).
Sample survey data [ssd]
The sample for Slovenia ES 2009, 2013, 2019 were selected using stratified random sampling, following the methodology explained in the Sampling Manual for Slovenia 2009 ES and for Slovenia 2013 ES, and in the Sampling Note for 2019 Slovenia ES.
Three levels of stratification were used in this country: industry, establishment size, and oblast (region). The original sample designs with specific information of the industries and regions chosen are included in the attached Excel file (Sampling Report.xls.) for Slovenia 2009 ES. For Slovenia 2013 and 2019 ES, specific information of the industries and regions chosen is described in the "The Slovenia 2013 Enterprise Surveys Data Set" and "The Slovenia 2019 Enterprise Surveys Data Set" reports respectively, Appendix E.
For the Slovenia 2009 ES, industry stratification was designed in the way that follows: the universe was stratified into manufacturing industries, services industries, and one residual (core) sector as defined in the sampling manual. Each industry had a target of 90 interviews. For the manufacturing industries sample sizes were inflated by about 17% to account for potential non-response cases when requesting sensitive financial data and also because of likely attrition in future surveys that would affect the construction of a panel. For the other industries (residuals) sample sizes were inflated by about 12% to account for under sampling in firms in service industries.
For Slovenia 2013 ES, industry stratification was designed in the way that follows: the universe was stratified into one manufacturing industry, and two service industries (retail, and other services).
Finally, for Slovenia 2019 ES, three levels of stratification were used in this country: industry, establishment size, and region. The original sample design with specific information of the industries and regions chosen is described in "The Slovenia 2019 Enterprise Surveys Data Set" report, Appendix C. Industry stratification was done as follows: Manufacturing – combining all the relevant activities (ISIC Rev. 4.0 codes 10-33), Retail (ISIC 47), and Other Services (ISIC 41-43, 45, 46, 49-53, 55, 56, 58, 61, 62, 79, 95).
For Slovenia 2009 and 2013 ES, size stratification was defined following the standardized definition for the rollout: small (5 to 19 employees), medium (20 to 99 employees), and large (more than 99 employees). For stratification purposes, the number of employees was defined on the basis of reported permanent full-time workers. This seems to be an appropriate definition of the labor force since seasonal/casual/part-time employment is not a common practice, except in the sectors of construction and agriculture.
For Slovenia 2009 ES, regional stratification was defined in 2 regions. These regions are Vzhodna Slovenija and Zahodna Slovenija. The Slovenia sample contains panel data. The wave 1 panel “Investment Climate Private Enterprise Survey implemented in Slovenia” consisted of 223 establishments interviewed in 2005. A total of 57 establishments have been re-interviewed in the 2008 Business Environment and Enterprise Performance Survey.
For Slovenia 2013 ES, regional stratification was defined in 2 regions (city and the surrounding business area) throughout Slovenia.
Finally, for Slovenia 2019 ES, regional stratification was done across two regions: Eastern Slovenia (NUTS code SI03) and Western Slovenia (SI04).
Computer Assisted Personal Interview [capi]
Questionnaires have common questions (core module) and respectfully additional manufacturing- and services-specific questions. The eligible manufacturing industries have been surveyed using the Manufacturing questionnaire (includes the core module, plus manufacturing specific questions). Retail firms have been interviewed using the Services questionnaire (includes the core module plus retail specific questions) and the residual eligible services have been covered using the Services questionnaire (includes the core module). Each variation of the questionnaire is identified by the index variable, a0.
Survey non-response must be differentiated from item non-response. The former refers to refusals to participate in the survey altogether whereas the latter refers to the refusals to answer some specific questions. Enterprise Surveys suffer from both problems and different strategies were used to address these issues.
Item non-response was addressed by two strategies: a- For sensitive questions that may generate negative reactions from the respondent, such as corruption or tax evasion, enumerators were instructed to collect the refusal to respond as (-8). b- Establishments with incomplete information were re-contacted in order to complete this information, whenever necessary. However, there were clear cases of low response.
For 2009 and 2013 Slovenia ES, the survey non-response was addressed by maximizing efforts to contact establishments that were initially selected for interview. Up to 4 attempts were made to contact the establishment for interview at different times/days of the week before a replacement establishment (with similar strata characteristics) was suggested for interview. Survey non-response did occur but substitutions were made in order to potentially achieve strata-specific goals. Further research is needed on survey non-response in the Enterprise Surveys regarding potential introduction of bias.
For 2009, the number of contacted establishments per realized interview was 6.18. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The relatively low ratio of contacted establishments per realized interview (6.18) suggests that the main source of error in estimates in the Slovenia may be selection bias and not frame inaccuracy.
For 2013, the number of realized interviews per contacted establishment was 25%. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The number of rejections per contact was 44%.
Finally, for 2019, the number of interviews per contacted establishments was 9.7%. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The share of rejections per contact was 75.2%.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Content and data sourceThis dataset contains the results of a manual analysis of Open Science markers in the publications of the Swedish Metabolomics Centre (SMC) between 2016 and 2024. It contains similar variables as the data of the "Analysis of CBCS publications for Open Access, data availability statements and persistent identifiers for supplementary data" (Kieselbach, 2023).
The sample of these publications was fetched from SciLifeLab on 5 May 2025 at the URL: https://publications.scilifelab.se/label/Swedish Metabolomics Centre (SMC)
It contains 285 articles that are the source data for the work to create this dataset. Every publication was manually visited at its DOI URL and checked for 23 variables.
Questions studiedSome of the questions that were addressed in the collection of the data are:
Does the article have an open license and what kind of license does it have?
Does the article contain research data that may have restricted access such as personal data and health data?
Does the article contain a data availability statement?
Does the article contain supplementary material that the authors added to it?
Does the supplementary material contain research data?
Does the supplementary material contain metabolomics data such as, for instance, summaries and visualizations?
Did the authors submit metabolomics data to MetaboLights at the EBI or to other repsoitories?
Did the authors submit other data to other repositories?
Is data available on request from the authors?
Visualization of dataThe data was compiled and visualized using Microsoft Excel 365. The visualization includes one table that gives a general overview of the dataset, and four figures that show some results of the analysis.
Figure 1. Percentage of publications between 2016 and 2024 with an Open Access License and with a data availability statement.
Figure 2. Submissions to repositories between 2016 and 2024.
Figure 3. Percentage of publications that contained supplementary material and if this supplementary material contained research data and metabolomics data.
Figure 4. Repositories used by the authors between 2016 and 2024.
List of variables1. Year of Publication (answer: year)
Date of Publication (answer: date)
DOI (answer: DOI)
DOI URL (answer: DOI URL)
Research article (answer: Yes or No)
Access to article without paywall (answer: Yes or No)
License for research article (answer: Name of the license or No)
Data with restricted access (answer: Yes or No)
Data availability statement in article (answer: Yes or No)
Supplementary material added to article (answer: Yes or No)
Access to supplementary material without paywall (answer: Yes or No)
Supplementary material contains research data (answer: Yes or No)
Supplementary data contains metabolomics data (answer: Yes or No)
Persistent identifier for supplementary data (answer: Yes or No)
Source data added to the article (answer: Yes or No)
Source data contain metabolomics data (answer: Yes or No)
Authors submitted metabolomics data to MetaboLights (answer: Yes or No)
Authors submitted metabolomics data to another repository (answer: name of the repository or No)
Authors submitted other data to a repository (answer: name of the repository or No)
Authors submitted other data to a second repository (answer: name of the repository or No)
Authors submitted other data to a third repository (answer: name of the repository or No)
Authors submitted code to a repository (answer: name of the repository or No)
Data available on request from the authors (answer: Yes or No)
Variables that are available in the source data1. Title of article
Authors
Journal
Year
(Date) Published
(Date) E-published
Volume
Issue
Pages
DOI
PMID
Labels
Qualifiers
IUID
URL
DOI URL of research article
PubMed URL of research article
File formats and softwareThe file formats used in this dataset are:
.csv (Text file)
.jpg (JPEG image file)
.pdf/A (Portable Document Format for archiving)
.txt (Text file)
.xlsx (Microsoft Excel 365 file)
All files can be opened with Microsoft Office 365.
ReferenceKieselbach, Theresa (2023). Analysis of CBCS publications for Open Access, data availability statements and persistent identifiers for supplementary data. Umeå University. Dataset. https://doi.org/10.17044/scilifelab.23641749.v1
AbbreviationsCC BY 4.0: Creative Commons Attribution 4.0 International Public License
CC BY-NC 4.0: Creative Commons Attribution-NonCommercial 4.0 International Public License
CC BY-NC 3.0: Creative Commons Attribution-NonCommercial 3.0 International Public License
CC BY-NC-ND 4.0: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License
DOI: Digital Object Identifier
EBI: European Bioinformatics Institute
EBI-ArrayExpress: The ArrayExpress collection of functional genomics data at the EBI
EBI-ENA: European Nucleotide Archive at the EBI
EBI-Pride: Proteomics Identification Database at the EBI
e!DAL: electronic Data Archive Library at the Leibniz Institute for Plant Genetics and Crop Plant Research
IUID: Item Unique identification
LUDC: Lund University Diabetes Centre
LUDC repository: data repository at the Lund University Diabetes Centre
NCBI: National Center for Biotechnology Information
NCBI-GEO: The Gene Expression Omnibus database repository at the NCBI
NCBI-SRA: The Sequence Read Archive at the NCBI
PMID: Pubmed Identifier
URL: Uniform Resource Locator
MD5 Checksums of the filesManifest.txt (2 KB): 89f32a728fb74ebecef0aef4633130b0
README.txt (6 KB): 34ea4ad9cb9bdea54755fa87f2d0b913
Analysis_SMC_publications_2016_2024_Open_Access_publication_and_access_to_data_status_2025_06_24.csv (46 KB): 9719df26381901bc6aabfd34fdbfab81
Analysis_SMC_publications_2016_2024_Open_Access_publication_and_access_to_data_status_2025_06_24.xlsx (49 KB): 1ec95dc29262645240e7d8714967bcfc
Table_1_Overview_SMC_publications_2016_2024_status_2025_06_11.csv (391 Bytes): 1fd723dc6f52f18251d41c0d343a4f0f
Table_1_Overview_SMC_publications_2016_2024_status_2025_06_11.xlsx (9 KB): 38622a9681c6f1057a6e1a4be56b0285
Figure_1_SMC_publications_2016_2024_open_access_license_and_data_availability_status_2025_06_11.csv (468 Bytes): 9f9156f8d52603ccdec968f626bc002a
Figure_1_SMC_publications_2016_2024_open_access_license_and_data_availability_status_2025_06_11.jpg (119 KB): dc9a4d7de4c789e8aea46ce66e007301
Figure_1_SMC_publications_2016_2024_open_access_license_and_data_availability_status_2025_06_11.xlsx (15 KB): 6527d1ebd0069ef3757bd1b049f0fc74
Figure_2_SMC_publications_2016_2024_metabolomics_data_and_other_data_to_repositories_status_2024_06_12.csv (300 Bytes): 5abc4a0fcf776f8dc4745f41deddacbc
Figure_2_SMC_publications_2016_2024_metabolomics_data_and_other_data_to_repositories_status_2024_06_12.jpg (126 KB): e03e5bf4ba2d942c3b022aebb0a59033
Figure_2_SMC_publications_2016_2024_metabolomics_data_and_other_data_to_repositories_status_2024_06_12.xlsx (15 KB): a80f977c051d4798db221b07733c694b
Figure_3_SMC_publications_2016_2024_overview_supplementary_data_status_2025_06_11.csv (670 Bytes): a694a3defa98aa52fcdec8ff9e9e3316
Figure_3_SMC_publications_2016_2024_overview_supplementary_data_status_2025_06_11.jpg(153 KB): 3928bdc1f046ca9b6f66bdbcdf936ca8
Figure_3_SMC_publications_2016_2024_overview_supplementary_data_status_2025_06_11.xlsx (15 KB): 46dfda56b116b571b4bf8e3674b44512
Figure_4_SMC_publications_2016_2024_submission_of_data_to_repositories_status_2025_06_12.csv (498 Bytes): 8963a412cc9e458ced2e80883bb93e1a
Figure_4_SMC_publications_2016_2024_submission_of_data_to_repositories_status_2025_06_12.jpg (137 KB): c9ba447225e99431f24732128a754b7e
Figure_4_SMC_publications_2016_2024_submission_of_data_to_repositories_status_2025_06_12.xlsx (16 KB): 1e2813d3ccb0ee14991b276947c21b8a
Materials_and_methods_SMC_publications_2016_2024.docx (19 KB): 71776ffc1e530e1b40255763403b2f40
Materials_and_methods_SMC_publications_2016_2024.txt (4 KB): 26c4b91b958b9e33d93d13dc52b25da9
Materials_and_methods_SMC_publications_2026_2024.pdf (172 KB): eee564f452ef4f3cf57bb81a6874fcd4
SMC_publications_2016_2024_status_2025_05_05.csv (143 KB): 5e61d09244ca90b1e5b057a7afdfe5e7
SMC_publications_2016_2024_status_2025_05_05.xlsx (106 KB): 6977fbcac21ff5a12763e40de90c0a91
Facebook
TwitterThis dataset was generated from a set of Excel spreadsheets from an Information and Communication Technology Services (ICTS) administrative database on student applications to the University of Cape Town (UCT). This database contains information on applications to UCT between the January 2006 and December 2014. In the original form received by DataFirst the data were ill suited to research purposes. This dataset represents an attempt at cleaning and organizing these data into a more tractable format. To ensure data confidentiality direct identifiers have been removed from the data and the data is only made available to accredited researchers through DataFirst's Secure Data Service.
The dataset was separated into the following data files:
Applications, individuals
Administrative records [adm]
Other [oth]
The data files were made available to DataFirst as a group of Excel spreadsheet documents from an SQL database managed by the University of Cape Town's Information and Communication Technology Services . The process of combining these original data files to create a research-ready dataset is summarised in a document entitled "Notes on preparing the UCT Student Application Data 2006-2014" accompanying the data.
Facebook
TwitterSpatial analysis and statistical summaries of the Protected Areas Database of the United States (PAD-US) provide land managers and decision makers with a general assessment of management intent for biodiversity protection, natural resource management, and recreation access across the nation. The PAD-US 3.0 Combined Fee, Designation, Easement feature class (with Military Lands and Tribal Areas from the Proclamation and Other Planning Boundaries feature class) was modified to remove overlaps, avoiding overestimation in protected area statistics and to support user needs. A Python scripted process ("PADUS3_0_CreateVectorAnalysisFileScript.zip") associated with this data release prioritized overlapping designations (e.g. Wilderness within a National Forest) based upon their relative biodiversity conservation status (e.g. GAP Status Code 1 over 2), public access values (in the order of Closed, Restricted, Open, Unknown), and geodatabase load order (records are deliberately organized in the PAD-US full inventory with fee owned lands loaded before overlapping management designations, and easements). The Vector Analysis File ("PADUS3_0VectorAnalysisFile_ClipCensus.zip") associated item of PAD-US 3.0 Spatial Analysis and Statistics ( https://doi.org/10.5066/P9KLBB5D ) was clipped to the Census state boundary file to define the extent and serve as a common denominator for statistical summaries. Boundaries of interest to stakeholders (State, Department of the Interior Region, Congressional District, County, EcoRegions I-IV, Urban Areas, Landscape Conservation Cooperative) were incorporated into separate geodatabase feature classes to support various data summaries ("PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.zip") and Comma-separated Value (CSV) tables ("PADUS3_0SummaryStatistics_TabularData_CSV.zip") summarizing "PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.zip" are provided as an alternative format and enable users to explore and download summary statistics of interest (Comma-separated Table [CSV], Microsoft Excel Workbook [.XLSX], Portable Document Format [.PDF] Report) from the PAD-US Lands and Inland Water Statistics Dashboard ( https://www.usgs.gov/programs/gap-analysis-project/science/pad-us-statistics ). In addition, a "flattened" version of the PAD-US 3.0 combined file without other extent boundaries ("PADUS3_0VectorAnalysisFile_ClipCensus.zip") allow for other applications that require a representation of overall protection status without overlapping designation boundaries. The "PADUS3_0VectorAnalysis_State_Clip_CENSUS2020" feature class ("PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.gdb") is the source of the PAD-US 3.0 raster files (associated item of PAD-US 3.0 Spatial Analysis and Statistics, https://doi.org/10.5066/P9KLBB5D ). Note, the PAD-US inventory is now considered functionally complete with the vast majority of land protection types represented in some manner, while work continues to maintain updates and improve data quality (see inventory completeness estimates at: http://www.protectedlands.net/data-stewards/ ). In addition, changes in protected area status between versions of the PAD-US may be attributed to improving the completeness and accuracy of the spatial data more than actual management actions or new acquisitions. USGS provides no legal warranty for the use of this data. While PAD-US is the official aggregation of protected areas ( https://www.fgdc.gov/ngda-reports/NGDA_Datasets.html ), agencies are the best source of their lands data.