68 datasets found

d
Protected Areas Database of the United States (PAD-US) 3.0 Vector Analysis...
catalog.data.gov
data.usgs.gov
+1more
Updated Oct 22, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Protected Areas Database of the United States (PAD-US) 3.0 Vector Analysis and Summary Statistics [Dataset]. https://catalog.data.gov/dataset/protected-areas-database-of-the-united-states-pad-us-3-0-vector-analysis-and-summary-stati
Explore at:
Dataset updated
Oct 22, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
United States
Description
Spatial analysis and statistical summaries of the Protected Areas Database of the United States (PAD-US) provide land managers and decision makers with a general assessment of management intent for biodiversity protection, natural resource management, and recreation access across the nation. The PAD-US 3.0 Combined Fee, Designation, Easement feature class (with Military Lands and Tribal Areas from the Proclamation and Other Planning Boundaries feature class) was modified to remove overlaps, avoiding overestimation in protected area statistics and to support user needs. A Python scripted process ("PADUS3_0_CreateVectorAnalysisFileScript.zip") associated with this data release prioritized overlapping designations (e.g. Wilderness within a National Forest) based upon their relative biodiversity conservation status (e.g. GAP Status Code 1 over 2), public access values (in the order of Closed, Restricted, Open, Unknown), and geodatabase load order (records are deliberately organized in the PAD-US full inventory with fee owned lands loaded before overlapping management designations, and easements). The Vector Analysis File ("PADUS3_0VectorAnalysisFile_ClipCensus.zip") associated item of PAD-US 3.0 Spatial Analysis and Statistics ( https://doi.org/10.5066/P9KLBB5D ) was clipped to the Census state boundary file to define the extent and serve as a common denominator for statistical summaries. Boundaries of interest to stakeholders (State, Department of the Interior Region, Congressional District, County, EcoRegions I-IV, Urban Areas, Landscape Conservation Cooperative) were incorporated into separate geodatabase feature classes to support various data summaries ("PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.zip") and Comma-separated Value (CSV) tables ("PADUS3_0SummaryStatistics_TabularData_CSV.zip") summarizing "PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.zip" are provided as an alternative format and enable users to explore and download summary statistics of interest (Comma-separated Table [CSV], Microsoft Excel Workbook [.XLSX], Portable Document Format [.PDF] Report) from the PAD-US Lands and Inland Water Statistics Dashboard ( https://www.usgs.gov/programs/gap-analysis-project/science/pad-us-statistics ). In addition, a "flattened" version of the PAD-US 3.0 combined file without other extent boundaries ("PADUS3_0VectorAnalysisFile_ClipCensus.zip") allow for other applications that require a representation of overall protection status without overlapping designation boundaries. The "PADUS3_0VectorAnalysis_State_Clip_CENSUS2020" feature class ("PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.gdb") is the source of the PAD-US 3.0 raster files (associated item of PAD-US 3.0 Spatial Analysis and Statistics, https://doi.org/10.5066/P9KLBB5D ). Note, the PAD-US inventory is now considered functionally complete with the vast majority of land protection types represented in some manner, while work continues to maintain updates and improve data quality (see inventory completeness estimates at: http://www.protectedlands.net/data-stewards/ ). In addition, changes in protected area status between versions of the PAD-US may be attributed to improving the completeness and accuracy of the spatial data more than actual management actions or new acquisitions. USGS provides no legal warranty for the use of this data. While PAD-US is the official aggregation of protected areas ( https://www.fgdc.gov/ngda-reports/NGDA_Datasets.html ), agencies are the best source of their lands data.
T
Excel files containing data for Figures
dataverse.tdl.org
xls
Updated Aug 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Parrish Brady; Parrish Brady (2020). Excel files containing data for Figures [Dataset]. http://doi.org/10.18738/T8/EGV2TV
Explore at:
xls(22016), xls(71680), xls(9728), xls(13824), xls(529920), xls(339968), xls(26112), xls(17920), xls(67584)Available download formats
Unique identifier
https://doi.org/10.18738/T8/EGV2TV
Dataset updated
Aug 24, 2020
Dataset provided by
Texas Data Repository
Authors
Parrish Brady; Parrish Brady
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Data organization for the figures in the document: Figure 3A LineOutWithSun_SSAzi_135to225_green_Correct_ROI5_INFO.xls Figure 3b LineOutWithSun_SSAzi_m45to45_green_Correct_ROI5_INFO.xls Figure 4 fulllinear_inDic_SqAzi_m180to0_CP_20to50_green_Correct_ROI5_INFO.xls fulllinear_inDic_SqAzi_m180to0_CP_20to50_green_Sim_Correct_ROI5_INFO.xls Figure 5a LineOut_Camera_Elevation_SqAzi_m180to0_green_Sim_Correct_ROI5_INFO.xls LineOut_Camera_Elevation_SqAzi_m180to0_green_Correct_ROI5_INFO.xls Figure 5b LineOut_Camera_Elevation_SqAzi_0to180_green_Correct_ROI5_INFO.xls LineOut_Camera_Elevation_SqAzi_0to180_green_Sim_Correct_ROI5_INFO.xls Figure 6a LineOutColor_SqAzi_m180to0_CP_20to50_Correct_ROI5_INFO.xls Figure 6b LineOutROI_SqAzi_m180to0_CP_20to50_green_Correct_INFO.xls Figure 7 fulllinear_inDic_SqAzi_m180to0_CP_20to50_green_Correct_ROI5_INFO.xls LineOut_MeshAoPDif_Camera_Elevation_SqAzi_0to180_green_Correct_ROI5_INFO.xls LineOut_MeshAoPDif_Camera_Elevation_SqAzi_m180to0_green_Correct_ROI5_INFO.xls
Bank Loan Analysis Project Using Excel
kaggle.com
zip
Updated Aug 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
raka (2024). Bank Loan Analysis Project Using Excel [Dataset]. https://www.kaggle.com/datasets/connectraka/bank-loan-analysis-project-using-excel
Explore at:
zip(41141339 bytes)Available download formats
Dataset updated
Aug 16, 2024
Authors
raka
License
Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
License information was derived automatically
Description
I used Excel to analyze bank loan data, focusing on regional borrowing trends. The project included cleaning and organizing data, creating visualizations, and identifying areas with high loan activity. The findings helped in understanding loan patterns and making better decisions.
SPORTS_DATA_ANALYSIS_ON_EXCEL
kaggle.com
zip
Updated Dec 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nil kamal Saha (2024). SPORTS_DATA_ANALYSIS_ON_EXCEL [Dataset]. https://www.kaggle.com/datasets/nilkamalsaha/sports-data-analysis-on-excel
Explore at:
zip(1203633 bytes)Available download formats
Dataset updated
Dec 12, 2024
Authors
Nil kamal Saha
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
PROJECT OBJECTIVE

We are a part of XYZ Co Pvt Ltd company who is in the business of organizing the sports events at international level. Countries nominate sportsmen from different departments and our team has been given the responsibility to systematize the membership roster and generate different reports as per business requirements.

Questions (KPIs)

TASK 1: STANDARDIZING THE DATASET

Populate the FULLNAME consisting of the following fields ONLY, in the prescribed format: PREFIX FIRSTNAME LASTNAME.{Note: All UPPERCASE)

Get the COUNTRY NAME to which these sportsmen belong to. Make use of LOCATION sheet to get the required data

Populate the LANGUAGE_!poken by the sportsmen. Make use of LOCTION sheet to get the required data

Generate the EMAIL ADDRESS for those members, who speak English, in the prescribed format :lastname.firstnamel@xyz .org {Note: All lowercase) and for all other members, format should be lastname.firstname@xyz.com (Note: All lowercase)

Populate the SPORT LOCATION of the sport played by each player. Make use of SPORT sheet to get the required data

TASK 2: DATA FORMATING

Display MEMBER IDas always 3 digit number {Note: 001,002 ...,D2D,..etc)

Format the BIRTHDATE as dd mmm'yyyy (Prescribed format example: 09 May' 1986)

Display the units for the WEIGHT column (Prescribed format example: 80 kg)

Format the SALARY to show the data In thousands. If SALARY is less than 100,000 then display data with 2 decimal places else display data with one decimal place. In both cases units should be thousands (k) e.g. 87670 -> 87.67 k and 12 250 -> 123.2 k

TASK 3: SUMMARIZE DATA - PIVOT TABLE (Use SPORTSMEN worksheet after attempting TASK 1) • Create a PIVOT table in the worksheet ANALYSIS, starting at cell B3,with the following details:

In COLUMNS; Group : GENDER.

In ROWS; Group : COUNTRY (Note: use COUNTRY NAMES).

In VALUES; calculate the count of candidates from each COUNTRY and GENDER type, Remove GRAND TOTALs.

TASK 4: SUMMARIZE DATA - EXCEL FUNCTIONS (Use SPORTSMEN worksheet after attempting TASK 1)

• Create a SUMMARY table in the worksheet ANALYSIS,starting at cell G4, with the following details:

Starting from range RANGE H4; get the distinct GENDER. Use remove duplicates option and transpose the data.

Starting from range RANGE GS; get the distinct COUNTRY (Note: use COUNTRY NAMES).

In the cross table,get the count of candidates from each COUNTRY and GENDER type.

TASK 5: GENERATE REPORT - PIVOT TABLE (Use SPORTSMEN worksheet after attempting TASK 1)

• Create a PIVOT table report in the worksheet REPORT, starting at cell A3, with the following information:

Change the report layout to TABULAR form.

Remove expand and collapse buttons.

Remove GRAND TOTALs.

Allow user to filter the data by SPORT LOCATION.

Process

Verify data for any missing values and anomalies, and sort out the same.

Made sure data is consistent and clean with respect to data type, data format and values used.

Created pivot tables according to the questions asked.
i
Household Health Survey 2012-2013, Economic Research Forum (ERF)...
datacatalog.ihsn.org
catalog.ihsn.org
Updated Jun 26, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kurdistan Regional Statistics Office (KRSO) (2017). Household Health Survey 2012-2013, Economic Research Forum (ERF) Harmonization Data - Iraq [Dataset]. https://datacatalog.ihsn.org/catalog/6937
Explore at:
Dataset updated
Jun 26, 2017
Dataset provided by
Kurdistan Regional Statistics Office (KRSO)
Central Statistical Organization (CSO)
Economic Research Forum
Time period covered
2012 - 2013
Area covered
Iraq
Description
Abstract

The harmonized data set on health, created and published by the ERF, is a subset of Iraq Household Socio Economic Survey (IHSES) 2012. It was derived from the household, individual and health modules, collected in the context of the above mentioned survey. The sample was then used to create a harmonized health survey, comparable with the Iraq Household Socio Economic Survey (IHSES) 2007 micro data set.

----> Overview of the Iraq Household Socio Economic Survey (IHSES) 2012:

Iraq is considered a leader in household expenditure and income surveys where the first was conducted in 1946 followed by surveys in 1954 and 1961. After the establishment of Central Statistical Organization, household expenditure and income surveys were carried out every 3-5 years in (1971/ 1972, 1976, 1979, 1984/ 1985, 1988, 1993, 2002 / 2007). Implementing the cooperation between CSO and WB, Central Statistical Organization (CSO) and Kurdistan Region Statistics Office (KRSO) launched fieldwork on IHSES on 1/1/2012. The survey was carried out over a full year covering all governorates including those in Kurdistan Region.

The survey has six main objectives. These objectives are:

Provide data for poverty analysis and measurement and monitor, evaluate and update the implementation Poverty Reduction National Strategy issued in 2009.

Provide comprehensive data system to assess household social and economic conditions and prepare the indicators related to the human development.

Provide data that meet the needs and requirements of national accounts.

Provide detailed indicators on consumption expenditure that serve making decision related to production, consumption, export and import.

Provide detailed indicators on the sources of households and individuals income.

Provide data necessary for formulation of a new consumer price index number.

The raw survey data provided by the Statistical Office were then harmonized by the Economic Research Forum, to create a comparable version with the 2006/2007 Household Socio Economic Survey in Iraq. Harmonization at this stage only included unifying variables' names, labels and some definitions. See: Iraq 2007 & 2012- Variables Mapping & Availability Matrix.pdf provided in the external resources for further information on the mapping of the original variables on the harmonized ones, in addition to more indications on the variables' availability in both survey years and relevant comments.

Geographic coverage

National coverage: Covering a sample of urban, rural and metropolitan areas in all the governorates including those in Kurdistan Region.

Analysis unit

1- Household/family. 2- Individual/person.

Universe

The survey was carried out over a full year covering all governorates including those in Kurdistan Region.

Kind of data

Sample survey data [ssd]

Sampling procedure

----> Design:

Sample size was (25488) household for the whole Iraq, 216 households for each district of 118 districts, 2832 clusters each of which includes 9 households distributed on districts and governorates for rural and urban.

----> Sample frame:

Listing and numbering results of 2009-2010 Population and Housing Survey were adopted in all the governorates including Kurdistan Region as a frame to select households, the sample was selected in two stages: Stage 1: Primary sampling unit (blocks) within each stratum (district) for urban and rural were systematically selected with probability proportional to size to reach 2832 units (cluster). Stage two: 9 households from each primary sampling unit were selected to create a cluster, thus the sample size of total survey clusters was 25488 households distributed on the governorates, 216 households in each district.

----> Sampling Stages:

In each district, the sample was selected in two stages: Stage 1: based on 2010 listing and numbering frame 24 sample points were selected within each stratum through systematic sampling with probability proportional to size, in addition to the implicit breakdown urban and rural and geographic breakdown (sub-district, quarter, street, county, village and block). Stage 2: Using households as secondary sampling units, 9 households were selected from each sample point using systematic equal probability sampling. Sampling frames of each stages can be developed based on 2010 building listing and numbering without updating household lists. In some small districts, random selection processes of primary sampling may lead to select less than 24 units therefore a sampling unit is selected more than once , the selection may reach two cluster or more from the same enumeration unit when it is necessary.

Mode of data collection

Face-to-face [f2f]

Research instrument

----> Preparation:

The questionnaire of 2006 survey was adopted in designing the questionnaire of 2012 survey on which many revisions were made. Two rounds of pre-test were carried out. Revision were made based on the feedback of field work team, World Bank consultants and others, other revisions were made before final version was implemented in a pilot survey in September 2011. After the pilot survey implemented, other revisions were made in based on the challenges and feedbacks emerged during the implementation to implement the final version in the actual survey.

----> Questionnaire Parts:

The questionnaire consists of four parts each with several sections: Part 1: Socio – Economic Data: - Section 1: Household Roster - Section 2: Emigration - Section 3: Food Rations - Section 4: housing - Section 5: education - Section 6: health - Section 7: Physical measurements - Section 8: job seeking and previous job

Part 2: Monthly, Quarterly and Annual Expenditures: - Section 9: Expenditures on Non – Food Commodities and Services (past 30 days). - Section 10 : Expenditures on Non – Food Commodities and Services (past 90 days). - Section 11: Expenditures on Non – Food Commodities and Services (past 12 months). - Section 12: Expenditures on Non-food Frequent Food Stuff and Commodities (7 days). - Section 12, Table 1: Meals Had Within the Residential Unit. - Section 12, table 2: Number of Persons Participate in the Meals within Household Expenditure Other Than its Members.

Part 3: Income and Other Data: - Section 13: Job - Section 14: paid jobs - Section 15: Agriculture, forestry and fishing - Section 16: Household non – agricultural projects - Section 17: Income from ownership and transfers - Section 18: Durable goods - Section 19: Loans, advances and subsidies - Section 20: Shocks and strategy of dealing in the households - Section 21: Time use - Section 22: Justice - Section 23: Satisfaction in life - Section 24: Food consumption during past 7 days

Part 4: Diary of Daily Expenditures: Diary of expenditure is an essential component of this survey. It is left at the household to record all the daily purchases such as expenditures on food and frequent non-food items such as gasoline, newspapers…etc. during 7 days. Two pages were allocated for recording the expenditures of each day, thus the roster will be consists of 14 pages.

Cleaning operations

----> Raw Data:

Data Editing and Processing: To ensure accuracy and consistency, the data were edited at the following stages: 1. Interviewer: Checks all answers on the household questionnaire, confirming that they are clear and correct. 2. Local Supervisor: Checks to make sure that questions has been correctly completed. 3. Statistical analysis: After exporting data files from excel to SPSS, the Statistical Analysis Unit uses program commands to identify irregular or non-logical values in addition to auditing some variables. 4. World Bank consultants in coordination with the CSO data management team: the World Bank technical consultants use additional programs in SPSS and STAT to examine and correct remaining inconsistencies within the data files. The software detects errors by analyzing questionnaire items according to the expected parameter for each variable.

----> Harmonized Data:

The SPSS package is used to harmonize the Iraq Household Socio Economic Survey (IHSES) 2007 with Iraq Household Socio Economic Survey (IHSES) 2012.

The harmonization process starts with raw data files received from the Statistical Office.

A program is generated for each dataset to create harmonized variables.

Data is saved on the household and individual level, in SPSS and then converted to STATA, to be disseminated.

Response rate

Iraq Household Socio Economic Survey (IHSES) reached a total of 25488 households. Number of households refused to response was 305, response rate was 98.6%. The highest interview rates were in Ninevah and Muthanna (100%) while the lowest rates were in Sulaimaniya (92%).
d
1.15 Insurance Services Organization (summary)
catalog.data.gov
performance.tempe.gov
+10more
Updated Aug 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Tempe (2025). 1.15 Insurance Services Organization (summary) [Dataset]. https://catalog.data.gov/dataset/1-15-insurance-services-organization-summary-b621c
Explore at:
Dataset updated
Aug 11, 2025
Dataset provided by
City of Tempe
Description
ISO is an independent advisory organization that collects information on a community's building-code adoption and enforcement services in order to provide a ranking for insurance companies. ISO assigns a Building Code Effectiveness Classification from 1 to 10 based on the data collected. Class 1 represents exemplary commitment to building-code enforcement.Municipalities with better rankings are lower risk, and their residents' insurance rates can reflect that. The prospect of minimizing catastrophe-related damage and ultimately lowering insurance costs gives communities an incentive to enforce their building codes rigorously.This page provides data for the Insurance Services Organization (ISO) performance measure. This data includes residential and commercial building code enforcement ratings for the City of Tempe.The performance measure dashboard is available at 1.15 Insurance Services Organization (ISO) RatingAdditional InformationSource: Insurance Service Organization RatingContact: Chris ThompsonContact E-Mail: Christopher_Thompson@tempe.govData Source Type: ExcelPreparation Method: Information added to Excel spreadsheet from rating reportPublish Frequency: Every 5 YearsPublish Method: ManualData Dictionary
d
Lease Inventory Excel Spreadsheet
catalog.data.gov
s.cnmilf.com
Updated May 6, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Public Buildings Service (2025). Lease Inventory Excel Spreadsheet [Dataset]. https://catalog.data.gov/dataset/lease-inventory-excel-spreadsheet
Explore at:
Dataset updated
May 6, 2025
Dataset provided by
Public Buildings Service
Description
GSA, the nation's largest public real estate organization, provides workspace for over one million federal workers. These employees, along with government property, are housed in space owned by the federal government and in leased properties including buildings, land, antenna sites, etc. across the country.
c
Niagara Open Data
catalog.civicdataecosystem.org
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Niagara Open Data [Dataset]. https://catalog.civicdataecosystem.org/dataset/niagara-open-data
Explore at:
Description
The Ontario government, generates and maintains thousands of datasets. Since 2012, we have shared data with Ontarians via a data catalogue. Open data is data that is shared with the public. Click here to learn more about open data and why Ontario releases it. Ontario’s Open Data Directive states that all data must be open, unless there is good reason for it to remain confidential. Ontario’s Chief Digital and Data Officer also has the authority to make certain datasets available publicly. Datasets listed in the catalogue that are not open will have one of the following labels: If you want to use data you find in the catalogue, that data must have a licence – a set of rules that describes how you can use it. A licence: Most of the data available in the catalogue is released under Ontario’s Open Government Licence. However, each dataset may be shared with the public under other kinds of licences or no licence at all. If a dataset doesn’t have a licence, you don’t have the right to use the data. If you have questions about how you can use a specific dataset, please contact us. The Ontario Data Catalogue endeavors to publish open data in a machine readable format. For machine readable datasets, you can simply retrieve the file you need using the file URL. The Ontario Data Catalogue is built on CKAN, which means the catalogue has the following features you can use when building applications. APIs (Application programming interfaces) let software applications communicate directly with each other. If you are using the catalogue in a software application, you might want to extract data from the catalogue through the catalogue API. Note: All Datastore API requests to the Ontario Data Catalogue must be made server-side. The catalogue's collection of dataset metadata (and dataset files) is searchable through the CKAN API. The Ontario Data Catalogue has more than just CKAN's documented search fields. You can also search these custom fields. You can also use the CKAN API to retrieve metadata about a particular dataset and check for updated files. Read the complete documentation for CKAN's API. Some of the open data in the Ontario Data Catalogue is available through the Datastore API. You can also search and access the machine-readable open data that is available in the catalogue. How to use the API feature: Read the complete documentation for CKAN's Datastore API. The Ontario Data Catalogue contains a record for each dataset that the Government of Ontario possesses. Some of these datasets will be available to you as open data. Others will not be available to you. This is because the Government of Ontario is unable to share data that would break the law or put someone's safety at risk. You can search for a dataset with a word that might describe a dataset or topic. Use words like “taxes” or “hospital locations” to discover what datasets the catalogue contains. You can search for a dataset from 3 spots on the catalogue: the homepage, the dataset search page, or the menu bar available across the catalogue. On the dataset search page, you can also filter your search results. You can select filters on the left hand side of the page to limit your search for datasets with your favourite file format, datasets that are updated weekly, datasets released by a particular organization, or datasets that are released under a specific licence. Go to the dataset search page to see the filters that are available to make your search easier. You can also do a quick search by selecting one of the catalogue’s categories on the homepage. These categories can help you see the types of data we have on key topic areas. When you find the dataset you are looking for, click on it to go to the dataset record. Each dataset record will tell you whether the data is available, and, if so, tell you about the data available. An open dataset might contain several data files. These files might represent different periods of time, different sub-sets of the dataset, different regions, language translations, or other breakdowns. You can select a file and either download it or preview it. Make sure to read the licence agreement to make sure you have permission to use it the way you want. Read more about previewing data. A non-open dataset may be not available for many reasons. Read more about non-open data. Read more about restricted data. Data that is non-open may still be subject to freedom of information requests. The catalogue has tools that enable all users to visualize the data in the catalogue without leaving the catalogue – no additional software needed. Have a look at our walk-through of how to make a chart in the catalogue. Get automatic notifications when datasets are updated. You can choose to get notifications for individual datasets, an organization’s datasets or the full catalogue. You don’t have to provide and personal information – just subscribe to our feeds using any feed reader you like using the corresponding notification web addresses. Copy those addresses and paste them into your reader. Your feed reader will let you know when the catalogue has been updated. The catalogue provides open data in several file formats (e.g., spreadsheets, geospatial data, etc). Learn about each format and how you can access and use the data each file contains. A file that has a list of items and values separated by commas without formatting (e.g. colours, italics, etc.) or extra visual features. This format provides just the data that you would display in a table. XLSX (Excel) files may be converted to CSV so they can be opened in a text editor. How to access the data: Open with any spreadsheet software application (e.g., Open Office Calc, Microsoft Excel) or text editor. Note: This format is considered machine-readable, it can be easily processed and used by a computer. Files that have visual formatting (e.g. bolded headers and colour-coded rows) can be hard for machines to understand, these elements make a file more human-readable and less machine-readable. A file that provides information without formatted text or extra visual features that may not follow a pattern of separated values like a CSV. How to access the data: Open with any word processor or text editor available on your device (e.g., Microsoft Word, Notepad). A spreadsheet file that may also include charts, graphs, and formatting. How to access the data: Open with a spreadsheet software application that supports this format (e.g., Open Office Calc, Microsoft Excel). Data can be converted to a CSV for a non-proprietary format of the same data without formatted text or extra visual features. A shapefile provides geographic information that can be used to create a map or perform geospatial analysis based on location, points/lines and other data about the shape and features of the area. It includes required files (.shp, .shx, .dbt) and might include corresponding files (e.g., .prj). How to access the data: Open with a geographic information system (GIS) software program (e.g., QGIS). A package of files and folders. The package can contain any number of different file types. How to access the data: Open with an unzipping software application (e.g., WinZIP, 7Zip). Note: If a ZIP file contains .shp, .shx, and .dbt file types, it is an ArcGIS ZIP: a package of shapefiles which provide information to create maps or perform geospatial analysis that can be opened with ArcGIS (a geographic information system software program). A file that provides information related to a geographic area (e.g., phone number, address, average rainfall, number of owl sightings in 2011 etc.) and its geospatial location (i.e., points/lines). How to access the data: Open using a GIS software application to create a map or do geospatial analysis. It can also be opened with a text editor to view raw information. Note: This format is machine-readable, and it can be easily processed and used by a computer. Human-readable data (including visual formatting) is easy for users to read and understand. A text-based format for sharing data in a machine-readable way that can store data with more unconventional structures such as complex lists. How to access the data: Open with any text editor (e.g., Notepad) or access through a browser. Note: This format is machine-readable, and it can be easily processed and used by a computer. Human-readable data (including visual formatting) is easy for users to read and understand. A text-based format to store and organize data in a machine-readable way that can store data with more unconventional structures (not just data organized in tables). How to access the data: Open with any text editor (e.g., Notepad). Note: This format is machine-readable, and it can be easily processed and used by a computer. Human-readable data (including visual formatting) is easy for users to read and understand. A file that provides information related to an area (e.g., phone number, address, average rainfall, number of owl sightings in 2011 etc.) and its geospatial location (i.e., points/lines). How to access the data: Open with a geospatial software application that supports the KML format (e.g., Google Earth). Note: This format is machine-readable, and it can be easily processed and used by a computer. Human-readable data (including visual formatting) is easy for users to read and understand. This format contains files with data from tables used for statistical analysis and data visualization of Statistics Canada census data. How to access the data: Open with the Beyond 20/20 application. A database which links and combines data from different files or applications (including HTML, XML, Excel, etc.). The database file can be converted to a CSV/TXT to make the data machine-readable, but human-readable formatting will be lost. How to access the data: Open with Microsoft Office Access (a database management system used to develop application software). A file that keeps the original layout and
Historical Coffee Trading Data
kaggle.com
zip
Updated Apr 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
mathjkim (2023). Historical Coffee Trading Data [Dataset]. https://www.kaggle.com/datasets/mathjkim/historical-coffee-trading-data
Explore at:
zip(37263 bytes)Available download formats
Dataset updated
Apr 19, 2023
Authors
mathjkim
Description
Data Collecting

Historical Data on the Global Coffee Trade https://www.ico.org/new_historical.asp

International Coffee Organization(ICO) provides data on the coffee industry over 30 years between 1990-2019. I select 9 .xlsx files: production, domestic consumption, gross opening stocks, exports, imports, re-exports, price to growers, retail price, and consumption recorded for different countries and years. 5 of them --production, domestic consumption, gross opening stocks, exports, and price to growers-- includes data from the exporting countries and 4 of them have data from importing countries.

Data Cleaning

Each excel file should be a feature in the data frame in pandas. Our goal is to combine the tables into one, using 'pd.melt'and 'pd.merge'. At the same time, we deal with missing values and redundant values. Also, messy string data is cleaned.

Structure: assign a proper data type Quality: drop empty rows, unified format, remove unnessarily aggregated rows

Refer to https://github.com/mathjkim/coffee-crew
H
Global Health Observatory (GHO)
data.niaid.nih.gov
dataverse.harvard.edu
Updated May 5, 2011
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2011). Global Health Observatory (GHO) [Dataset]. http://doi.org/10.7910/DVN/JILCZW
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/JILCZW
Dataset updated
May 5, 2011
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Users can find data on a range of global health topics like mortality, the burden of disease, infectious diseases, risk factors and health expenditures. Background The Global Health Observatory (GHO) database is the World Health Organization's main health statistics repository. Data is available for 193 World Health Organization member states on topics including but not limited to: Health related millennium goals, mortality, immunization, nutrition, infectious disease, non- communicable disease, tobacco control, violence, injuries, alcohol, HIV/AIDS, tuberculosis, malaria, water and sanitation, maternal and reproductive health, cho lera, child health, child nutrition, and road safety. User FunctionalityUsers can generate tables and charts according to country or region, health indicator, and time period. Data can also be compared across countries. Data can be filtered, tabulated, charted, and downloaded into Excel statistical software. These data are also published in statistical reports covering topics including: Alcohol and health, Child health, Cholera, HIV/AIDS, Malaria, Maternal and reproductive heal th, Non-communicable diseases, Public health and environment, Road safety, Tuberculosis, Tobacco control. Data Notes Data are derived from surveillance and household surveys. Years in which data were collected is indicated with these health statistics. Information is available for each WHO member country and international region. The most recent data is available from 2009.
r
Analysis of CBCS publications for Open Access, data availability statements...
researchdata.se
figshare.scilifelab.se
+2more
Updated Aug 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Th. Theresa Kieselbach (2023). Analysis of CBCS publications for Open Access, data availability statements and persistent identifiers for supplementary data [Dataset]. http://doi.org/10.17044/SCILIFELAB.23641749
Explore at:
Unique identifier
https://doi.org/10.17044/SCILIFELAB.23641749
Dataset updated
Aug 29, 2023
Dataset provided by
Umeå University
Authors
Th. Theresa Kieselbach
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
General description

This dataset contains some markers of Open Science in the publications of the Chemical Biology Consortium Sweden (CBCS) between 2010 and July 2023. The sample of CBCS publications during this period consists of 188 articles. Every publication was visited manually at its DOI URL to answer the following questions.

Is the research article an Open Access publication?

Does the research article have a Creative Common license or a similar license?

Does the research article contain a data availability statement?

Did the authors submit data of their study to a repository such as EMBL, Genbank, Protein Data Bank PDB, Cambridge Crystallographic Data Centre CCDC, Dryad or a similar repository?

Does the research article contain supplementary data?

Do the supplementary data have a persistent identifier that makes them citable as a defined research output?

Variables

The data were compiled in a Microsoft Excel 365 document that includes the following variables.

DOI URL of research article

Year of publication

Research article published with Open Access

License for research article

Data availability statement in article

Supplementary data added to article

Persistent identifier for supplementary data

Authors submitted data to NCBI or EMBL or PDB or Dryad or CCDC

Visualization

Parts of the data were visualized in two figures as bar diagrams using Microsoft Excel 365. The first figure displays the number of publications during a year, the number of publications that is published with open access and the number of publications that contain a data availability statement (Figure 1). The second figure shows the number of publication sper year and how many publications contain supplementary data. This figure also shows how many of the supplementary datasets have a persistent identifier (Figure 2).

File formats and software

The file formats used in this dataset are:

.csv (Text file) .docx (Microsoft Word 365 file) .jpg (JPEG image file) .pdf/A (Portable Document Format for archiving) .png (Portable Network Graphics image file) .pptx (Microsoft Power Point 365 file) .txt (Text file) .xlsx (Microsoft Excel 365 file)

All files can be opened with Microsoft Office 365 and work likely also with the older versions Office 2019 and 2016.

MD5 checksums

Here is a list of all files of this dataset and of their MD5 checksums.

Readme.txt (MD5: 795f171be340c13d78ba8608dafb3e76)

Manifest.txt (MD5: 46787888019a87bb9d897effdf719b71)

Materials_and_methods.docx (MD5: 0eedaebf5c88982896bd1e0fe57849c2),

Materials_and_methods.pdf (MD5: d314bf2bdff866f827741d7a746f063b),

Materials_and_methods.txt (MD5: 26e7319de89285fc5c1a503d0b01d08a),

CBCS_publications_until_date_2023_07_05.xlsx (MD5: 532fec0bd177844ac0410b98de13ca7c),

CBCS_publications_until_date_2023_07_05.csv (MD5: 2580410623f79959c488fdfefe8b4c7b),

Data_from_CBCS_publications_until_date_2023_07_05_obtained_by_manual_collection.xlsx (MD5: 9c67dd84a6b56a45e1f50a28419930e5),

Data_from_CBCS_publications_until_date_2023_07_05_obtained_by_manual_collection.csv (MD5: fb3ac69476bfc57a8adc734b4d48ea2b),

Aggregated_data_from_CBCS_publications_until_2023_07_05.xlsx (MD5: 6b6cbf3b9617fa8960ff15834869f793),

Aggregated_data_from_CBCS_publications_until_2023_07_05.csv (MD5: b2b8dd36ba86629ed455ae5ad2489d6e),

Figure_1_CBCS_publications_until_2023_07_05_Open_Access_and_data_availablitiy_statement.xlsx (MD5: 9c0422cf1bbd63ac0709324cb128410e),

Figure_1.pptx (MD5: 55a1d12b2a9a81dca4bb7f333002f7fe),

Image_of_figure_1.jpg (MD5: 5179f69297fbbf2eaaf7b641784617d7),

Image_of_figure_1.png (MD5: 8ec94efc07417d69115200529b359698),

Figure_2_CBCS_publications_until_2023_07_05_supplementary_data_and_PID_for_supplementary_data.xlsx (MD5: f5f0d6e4218e390169c7409870227a0a),

Figure_2.pptx (MD5: 0fd4c622dc0474549df88cf37d0e9d72),

Image_of_figure_2.jpg (MD5: c6c68b63b7320597b239316a1c15e00d),

Image_of_figure_2.png (MD5: 24413cc7d292f468bec0ac60cbaa7809)
Z
GAPs Data Repository on Return: Guideline, Data Samples and Codebook
data.niaid.nih.gov
data.europa.eu
Updated Feb 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sahin Mencutek, Zeynep; Yılmaz-Elmas, Fatma (2025). GAPs Data Repository on Return: Guideline, Data Samples and Codebook [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10790794
Explore at:
Dataset updated
Feb 13, 2025
Dataset provided by
Istanbul Ozyegin University
Bonn International Center for Conflict Studies
Authors
Sahin Mencutek, Zeynep; Yılmaz-Elmas, Fatma
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The GAPs Data Repository provides a comprehensive overview of available qualitative and quantitative data on national return regimes, now accessible through an advanced web interface at https://data.returnmigration.eu/.

This updated guideline outlines the complete process, starting from the initial data collection for the return migration data repository to the development of a comprehensive web-based platform. Through iterative development, participatory approaches, and rigorous quality checks, we have ensured a systematic representation of return migration data at both national and comparative levels.

The Repository organizes data into five main categories, covering diverse aspects and offering a holistic view of return regimes: country profiles, legislation, infrastructure, international cooperation, and descriptive statistics. These categories, further divided into subcategories, are based on insights from a literature review, existing datasets, and empirical data collection from 14 countries. The selection of categories prioritizes relevance for understanding return and readmission policies and practices, data accessibility, reliability, clarity, and comparability. Raw data is meticulously collected by the national experts.

The transition to a web-based interface builds upon the Repository’s original structure, which was initially developed using REDCap (Research Electronic Data Capture). It is a secure web application for building and managing online surveys and databases.The REDCAP ensures systematic data entries and store them on Uppsala University’s servers while significantly improving accessibility and usability as well as data security. It also enables users to export any or all data from the Project when granted full data export privileges. Data can be exported in various ways and formats, including Microsoft Excel, SAS, Stata, R, or SPSS for analysis. At this stage, the Data Repository design team also converted tailored records of available data into public reports accessible to anyone with a unique URL, without the need to log in to REDCap or obtain permission to access the GAPs Project Data Repository. Public reports can be used to share information with stakeholders or external partners without granting them access to the Project or requiring them to set up a personal account. Currently, all public report links inserted in this report are also available on the Repository’s webpage, allowing users to export original data.

This report also includes a detailed codebook to help users understand the structure, variables, and methodologies used in data collection and organization. This addition ensures transparency and provides a comprehensive framework for researchers and practitioners to effectively interpret the data.

The GAPs Data Repository is committed to providing accessible, well-organized, and reliable data by moving to a centralized web platform and incorporating advanced visuals. This Repository aims to contribute inputs for research, policy analysis, and evidence-based decision-making in the return and readmission field.

Explore the GAPs Data Repository at https://data.returnmigration.eu/.
Cyclistic
kaggle.com
zip
Updated May 12, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Salam Ibrahim (2022). Cyclistic [Dataset]. https://www.kaggle.com/datasets/salamibrahim/cyclistic
Explore at:
zip(209748131 bytes)Available download formats
Dataset updated
May 12, 2022
Authors
Salam Ibrahim
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
**Introduction ** This case study will be based on Cyclistic, a bike sharing company in Chicago. I will perform tasks of a junior data analyst to answer business questions. I will do this by following a process that includes the following phases: ask, prepare, process, analyze, share and act.

Background Cyclistic is a bike sharing company that operates 5828 bikes within 692 docking stations. The company has been around since 2016 and separates itself from the competition due to the fact that they offer a variety of bike services including assistive options. Lily Moreno is the director of the marketing team and will be the person to receive these insights from this analysis.

Case Study and business task Lily Morenos perspective on how to generate more income by marketing Cyclistics services correctly includes converting casual riders (one day passes and/or pay per ride customers) into annual riders with a membership. Annual riders are more profitable than casual riders according to the finance analysts. She would rather see a campaign targeting casual riders into annual riders, instead of launching campaigns targeting new costumers. So her strategy as the manager of the marketing team is simply to maximize the amount of annual riders by converting casual riders.

In order to make a data driven decision, Moreno needs the following insights: - A better understanding of how casual riders and annual riders differ - Why would a casual rider become an annual one - How digital media can affect the marketing tactics

Moreno has directed me to the first question - how do casual riders and annual riders differ?

Stakeholders Lily Moreno, manager of the marketing team Cyclistic Marketing team Executive team

Data sources and organization Data used in this report is made available and is licensed by Motivate International Inc. Personal data is hidden to protect personal information. Data used is from the past 12 months (01/04/2021 – 31/03/2022) of bike share dataset.

By merging all 12 monthly bike share data provided, an extensive amount of data with 5,400,000 rows were returned and included in this analysis.

Data security and limitations: Personal information is secured and hidden to prevent unlawful use. Original files are backed up in folders and subfolders.

Tools and documentation of cleaning process The tools used for data verification and data cleaning are Microsoft Excel and R programming. The original files made accessible by Motivate International Inc. are backed up in their original format and in separate files.

Microsoft Excel is used to generally look through the dataset and get a overview of the content. I performed simple checks of the data by filtering, sorting, formatting and standardizing the data to make it easily mergeable.. In Excel, I also changed data type to have the right format, removed unnecessary data if its incomplete or incorrect, created new columns to subtract and reformat existing columns and deleting empty cells. These tasks are easily done in spreadsheets and provides an initial cleaning process of the data.

R will be used to perform queries of bigger datasets such as this one. R will also be used to create visualizations to answer the question at hand.

Limitations Microsoft Excel has a limitation of 1,048,576 rows while the data of the 12 months combined are over 5,500,000 rows. When combining the 12 months of data into one table/sheet, Excel is no longer efficient and I switched over to R programming.
c
Ontario Data Catalogue (Ontario Data Catalogue)
catalog.civicdataecosystem.org
Updated Nov 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Ontario Data Catalogue (Ontario Data Catalogue) [Dataset]. https://catalog.civicdataecosystem.org/dataset/ontario-data-catalogue-ontario-data-catalogue
Explore at:
Dataset updated
Nov 24, 2025
Area covered
Ontario
Description
AI Generated Summary: The Ontario Data Catalogue is a data portal providing access to open datasets generated and maintained by the Ontario government. It allows users to search, access, visualize, and download data in various machine-readable formats, often through APIs, while also indicating licensing terms and data update frequencies. The catalogue also provides tools for data visualization and notifications for dataset updates. About: The Ontario government generates and maintains thousands of datasets. Since 2012, we have shared data with Ontarians via a data catalogue. Open data is data that is shared with the public. Click here to learn more about open data and why Ontario releases it. Ontario’s Digital and Data Directive states that all data must be open, unless there is good reason for it to remain confidential. Ontario’s Chief Digital and Data Officer also has the authority to make certain datasets available publicly. Datasets listed in the catalogue that are not open will have one of the following labels: If you want to use data you find in the catalogue, that data must have a licence – a set of rules that describes how you can use it. A licence: Most of the data available in the catalogue is released under Ontario’s Open Government Licence. However, each dataset may be shared with the public under other kinds of licences or no licence at all. If a dataset doesn’t have a licence, you don’t have the right to use the data. If you have questions about how you can use a specific dataset, please contact us. The Ontario Data Catalogue endeavors to publish open data in a machine readable format. For machine readable datasets, you can simply retrieve the file you need using the file URL. The Ontario Data Catalogue is built on CKAN, which means the catalogue has the following features you can use when building applications. APIs (Application programming interfaces) let software applications communicate directly with each other. If you are using the catalogue in a software application, you might want to extract data from the catalogue through the catalogue API. Note: All Datastore API requests to the Ontario Data Catalogue must be made server-side. The catalogue's collection of dataset metadata (and dataset files) is searchable through the CKAN API. The Ontario Data Catalogue has more than just CKAN's documented search fields. You can also search these custom fields. You can also use the CKAN API to retrieve metadata about a particular dataset and check for updated files. Read the complete documentation for CKAN's API. Some of the open data in the Ontario Data Catalogue is available through the Datastore API. You can also search and access the machine-readable open data that is available in the catalogue. How to use the API feature: Read the complete documentation for CKAN's Datastore API. The Ontario Data Catalogue contains a record for each dataset that the Government of Ontario possesses. Some of these datasets will be available to you as open data. Others will not be available to you. This is because the Government of Ontario is unable to share data that would break the law or put someone's safety at risk. You can search for a dataset with a word that might describe a dataset or topic. Use words like “taxes” or “hospital locations” to discover what datasets the catalogue contains. You can search for a dataset from 3 spots on the catalogue: the homepage, the dataset search page, or the menu bar available across the catalogue. On the dataset search page, you can also filter your search results. You can select filters on the left hand side of the page to limit your search for datasets with your favourite file format, datasets that are updated weekly, datasets released by a particular ministry, or datasets that are released under a specific licence. Go to the dataset search page to see the filters that are available to make your search easier. You can also do a quick search by selecting one of the catalogue’s categories on the homepage. These categories can help you see the types of data we have on key topic areas. When you find the dataset you are looking for, click on it to go to the dataset record. Each dataset record will tell you whether the data is available, and, if so, tell you about the data available. An open dataset might contain several data files. These files might represent different periods of time, different sub-sets of the dataset, different regions, language translations, or other breakdowns. You can select a file and either download it or preview it. Make sure to read the licence agreement to make sure you have permission to use it the way you want. A non-open dataset may be not available for many reasons. Read more about non-open data. Read more about restricted data. Data that is non-open may still be subject to freedom of information requests. The catalogue has tools that enable all users to visualize the data in the catalogue without leaving the catalogue – no additional software needed. Get automatic notifications when datasets are updated. You can choose to get notifications for individual datasets, an organization’s datasets or the full catalogue. You don’t have to provide and personal information – just subscribe to our feeds using any feed reader you like using the corresponding notification web addresses. Copy those addresses and paste them into your reader. Your feed reader will let you know when the catalogue has been updated. The catalogue provides open data in several file formats (e.g., spreadsheets, geospatial data, etc). Learn about each format and how you can access and use the data each file contains. A file that has a list of items and values separated by commas without formatting (e.g. colours, italics, etc.) or extra visual features. This format provides just the data that you would display in a table. XLSX (Excel) files may be converted to CSV so they can be opened in a text editor. How to access the data: Open with any spreadsheet software application (e.g., Open Office Calc, Microsoft Excel) or text editor. Note: This format is considered machine-readable, it can be easily processed and used by a computer. Files that have visual formatting (e.g. bolded headers and colour-coded rows) can be hard for machines to understand, these elements make a file more human-readable and less machine-readable. A file that provides information without formatted text or extra visual features that may not follow a pattern of separated values like a CSV. How to access the data: Open with any word processor or text editor available on your device (e.g., Microsoft Word, Notepad). A spreadsheet file that may also include charts, graphs, and formatting. How to access the data: Open with a spreadsheet software application that supports this format (e.g., Open Office Calc, Microsoft Excel). Data can be converted to a CSV for a non-proprietary format of the same data without formatted text or extra visual features. A shapefile provides geographic information that can be used to create a map or perform geospatial analysis based on location, points/lines and other data about the shape and features of the area. It includes required files (.shp, .shx, .dbt) and might include corresponding files (e.g., .prj). How to access the data: Open with a geographic information system (GIS) software program (e.g., QGIS). A package of files and folders. The package can contain any number of different file types. How to access the data: Open with an unzipping software application (e.g., WinZIP, 7Zip). Note: If a ZIP file contains .shp, .shx, and .dbt file types, it is an ArcGIS ZIP: a package of shapefiles which provide information to create maps or perform geospatial analysis that can be opened with ArcGIS (a geographic information system software program). A file that provides information related to a geographic area (e.g., phone number, address, average rainfall, number of owl sightings in 2011 etc.) and its geospatial location (i.e., points/lines). How to access the data: Open using a GIS software application to create a map or do geospatial analysis. It can also be opened with a text editor to view raw information. Note: This format is machine-readable, and it can be easily processed and used by a computer. Human-readable data (including visual formatting) is easy for users to read and understand. A text-based format for sharing data in a machine-readable way that can store data with more unconventional structures such as complex lists. How to access the data: Open with any text editor (e.g., Notepad) or access through a browser. Note: This format is machine-readable, and it can be easily processed and used by a computer. Human-readable data (including visual formatting) is easy for users to read and understand. A text-based format to store and organize data in a machine-readable way that can store data with more unconventional structures (not just data organized in tables). How to access the data: Open with any text editor (e.g., Notepad). Note: This format is machine-readable, and it can be easily processed and used by a computer. Human-readable data (including visual formatting) is easy for users to read and understand. A file that provides information related to an area (e.g., phone number, address, average rainfall, number of owl sightings in 2011 etc.) and its geospatial location (i.e., points/lines). How to access the data: Open with a geospatial software application that supports the KML format (e.g., Google Earth). Note: This format is machine-readable, and it can be easily processed and used by a computer. Human-readable data (including visual formatting) is easy for users to read and understand. This format contains files with data from tables used for statistical analysis and data visualization of Statistics Canada census data. How to access the data: Open with the Beyond 20/20 application. A database which links and combines data from different files or
Excel macros and data example
figshare.com
txt
Updated Aug 2, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eric Clifton (2019). Excel macros and data example [Dataset]. http://doi.org/10.6084/m9.figshare.9235142.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.9235142.v1
Dataset updated
Aug 2, 2019
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Eric Clifton
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Macros for organizing raw data from LabView software that logs timestamps for every rotation of a flight mill arms and then summarizes the time of individual flight bouts.
LCK Spring 2024 Players Statistics
kaggle.com
zip
Updated Dec 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lukas Rozado (2024). LCK Spring 2024 Players Statistics [Dataset]. https://www.kaggle.com/datasets/lukasrozado/lck-spring-2024-players-statistics/code
Explore at:
zip(156203 bytes)Available download formats
Dataset updated
Dec 1, 2024
Authors
Lukas Rozado
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset provides an in-depth look at the League of Legends Champions Korea (LCK) Spring 2024 season. It includes detailed metrics for players, champions, and matches, meticulously cleaned and organized for easy analysis and modeling.

Data Collection

The data was collected using a combination of manual efforts and automated web scraping tools. Specifically:

Source: Data was gathered from Gol.gg, a well-known platform for League of Legends statistics. Automation: Web scraping was performed using Python libraries like BeautifulSoup and Selenium to extract information on players, matches, and champions efficiently. Focus: The scripts were designed to capture relevant performance metrics for each player and champion used during the Spring 2024 split.

Data Cleaning and Processing

The raw data obtained from web scraping required significant preprocessing to ensure its usability. The following steps were taken:

Handling Raw Data:

Extracted key performance indicators like KDA, Win Rate, Games Played, and Match Durations from the source. Normalized inconsistent formats for metrics such as win rates (e.g., removing %) and durations (e.g., converting MM:SS to total seconds).

Data Cleaning:

Removed duplicate rows and ensured no missing values. Fixed inconsistencies in player and champion names to maintain uniformity. Checked for outliers in numerical metrics (e.g., unrealistically high KDA values).

Data Organization:

Created three separate tables for better data management:

Player Statistics: General player performance metrics like KDA, win rates, and average kills. Champion Statistics: Data on games played, win rates, and KDA for each champion. Match List: Details of each match, including players, champions, and results. Added sequential Player IDs to connect the three datasets, facilitating relational analysis. Date Formatting: Converted all date fields to the DD/MM/YYYY format for consistency. Removed irrelevant time data to focus solely on match dates.

Tools and Libraries Used

The following tools were used throughout the project:

Python: Libraries: Pandas, NumPy for data manipulation; BeautifulSoup, Selenium for web scraping. Visualization: Matplotlib, Seaborn, Plotly for potential analysis. Excel: Consolidated final datasets into a structured Excel file with multiple sheets. Data Validation: Used Python scripts to check for missing data, validate numerical columns, and ensure data consistency. Kaggle Integration: Cleaned datasets and a comprehensive README file were prepared for direct upload to Kaggle.

Applications

This dataset is ready for use in: Exploratory Data Analysis (EDA): Visualize player and champion performance trends across matches. Machine Learning: Develop models to predict match outcomes based on player and champion statistics. Sports Analytics: Gain insights into champion picks, win rates, and individual player strategies.

Acknowledgments

This dataset was made possible by the extensive statistics available on Gol.gg and the use of Python-based web scraping and data cleaning methodologies. It is shared under the CC BY 4.0 License to encourage reuse and collaboration.
B
Data Cleaning Sample
borealisdata.ca
dataone.org
Updated Jul 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP3/ZCN177
Dataset updated
Jul 13, 2023
Dataset provided by
Borealis
Authors
Rong Luo
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Sample data for exercises in Further Adventures in Data Cleaning.
Enterprise Survey 2009-2019, Panel Data - Slovenia
microdata.worldbank.org
catalog.ihsn.org
Updated Aug 6, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
World Bank Group (WBG) (2020). Enterprise Survey 2009-2019, Panel Data - Slovenia [Dataset]. https://microdata.worldbank.org/index.php/catalog/3762
Explore at:
Dataset updated
Aug 6, 2020
Dataset provided by
World Bank Grouphttp://www.worldbank.org/
European Investment Bankhttp://eib.org/
European Bank for Reconstruction and Developmenthttp://ebrd.com/
Time period covered
2008 - 2019
Area covered
Slovenia
Description
Abstract

The documentation covers Enterprise Survey panel datasets that were collected in Slovenia in 2009, 2013 and 2019.

The Slovenia ES 2009 was conducted between 2008 and 2009. The Slovenia ES 2013 was conducted between March 2013 and September 2013. Finally, the Slovenia ES 2019 was conducted between December 2018 and November 2019. The objective of the Enterprise Survey is to gain an understanding of what firms experience in the private sector.

As part of its strategic goal of building a climate for investment, job creation, and sustainable growth, the World Bank has promoted improving the business environment as a key strategy for development, which has led to a systematic effort in collecting enterprise data across countries. The Enterprise Surveys (ES) are an ongoing World Bank project in collecting both objective data based on firms' experiences and enterprises' perception of the environment in which they operate.

Geographic coverage

National

Analysis unit

The primary sampling unit of the study is the establishment. An establishment is a physical location where business is carried out and where industrial operations take place or services are provided. A firm may be composed of one or more establishments. For example, a brewery may have several bottling plants and several establishments for distribution. For the purposes of this survey an establishment must take its own financial decisions and have its own financial statements separate from those of the firm. An establishment must also have its own management and control over its payroll.

Universe

As it is standard for the ES, the Slovenia ES was based on the following size stratification: small (5 to 19 employees), medium (20 to 99 employees), and large (100 or more employees).

Kind of data

Sample survey data [ssd]

Sampling procedure

The sample for Slovenia ES 2009, 2013, 2019 were selected using stratified random sampling, following the methodology explained in the Sampling Manual for Slovenia 2009 ES and for Slovenia 2013 ES, and in the Sampling Note for 2019 Slovenia ES.

Three levels of stratification were used in this country: industry, establishment size, and oblast (region). The original sample designs with specific information of the industries and regions chosen are included in the attached Excel file (Sampling Report.xls.) for Slovenia 2009 ES. For Slovenia 2013 and 2019 ES, specific information of the industries and regions chosen is described in the "The Slovenia 2013 Enterprise Surveys Data Set" and "The Slovenia 2019 Enterprise Surveys Data Set" reports respectively, Appendix E.

For the Slovenia 2009 ES, industry stratification was designed in the way that follows: the universe was stratified into manufacturing industries, services industries, and one residual (core) sector as defined in the sampling manual. Each industry had a target of 90 interviews. For the manufacturing industries sample sizes were inflated by about 17% to account for potential non-response cases when requesting sensitive financial data and also because of likely attrition in future surveys that would affect the construction of a panel. For the other industries (residuals) sample sizes were inflated by about 12% to account for under sampling in firms in service industries.

For Slovenia 2013 ES, industry stratification was designed in the way that follows: the universe was stratified into one manufacturing industry, and two service industries (retail, and other services).

Finally, for Slovenia 2019 ES, three levels of stratification were used in this country: industry, establishment size, and region. The original sample design with specific information of the industries and regions chosen is described in "The Slovenia 2019 Enterprise Surveys Data Set" report, Appendix C. Industry stratification was done as follows: Manufacturing – combining all the relevant activities (ISIC Rev. 4.0 codes 10-33), Retail (ISIC 47), and Other Services (ISIC 41-43, 45, 46, 49-53, 55, 56, 58, 61, 62, 79, 95).

For Slovenia 2009 and 2013 ES, size stratification was defined following the standardized definition for the rollout: small (5 to 19 employees), medium (20 to 99 employees), and large (more than 99 employees). For stratification purposes, the number of employees was defined on the basis of reported permanent full-time workers. This seems to be an appropriate definition of the labor force since seasonal/casual/part-time employment is not a common practice, except in the sectors of construction and agriculture.

For Slovenia 2009 ES, regional stratification was defined in 2 regions. These regions are Vzhodna Slovenija and Zahodna Slovenija. The Slovenia sample contains panel data. The wave 1 panel “Investment Climate Private Enterprise Survey implemented in Slovenia” consisted of 223 establishments interviewed in 2005. A total of 57 establishments have been re-interviewed in the 2008 Business Environment and Enterprise Performance Survey.

For Slovenia 2013 ES, regional stratification was defined in 2 regions (city and the surrounding business area) throughout Slovenia.

Finally, for Slovenia 2019 ES, regional stratification was done across two regions: Eastern Slovenia (NUTS code SI03) and Western Slovenia (SI04).

Mode of data collection

Computer Assisted Personal Interview [capi]

Research instrument

Questionnaires have common questions (core module) and respectfully additional manufacturing- and services-specific questions. The eligible manufacturing industries have been surveyed using the Manufacturing questionnaire (includes the core module, plus manufacturing specific questions). Retail firms have been interviewed using the Services questionnaire (includes the core module plus retail specific questions) and the residual eligible services have been covered using the Services questionnaire (includes the core module). Each variation of the questionnaire is identified by the index variable, a0.

Response rate

Survey non-response must be differentiated from item non-response. The former refers to refusals to participate in the survey altogether whereas the latter refers to the refusals to answer some specific questions. Enterprise Surveys suffer from both problems and different strategies were used to address these issues.

Item non-response was addressed by two strategies: a- For sensitive questions that may generate negative reactions from the respondent, such as corruption or tax evasion, enumerators were instructed to collect the refusal to respond as (-8). b- Establishments with incomplete information were re-contacted in order to complete this information, whenever necessary. However, there were clear cases of low response.

For 2009 and 2013 Slovenia ES, the survey non-response was addressed by maximizing efforts to contact establishments that were initially selected for interview. Up to 4 attempts were made to contact the establishment for interview at different times/days of the week before a replacement establishment (with similar strata characteristics) was suggested for interview. Survey non-response did occur but substitutions were made in order to potentially achieve strata-specific goals. Further research is needed on survey non-response in the Enterprise Surveys regarding potential introduction of bias.

For 2009, the number of contacted establishments per realized interview was 6.18. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The relatively low ratio of contacted establishments per realized interview (6.18) suggests that the main source of error in estimates in the Slovenia may be selection bias and not frame inaccuracy.

For 2013, the number of realized interviews per contacted establishment was 25%. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The number of rejections per contact was 44%.

Finally, for 2019, the number of interviews per contacted establishments was 9.7%. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The share of rejections per contact was 75.2%.
r
Analysis of publications of the Swedish Metabolomics Centre for Open Access...
demo.researchdata.se
researchdata.se
+1more
Updated Sep 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Theresa Kieselbach (2025). Analysis of publications of the Swedish Metabolomics Centre for Open Access licenses, data availability statements and access to data [Dataset]. http://doi.org/10.17044/SCILIFELAB.29392007
Explore at:
Unique identifier
https://doi.org/10.17044/SCILIFELAB.29392007
Dataset updated
Sep 19, 2025
Dataset provided by
Umeå University
Authors
Theresa Kieselbach
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Content and data sourceThis dataset contains the results of a manual analysis of Open Science markers in the publications of the Swedish Metabolomics Centre (SMC) between 2016 and 2024. It contains similar variables as the data of the "Analysis of CBCS publications for Open Access, data availability statements and persistent identifiers for supplementary data" (Kieselbach, 2023).

The sample of these publications was fetched from SciLifeLab on 5 May 2025 at the URL: https://publications.scilifelab.se/label/Swedish Metabolomics Centre (SMC)

It contains 285 articles that are the source data for the work to create this dataset. Every publication was manually visited at its DOI URL and checked for 23 variables.

Questions studiedSome of the questions that were addressed in the collection of the data are:

Does the article have an open license and what kind of license does it have?

Does the article contain research data that may have restricted access such as personal data and health data?

Does the article contain a data availability statement?

Does the article contain supplementary material that the authors added to it?

Does the supplementary material contain research data?

Does the supplementary material contain metabolomics data such as, for instance, summaries and visualizations?

Did the authors submit metabolomics data to MetaboLights at the EBI or to other repsoitories?

Did the authors submit other data to other repositories?

Is data available on request from the authors?

Visualization of dataThe data was compiled and visualized using Microsoft Excel 365. The visualization includes one table that gives a general overview of the dataset, and four figures that show some results of the analysis.

Figure 1. Percentage of publications between 2016 and 2024 with an Open Access License and with a data availability statement.

Figure 2. Submissions to repositories between 2016 and 2024.

Figure 3. Percentage of publications that contained supplementary material and if this supplementary material contained research data and metabolomics data.

Figure 4. Repositories used by the authors between 2016 and 2024.

List of variables1. Year of Publication (answer: year)

Date of Publication (answer: date)

DOI (answer: DOI)

DOI URL (answer: DOI URL)

Research article (answer: Yes or No)

Access to article without paywall (answer: Yes or No)

License for research article (answer: Name of the license or No)

Data with restricted access (answer: Yes or No)

Data availability statement in article (answer: Yes or No)

Supplementary material added to article (answer: Yes or No)

Access to supplementary material without paywall (answer: Yes or No)

Supplementary material contains research data (answer: Yes or No)

Supplementary data contains metabolomics data (answer: Yes or No)

Persistent identifier for supplementary data (answer: Yes or No)

Source data added to the article (answer: Yes or No)

Source data contain metabolomics data (answer: Yes or No)

Authors submitted metabolomics data to MetaboLights (answer: Yes or No)

Authors submitted metabolomics data to another repository (answer: name of the repository or No)

Authors submitted other data to a repository (answer: name of the repository or No)

Authors submitted other data to a second repository (answer: name of the repository or No)

Authors submitted other data to a third repository (answer: name of the repository or No)

Authors submitted code to a repository (answer: name of the repository or No)

Data available on request from the authors (answer: Yes or No)

Variables that are available in the source data1. Title of article

Authors

Journal

Year

(Date) Published

(Date) E-published

Volume

Issue

Pages

DOI

PMID

Labels

Qualifiers

IUID

URL

DOI URL of research article

PubMed URL of research article

File formats and softwareThe file formats used in this dataset are:

.csv (Text file)

.jpg (JPEG image file)

.pdf/A (Portable Document Format for archiving)

.txt (Text file)

.xlsx (Microsoft Excel 365 file)

All files can be opened with Microsoft Office 365.

ReferenceKieselbach, Theresa (2023). Analysis of CBCS publications for Open Access, data availability statements and persistent identifiers for supplementary data. Umeå University. Dataset. https://doi.org/10.17044/scilifelab.23641749.v1

AbbreviationsCC BY 4.0: Creative Commons Attribution 4.0 International Public License

CC BY-NC 4.0: Creative Commons Attribution-NonCommercial 4.0 International Public License

CC BY-NC 3.0: Creative Commons Attribution-NonCommercial 3.0 International Public License

CC BY-NC-ND 4.0: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License

DOI: Digital Object Identifier

EBI: European Bioinformatics Institute

EBI-ArrayExpress: The ArrayExpress collection of functional genomics data at the EBI

EBI-ENA: European Nucleotide Archive at the EBI

EBI-Pride: Proteomics Identification Database at the EBI

e!DAL: electronic Data Archive Library at the Leibniz Institute for Plant Genetics and Crop Plant Research

IUID: Item Unique identification

LUDC: Lund University Diabetes Centre

LUDC repository: data repository at the Lund University Diabetes Centre

NCBI: National Center for Biotechnology Information

NCBI-GEO: The Gene Expression Omnibus database repository at the NCBI

NCBI-SRA: The Sequence Read Archive at the NCBI

PMID: Pubmed Identifier

URL: Uniform Resource Locator

MD5 Checksums of the filesManifest.txt (2 KB): 89f32a728fb74ebecef0aef4633130b0

README.txt (6 KB): 34ea4ad9cb9bdea54755fa87f2d0b913

Analysis_SMC_publications_2016_2024_Open_Access_publication_and_access_to_data_status_2025_06_24.csv (46 KB): 9719df26381901bc6aabfd34fdbfab81

Analysis_SMC_publications_2016_2024_Open_Access_publication_and_access_to_data_status_2025_06_24.xlsx (49 KB): 1ec95dc29262645240e7d8714967bcfc

Table_1_Overview_SMC_publications_2016_2024_status_2025_06_11.csv (391 Bytes): 1fd723dc6f52f18251d41c0d343a4f0f

Table_1_Overview_SMC_publications_2016_2024_status_2025_06_11.xlsx (9 KB): 38622a9681c6f1057a6e1a4be56b0285

Figure_1_SMC_publications_2016_2024_open_access_license_and_data_availability_status_2025_06_11.csv (468 Bytes): 9f9156f8d52603ccdec968f626bc002a

Figure_1_SMC_publications_2016_2024_open_access_license_and_data_availability_status_2025_06_11.jpg (119 KB): dc9a4d7de4c789e8aea46ce66e007301

Figure_1_SMC_publications_2016_2024_open_access_license_and_data_availability_status_2025_06_11.xlsx (15 KB): 6527d1ebd0069ef3757bd1b049f0fc74

Figure_2_SMC_publications_2016_2024_metabolomics_data_and_other_data_to_repositories_status_2024_06_12.csv (300 Bytes): 5abc4a0fcf776f8dc4745f41deddacbc

Figure_2_SMC_publications_2016_2024_metabolomics_data_and_other_data_to_repositories_status_2024_06_12.jpg (126 KB): e03e5bf4ba2d942c3b022aebb0a59033

Figure_2_SMC_publications_2016_2024_metabolomics_data_and_other_data_to_repositories_status_2024_06_12.xlsx (15 KB): a80f977c051d4798db221b07733c694b

Figure_3_SMC_publications_2016_2024_overview_supplementary_data_status_2025_06_11.csv (670 Bytes): a694a3defa98aa52fcdec8ff9e9e3316

Figure_3_SMC_publications_2016_2024_overview_supplementary_data_status_2025_06_11.jpg(153 KB): 3928bdc1f046ca9b6f66bdbcdf936ca8

Figure_3_SMC_publications_2016_2024_overview_supplementary_data_status_2025_06_11.xlsx (15 KB): 46dfda56b116b571b4bf8e3674b44512

Figure_4_SMC_publications_2016_2024_submission_of_data_to_repositories_status_2025_06_12.csv (498 Bytes): 8963a412cc9e458ced2e80883bb93e1a

Figure_4_SMC_publications_2016_2024_submission_of_data_to_repositories_status_2025_06_12.jpg (137 KB): c9ba447225e99431f24732128a754b7e

Figure_4_SMC_publications_2016_2024_submission_of_data_to_repositories_status_2025_06_12.xlsx (16 KB): 1e2813d3ccb0ee14991b276947c21b8a

Materials_and_methods_SMC_publications_2016_2024.docx (19 KB): 71776ffc1e530e1b40255763403b2f40

Materials_and_methods_SMC_publications_2016_2024.txt (4 KB): 26c4b91b958b9e33d93d13dc52b25da9

Materials_and_methods_SMC_publications_2026_2024.pdf (172 KB): eee564f452ef4f3cf57bb81a6874fcd4

SMC_publications_2016_2024_status_2025_05_05.csv (143 KB): 5e61d09244ca90b1e5b057a7afdfe5e7

SMC_publications_2016_2024_status_2025_05_05.xlsx (106 KB): 6977fbcac21ff5a12763e40de90c0a91
u
University of Cape Town Student Admissions Data 2006-2014 - South Africa
datafirst.uct.ac.za
Updated Jul 28, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCT Student Administration (2020). University of Cape Town Student Admissions Data 2006-2014 - South Africa [Dataset]. http://www.datafirst.uct.ac.za/Dataportal/index.php/catalog/556
Explore at:
Dataset updated
Jul 28, 2020
Dataset authored and provided by
UCT Student Administration
Time period covered
2006 - 2014
Area covered
South Africa
Description
Abstract

This dataset was generated from a set of Excel spreadsheets from an Information and Communication Technology Services (ICTS) administrative database on student applications to the University of Cape Town (UCT). This database contains information on applications to UCT between the January 2006 and December 2014. In the original form received by DataFirst the data were ill suited to research purposes. This dataset represents an attempt at cleaning and organizing these data into a more tractable format. To ensure data confidentiality direct identifiers have been removed from the data and the data is only made available to accredited researchers through DataFirst's Secure Data Service.

The dataset was separated into the following data files:

Application level information: the "finest" unit of analysis. Individuals may have multiple applications. Uniquely identified by an application ID variable. There are a total of 1,714,669 applications on record.

Individual level information: individuals may have multiple applications. Each individual is uniquely identified by an individual ID variable. Each individual is associated with information on "key subjects" from a separate data file also contained in the database. These key subjects are all separate variables in the individual level data file. There are a total of 285,005 individuals on record.

Secondary Education Information: individuals can also be associated with row entries for each subject. This data file does not have a unique identifier. Instead, each row entry represents a specific secondary school subject for a specific individual. These subjects are quite specific and the data allows the user to distinguish between, for example, higher grade accounting and standard grade accounting. It also allows the user to identify the educational authority issuing the qualification e.g. Cambridge Internal Examinations (CIE) versus National Senior Certificate (NSC).

Tertiary Education Information: the smallest of the four data files. There are multiple entries for each individual in this dataset. Each row entry contains information on the year, institution and transcript information and can be associated with individuals.

Analysis unit

Applications, individuals

Kind of data

Administrative records [adm]

Mode of data collection

Other [oth]

Cleaning operations

The data files were made available to DataFirst as a group of Excel spreadsheet documents from an SQL database managed by the University of Cape Town's Information and Communication Technology Services . The process of combining these original data files to create a research-ready dataset is summarised in a document entitled "Notes on preparing the UCT Student Application Data 2006-2014" accompanying the data.

Facebook

Twitter

Click to copy link

Link copied

Cite

U.S. Geological Survey (2025). Protected Areas Database of the United States (PAD-US) 3.0 Vector Analysis and Summary Statistics [Dataset]. https://catalog.data.gov/dataset/protected-areas-database-of-the-united-states-pad-us-3-0-vector-analysis-and-summary-stati

Protected Areas Database of the United States (PAD-US) 3.0 Vector Analysis and Summary Statistics

Explore at:

Dataset updated

Oct 22, 2025

Dataset provided by

United States Geological Surveyhttp://www.usgs.gov/

Area covered

United States

Description

Spatial analysis and statistical summaries of the Protected Areas Database of the United States (PAD-US) provide land managers and decision makers with a general assessment of management intent for biodiversity protection, natural resource management, and recreation access across the nation. The PAD-US 3.0 Combined Fee, Designation, Easement feature class (with Military Lands and Tribal Areas from the Proclamation and Other Planning Boundaries feature class) was modified to remove overlaps, avoiding overestimation in protected area statistics and to support user needs. A Python scripted process ("PADUS3_0_CreateVectorAnalysisFileScript.zip") associated with this data release prioritized overlapping designations (e.g. Wilderness within a National Forest) based upon their relative biodiversity conservation status (e.g. GAP Status Code 1 over 2), public access values (in the order of Closed, Restricted, Open, Unknown), and geodatabase load order (records are deliberately organized in the PAD-US full inventory with fee owned lands loaded before overlapping management designations, and easements). The Vector Analysis File ("PADUS3_0VectorAnalysisFile_ClipCensus.zip") associated item of PAD-US 3.0 Spatial Analysis and Statistics ( https://doi.org/10.5066/P9KLBB5D ) was clipped to the Census state boundary file to define the extent and serve as a common denominator for statistical summaries. Boundaries of interest to stakeholders (State, Department of the Interior Region, Congressional District, County, EcoRegions I-IV, Urban Areas, Landscape Conservation Cooperative) were incorporated into separate geodatabase feature classes to support various data summaries ("PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.zip") and Comma-separated Value (CSV) tables ("PADUS3_0SummaryStatistics_TabularData_CSV.zip") summarizing "PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.zip" are provided as an alternative format and enable users to explore and download summary statistics of interest (Comma-separated Table [CSV], Microsoft Excel Workbook [.XLSX], Portable Document Format [.PDF] Report) from the PAD-US Lands and Inland Water Statistics Dashboard ( https://www.usgs.gov/programs/gap-analysis-project/science/pad-us-statistics ). In addition, a "flattened" version of the PAD-US 3.0 combined file without other extent boundaries ("PADUS3_0VectorAnalysisFile_ClipCensus.zip") allow for other applications that require a representation of overall protection status without overlapping designation boundaries. The "PADUS3_0VectorAnalysis_State_Clip_CENSUS2020" feature class ("PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.gdb") is the source of the PAD-US 3.0 raster files (associated item of PAD-US 3.0 Spatial Analysis and Statistics, https://doi.org/10.5066/P9KLBB5D ). Note, the PAD-US inventory is now considered functionally complete with the vast majority of land protection types represented in some manner, while work continues to maintain updates and improve data quality (see inventory completeness estimates at: http://www.protectedlands.net/data-stewards/ ). In addition, changes in protected area status between versions of the PAD-US may be attributed to improving the completeness and accuracy of the spatial data more than actual management actions or new acquisitions. USGS provides no legal warranty for the use of this data. While PAD-US is the official aggregation of protected areas ( https://www.fgdc.gov/ngda-reports/NGDA_Datasets.html ), agencies are the best source of their lands data.

Clear search

Close search

Google apps

Main menu

Protected Areas Database of the United States (PAD-US) 3.0 Vector Analysis...

Excel files containing data for Figures

Bank Loan Analysis Project Using Excel

SPORTS_DATA_ANALYSIS_ON_EXCEL

Household Health Survey 2012-2013, Economic Research Forum (ERF)...

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Response rate

1.15 Insurance Services Organization (summary)

Lease Inventory Excel Spreadsheet

Niagara Open Data

Historical Coffee Trading Data

Global Health Observatory (GHO)

Analysis of CBCS publications for Open Access, data availability statements...

GAPs Data Repository on Return: Guideline, Data Samples and Codebook

Cyclistic

Ontario Data Catalogue (Ontario Data Catalogue)

Excel macros and data example

LCK Spring 2024 Players Statistics

Data Collection

Data Cleaning and Processing

Handling Raw Data:

Data Cleaning:

Data Organization:

Tools and Libraries Used

Applications

Acknowledgments

Data Cleaning Sample

Enterprise Survey 2009-2019, Panel Data - Slovenia

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Response rate

Analysis of publications of the Swedish Metabolomics Centre for Open Access...

University of Cape Town Student Admissions Data 2006-2014 - South Africa

Abstract

Analysis unit

Kind of data

Mode of data collection

Cleaning operations

Protected Areas Database of the United States (PAD-US) 3.0 Vector Analysis and Summary StatisticsSee More Versions

Protected Areas Database of the United States (PAD-US) 3.0 Vector Analysis and Summary Statistics