25 datasets found
  1. Data from: Current and projected research data storage needs of Agricultural...

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    • +2more
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Current and projected research data storage needs of Agricultural Research Service researchers in 2016 [Dataset]. https://catalog.data.gov/dataset/current-and-projected-research-data-storage-needs-of-agricultural-research-service-researc-f33da
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel

  2. t

    1.15 Insurance Services Organization (summary)

    • data.tempe.gov
    • covid19.tempe.gov
    • +10more
    Updated Jan 7, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Tempe (2020). 1.15 Insurance Services Organization (summary) [Dataset]. https://data.tempe.gov/datasets/tempegov::1-15-insurance-services-organization-summary
    Explore at:
    Dataset updated
    Jan 7, 2020
    Dataset authored and provided by
    City of Tempe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ISO is an independent advisory organization that collects information on a community's building-code adoption and enforcement services in order to provide a ranking for insurance companies. ISO assigns a Building Code Effectiveness Classification from 1 to 10 based on the data collected. Class 1 represents exemplary commitment to building-code enforcement.Municipalities with better rankings are lower risk, and their residents' insurance rates can reflect that. The prospect of minimizing catastrophe-related damage and ultimately lowering insurance costs gives communities an incentive to enforce their building codes rigorously.This page provides data for the Insurance Services Organization (ISO) performance measure. This data includes residential and commercial building code enforcement ratings for the City of Tempe.The performance measure dashboard is available at 1.15 Insurance Services Organization (ISO) RatingAdditional InformationSource: Insurance Service Organization RatingContact: Chris ThompsonContact E-Mail: Christopher_Thompson@tempe.govData Source Type: ExcelPreparation Method: Information added to Excel spreadsheet from rating reportPublish Frequency: Every 5 YearsPublish Method: ManualData Dictionary

  3. m

    Forest data organizer Python tool

    • mostwiedzy.pl
    zip
    Updated Nov 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Piotr Krajewski (2025). Forest data organizer Python tool [Dataset]. http://doi.org/10.34808/jks7-0274
    Explore at:
    zip(19521)Available download formats
    Dataset updated
    Nov 28, 2025
    Authors
    Piotr Krajewski
    License

    https://cdn.mostwiedzy.pl/34/ce/29/cb/0_202511281401408119671_FME/mit-no-attribution.txthttps://cdn.mostwiedzy.pl/34/ce/29/cb/0_202511281401408119671_FME/mit-no-attribution.txt

    Description

    This dataset contains a Python tool for processing forest inventory data and calculating standard forestry and ecological metrics. The tool reads Excel spreadsheets containing forest inventory data and outputs organized datasets with derived variables. All formulas based on published literature. See code documentation for specific citations. The related dataset https://doi.org/10.34808/h628-m476 provides processed data for immediate use, whereas this dataset provides the processing tool for users with their own inventory data or custom analysis requirements.

  4. Coffee Sales Excel Project

    • kaggle.com
    Updated Nov 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nuha Zahidi (2024). Coffee Sales Excel Project [Dataset]. https://www.kaggle.com/datasets/nuhazahidi/coffee-sales-excel-project
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 13, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Nuha Zahidi
    Description

    Tool: Microsoft Excel

    Dataset: Coffee Sales

    Process: 1. Data Cleaning: • Remove duplicates and blanks. • Standardize date and currency formats.

    1. Data Manipulation: • Sorting and filtering function to work
      with interest subsets of data. • Use XLOOKUP, INDEX-MATCH and IF
      formula for efficient data manipulation, such as retrieving, matching and organising information in spreadsheets

    2. Data Analysis: • Create Pivot Tables and Pivot Charts with the formatting to visualize trends.

    3. Dashboard Development: • Insert Slicers with the formatting for easy filtering and dynamic updates.

    Highlights: This project aims to understand coffee sales trends by country, roast type, and year, which could help identify marketing opportunities and customer segments.

  5. Bank Loan Analysis Project Using Excel

    • kaggle.com
    zip
    Updated Aug 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    raka (2024). Bank Loan Analysis Project Using Excel [Dataset]. https://www.kaggle.com/datasets/connectraka/bank-loan-analysis-project-using-excel
    Explore at:
    zip(41141339 bytes)Available download formats
    Dataset updated
    Aug 16, 2024
    Authors
    raka
    License

    Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
    License information was derived automatically

    Description

    I used Excel to analyze bank loan data, focusing on regional borrowing trends. The project included cleaning and organizing data, creating visualizations, and identifying areas with high loan activity. The findings helped in understanding loan patterns and making better decisions.

  6. g

    1.15 Insurance Services Organization (summary) | gimi9.com

    • gimi9.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    1.15 Insurance Services Organization (summary) | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_1-15-insurance-services-organization-summary-b621c
    Explore at:
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Municipalities with better rankings are lower risk, and their residents' insurance rates can reflect that. The prospect of minimizing catastrophe-related damage and ultimately lowering insurance costs gives communities an incentive to enforce their building codes rigorously.This page provides data for the Insurance Services Organization (ISO) performance measure. This data includes residential and commercial building code enforcement ratings for the City of Tempe.The performance measure dashboard is available at 1.15 Insurance Services Organization (ISO) RatingAdditional InformationSource: Insurance Service Organization RatingContact: Chris ThompsonContact E-Mail: Christopher_Thompson@tempe.govData Source Type: ExcelPreparation Method: Information added to Excel spreadsheet from rating reportPublish Frequency: Every 5 Years

  7. d

    Protected Areas Database of the United States (PAD-US) 3.0 Vector Analysis...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Oct 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Protected Areas Database of the United States (PAD-US) 3.0 Vector Analysis and Summary Statistics [Dataset]. https://catalog.data.gov/dataset/protected-areas-database-of-the-united-states-pad-us-3-0-vector-analysis-and-summary-stati
    Explore at:
    Dataset updated
    Oct 22, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    United States
    Description

    Spatial analysis and statistical summaries of the Protected Areas Database of the United States (PAD-US) provide land managers and decision makers with a general assessment of management intent for biodiversity protection, natural resource management, and recreation access across the nation. The PAD-US 3.0 Combined Fee, Designation, Easement feature class (with Military Lands and Tribal Areas from the Proclamation and Other Planning Boundaries feature class) was modified to remove overlaps, avoiding overestimation in protected area statistics and to support user needs. A Python scripted process ("PADUS3_0_CreateVectorAnalysisFileScript.zip") associated with this data release prioritized overlapping designations (e.g. Wilderness within a National Forest) based upon their relative biodiversity conservation status (e.g. GAP Status Code 1 over 2), public access values (in the order of Closed, Restricted, Open, Unknown), and geodatabase load order (records are deliberately organized in the PAD-US full inventory with fee owned lands loaded before overlapping management designations, and easements). The Vector Analysis File ("PADUS3_0VectorAnalysisFile_ClipCensus.zip") associated item of PAD-US 3.0 Spatial Analysis and Statistics ( https://doi.org/10.5066/P9KLBB5D ) was clipped to the Census state boundary file to define the extent and serve as a common denominator for statistical summaries. Boundaries of interest to stakeholders (State, Department of the Interior Region, Congressional District, County, EcoRegions I-IV, Urban Areas, Landscape Conservation Cooperative) were incorporated into separate geodatabase feature classes to support various data summaries ("PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.zip") and Comma-separated Value (CSV) tables ("PADUS3_0SummaryStatistics_TabularData_CSV.zip") summarizing "PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.zip" are provided as an alternative format and enable users to explore and download summary statistics of interest (Comma-separated Table [CSV], Microsoft Excel Workbook [.XLSX], Portable Document Format [.PDF] Report) from the PAD-US Lands and Inland Water Statistics Dashboard ( https://www.usgs.gov/programs/gap-analysis-project/science/pad-us-statistics ). In addition, a "flattened" version of the PAD-US 3.0 combined file without other extent boundaries ("PADUS3_0VectorAnalysisFile_ClipCensus.zip") allow for other applications that require a representation of overall protection status without overlapping designation boundaries. The "PADUS3_0VectorAnalysis_State_Clip_CENSUS2020" feature class ("PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.gdb") is the source of the PAD-US 3.0 raster files (associated item of PAD-US 3.0 Spatial Analysis and Statistics, https://doi.org/10.5066/P9KLBB5D ). Note, the PAD-US inventory is now considered functionally complete with the vast majority of land protection types represented in some manner, while work continues to maintain updates and improve data quality (see inventory completeness estimates at: http://www.protectedlands.net/data-stewards/ ). In addition, changes in protected area status between versions of the PAD-US may be attributed to improving the completeness and accuracy of the spatial data more than actual management actions or new acquisitions. USGS provides no legal warranty for the use of this data. While PAD-US is the official aggregation of protected areas ( https://www.fgdc.gov/ngda-reports/NGDA_Datasets.html ), agencies are the best source of their lands data.

  8. i

    Household Health Survey 2012-2013, Economic Research Forum (ERF)...

    • datacatalog.ihsn.org
    • catalog.ihsn.org
    Updated Jun 26, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kurdistan Regional Statistics Office (KRSO) (2017). Household Health Survey 2012-2013, Economic Research Forum (ERF) Harmonization Data - Iraq [Dataset]. https://datacatalog.ihsn.org/catalog/6937
    Explore at:
    Dataset updated
    Jun 26, 2017
    Dataset provided by
    Economic Research Forum
    Kurdistan Regional Statistics Office (KRSO)
    Central Statistical Organization (CSO)
    Time period covered
    2012 - 2013
    Area covered
    Iraq
    Description

    Abstract

    The harmonized data set on health, created and published by the ERF, is a subset of Iraq Household Socio Economic Survey (IHSES) 2012. It was derived from the household, individual and health modules, collected in the context of the above mentioned survey. The sample was then used to create a harmonized health survey, comparable with the Iraq Household Socio Economic Survey (IHSES) 2007 micro data set.

    ----> Overview of the Iraq Household Socio Economic Survey (IHSES) 2012:

    Iraq is considered a leader in household expenditure and income surveys where the first was conducted in 1946 followed by surveys in 1954 and 1961. After the establishment of Central Statistical Organization, household expenditure and income surveys were carried out every 3-5 years in (1971/ 1972, 1976, 1979, 1984/ 1985, 1988, 1993, 2002 / 2007). Implementing the cooperation between CSO and WB, Central Statistical Organization (CSO) and Kurdistan Region Statistics Office (KRSO) launched fieldwork on IHSES on 1/1/2012. The survey was carried out over a full year covering all governorates including those in Kurdistan Region.

    The survey has six main objectives. These objectives are:

    1. Provide data for poverty analysis and measurement and monitor, evaluate and update the implementation Poverty Reduction National Strategy issued in 2009.
    2. Provide comprehensive data system to assess household social and economic conditions and prepare the indicators related to the human development.
    3. Provide data that meet the needs and requirements of national accounts.
    4. Provide detailed indicators on consumption expenditure that serve making decision related to production, consumption, export and import.
    5. Provide detailed indicators on the sources of households and individuals income.
    6. Provide data necessary for formulation of a new consumer price index number.

    The raw survey data provided by the Statistical Office were then harmonized by the Economic Research Forum, to create a comparable version with the 2006/2007 Household Socio Economic Survey in Iraq. Harmonization at this stage only included unifying variables' names, labels and some definitions. See: Iraq 2007 & 2012- Variables Mapping & Availability Matrix.pdf provided in the external resources for further information on the mapping of the original variables on the harmonized ones, in addition to more indications on the variables' availability in both survey years and relevant comments.

    Geographic coverage

    National coverage: Covering a sample of urban, rural and metropolitan areas in all the governorates including those in Kurdistan Region.

    Analysis unit

    1- Household/family. 2- Individual/person.

    Universe

    The survey was carried out over a full year covering all governorates including those in Kurdistan Region.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    ----> Design:

    Sample size was (25488) household for the whole Iraq, 216 households for each district of 118 districts, 2832 clusters each of which includes 9 households distributed on districts and governorates for rural and urban.

    ----> Sample frame:

    Listing and numbering results of 2009-2010 Population and Housing Survey were adopted in all the governorates including Kurdistan Region as a frame to select households, the sample was selected in two stages: Stage 1: Primary sampling unit (blocks) within each stratum (district) for urban and rural were systematically selected with probability proportional to size to reach 2832 units (cluster). Stage two: 9 households from each primary sampling unit were selected to create a cluster, thus the sample size of total survey clusters was 25488 households distributed on the governorates, 216 households in each district.

    ----> Sampling Stages:

    In each district, the sample was selected in two stages: Stage 1: based on 2010 listing and numbering frame 24 sample points were selected within each stratum through systematic sampling with probability proportional to size, in addition to the implicit breakdown urban and rural and geographic breakdown (sub-district, quarter, street, county, village and block). Stage 2: Using households as secondary sampling units, 9 households were selected from each sample point using systematic equal probability sampling. Sampling frames of each stages can be developed based on 2010 building listing and numbering without updating household lists. In some small districts, random selection processes of primary sampling may lead to select less than 24 units therefore a sampling unit is selected more than once , the selection may reach two cluster or more from the same enumeration unit when it is necessary.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    ----> Preparation:

    The questionnaire of 2006 survey was adopted in designing the questionnaire of 2012 survey on which many revisions were made. Two rounds of pre-test were carried out. Revision were made based on the feedback of field work team, World Bank consultants and others, other revisions were made before final version was implemented in a pilot survey in September 2011. After the pilot survey implemented, other revisions were made in based on the challenges and feedbacks emerged during the implementation to implement the final version in the actual survey.

    ----> Questionnaire Parts:

    The questionnaire consists of four parts each with several sections: Part 1: Socio – Economic Data: - Section 1: Household Roster - Section 2: Emigration - Section 3: Food Rations - Section 4: housing - Section 5: education - Section 6: health - Section 7: Physical measurements - Section 8: job seeking and previous job

    Part 2: Monthly, Quarterly and Annual Expenditures: - Section 9: Expenditures on Non – Food Commodities and Services (past 30 days). - Section 10 : Expenditures on Non – Food Commodities and Services (past 90 days). - Section 11: Expenditures on Non – Food Commodities and Services (past 12 months). - Section 12: Expenditures on Non-food Frequent Food Stuff and Commodities (7 days). - Section 12, Table 1: Meals Had Within the Residential Unit. - Section 12, table 2: Number of Persons Participate in the Meals within Household Expenditure Other Than its Members.

    Part 3: Income and Other Data: - Section 13: Job - Section 14: paid jobs - Section 15: Agriculture, forestry and fishing - Section 16: Household non – agricultural projects - Section 17: Income from ownership and transfers - Section 18: Durable goods - Section 19: Loans, advances and subsidies - Section 20: Shocks and strategy of dealing in the households - Section 21: Time use - Section 22: Justice - Section 23: Satisfaction in life - Section 24: Food consumption during past 7 days

    Part 4: Diary of Daily Expenditures: Diary of expenditure is an essential component of this survey. It is left at the household to record all the daily purchases such as expenditures on food and frequent non-food items such as gasoline, newspapers…etc. during 7 days. Two pages were allocated for recording the expenditures of each day, thus the roster will be consists of 14 pages.

    Cleaning operations

    ----> Raw Data:

    Data Editing and Processing: To ensure accuracy and consistency, the data were edited at the following stages: 1. Interviewer: Checks all answers on the household questionnaire, confirming that they are clear and correct. 2. Local Supervisor: Checks to make sure that questions has been correctly completed. 3. Statistical analysis: After exporting data files from excel to SPSS, the Statistical Analysis Unit uses program commands to identify irregular or non-logical values in addition to auditing some variables. 4. World Bank consultants in coordination with the CSO data management team: the World Bank technical consultants use additional programs in SPSS and STAT to examine and correct remaining inconsistencies within the data files. The software detects errors by analyzing questionnaire items according to the expected parameter for each variable.

    ----> Harmonized Data:

    • The SPSS package is used to harmonize the Iraq Household Socio Economic Survey (IHSES) 2007 with Iraq Household Socio Economic Survey (IHSES) 2012.
    • The harmonization process starts with raw data files received from the Statistical Office.
    • A program is generated for each dataset to create harmonized variables.
    • Data is saved on the household and individual level, in SPSS and then converted to STATA, to be disseminated.

    Response rate

    Iraq Household Socio Economic Survey (IHSES) reached a total of 25488 households. Number of households refused to response was 305, response rate was 98.6%. The highest interview rates were in Ninevah and Muthanna (100%) while the lowest rates were in Sulaimaniya (92%).

  9. SPORTS_DATA_ANALYSIS_ON_EXCEL

    • kaggle.com
    zip
    Updated Dec 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nil kamal Saha (2024). SPORTS_DATA_ANALYSIS_ON_EXCEL [Dataset]. https://www.kaggle.com/datasets/nilkamalsaha/sports-data-analysis-on-excel
    Explore at:
    zip(1203633 bytes)Available download formats
    Dataset updated
    Dec 12, 2024
    Authors
    Nil kamal Saha
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    PROJECT OBJECTIVE

    We are a part of XYZ Co Pvt Ltd company who is in the business of organizing the sports events at international level. Countries nominate sportsmen from different departments and our team has been given the responsibility to systematize the membership roster and generate different reports as per business requirements.

    Questions (KPIs)

    TASK 1: STANDARDIZING THE DATASET

    • Populate the FULLNAME consisting of the following fields ONLY, in the prescribed format: PREFIX FIRSTNAME LASTNAME.{Note: All UPPERCASE)
    • Get the COUNTRY NAME to which these sportsmen belong to. Make use of LOCATION sheet to get the required data
    • Populate the LANGUAGE_!poken by the sportsmen. Make use of LOCTION sheet to get the required data
    • Generate the EMAIL ADDRESS for those members, who speak English, in the prescribed format :lastname.firstnamel@xyz .org {Note: All lowercase) and for all other members, format should be lastname.firstname@xyz.com (Note: All lowercase)
    • Populate the SPORT LOCATION of the sport played by each player. Make use of SPORT sheet to get the required data

    TASK 2: DATA FORMATING

    • Display MEMBER IDas always 3 digit number {Note: 001,002 ...,D2D,..etc)
    • Format the BIRTHDATE as dd mmm'yyyy (Prescribed format example: 09 May' 1986)
    • Display the units for the WEIGHT column (Prescribed format example: 80 kg)
    • Format the SALARY to show the data In thousands. If SALARY is less than 100,000 then display data with 2 decimal places else display data with one decimal place. In both cases units should be thousands (k) e.g. 87670 -> 87.67 k and 12 250 -> 123.2 k

    TASK 3: SUMMARIZE DATA - PIVOT TABLE (Use SPORTSMEN worksheet after attempting TASK 1) • Create a PIVOT table in the worksheet ANALYSIS, starting at cell B3,with the following details:

    • In COLUMNS; Group : GENDER.
    • In ROWS; Group : COUNTRY (Note: use COUNTRY NAMES).
    • In VALUES; calculate the count of candidates from each COUNTRY and GENDER type, Remove GRAND TOTALs.

    TASK 4: SUMMARIZE DATA - EXCEL FUNCTIONS (Use SPORTSMEN worksheet after attempting TASK 1)

    • Create a SUMMARY table in the worksheet ANALYSIS,starting at cell G4, with the following details:

    • Starting from range RANGE H4; get the distinct GENDER. Use remove duplicates option and transpose the data.
    • Starting from range RANGE GS; get the distinct COUNTRY (Note: use COUNTRY NAMES).
    • In the cross table,get the count of candidates from each COUNTRY and GENDER type.

    TASK 5: GENERATE REPORT - PIVOT TABLE (Use SPORTSMEN worksheet after attempting TASK 1)

    • Create a PIVOT table report in the worksheet REPORT, starting at cell A3, with the following information:

    • Change the report layout to TABULAR form.
    • Remove expand and collapse buttons.
    • Remove GRAND TOTALs.
    • Allow user to filter the data by SPORT LOCATION.

    Process

    • Verify data for any missing values and anomalies, and sort out the same.
    • Made sure data is consistent and clean with respect to data type, data format and values used.
    • Created pivot tables according to the questions asked.
  10. r

    PC-Urban Outdoordataset for 3D Point Cloud semantic segmentation

    • researchdata.edu.au
    Updated 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ajmal Mian; Micheal Wise; Naveed Akhtar; Muhammad Ibrahim; Computer Science and Software Engineering (2021). PC-Urban Outdoordataset for 3D Point Cloud semantic segmentation [Dataset]. http://doi.org/10.21227/FVQD-K603
    Explore at:
    Dataset updated
    2021
    Dataset provided by
    The University of Western Australia
    IEEE DataPort
    Authors
    Ajmal Mian; Micheal Wise; Naveed Akhtar; Muhammad Ibrahim; Computer Science and Software Engineering
    Description

    The proposed dataset, termed PC-Urban (Urban Point Cloud), is captured with an Ouster LiDAR sensor with 64 channels. The sensor is installed on an SUV that drives through the downtown of Perth, Western Australia (WA), Australia. The dataset comprises over 4.3 billion points captured for 66K sensor frames. The labelled data is organized as registered and raw point cloud frames, where the former has a different number of registered consecutive frames. We provide 25 class labels in the dataset covering 23 million points and 5K instances. Labelling is performed with PC-Annotate and can easily be extended by the end-users employing the same tool.The data is organized into unlabelled and labelled 3D point clouds. The unlabelled data is provided in .PCAP file format, which is the direct output format of the used Ouster LiDAR sensor. Raw frames are extracted from the recorded .PCAP files in the form of Ply and Excel files using the Ouster Studio Software. Labelled 3D point cloud data consists of registered or raw point clouds. A labelled point cloud is a combination of Ply, Excel, Labels and Summary files. A point cloud in Ply file contains X, Y, Z values along with color information. An Excel file contains X, Y, Z values, Intensity, Reflectivity, Ring, Noise, and Range of each point. These attributes can be useful in semantic segmentation using deep learning algorithms. The Label and Label Summary files have been explained in the previous section. Our one GB raw data contains nearly 1,300 raw frames, whereas 66,425 frames are provided in the dataset, each comprising 65,536 points. Hence, 4.3 billion points captured with the Ouster LiDAR sensor are provided. Annotation of 25 general outdoor classes is provided, which include car, building, bridge, tree, road, letterbox, traffic signal, light-pole, rubbish bin, cycles, motorcycle, truck, bus, bushes, road sign board, advertising board, road divider, road lane, pedestrians, side-path, wall, bus stop, water, zebra-crossing, and background. With the released data, a total of 143 scenes are annotated which include both raw and registered frames.

  11. u

    University of Cape Town Student Admissions Data 2006-2014 - South Africa

    • datafirst.uct.ac.za
    Updated Jul 28, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCT Student Administration (2020). University of Cape Town Student Admissions Data 2006-2014 - South Africa [Dataset]. http://www.datafirst.uct.ac.za/Dataportal/index.php/catalog/556
    Explore at:
    Dataset updated
    Jul 28, 2020
    Dataset authored and provided by
    UCT Student Administration
    Time period covered
    2006 - 2014
    Area covered
    South Africa
    Description

    Abstract

    This dataset was generated from a set of Excel spreadsheets from an Information and Communication Technology Services (ICTS) administrative database on student applications to the University of Cape Town (UCT). This database contains information on applications to UCT between the January 2006 and December 2014. In the original form received by DataFirst the data were ill suited to research purposes. This dataset represents an attempt at cleaning and organizing these data into a more tractable format. To ensure data confidentiality direct identifiers have been removed from the data and the data is only made available to accredited researchers through DataFirst's Secure Data Service.

    The dataset was separated into the following data files:

    1. Application level information: the "finest" unit of analysis. Individuals may have multiple applications. Uniquely identified by an application ID variable. There are a total of 1,714,669 applications on record.
    2. Individual level information: individuals may have multiple applications. Each individual is uniquely identified by an individual ID variable. Each individual is associated with information on "key subjects" from a separate data file also contained in the database. These key subjects are all separate variables in the individual level data file. There are a total of 285,005 individuals on record.
    3. Secondary Education Information: individuals can also be associated with row entries for each subject. This data file does not have a unique identifier. Instead, each row entry represents a specific secondary school subject for a specific individual. These subjects are quite specific and the data allows the user to distinguish between, for example, higher grade accounting and standard grade accounting. It also allows the user to identify the educational authority issuing the qualification e.g. Cambridge Internal Examinations (CIE) versus National Senior Certificate (NSC).
    4. Tertiary Education Information: the smallest of the four data files. There are multiple entries for each individual in this dataset. Each row entry contains information on the year, institution and transcript information and can be associated with individuals.

    Analysis unit

    Applications, individuals

    Kind of data

    Administrative records [adm]

    Mode of data collection

    Other [oth]

    Cleaning operations

    The data files were made available to DataFirst as a group of Excel spreadsheet documents from an SQL database managed by the University of Cape Town's Information and Communication Technology Services . The process of combining these original data files to create a research-ready dataset is summarised in a document entitled "Notes on preparing the UCT Student Application Data 2006-2014" accompanying the data.

  12. Utah FORGE: Well 16A(78)-32 Perforation Images and Raw Data

    • gdr.openei.org
    • data.openei.org
    • +1more
    archive
    Updated Nov 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Souvivk Saha; Souvivk Saha (2024). Utah FORGE: Well 16A(78)-32 Perforation Images and Raw Data [Dataset]. http://doi.org/10.15121/2478485
    Explore at:
    archiveAvailable download formats
    Dataset updated
    Nov 5, 2024
    Dataset provided by
    United States Department of Energyhttp://energy.gov/
    Office of Energy Efficiency and Renewable Energyhttp://energy.gov/eere
    Geothermal Data Repository
    Energy and Geoscience Institute at the University of Utah
    Authors
    Souvivk Saha; Souvivk Saha
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This archive contains raw data of visual and acoustic mapping of perforations in Utah FORGE well 16A(78)-32 acquired during the August 2024 circulation program. The dataset includes downhole images captured by EV, a downhole visual analytics company, providing visual records of each perforation. Images are organized in two folders: one set with perforation visualization overlays and one without. An included Excel spreadsheet provides the organized raw data.

  13. Z

    GAPs Data Repository on Return: Guideline, Data Samples and Codebook

    • data.niaid.nih.gov
    • data.europa.eu
    Updated Feb 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sahin Mencutek, Zeynep; Yılmaz-Elmas, Fatma (2025). GAPs Data Repository on Return: Guideline, Data Samples and Codebook [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10790794
    Explore at:
    Dataset updated
    Feb 13, 2025
    Dataset provided by
    Bonn International Center for Conflict Studies
    Istanbul Ozyegin University
    Authors
    Sahin Mencutek, Zeynep; Yılmaz-Elmas, Fatma
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The GAPs Data Repository provides a comprehensive overview of available qualitative and quantitative data on national return regimes, now accessible through an advanced web interface at https://data.returnmigration.eu/.

    This updated guideline outlines the complete process, starting from the initial data collection for the return migration data repository to the development of a comprehensive web-based platform. Through iterative development, participatory approaches, and rigorous quality checks, we have ensured a systematic representation of return migration data at both national and comparative levels.

    The Repository organizes data into five main categories, covering diverse aspects and offering a holistic view of return regimes: country profiles, legislation, infrastructure, international cooperation, and descriptive statistics. These categories, further divided into subcategories, are based on insights from a literature review, existing datasets, and empirical data collection from 14 countries. The selection of categories prioritizes relevance for understanding return and readmission policies and practices, data accessibility, reliability, clarity, and comparability. Raw data is meticulously collected by the national experts.

    The transition to a web-based interface builds upon the Repository’s original structure, which was initially developed using REDCap (Research Electronic Data Capture). It is a secure web application for building and managing online surveys and databases.The REDCAP ensures systematic data entries and store them on Uppsala University’s servers while significantly improving accessibility and usability as well as data security. It also enables users to export any or all data from the Project when granted full data export privileges. Data can be exported in various ways and formats, including Microsoft Excel, SAS, Stata, R, or SPSS for analysis. At this stage, the Data Repository design team also converted tailored records of available data into public reports accessible to anyone with a unique URL, without the need to log in to REDCap or obtain permission to access the GAPs Project Data Repository. Public reports can be used to share information with stakeholders or external partners without granting them access to the Project or requiring them to set up a personal account. Currently, all public report links inserted in this report are also available on the Repository’s webpage, allowing users to export original data.

    This report also includes a detailed codebook to help users understand the structure, variables, and methodologies used in data collection and organization. This addition ensures transparency and provides a comprehensive framework for researchers and practitioners to effectively interpret the data.

    The GAPs Data Repository is committed to providing accessible, well-organized, and reliable data by moving to a centralized web platform and incorporating advanced visuals. This Repository aims to contribute inputs for research, policy analysis, and evidence-based decision-making in the return and readmission field.

    Explore the GAPs Data Repository at https://data.returnmigration.eu/.

  14. d

    Data from: Thermally controlled intein splicing of engineered DNA...

    • datadryad.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Jun 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    You Wang; Yuqian Shi; Homme Hellinga; Lorena Beese (2023). Thermally controlled intein splicing of engineered DNA polymerases provides a robust and generalizable solution for accurate and sensitive molecular diagnostics [Dataset]. http://doi.org/10.5061/dryad.5qfttdz9s
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 13, 2023
    Dataset provided by
    Dryad
    Authors
    You Wang; Yuqian Shi; Homme Hellinga; Lorena Beese
    Time period covered
    May 31, 2023
    Description

    The raw data were organized in one Microsoft Excel file, which can be opened using Microsoft Excel. The raw images were organized in one Microsoft Word file, which can be opened using Microsoft Word.

  15. c

    Niagara Open Data

    • catalog.civicdataecosystem.org
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Niagara Open Data [Dataset]. https://catalog.civicdataecosystem.org/dataset/niagara-open-data
    Explore at:
    Description

    The Ontario government, generates and maintains thousands of datasets. Since 2012, we have shared data with Ontarians via a data catalogue. Open data is data that is shared with the public. Click here to learn more about open data and why Ontario releases it. Ontario’s Open Data Directive states that all data must be open, unless there is good reason for it to remain confidential. Ontario’s Chief Digital and Data Officer also has the authority to make certain datasets available publicly. Datasets listed in the catalogue that are not open will have one of the following labels: If you want to use data you find in the catalogue, that data must have a licence – a set of rules that describes how you can use it. A licence: Most of the data available in the catalogue is released under Ontario’s Open Government Licence. However, each dataset may be shared with the public under other kinds of licences or no licence at all. If a dataset doesn’t have a licence, you don’t have the right to use the data. If you have questions about how you can use a specific dataset, please contact us. The Ontario Data Catalogue endeavors to publish open data in a machine readable format. For machine readable datasets, you can simply retrieve the file you need using the file URL. The Ontario Data Catalogue is built on CKAN, which means the catalogue has the following features you can use when building applications. APIs (Application programming interfaces) let software applications communicate directly with each other. If you are using the catalogue in a software application, you might want to extract data from the catalogue through the catalogue API. Note: All Datastore API requests to the Ontario Data Catalogue must be made server-side. The catalogue's collection of dataset metadata (and dataset files) is searchable through the CKAN API. The Ontario Data Catalogue has more than just CKAN's documented search fields. You can also search these custom fields. You can also use the CKAN API to retrieve metadata about a particular dataset and check for updated files. Read the complete documentation for CKAN's API. Some of the open data in the Ontario Data Catalogue is available through the Datastore API. You can also search and access the machine-readable open data that is available in the catalogue. How to use the API feature: Read the complete documentation for CKAN's Datastore API. The Ontario Data Catalogue contains a record for each dataset that the Government of Ontario possesses. Some of these datasets will be available to you as open data. Others will not be available to you. This is because the Government of Ontario is unable to share data that would break the law or put someone's safety at risk. You can search for a dataset with a word that might describe a dataset or topic. Use words like “taxes” or “hospital locations” to discover what datasets the catalogue contains. You can search for a dataset from 3 spots on the catalogue: the homepage, the dataset search page, or the menu bar available across the catalogue. On the dataset search page, you can also filter your search results. You can select filters on the left hand side of the page to limit your search for datasets with your favourite file format, datasets that are updated weekly, datasets released by a particular organization, or datasets that are released under a specific licence. Go to the dataset search page to see the filters that are available to make your search easier. You can also do a quick search by selecting one of the catalogue’s categories on the homepage. These categories can help you see the types of data we have on key topic areas. When you find the dataset you are looking for, click on it to go to the dataset record. Each dataset record will tell you whether the data is available, and, if so, tell you about the data available. An open dataset might contain several data files. These files might represent different periods of time, different sub-sets of the dataset, different regions, language translations, or other breakdowns. You can select a file and either download it or preview it. Make sure to read the licence agreement to make sure you have permission to use it the way you want. Read more about previewing data. A non-open dataset may be not available for many reasons. Read more about non-open data. Read more about restricted data. Data that is non-open may still be subject to freedom of information requests. The catalogue has tools that enable all users to visualize the data in the catalogue without leaving the catalogue – no additional software needed. Have a look at our walk-through of how to make a chart in the catalogue. Get automatic notifications when datasets are updated. You can choose to get notifications for individual datasets, an organization’s datasets or the full catalogue. You don’t have to provide and personal information – just subscribe to our feeds using any feed reader you like using the corresponding notification web addresses. Copy those addresses and paste them into your reader. Your feed reader will let you know when the catalogue has been updated. The catalogue provides open data in several file formats (e.g., spreadsheets, geospatial data, etc). Learn about each format and how you can access and use the data each file contains. A file that has a list of items and values separated by commas without formatting (e.g. colours, italics, etc.) or extra visual features. This format provides just the data that you would display in a table. XLSX (Excel) files may be converted to CSV so they can be opened in a text editor. How to access the data: Open with any spreadsheet software application (e.g., Open Office Calc, Microsoft Excel) or text editor. Note: This format is considered machine-readable, it can be easily processed and used by a computer. Files that have visual formatting (e.g. bolded headers and colour-coded rows) can be hard for machines to understand, these elements make a file more human-readable and less machine-readable. A file that provides information without formatted text or extra visual features that may not follow a pattern of separated values like a CSV. How to access the data: Open with any word processor or text editor available on your device (e.g., Microsoft Word, Notepad). A spreadsheet file that may also include charts, graphs, and formatting. How to access the data: Open with a spreadsheet software application that supports this format (e.g., Open Office Calc, Microsoft Excel). Data can be converted to a CSV for a non-proprietary format of the same data without formatted text or extra visual features. A shapefile provides geographic information that can be used to create a map or perform geospatial analysis based on location, points/lines and other data about the shape and features of the area. It includes required files (.shp, .shx, .dbt) and might include corresponding files (e.g., .prj). How to access the data: Open with a geographic information system (GIS) software program (e.g., QGIS). A package of files and folders. The package can contain any number of different file types. How to access the data: Open with an unzipping software application (e.g., WinZIP, 7Zip). Note: If a ZIP file contains .shp, .shx, and .dbt file types, it is an ArcGIS ZIP: a package of shapefiles which provide information to create maps or perform geospatial analysis that can be opened with ArcGIS (a geographic information system software program). A file that provides information related to a geographic area (e.g., phone number, address, average rainfall, number of owl sightings in 2011 etc.) and its geospatial location (i.e., points/lines). How to access the data: Open using a GIS software application to create a map or do geospatial analysis. It can also be opened with a text editor to view raw information. Note: This format is machine-readable, and it can be easily processed and used by a computer. Human-readable data (including visual formatting) is easy for users to read and understand. A text-based format for sharing data in a machine-readable way that can store data with more unconventional structures such as complex lists. How to access the data: Open with any text editor (e.g., Notepad) or access through a browser. Note: This format is machine-readable, and it can be easily processed and used by a computer. Human-readable data (including visual formatting) is easy for users to read and understand. A text-based format to store and organize data in a machine-readable way that can store data with more unconventional structures (not just data organized in tables). How to access the data: Open with any text editor (e.g., Notepad). Note: This format is machine-readable, and it can be easily processed and used by a computer. Human-readable data (including visual formatting) is easy for users to read and understand. A file that provides information related to an area (e.g., phone number, address, average rainfall, number of owl sightings in 2011 etc.) and its geospatial location (i.e., points/lines). How to access the data: Open with a geospatial software application that supports the KML format (e.g., Google Earth). Note: This format is machine-readable, and it can be easily processed and used by a computer. Human-readable data (including visual formatting) is easy for users to read and understand. This format contains files with data from tables used for statistical analysis and data visualization of Statistics Canada census data. How to access the data: Open with the Beyond 20/20 application. A database which links and combines data from different files or applications (including HTML, XML, Excel, etc.). The database file can be converted to a CSV/TXT to make the data machine-readable, but human-readable formatting will be lost. How to access the data: Open with Microsoft Office Access (a database management system used to develop application software). A file that keeps the original layout and

  16. f

    Table S1: Raw data and exact values of statistical tests. Excel file with...

    • rs.figshare.com
    xlsx
    Updated Feb 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Naoko Toshima; Michael Schleyer (2024). Table S1: Raw data and exact values of statistical tests. Excel file with all underlying data for all experiments of this study. The data of each subfigure are displayed in a separate sheet. Data within each subfigure are organized according to experimental condition (genotype, and where applicable, testing condition). The exact results of all statistical tests are displayed below the data of the respective subfigure. [Dataset]. http://doi.org/10.6084/m9.figshare.25199496.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 14, 2024
    Dataset provided by
    The Royal Society
    Authors
    Naoko Toshima; Michael Schleyer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Learning where to find nutrients while at the same time avoiding toxic food is essential for survival of any animal. Using Drosophila melanogaster larvae as a study case, we investigate the role of gustatory sensory neurons expressing IR76b for associative learning of amino acids, the building blocks of proteins. We found surprising complexity in the neuronal underpinnings of sensing amino acids, and a functional division of sensory neurons. We found that the IR76b receptor is dispensable for amino acid learning, whereas the neurons expressing IR76b are specifically required for the rewarding but not the punishing effect of amino acids. This unexpected dissociation in neuronal processing of amino acids for different behavioural functions provides a study case for functional divisions of labour in gustatory systems.

  17. Digital India Transactions 2023 Dataset

    • kaggle.com
    zip
    Updated Jul 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Farha Kousar (2024). Digital India Transactions 2023 Dataset [Dataset]. https://www.kaggle.com/datasets/farhakouser/bayesian-network-dataset-in-excel
    Explore at:
    zip(3665 bytes)Available download formats
    Dataset updated
    Jul 10, 2024
    Authors
    Farha Kousar
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset contains a well-structured collection of data suitable for building and analyzing Bayesian Networks. The data is organized in an Excel file format, making it easy to manipulate, visualize, and use with various statistical software and programming languages. Node Information: Each column represents a variable (node) in the Bayesian Network. Edge Information: The relationships (edges) between the variables are implicitly defined by the data. Observations: Each row in the dataset corresponds to an observation or a data point, providing the values for each variable. Potential Applications:

    Predictive Modeling: Use the dataset to build predictive models that can estimate the probability of certain outcomes based on observed data. Decision Support: Develop decision support systems that can suggest optimal actions based on probabilistic reasoning. Educational Purposes: Ideal for students and educators to understand and demonstrate the principles of Bayesian Networks. Columns:

    Healthcare: Predict the likelihood of a patient developing a certain condition based on their medical history and symptoms. Finance: Model the probability of credit default based on financial indicators and borrower characteristics. Marketing: Determine the likelihood of a customer purchasing a product based on their browsing and purchasing history. Acknowledgements: This dataset was compiled and organized to support the research and application of Bayesian Networks. We encourage users to explore, analyze, and share their findings.

  18. f

    Excel spreadsheet containing raw data, organized by figure.

    • plos.figshare.com
    • figshare.com
    xlsx
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joel M. Serre; Mark M. Slabodnick; Bob Goldstein; Jeff Hardin (2023). Excel spreadsheet containing raw data, organized by figure. [Dataset]. http://doi.org/10.1371/journal.pgen.1010507.s008
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    PLOS Genetics
    Authors
    Joel M. Serre; Mark M. Slabodnick; Bob Goldstein; Jeff Hardin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Excel spreadsheet containing raw data, organized by figure.

  19. SPROCKET CENTRAL PTY LTD BIKE DATASET

    • kaggle.com
    zip
    Updated Apr 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chikka Masala (2025). SPROCKET CENTRAL PTY LTD BIKE DATASET [Dataset]. https://www.kaggle.com/datasets/chikkamasala/sprocket-central-pty-ltd-bike-dataset
    Explore at:
    zip(1941323 bytes)Available download formats
    Dataset updated
    Apr 5, 2025
    Authors
    Chikka Masala
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Background Information

    Sprocket Central Pty Ltd (fictional company), a medium size bikes & cycling accessories organization based in Australia.

    Primarily, Sprocket Central Pty Ltd needs help with its customer and transactions data. The organization has a large dataset relating to its customers, but their team is unsure how to effectively analyze it to help optimize its marketing strategy.

    The client provided KPMG with 2 datasets:

    • Customer Demographic: customer_id, first_name, last_name, gender, past_3_years_bike_related_purchases, DOB, job_title, job_industry_category, wealth_segment, deceased_indicator, default, owns_car, tenure

    • Transactions data in 2017: transaction_id, product_id, customer_id, transaction_date, online_order, order_status, brand, product_line, product_class, product_size, list_price, standard_cost, product_first_sold_date

    This is a project from my KPMG data analytics virtual internship program where work as a Data Analyst to performed three tasks:

    Data Quality Assessment: Assessment of data quality and completeness in preparation for analysis. Data Insights: Targeting high value customers based on customers demographic and attributes. Data Insights and Presentation: Using visualizations to present insights.

    Data Quality Assessment

    During this task, assessment of data quality and completeness was done on the three (3) datasets and recommendations are provided in preparation for analysis using Excel Spreadsheet.

    Some of the data issues:

    • Quality issues

    • Transaction dataset

    Online order, brand, product line, product class, product size, list price, standard cost, and product first sold date fields contains blank cells. Inconsistent data type for Product first sold date field. Duplicated values in Customer Id field.

    • Customer Demographic dataset

    Last name, DOB, Job title, tenure fields contains blank cells. Incorrect data (U, F, M, Femal) in the Gender field. Incorrect data in the DOB field

    Data Insights

    During this task, the existing 2 datasets (Customer demographic and transactions) was used as a labeled dataset to provide recommendations on which of the 1,000 new customers dataset should be targeted to drive the most value for the organization by following 3 phases:

    Data Exploration Data Model Development Data Interpretation.

    Data Insights and Presentations

    During this task, created a dashboard using Excel Spreadsheet to communicate and convey key findings from the analysis and also give recommendations to the stakeholders of the organization.

  20. Source data for the Prober et al manuscript: Shifting the conservation...

    • researchdata.edu.au
    • data.csiro.au
    datadownload
    Updated Aug 31, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fiona Dickson; Kristen Williams; Linda Broadhurst; Veronica Doerr; Suzanne Prober (2018). Source data for the Prober et al manuscript: Shifting the conservation paradigm - a synthesis of options for renovating nature under climate change [Dataset]. http://doi.org/10.25919/5B8861AA54B8B
    Explore at:
    datadownloadAvailable download formats
    Dataset updated
    Aug 31, 2018
    Dataset provided by
    CSIROhttp://www.csiro.au/
    Authors
    Fiona Dickson; Kristen Williams; Linda Broadhurst; Veronica Doerr; Suzanne Prober
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1992 - Jan 1, 2016
    Area covered
    Description

    This data collection is the source data for the manuscript "Shifting the conservation paradigm - a synthesis of options for renovating nature under climate change" by Suzanne M. Prober, Veronica A. J. Doerr, Linda M. Broadhurst, Kristen J. Williams, Fiona Dickson and published in the peer-reviewed journal "Ecological Monographs" in 2018. The data are provided as an excel spreadsheet with three sheets. The results are provided in "SourceData_Prober_etal_EcolMono" including the cited references (peer-reviewed journal articles). The full citation of each reference is given in Appendix S1 of the manuscript (and on tab 3). Lineage: The data is based on a literature survey of studies that proposed, implemented or tested on-ground options for facilitating persistence or adaptation of species or ecosystems under climate change (hereafter ‘options’). We searched Web of Science (to 31 December 2016) under the topics of ecology, environmental sciences, environmental studies, forestry and biodiversity conservation for publications that included at least one term from each of the following three groups: (1) ecolog*, ecosystem, biodiversity, forest, woodland, rangeland, grassland, shrubland, heathland, rainforest, wetland, mangrove, saltmarsh, shore, tidal, dune, river, stream, freshwater, riparian, desert, dryland, species, nature; (2) adapt*, interven*, restor*, engineer*, revegetation, conserv* and (3) climate change, warming, CO2, aridif*, changing climate. Additional studies known to authors were also included, noting we do not expect to have fully captured specialized literature on some topics (e.g. assisted colonization). Our search resulted in 16,753 publications, which reduced on screening of titles and abstracts to 640, then on full text to 473 papers reviewed and scored for this analysis (Appendix S1). Criteria for inclusion included (1) an emphasis on biodiversity or nature conservation (rather than livelihoods); (2) an interest in active on–ground management response rather than solely climate change impacts, policy or social aspects; but including selection of areas for conservation protection or action; (3) a terrestrial focus (including land-based aquatic but not marine). We systematically scored each of the 473 papers for publication date, ecosystems or organisms studied, geographic regions of study and types of inference used (field observations or experiments, reports of implementation, ecological reasoning, conceptual frameworks, modelling and reviews). ‘Ecological reasoning’ included commentaries or reasoning based on impact assessments, stakeholder workshops and surveys, ecological theory and/or earlier literature. Modelling studies involved simulation of outcomes of applying adaptation interventions, and reviews focused on assessment of evidence from past literature for outcomes of interventions. An individual study could be scored more than once in any classification. Based on this literature we created a typology of intervention options. We derived our typology using an iterative approach. We began by randomly selecting publications from our literature search, extracting information on the interventions they discussed or tested, and trialing organization of the information using insights drawn from overview and synthesis literature. This dual bottom-up and top-down approach allowed us to establish a preliminary, tractable set of ‘intervention option types’ (hereafter option types), and to organize these into preliminary higher level groupings (see results). Whilst recognizing there would be many potential ways to classify the information, we aimed for a simple classification that captured some key concepts relevant to underpinning ecological mechanisms and to practical concerns. We continued to build the set of preliminary option types until adjustments and new categories became infrequent, leading to a preliminary typology after 70 publications. Beginning with this preliminary typology, a single operator then systematically scored intervention options proposed, tested or described in each of the 473 papers, including re-scoring of the initial 70 publications. Refinements were made to the typology where new ideas were encountered, with rescoring of earlier papers if refinements could have affected their scores (e.g. splitting or reconceptualisation of a preliminary category). In total, this process led to 1304 instances of option types scored from the 473 papers.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Agricultural Research Service (2025). Current and projected research data storage needs of Agricultural Research Service researchers in 2016 [Dataset]. https://catalog.data.gov/dataset/current-and-projected-research-data-storage-needs-of-agricultural-research-service-researc-f33da
Organization logo

Data from: Current and projected research data storage needs of Agricultural Research Service researchers in 2016

Related Article
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description

The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel

Search
Clear search
Close search
Google apps
Main menu