96 datasets found
  1. Massive Bank dataset ( 1 Million+ rows)

    • kaggle.com
    zip
    Updated Feb 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    K S ABISHEK (2023). Massive Bank dataset ( 1 Million+ rows) [Dataset]. https://www.kaggle.com/datasets/ksabishek/massive-bank-dataset-1-million-rows
    Explore at:
    zip(32471013 bytes)Available download formats
    Dataset updated
    Feb 21, 2023
    Authors
    K S ABISHEK
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Greetings , fellow analysts !

    (NOTE : This is a random dataset generated using python. It bears no resemblance to any real entity in the corporate world. Any resemblance is a matter of coincidence.)

    REC-SSEC Bank is a govt-aided bank operating in the Indian Peninsula. They have regional branches in over 40+ regions of the country. You have been provided with a massive excel sheet containing the transaction details, the total transaction amount and their location and total transaction count.

    The dataset is described as follows :

    1. Date - The date on which the transaction took place. 2.Domain - Where or which type of Business entity made the transaction. 3.Location - Where the data is collected from 4.Value - Total value of transaction
    2. Count of transaction .

    For example , in the very first row , the data can be read as : " On the first of January, 2022 , 1932 transactions of summing upto INR 365554 from Bhuj were reported " NOTE : There are about 2750 transactions every single day. All of this has been given to you.

    The bank wants you to answer the following questions :

    1. What is the average transaction value everyday for each domain over the year.
    2. What is the average transaction value for every city/location over the year
    3. The bank CEO , Mr: Hariharan , wants to promote the ease of transaction for the highest active domain. If the domains could be sorted into a priority, what would be the priority list ?
    4. What's the average transaction count for each city ?
  2. 18 excel spreadsheets by species and year giving reproduction and growth...

    • catalog.data.gov
    • data.wu.ac.at
    Updated Aug 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2024). 18 excel spreadsheets by species and year giving reproduction and growth data. One excel spreadsheet of herbicide treatment chemistry. [Dataset]. https://catalog.data.gov/dataset/18-excel-spreadsheets-by-species-and-year-giving-reproduction-and-growth-data-one-excel-sp
    Explore at:
    Dataset updated
    Aug 17, 2024
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    Excel spreadsheets by species (4 letter code is abbreviation for genus and species used in study, year 2010 or 2011 is year data collected, SH indicates data for Science Hub, date is date of file preparation). The data in a file are described in a read me file which is the first worksheet in each file. Each row in a species spreadsheet is for one plot (plant). The data themselves are in the data worksheet. One file includes a read me description of the column in the date set for chemical analysis. In this file one row is an herbicide treatment and sample for chemical analysis (if taken). This dataset is associated with the following publication: Olszyk , D., T. Pfleeger, T. Shiroyama, M. Blakely-Smith, E. Lee , and M. Plocher. Plant reproduction is altered by simulated herbicide drift toconstructed plant communities. ENVIRONMENTAL TOXICOLOGY AND CHEMISTRY. Society of Environmental Toxicology and Chemistry, Pensacola, FL, USA, 36(10): 2799-2813, (2017).

  3. Large Truck Crash Causation Study (LTCCS) - File 2 (Excel)

    • data.virginia.gov
    • data.transportation.gov
    • +1more
    xls
    Updated May 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S Department of Transportation (2024). Large Truck Crash Causation Study (LTCCS) - File 2 (Excel) [Dataset]. https://data.virginia.gov/dataset/large-truck-crash-causation-study-ltccs-file-2-excel
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 24, 2024
    Dataset provided by
    Federal Motor Carrier Safety Administrationhttps://www.fmcsa.dot.gov/
    Authors
    U.S Department of Transportation
    Description

    The Large Truck* Crash Causation Study (LTCCS) is based on a three-year data collection project conducted by the Federal Motor Carrier Safety Administration (FMCSA) and the National Highway Traffic Safety Administration (NHTSA) of the U.S. Department of Transportation (DOT). LTCCS is the first-ever national study to attempt to determine the critical events and associated factors that contribute to serious large truck crashes allowing DOT and others to implement effective countermeasures to reduce the occurrence and severity of these crashes.

  4. p

    PTB-XL, a large publicly available electrocardiography dataset

    • physionet.org
    • maplerate.net
    Updated Nov 9, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patrick Wagner; Nils Strodthoff; Ralf-Dieter Bousseljot; Wojciech Samek; Tobias Schaeffter (2022). PTB-XL, a large publicly available electrocardiography dataset [Dataset]. http://doi.org/10.13026/kfzx-aw45
    Explore at:
    Dataset updated
    Nov 9, 2022
    Authors
    Patrick Wagner; Nils Strodthoff; Ralf-Dieter Bousseljot; Wojciech Samek; Tobias Schaeffter
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Electrocardiography (ECG) is a key diagnostic tool to assess the cardiac condition of a patient. Automatic ECG interpretation algorithms as diagnosis support systems promise large reliefs for the medical personnel - only on the basis of the number of ECGs that are routinely taken. However, the development of such algorithms requires large training datasets and clear benchmark procedures. In our opinion, both aspects are not covered satisfactorily by existing freely accessible ECG datasets.

    The PTB-XL ECG dataset is a large dataset of 21799 clinical 12-lead ECGs from 18869 patients of 10 second length. The raw waveform data was annotated by up to two cardiologists, who assigned potentially multiple ECG statements to each record. The in total 71 different ECG statements conform to the SCP-ECG standard and cover diagnostic, form, and rhythm statements. To ensure comparability of machine learning algorithms trained on the dataset, we provide recommended splits into training and test sets. In combination with the extensive annotation, this turns the dataset into a rich resource for the training and the evaluation of automatic ECG interpretation algorithms. The dataset is complemented by extensive metadata on demographics, infarction characteristics, likelihoods for diagnostic ECG statements as well as annotated signal properties.

  5. Bulk data files for all years – releases, disposals, transfers and facility...

    • open.canada.ca
    csv, html
    Updated Jul 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Environment and Climate Change Canada (2025). Bulk data files for all years – releases, disposals, transfers and facility locations [Dataset]. https://open.canada.ca/data/en/dataset/40e01423-7728-429c-ac9d-2954385ccdfb
    Explore at:
    csv, htmlAvailable download formats
    Dataset updated
    Jul 15, 2025
    Dataset provided by
    Environment And Climate Change Canadahttps://www.canada.ca/en/environment-climate-change.html
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Time period covered
    Jan 1, 1993 - Dec 31, 2023
    Description

    The National Pollutant Release Inventory (NPRI) is Canada's public inventory of pollutant releases (to air, water and land), disposals and transfers for recycling. Each file contains data from 1993 to the latest reporting year. These CSV format datasets are in normalized or ‘list’ format and are optimized for pivot table analyses. Here is a description of each file: - The RELEASES file contains all substance release quantities. - The DISPOSALS file contains all on-site and off-site disposal quantities, including tailings and waste rock (TWR). - The TRANSFERS file contains all quantities transferred for recycling or treatment prior to disposal. - The COMMENTS file contains all the comments provided by facilities about substances included in their report. - The GEO LOCATIONS file contains complete geographic information for all facilities that have reported to the NPRI. Please consult the following resources to enhance your analysis: - Guide on using and Interpreting NPRI Data: https://www.canada.ca/en/environment-climate-change/services/national-pollutant-release-inventory/using-interpreting-data.html - Access additional data from the NPRI, including datasets and mapping products: https://www.canada.ca/en/environment-climate-change/services/national-pollutant-release-inventory/tools-resources-data/exploredata.html Supplemental Information More NPRI datasets and mapping products are available here: https://www.canada.ca/en/environment-climate-change/services/national-pollutant-release-inventory/tools-resources-data/access.html Supporting Projects: National Pollutant Release Inventory (NPRI)

  6. FHFA Data: Uniform Appraisal Dataset Aggregate Statistics

    • datalumos.org
    • openicpsr.org
    Updated Feb 18, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Federal Housing Finance Agency (2025). FHFA Data: Uniform Appraisal Dataset Aggregate Statistics [Dataset]. http://doi.org/10.3886/E219961V1
    Explore at:
    Dataset updated
    Feb 18, 2025
    Dataset authored and provided by
    Federal Housing Finance Agencyhttps://www.fhfa.gov/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2013 - 2024
    Area covered
    United States of America
    Description

    The Uniform Appraisal Dataset (UAD) Aggregate Statistics Data File and Dashboards are the nation’s first publicly available datasets of aggregate statistics on appraisal records, giving the public new access to a broad set of data points and trends found in appraisal reports. The UAD Aggregate Statistics for Enterprise Single-Family, Enterprise Condominium, and Federal Housing Administration (FHA) Single-Family appraisals may be grouped by neighborhood characteristics, property characteristics and different geographic levels.DocumentationOverview (10/28/2024)Data Dictionary (10/28/2024)Data File Version History and Suppression Rates (12/18/2024)Dashboard Guide (2/3/2025)UAD Aggregate Statistics DashboardsThe UAD Aggregate Statistics Dashboards are the visual front end of the UAD Aggregate Statistics Data File. The Dashboards are designed to provide easy access to customized maps and charts for all levels of users. Access the UAD Aggregate Statistics Dashboards here.UAD Aggregate Statistics DatasetsNotes:Some of the data files are relatively large in size and will not open correctly in certain software packages, such as Microsoft Excel. All the files can be opened and used in data analytics software such as SAS, Python, or R.All CSV files are zipped.

  7. Enterprise Survey 2009-2019, Panel Data - Slovenia

    • microdata.worldbank.org
    • catalog.ihsn.org
    Updated Aug 6, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Bank Group (WBG) (2020). Enterprise Survey 2009-2019, Panel Data - Slovenia [Dataset]. https://microdata.worldbank.org/index.php/catalog/3762
    Explore at:
    Dataset updated
    Aug 6, 2020
    Dataset provided by
    World Bank Grouphttp://www.worldbank.org/
    European Investment Bankhttp://eib.org/
    European Bank for Reconstruction and Developmenthttp://ebrd.com/
    Time period covered
    2008 - 2019
    Area covered
    Slovenia
    Description

    Abstract

    The documentation covers Enterprise Survey panel datasets that were collected in Slovenia in 2009, 2013 and 2019.

    The Slovenia ES 2009 was conducted between 2008 and 2009. The Slovenia ES 2013 was conducted between March 2013 and September 2013. Finally, the Slovenia ES 2019 was conducted between December 2018 and November 2019. The objective of the Enterprise Survey is to gain an understanding of what firms experience in the private sector.

    As part of its strategic goal of building a climate for investment, job creation, and sustainable growth, the World Bank has promoted improving the business environment as a key strategy for development, which has led to a systematic effort in collecting enterprise data across countries. The Enterprise Surveys (ES) are an ongoing World Bank project in collecting both objective data based on firms' experiences and enterprises' perception of the environment in which they operate.

    Geographic coverage

    National

    Analysis unit

    The primary sampling unit of the study is the establishment. An establishment is a physical location where business is carried out and where industrial operations take place or services are provided. A firm may be composed of one or more establishments. For example, a brewery may have several bottling plants and several establishments for distribution. For the purposes of this survey an establishment must take its own financial decisions and have its own financial statements separate from those of the firm. An establishment must also have its own management and control over its payroll.

    Universe

    As it is standard for the ES, the Slovenia ES was based on the following size stratification: small (5 to 19 employees), medium (20 to 99 employees), and large (100 or more employees).

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The sample for Slovenia ES 2009, 2013, 2019 were selected using stratified random sampling, following the methodology explained in the Sampling Manual for Slovenia 2009 ES and for Slovenia 2013 ES, and in the Sampling Note for 2019 Slovenia ES.

    Three levels of stratification were used in this country: industry, establishment size, and oblast (region). The original sample designs with specific information of the industries and regions chosen are included in the attached Excel file (Sampling Report.xls.) for Slovenia 2009 ES. For Slovenia 2013 and 2019 ES, specific information of the industries and regions chosen is described in the "The Slovenia 2013 Enterprise Surveys Data Set" and "The Slovenia 2019 Enterprise Surveys Data Set" reports respectively, Appendix E.

    For the Slovenia 2009 ES, industry stratification was designed in the way that follows: the universe was stratified into manufacturing industries, services industries, and one residual (core) sector as defined in the sampling manual. Each industry had a target of 90 interviews. For the manufacturing industries sample sizes were inflated by about 17% to account for potential non-response cases when requesting sensitive financial data and also because of likely attrition in future surveys that would affect the construction of a panel. For the other industries (residuals) sample sizes were inflated by about 12% to account for under sampling in firms in service industries.

    For Slovenia 2013 ES, industry stratification was designed in the way that follows: the universe was stratified into one manufacturing industry, and two service industries (retail, and other services).

    Finally, for Slovenia 2019 ES, three levels of stratification were used in this country: industry, establishment size, and region. The original sample design with specific information of the industries and regions chosen is described in "The Slovenia 2019 Enterprise Surveys Data Set" report, Appendix C. Industry stratification was done as follows: Manufacturing – combining all the relevant activities (ISIC Rev. 4.0 codes 10-33), Retail (ISIC 47), and Other Services (ISIC 41-43, 45, 46, 49-53, 55, 56, 58, 61, 62, 79, 95).

    For Slovenia 2009 and 2013 ES, size stratification was defined following the standardized definition for the rollout: small (5 to 19 employees), medium (20 to 99 employees), and large (more than 99 employees). For stratification purposes, the number of employees was defined on the basis of reported permanent full-time workers. This seems to be an appropriate definition of the labor force since seasonal/casual/part-time employment is not a common practice, except in the sectors of construction and agriculture.

    For Slovenia 2009 ES, regional stratification was defined in 2 regions. These regions are Vzhodna Slovenija and Zahodna Slovenija. The Slovenia sample contains panel data. The wave 1 panel “Investment Climate Private Enterprise Survey implemented in Slovenia” consisted of 223 establishments interviewed in 2005. A total of 57 establishments have been re-interviewed in the 2008 Business Environment and Enterprise Performance Survey.

    For Slovenia 2013 ES, regional stratification was defined in 2 regions (city and the surrounding business area) throughout Slovenia.

    Finally, for Slovenia 2019 ES, regional stratification was done across two regions: Eastern Slovenia (NUTS code SI03) and Western Slovenia (SI04).

    Mode of data collection

    Computer Assisted Personal Interview [capi]

    Research instrument

    Questionnaires have common questions (core module) and respectfully additional manufacturing- and services-specific questions. The eligible manufacturing industries have been surveyed using the Manufacturing questionnaire (includes the core module, plus manufacturing specific questions). Retail firms have been interviewed using the Services questionnaire (includes the core module plus retail specific questions) and the residual eligible services have been covered using the Services questionnaire (includes the core module). Each variation of the questionnaire is identified by the index variable, a0.

    Response rate

    Survey non-response must be differentiated from item non-response. The former refers to refusals to participate in the survey altogether whereas the latter refers to the refusals to answer some specific questions. Enterprise Surveys suffer from both problems and different strategies were used to address these issues.

    Item non-response was addressed by two strategies: a- For sensitive questions that may generate negative reactions from the respondent, such as corruption or tax evasion, enumerators were instructed to collect the refusal to respond as (-8). b- Establishments with incomplete information were re-contacted in order to complete this information, whenever necessary. However, there were clear cases of low response.

    For 2009 and 2013 Slovenia ES, the survey non-response was addressed by maximizing efforts to contact establishments that were initially selected for interview. Up to 4 attempts were made to contact the establishment for interview at different times/days of the week before a replacement establishment (with similar strata characteristics) was suggested for interview. Survey non-response did occur but substitutions were made in order to potentially achieve strata-specific goals. Further research is needed on survey non-response in the Enterprise Surveys regarding potential introduction of bias.

    For 2009, the number of contacted establishments per realized interview was 6.18. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The relatively low ratio of contacted establishments per realized interview (6.18) suggests that the main source of error in estimates in the Slovenia may be selection bias and not frame inaccuracy.

    For 2013, the number of realized interviews per contacted establishment was 25%. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The number of rejections per contact was 44%.

    Finally, for 2019, the number of interviews per contacted establishments was 9.7%. This number is the result of two factors: explicit refusals to participate in the survey, as reflected by the rate of rejection (which includes rejections of the screener and the main survey) and the quality of the sample frame, as represented by the presence of ineligible units. The share of rejections per contact was 75.2%.

  8. [Superseded] Intellectual Property Government Open Data 2019

    • data.gov.au
    • researchdata.edu.au
    csv-geo-au, pdf
    Updated Jan 26, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    IP Australia (2022). [Superseded] Intellectual Property Government Open Data 2019 [Dataset]. https://data.gov.au/data/dataset/activity/intellectual-property-government-open-data-2019
    Explore at:
    csv-geo-au(59281977), csv-geo-au(680030), csv-geo-au(39873883), csv-geo-au(37247273), csv-geo-au(25433945), csv-geo-au(92768371), pdf(702054), csv-geo-au(208449), csv-geo-au(166844), csv-geo-au(517357734), csv-geo-au(32100526), csv-geo-au(33981694), csv-geo-au(21315), csv-geo-au(6828919), csv-geo-au(86824299), csv-geo-au(359763), csv-geo-au(567412), csv-geo-au(153175), csv-geo-au(165051861), csv-geo-au(115749297), csv-geo-au(79743393), csv-geo-au(55504675), csv-geo-au(221026), csv-geo-au(50760305), csv-geo-au(2867571), csv-geo-au(212907250), csv-geo-au(4352457), csv-geo-au(4843670), csv-geo-au(1032589), csv-geo-au(1163830), csv-geo-au(278689420), csv-geo-au(28585330), csv-geo-au(130674), csv-geo-au(13968748), csv-geo-au(11926959), csv-geo-au(4802733), csv-geo-au(243729054), csv-geo-au(64511181), csv-geo-au(592774239), csv-geo-au(149948862)Available download formats
    Dataset updated
    Jan 26, 2022
    Dataset authored and provided by
    IP Australiahttp://ipaustralia.gov.au/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    What is IPGOD?

    The Intellectual Property Government Open Data (IPGOD) includes over 100 years of registry data on all intellectual property (IP) rights administered by IP Australia. It also has derived information about the applicants who filed these IP rights, to allow for research and analysis at the regional, business and individual level. This is the 2019 release of IPGOD.

    How do I use IPGOD?

    IPGOD is large, with millions of data points across up to 40 tables, making them too large to open with Microsoft Excel. Furthermore, analysis often requires information from separate tables which would need specialised software for merging. We recommend that advanced users interact with the IPGOD data using the right tools with enough memory and compute power. This includes a wide range of programming and statistical software such as Tableau, Power BI, Stata, SAS, R, Python, and Scalar.

    IP Data Platform

    IP Australia is also providing free trials to a cloud-based analytics platform with the capabilities to enable working with large intellectual property datasets, such as the IPGOD, through the web browser, without any installation of software. IP Data Platform

    References

    The following pages can help you gain the understanding of the intellectual property administration and processes in Australia to help your analysis on the dataset.

    Updates

    Tables and columns

    Due to the changes in our systems, some tables have been affected.

    • We have added IPGOD 225 and IPGOD 325 to the dataset!
    • The IPGOD 206 table is not available this year.
    • Many tables have been re-built, and as a result may have different columns or different possible values. Please check the data dictionary for each table before use.

    Data quality improvements

    Data quality has been improved across all tables.

    • Null values are simply empty rather than '31/12/9999'.
    • All date columns are now in ISO format 'yyyy-mm-dd'.
    • All indicator columns have been converted to Boolean data type (True/False) rather than Yes/No, Y/N, or 1/0.
    • All tables are encoded in UTF-8.
    • All tables use the backslash \ as the escape character.
    • The applicant name cleaning and matching algorithms have been updated. We believe that this year's method improves the accuracy of the matches. Please note that the "ipa_id" generated in IPGOD 2019 will not match with those in previous releases of IPGOD.
  9. c

    Dataset of X-ray Micro Computed Tomography Porosity Measurements from Powder...

    • kilthub.cmu.edu
    txt
    Updated Jun 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Justin Miner; Sneha Prabha Narra (2025). Dataset of X-ray Micro Computed Tomography Porosity Measurements from Powder Bed Fusion - Laser Beam Ti-6Al-4V [Dataset]. http://doi.org/10.1184/R1/26304175.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 25, 2025
    Dataset provided by
    Carnegie Mellon University
    Authors
    Justin Miner; Sneha Prabha Narra
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description Dataset of porosity data in Powder Bed Fusion - Laser Beam of Ti-6Al-4V obtained via X-ray Micro Computed Tomography. This work was conducted on an EOS M290. Contents poredf.csv: A csv file with pore measurements for each sample scanned.

    parameters.csv: A csv file containing the process parameters and extreme value statistics (EVS) parameters for each sample scanned.

    WARNING: parameters.csv is too large to open in excel. Saving it in excel will cause data loss

  10. Dust Count Observations March 1933 - August 1933 in College-Fairbanks, AK,...

    • data.nasa.gov
    Updated Apr 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Dust Count Observations March 1933 - August 1933 in College-Fairbanks, AK, Version 1 - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/dust-count-observations-march-1933-august-1933-in-college-fairbanks-ak-version-1-cc9e9
    Explore at:
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Area covered
    Fairbanks, Alaska
    Description

    These data are daily dust count observations taken in College-Fairbanks, Alaska from 23 March 1933 to 29 August 1933. The data are part of a larger collection titled "Second International Polar Year Records, 1931-1936, Department of Terrestrial Magnetism, Carnegie Institute of Washington." Within this larger collection, the data are identified as "Series 1: College-Fairbanks IPY Station Records and Data, 1932-1934: Subseries C: Auroral and Meteorological Records and Data, 1932-1933: Dust Count Observations, March 1933 - August 1933."The data are provided in a PDF copy of the handwritten entries (Dust_Count_Observations_March1933_to_August1933.pdf). Two supporting files are also included in this data set. The first is a copy of the handwritten data transcribed to a Microsoft Excel spreadsheet (Dust_Count_Observations_March1933_to_August1933.xls). The second is a PDF document that explains the larger collection (DTM_Collection_Description.pdf).The entries were recorded using an Aitken Dust Counter. Each entry includes up to 10 counts per day with measurements of wind, clouds, and visibility. The handwritten copy has the most complete data, as some of the handwritten notes were not transcribed into the computer spreadsheet. For example, handwritten notes concerning problems with the counter itself were not transcribed into the computer spreadsheet.The data are available via FTP. NOAA@NSIDC believes these data to be of value but is unable to provide documentation. If you have information about this data set that others would find useful, please contact NSIDC User Services.

  11. g

    IP Australia - [Superseded] Intellectual Property Government Open Data 2019...

    • gimi9.com
    Updated Jul 20, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). IP Australia - [Superseded] Intellectual Property Government Open Data 2019 | gimi9.com [Dataset]. https://gimi9.com/dataset/au_intellectual-property-government-open-data-2019
    Explore at:
    Dataset updated
    Jul 20, 2018
    Area covered
    Australia
    Description

    What is IPGOD? The Intellectual Property Government Open Data (IPGOD) includes over 100 years of registry data on all intellectual property (IP) rights administered by IP Australia. It also has derived information about the applicants who filed these IP rights, to allow for research and analysis at the regional, business and individual level. This is the 2019 release of IPGOD. # How do I use IPGOD? IPGOD is large, with millions of data points across up to 40 tables, making them too large to open with Microsoft Excel. Furthermore, analysis often requires information from separate tables which would need specialised software for merging. We recommend that advanced users interact with the IPGOD data using the right tools with enough memory and compute power. This includes a wide range of programming and statistical software such as Tableau, Power BI, Stata, SAS, R, Python, and Scalar. # IP Data Platform IP Australia is also providing free trials to a cloud-based analytics platform with the capabilities to enable working with large intellectual property datasets, such as the IPGOD, through the web browser, without any installation of software. IP Data Platform # References The following pages can help you gain the understanding of the intellectual property administration and processes in Australia to help your analysis on the dataset. * Patents * Trade Marks * Designs * Plant Breeder’s Rights # Updates ### Tables and columns Due to the changes in our systems, some tables have been affected. * We have added IPGOD 225 and IPGOD 325 to the dataset! * The IPGOD 206 table is not available this year. * Many tables have been re-built, and as a result may have different columns or different possible values. Please check the data dictionary for each table before use. ### Data quality improvements Data quality has been improved across all tables. * Null values are simply empty rather than '31/12/9999'. * All date columns are now in ISO format 'yyyy-mm-dd'. * All indicator columns have been converted to Boolean data type (True/False) rather than Yes/No, Y/N, or 1/0. * All tables are encoded in UTF-8. * All tables use the backslash \ as the escape character. * The applicant name cleaning and matching algorithms have been updated. We believe that this year's method improves the accuracy of the matches. Please note that the "ipa_id" generated in IPGOD 2019 will not match with those in previous releases of IPGOD.

  12. C

    City of Chicago Data prtal

    • data.cityofchicago.org
    csv, xlsx, xml
    Updated Dec 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Chicago (2025). City of Chicago Data prtal [Dataset]. https://data.cityofchicago.org/widgets/qd2y-e669
    Explore at:
    xml, csv, xlsxAvailable download formats
    Dataset updated
    Dec 2, 2025
    Authors
    City of Chicago
    Area covered
    Chicago
    Description

    This dataset contains all current and active business licenses issued by the Department of Business Affairs and Consumer Protection. This dataset contains a large number of records /rows of data and may not be viewed in full in Microsoft Excel. Therefore, when downloading the file, select CSV from the Export menu. Open the file in an ASCII text editor, such as Notepad or Wordpad, to view and search.

    Data fields requiring description are detailed below.

    APPLICATION TYPE: 'ISSUE' is the record associated with the initial license application. 'RENEW' is a subsequent renewal record. All renewal records are created with a term start date and term expiration date. 'C_LOC' is a change of location record. It means the business moved. 'C_CAPA' is a change of capacity record. Only a few license types my file this type of application. 'C_EXPA' only applies to businesses that have liquor licenses. It means the business location expanded.

    LICENSE STATUS: 'AAI' means the license was issued.

    Business license owners may be accessed at: http://data.cityofchicago.org/Community-Economic-Development/Business-Owners/ezma-pppn To identify the owner of a business, you will need the account number or legal name.

    Data Owner: Business Affairs and Consumer Protection

    Time Period: Current

    Frequency: Data is updated daily

  13. Individuals and Households Program - Valid Registrations

    • catalog.data.gov
    Updated Jun 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FEMA/Response and Recovery/Recovery Directorate (2025). Individuals and Households Program - Valid Registrations [Dataset]. https://catalog.data.gov/dataset/individuals-and-households-program-valid-registrations-nemis
    Explore at:
    Dataset updated
    Jun 7, 2025
    Dataset provided by
    Federal Emergency Management Agencyhttp://www.fema.gov/
    Description

    This dataset contains FEMA applicant-level data for the Individuals and Households Program (IHP). All PII information has been removed. The location is represented by county, city, and zip code. This dataset contains Individual Assistance (IA) applications from DR1439 (declared in 2002) to those declared over 30 days ago. The full data set is refreshed on an annual basis and refreshed weekly to update disasters declared in the last 18 months. This dataset includes all major disasters and includes only valid registrants (applied in a declared county, within the registration period, having damage due to the incident and damage within the incident period). Information about individual data elements and descriptions are listed in the metadata information within the dataset.rnValid registrants may be eligible for IA assistance, which is intended to meet basic needs and supplement disaster recovery efforts. IA assistance is not intended to return disaster-damaged property to its pre-disaster condition. Disaster damage to secondary or vacation homes does not qualify for IHP assistance.rnData comes from FEMA's National Emergency Management Information System (NEMIS) with raw, unedited, self-reported content and subject to a small percentage of human error.rnAny financial information is derived from NEMIS and not FEMA's official financial systems. Due to differences in reporting periods, status of obligations and application of business rules, this financial information may differ slightly from official publication on public websites such as usaspending.gov. This dataset is not intended to be used for any official federal reporting. rnCitation: The Agency’s preferred citation for datasets (API usage or file downloads) can be found on the OpenFEMA Terms and Conditions page, Citing Data section: https://www.fema.gov/about/openfema/terms-conditions.rnDue to the size of this file, tools other than a spreadsheet may be required to analyze, visualize, and manipulate the data. MS Excel will not be able to process files this large without data loss. It is recommended that a database (e.g., MS Access, MySQL, PostgreSQL, etc.) be used to store and manipulate data. Other programming tools such as R, Apache Spark, and Python can also be used to analyze and visualize data. Further, basic Linux/Unix tools can be used to manipulate, search, and modify large files.rnIf you have media inquiries about this dataset, please email the FEMA News Desk at FEMA-News-Desk@fema.dhs.gov or call (202) 646-3272. For inquiries about FEMA's data and Open Government program, please email the OpenFEMA team at OpenFEMA@fema.dhs.gov.rnThis dataset is scheduled to be superceded by Valid Registrations Version 2 by early CY 2024.

  14. New 1000 Sales Records Data 2

    • kaggle.com
    zip
    Updated Jan 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Calvin Oko Mensah (2023). New 1000 Sales Records Data 2 [Dataset]. https://www.kaggle.com/datasets/calvinokomensah/new-1000-sales-records-data-2
    Explore at:
    zip(49305 bytes)Available download formats
    Dataset updated
    Jan 12, 2023
    Authors
    Calvin Oko Mensah
    Description

    This is a dataset downloaded off excelbianalytics.com created off of random VBA logic. I recently performed an extensive exploratory data analysis on it and I included new columns to it, namely: Unit margin, Order year, Order month, Order weekday and Order_Ship_Days which I think can help with analysis on the data. I shared it because I thought it was a great dataset to practice analytical processes on for newbies like myself.

  15. C

    Data from: Manufacturing Establishments

    • data.cityofchicago.org
    Updated Dec 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Chicago (2025). Manufacturing Establishments [Dataset]. https://data.cityofchicago.org/Community-Economic-Development/Manufacturing-Establishments/es3k-j9sz
    Explore at:
    kmz, application/geo+json, xlsx, xml, kml, csvAvailable download formats
    Dataset updated
    Dec 2, 2025
    Authors
    City of Chicago
    Description

    This dataset contains all current and active business licenses issued by the Department of Business Affairs and Consumer Protection. This dataset contains a large number of records /rows of data and may not be viewed in full in Microsoft Excel. Therefore, when downloading the file, select CSV from the Export menu. Open the file in an ASCII text editor, such as Notepad or Wordpad, to view and search.

    Data fields requiring description are detailed below.

    APPLICATION TYPE: 'ISSUE' is the record associated with the initial license application. 'RENEW' is a subsequent renewal record. All renewal records are created with a term start date and term expiration date. 'C_LOC' is a change of location record. It means the business moved. 'C_CAPA' is a change of capacity record. Only a few license types my file this type of application. 'C_EXPA' only applies to businesses that have liquor licenses. It means the business location expanded.

    LICENSE STATUS: 'AAI' means the license was issued.

    Business license owners may be accessed at: http://data.cityofchicago.org/Community-Economic-Development/Business-Owners/ezma-pppn To identify the owner of a business, you will need the account number or legal name.

    Data Owner: Business Affairs and Consumer Protection

    Time Period: Current

    Frequency: Data is updated daily

  16. P

    BaiZhi open job big data information

    • opendata.pku.edu.cn
    docx, xls
    Updated Dec 5, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peking University Open Research Data Platform (2017). BaiZhi open job big data information [Dataset]. http://doi.org/10.18170/DVN/OIMPNJ
    Explore at:
    docx(17299), xls(4234752)Available download formats
    Dataset updated
    Dec 5, 2017
    Dataset provided by
    Peking University Open Research Data Platform
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Data Description: The nationwide public job recruitment data, including job title, job experience requirements, academic requirements, industry, job type, the nature of the company and other fields; Time range: 2017-01-01 to 2017-10-31; Data volume: 40,000 (randomly selected in the time range), sources include major recruitment sites, corporate website, job BBS; Data Format: excel.

  17. u

    Electrification of Heat Demonstration Project, 2020-2023

    • datacatalogue.ukdataservice.ac.uk
    Updated Dec 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Energy Systems Catapult (2024). Electrification of Heat Demonstration Project, 2020-2023 [Dataset]. http://doi.org/10.5255/UKDA-SN-9050-2
    Explore at:
    Dataset updated
    Dec 19, 2024
    Dataset provided by
    UK Data Servicehttps://ukdataservice.ac.uk/
    Authors
    Energy Systems Catapult
    Area covered
    United Kingdom
    Description

    The heat pump monitoring datasets are a key output of the Electrification of Heat Demonstration (EoH) project, a government-funded heat pump trial assessing the feasibility of heat pumps across the UK’s diverse housing stock. These datasets are provided in both cleansed and raw form and allow analysis of the initial performance of the heat pumps installed in the trial. From the datasets, insights such as heat pump seasonal performance factor (a measure of the heat pump's efficiency), heat pump performance during the coldest day of the year, and half-hourly performance to inform peak demand can be gleaned.

    For the second edition (December 2024), the data were updated to include cleaned performance data collected between November 2020 and September 2023. The only documentation currently available with the study is the Excel data dictionary. Reports and other contextual information can be found on the Energy Systems Catapult website.

    The EoH project was funded by the Department of Business, Energy and Industrial Strategy. From 2023, it is covered by the new Department for Energy Security and Net Zero.

    Data availability

    This study comprises the open-access cleansed data from the EoH project and a summary dataset, available in four zipped files (see the 'Access Data' tab). Users must download all four zip files to obtain the full set of cleansed data and accompanying documentation.

    When unzipped, the full cleansed data comprises 742 CSV files. Most of the individual CSV files are too large to open in Excel. Users should ensure they have sufficient computing facilities to analyse the data.

    The UKDS also holds an accompanying study, SN 9049 Electrification of Heat Demonstration Project: Heat Pump Performance Raw Data, 2020-2023, which is available only to registered UKDS users. This contains the raw data from the EoH project. Since the data are very large, only the summary dataset is available to download; an order must be placed for FTP delivery of the remaining raw data. Other studies in the set include SN 9209, which comprises 30-minute interval heat pump performance data, and SN 9210, which includes daily heat pump performance data.

    The Python code used to cleanse the raw data and then perform the analysis is accessible via the "https://github.com/ES-Catapult/electrification_of_heat" target="_blank"> Energy Systems Catapult Github

  18. d

    LADOT Parking Meter Occupancy - Archive

    • catalog.data.gov
    • data.lacity.org
    • +1more
    Updated Nov 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.lacity.org (2025). LADOT Parking Meter Occupancy - Archive [Dataset]. https://catalog.data.gov/dataset/ladot-parking-meter-occupancy-archive
    Explore at:
    Dataset updated
    Nov 1, 2025
    Dataset provided by
    data.lacity.org
    Description

    Monthly archive of all parking meter sensor activity over the previous 36 months (3 years). Updated monthly for data 2 months prior (eg. January data will be published early March). For best-available current "live" status, see "LADOT Parking Meter Occupancy". For location and parking policy details, see "LADOT Metered Parking Inventory & Policies". This dataset is geared towards database professionals and/or app developers. Each file is extremely large, over 300MB at minimum. Common applications like Microsoft Excel will not be able to open the file and show all data. ** For best results, import into a database or use advanced data access methods appropriate for processing large files.

  19. C

    Liquor Retail

    • data.cityofchicago.org
    Updated Dec 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Chicago (2025). Liquor Retail [Dataset]. https://data.cityofchicago.org/Community-Economic-Development/Liquor-Retail/4py5-yxxu
    Explore at:
    xlsx, kml, kmz, application/geo+json, csv, xmlAvailable download formats
    Dataset updated
    Dec 2, 2025
    Authors
    City of Chicago
    Description

    Business licenses issued by the Department of Business Affairs and Consumer Protection in the City of Chicago from 2002 to the present. This dataset contains a large number of records/rows of data and may not be viewed in full in Microsoft Excel. Therefore, when downloading the file, select CSV from the Export menu. Open the file in an ASCII text editor, such as Notepad or Wordpad, to view and search.

    Data fields requiring description are detailed below.

    APPLICATION TYPE: ‘ISSUE’ is the record associated with the initial license application. ‘RENEW’ is a subsequent renewal record. All renewal records are created with a term start date and term expiration date. ‘C_LOC’ is a change of location record. It means the business moved. ‘C_CAPA’ is a change of capacity record. Only a few license types may file this type of application. ‘C_EXPA’ only applies to businesses that have liquor licenses. It means the business location expanded. 'C_SBA' is a change of business activity record. It means that a new business activity was added or an existing business activity was marked as expired.

    LICENSE STATUS: ‘AAI’ means the license was issued. ‘AAC’ means the license was cancelled during its term. ‘REV’ means the license was revoked. 'REA' means the license revocation has been appealed.

    LICENSE STATUS CHANGE DATE: This date corresponds to the date a license was cancelled (AAC), revoked (REV) or appealed (REA).

    Business License Owner information may be accessed at: https://data.cityofchicago.org/dataset/Business-Owners/ezma-pppn. To identify the owner of a business, you will need the account number or legal name, which may be obtained from this Business Licenses dataset.

    Data Owner: Business Affairs and Consumer Protection. Time Period: January 1, 2002 to present. Frequency: Data is updated daily.

  20. Z

    Towards a systematic approach to manual annotation of code smells - C#...

    • data.niaid.nih.gov
    Updated May 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nikola Luburić; Simona Prokić; Katarina-Glorija Grujić; Jelena Slivka; Aleksandar Kovačević; Goran Sladić; Dragan Vidaković (2022). Towards a systematic approach to manual annotation of code smells - C# Dataset of Long Method and Large Class code smells [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6520055
    Explore at:
    Dataset updated
    May 5, 2022
    Dataset provided by
    University of Novi Sad, Faculty of Technical Sciences
    Authors
    Nikola Luburić; Simona Prokić; Katarina-Glorija Grujić; Jelena Slivka; Aleksandar Kovačević; Goran Sladić; Dragan Vidaković
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset includes open-source projects written in C# programing language, annotated for the presence of Long Method and God Class code smells. Each instance was manually annotated by at least two annotators. We explain our motivation and methodology for creating this dataset in our preprint:

    Luburić, N., Prokić, S., Grujić, K.G., Slivka, J., Kovačević, A., Sladić, G. and Vidaković, D., 2021. Towards a systematic approach to manual annotation of code smells.

    The dataset contains two excel datasheets:

    DataSet_Large Class.xlsx – C# classes annotated for the Large Class code smell severity.

    DataSet_Long Method.xlsx – C# methods annotated for the Long method code smell severity.

    The columns in the datasheet represent:

    Code Snippet ID – the full name of the code snippet.

    For classes, this is the package/namespace name followed by the class name. The full name of inner classes also contains the names of any outer classes (e.g., namespace.subnamespace.outerclass.innerclass).

    For methods, this is the full name of the class and the methods’s signature (e.g., namespace.class.method(param1Type, param2Type) ).

    Link – The GitHub link to the code snippet, including the commit and the start and end LOC.

    Code Smell – code smell for which the code snippet is examined (Large Class or Long Method).

    Project Link – the link to the version of the code repository that was annotated.

    Metrics – a list of metrics for the code snippet, calculated by our platform. Our dataset provides 25 class-level metrics for Large Class detection and 18 method-level metrics for Long Method detection The list of metrics and their definitions is available here.

    Final annotation – a single severity score calculated by a majority vote.

    Annotators – each annotator's (1, 2, or 3) assigned severity score.

    To help guide their reasoning for evaluating the presence and the severity of a code smell, three annotators independently annotated whether the considered heuristics apply to an evaluated code snippet. We provide these results in two separate excel datasheets:

    LargeClass_Heuristics.xlsx - C# classes annotated for the presence of heuristics relevant for the Large Class code smell.

    LongMethod_Heuristics.xlsx - C# classes annotated for the presence of heuristics relevant for the Large Class code smell.

    The columns of these two datasheets are:

    Code Snippet ID - the full name of the code snippet (matching the IDs from DataSet_Large Class.xlsx and DataSet_Long Method.xlsx)

    Annotators – heuristics labelled by each of the annotators (1, 2, or 3).

    Heuristics – whether the heuristic is applicable to the examined code snippet or not (Section 1.2.4 lists heuristics relevant for the Large Class detection, and Section 1.2.5 lists the heuristics relevant for the Long Method detection).

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
K S ABISHEK (2023). Massive Bank dataset ( 1 Million+ rows) [Dataset]. https://www.kaggle.com/datasets/ksabishek/massive-bank-dataset-1-million-rows
Organization logo

Massive Bank dataset ( 1 Million+ rows)

Help the bank gain insights on domains , Locations and transaction counts.

Explore at:
zip(32471013 bytes)Available download formats
Dataset updated
Feb 21, 2023
Authors
K S ABISHEK
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Greetings , fellow analysts !

(NOTE : This is a random dataset generated using python. It bears no resemblance to any real entity in the corporate world. Any resemblance is a matter of coincidence.)

REC-SSEC Bank is a govt-aided bank operating in the Indian Peninsula. They have regional branches in over 40+ regions of the country. You have been provided with a massive excel sheet containing the transaction details, the total transaction amount and their location and total transaction count.

The dataset is described as follows :

  1. Date - The date on which the transaction took place. 2.Domain - Where or which type of Business entity made the transaction. 3.Location - Where the data is collected from 4.Value - Total value of transaction
  2. Count of transaction .

For example , in the very first row , the data can be read as : " On the first of January, 2022 , 1932 transactions of summing upto INR 365554 from Bhuj were reported " NOTE : There are about 2750 transactions every single day. All of this has been given to you.

The bank wants you to answer the following questions :

  1. What is the average transaction value everyday for each domain over the year.
  2. What is the average transaction value for every city/location over the year
  3. The bank CEO , Mr: Hariharan , wants to promote the ease of transaction for the highest active domain. If the domains could be sorted into a priority, what would be the priority list ?
  4. What's the average transaction count for each city ?
Search
Clear search
Close search
Google apps
Main menu