100+ datasets found
  1. Exploring E-commerce Trends⭐️⭐️⭐️

    • kaggle.com
    zip
    Updated Jul 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Roshan Riaz (2024). Exploring E-commerce Trends⭐️⭐️⭐️ [Dataset]. https://www.kaggle.com/datasets/muhammadroshaanriaz/e-commerce-trends-a-guide-to-leveraging-dataset
    Explore at:
    zip(51169 bytes)Available download formats
    Dataset updated
    Jul 8, 2024
    Authors
    Muhammad Roshan Riaz
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Exploring E-commerce Trends: A Guide to Leveraging Dummy Dataset

    Introduction: In the world of e-commerce, data is a powerful asset that can be leveraged to understand customer behavior, improve sales strategies, and enhance overall business performance. This guide explores how to effectively utilize a dummy dataset generated to simulate various aspects of an e-commerce platform. By analyzing this dataset, businesses can gain valuable insights into product trends, customer preferences, and market dynamics.

    1. Dataset Overview: The dummy dataset contains information on 1000 products across different categories such as electronics, clothing, home & kitchen, books, toys & games, and more. Each product is associated with attributes such as price, rating, number of reviews, stock quantity, discounts, sales, and date added to inventory. This comprehensive dataset provides a rich source of information for analysis and exploration.

    2. Data Analysis: Using tools like Pandas, NumPy, and visualization libraries like Matplotlib or Seaborn, businesses can perform in-depth analysis of the dataset. Key insights such as top-selling products, popular product categories, pricing trends, and seasonal variations can be extracted through exploratory data analysis (EDA). Visualization techniques can be employed to create intuitive graphs and charts for better understanding and communication of findings.

    3. Machine Learning Applications: The dataset can be used to train machine learning models for various e-commerce tasks such as product recommendation, sales prediction, customer segmentation, and sentiment analysis. By applying algorithms like linear regression, decision trees, or neural networks, businesses can develop predictive models to optimize inventory management, personalize customer experiences, and drive sales growth.

    4. Testing and Prototyping: Businesses can utilize the dummy dataset to test new algorithms, prototype new features, or conduct A/B testing experiments without impacting real user data. This enables rapid iteration and experimentation to validate hypotheses and refine strategies before implementation in a live environment.

    5. Educational Resources: The dummy dataset serves as an invaluable educational resource for students, researchers, and professionals interested in learning about e-commerce data analysis and machine learning. Tutorials, workshops, and online courses can be developed using the dataset to teach concepts such as data manipulation, statistical analysis, and model training in the context of e-commerce.

    6. Decision Support and Strategy Development: Insights derived from the dataset can inform strategic decision-making processes and guide business strategy development. By understanding customer preferences, market trends, and competitor behavior, businesses can make informed decisions regarding product assortment, pricing strategies, marketing campaigns, and resource allocation.

    Conclusion: In conclusion, the dummy dataset provides a versatile and valuable resource for exploring e-commerce trends, understanding customer behavior, and driving business growth. By leveraging this dataset effectively, businesses can unlock actionable insights, optimize operations, and stay ahead in today's competitive e-commerce landscape

  2. d

    Data from: Land Cover Trends Dataset, 2000-2011

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Nov 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Land Cover Trends Dataset, 2000-2011 [Dataset]. https://catalog.data.gov/dataset/land-cover-trends-dataset-2000-2011
    Explore at:
    Dataset updated
    Nov 26, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    U.S. Geological Survey scientists, funded by the Climate and Land Use Change Research and Development Program, developed a dataset of 2006 and 2011 land use and land cover (LULC) information for selected 100-km2 sample blocks within 29 EPA Level 3 ecoregions across the conterminous United States. The data was collected for validation of new and existing national scale LULC datasets developed from remotely sensed data sources. The data can also be used with the previously published Land Cover Trends Dataset: 1973-2000 (http:// http://pubs.usgs.gov/ds/844/), to assess land-use/land-cover change in selected ecoregions over a 37-year study period. LULC data for 2006 and 2011 was manually delineated using the same sample block classification procedures as the previous Land Cover Trends project. The methodology is based on a statistical sampling approach, manual classification of land use and land cover, and post-classification comparisons of land cover across different dates. Landsat Thematic Mapper, and Enhanced Thematic Mapper Plus imagery was interpreted using a modified Anderson Level I classification scheme. Landsat data was acquired from the National Land Cover Database (NLCD) collection of images. For the 2006 and 2011 update, ecoregion specific alterations in the sampling density were made to expedite the completion of manual block interpretations. The data collection process started with the 2000 date from the previous assessment and any needed corrections were made before interpreting the next two dates of 2006 and 2011 imagery. The 2000 land cover was copied and any changes seen in the 2006 Landsat images were digitized into a new 2006 land cover image. Similarly, the 2011 land cover image was created after completing the 2006 delineation. Results from analysis of these data include ecoregion based statistical estimates of the amount of LULC change per time period, ranking of the most common types of conversions, rates of change, and percent composition. Overall estimated amount of change per ecoregion from 2001 to 2011 ranged from a low of 370 km2 in the Northern Basin and Range Ecoregion to a high of 78,782 km2 in the Southeastern Plains Ecoregion. The Southeastern Plains Ecoregion continues to encompass the most intense forest harvesting and regrowth in the country. Forest harvesting and regrowth rates in the southeastern U.S. and Pacific Northwest continued at late 20th century levels. The land use and land cover data collected by this study is ideally suited for training, validation, and regional assessments of land use and land cover change in the U.S. because it is collected using manual interpretation techniques of Landsat data aided by high resolution photography. The 2001-2011 Land Cover Trends Dataset is provided in an Albers Conical Equal Area projection using the NAD 1983 datum. The sample blocks have a 30-meter resolution and file names follow a specific naming convention that includes the number of the ecoregion containing the block, the block number, and the Landsat image date. The data files are organized by ecoregion, and are available in the ERDAS Imagine (.img) format. U.S. Geological Survey scientists, funded by the Climate and Land Use Change Research and Development Program, developed a dataset of 2006 and 2011 land use and land cover (LULC) information for selected 100-km2 sample blocks within 29 EPA Level 3 ecoregions across the conterminous United States. The data was collected for validation of new and existing national scale LULC datasets developed from remotely sensed data sources. The data can also be used with the previously published Land Cover Trends Dataset: 1973-2000 (http:// http://pubs.usgs.gov/ds/844/), to assess land-use/land-cover change in selected ecoregions over a 37-year study period. LULC data for 2006 and 2011 was manually delineated using the same sample block classification procedures as the previous Land Cover Trends project. The methodology is based on a statistical sampling approach, manual classification of land use and land cover, and post-classification comparisons of land cover across different dates. Landsat Thematic Mapper, and Enhanced Thematic Mapper Plus imagery was interpreted using a modified Anderson Level I classification scheme. Landsat data was acquired from the National Land Cover Database (NLCD) collection of images. For the 2006 and 2011 update, ecoregion specific alterations in the sampling density were made to expedite the completion of manual block interpretations. The data collection process started with the 2000 date from the previous assessment and any needed corrections were made before interpreting the next two dates of 2006 and 2011 imagery. The 2000 land cover was copied and any changes seen in the 2006 Landsat images were digitized into a new 2006 land cover image. Similarly, the 2011 land cover image was created after completing the 2006 delineation. Results from analysis of these data include ecoregion based statistical estimates of the amount of LULC change per time period, ranking of the most common types of conversions, rates of change, and percent composition. Overall estimated amount of change per ecoregion from 2001 to 2011 ranged from a low of 370 square km in the Northern Basin and Range Ecoregion to a high of 78,782 square km in the Southeastern Plains Ecoregion. The Southeastern Plains Ecoregion continues to encompass the most intense forest harvesting and regrowth in the country. Forest harvesting and regrowth rates in the southeastern U.S. and Pacific Northwest continued at late 20th century levels. The land use and land cover data collected by this study is ideally suited for training, validation, and regional assessments of land use and land cover change in the U.S. because it’s collected using manual interpretation techniques of Landsat data aided by high resolution photography. The 2001-2011 Land Cover Trends Dataset is provided in an Albers Conical Equal Area projection using the NAD 1983 datum. The sample blocks have a 30-meter resolution and file names follow a specific naming convention that includes the number of the ecoregion containing the block, the block number, and the Landsat image date. The data files are organized by ecoregion, and are available in the ERDAS Imagine (.img) format.

  3. i

    Household Health Survey 2012-2013, Economic Research Forum (ERF)...

    • datacatalog.ihsn.org
    • catalog.ihsn.org
    Updated Jun 26, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kurdistan Regional Statistics Office (KRSO) (2017). Household Health Survey 2012-2013, Economic Research Forum (ERF) Harmonization Data - Iraq [Dataset]. https://datacatalog.ihsn.org/catalog/6937
    Explore at:
    Dataset updated
    Jun 26, 2017
    Dataset provided by
    Kurdistan Regional Statistics Office (KRSO)
    Economic Research Forum
    Central Statistical Organization (CSO)
    Time period covered
    2012 - 2013
    Area covered
    Iraq
    Description

    Abstract

    The harmonized data set on health, created and published by the ERF, is a subset of Iraq Household Socio Economic Survey (IHSES) 2012. It was derived from the household, individual and health modules, collected in the context of the above mentioned survey. The sample was then used to create a harmonized health survey, comparable with the Iraq Household Socio Economic Survey (IHSES) 2007 micro data set.

    ----> Overview of the Iraq Household Socio Economic Survey (IHSES) 2012:

    Iraq is considered a leader in household expenditure and income surveys where the first was conducted in 1946 followed by surveys in 1954 and 1961. After the establishment of Central Statistical Organization, household expenditure and income surveys were carried out every 3-5 years in (1971/ 1972, 1976, 1979, 1984/ 1985, 1988, 1993, 2002 / 2007). Implementing the cooperation between CSO and WB, Central Statistical Organization (CSO) and Kurdistan Region Statistics Office (KRSO) launched fieldwork on IHSES on 1/1/2012. The survey was carried out over a full year covering all governorates including those in Kurdistan Region.

    The survey has six main objectives. These objectives are:

    1. Provide data for poverty analysis and measurement and monitor, evaluate and update the implementation Poverty Reduction National Strategy issued in 2009.
    2. Provide comprehensive data system to assess household social and economic conditions and prepare the indicators related to the human development.
    3. Provide data that meet the needs and requirements of national accounts.
    4. Provide detailed indicators on consumption expenditure that serve making decision related to production, consumption, export and import.
    5. Provide detailed indicators on the sources of households and individuals income.
    6. Provide data necessary for formulation of a new consumer price index number.

    The raw survey data provided by the Statistical Office were then harmonized by the Economic Research Forum, to create a comparable version with the 2006/2007 Household Socio Economic Survey in Iraq. Harmonization at this stage only included unifying variables' names, labels and some definitions. See: Iraq 2007 & 2012- Variables Mapping & Availability Matrix.pdf provided in the external resources for further information on the mapping of the original variables on the harmonized ones, in addition to more indications on the variables' availability in both survey years and relevant comments.

    Geographic coverage

    National coverage: Covering a sample of urban, rural and metropolitan areas in all the governorates including those in Kurdistan Region.

    Analysis unit

    1- Household/family. 2- Individual/person.

    Universe

    The survey was carried out over a full year covering all governorates including those in Kurdistan Region.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    ----> Design:

    Sample size was (25488) household for the whole Iraq, 216 households for each district of 118 districts, 2832 clusters each of which includes 9 households distributed on districts and governorates for rural and urban.

    ----> Sample frame:

    Listing and numbering results of 2009-2010 Population and Housing Survey were adopted in all the governorates including Kurdistan Region as a frame to select households, the sample was selected in two stages: Stage 1: Primary sampling unit (blocks) within each stratum (district) for urban and rural were systematically selected with probability proportional to size to reach 2832 units (cluster). Stage two: 9 households from each primary sampling unit were selected to create a cluster, thus the sample size of total survey clusters was 25488 households distributed on the governorates, 216 households in each district.

    ----> Sampling Stages:

    In each district, the sample was selected in two stages: Stage 1: based on 2010 listing and numbering frame 24 sample points were selected within each stratum through systematic sampling with probability proportional to size, in addition to the implicit breakdown urban and rural and geographic breakdown (sub-district, quarter, street, county, village and block). Stage 2: Using households as secondary sampling units, 9 households were selected from each sample point using systematic equal probability sampling. Sampling frames of each stages can be developed based on 2010 building listing and numbering without updating household lists. In some small districts, random selection processes of primary sampling may lead to select less than 24 units therefore a sampling unit is selected more than once , the selection may reach two cluster or more from the same enumeration unit when it is necessary.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    ----> Preparation:

    The questionnaire of 2006 survey was adopted in designing the questionnaire of 2012 survey on which many revisions were made. Two rounds of pre-test were carried out. Revision were made based on the feedback of field work team, World Bank consultants and others, other revisions were made before final version was implemented in a pilot survey in September 2011. After the pilot survey implemented, other revisions were made in based on the challenges and feedbacks emerged during the implementation to implement the final version in the actual survey.

    ----> Questionnaire Parts:

    The questionnaire consists of four parts each with several sections: Part 1: Socio – Economic Data: - Section 1: Household Roster - Section 2: Emigration - Section 3: Food Rations - Section 4: housing - Section 5: education - Section 6: health - Section 7: Physical measurements - Section 8: job seeking and previous job

    Part 2: Monthly, Quarterly and Annual Expenditures: - Section 9: Expenditures on Non – Food Commodities and Services (past 30 days). - Section 10 : Expenditures on Non – Food Commodities and Services (past 90 days). - Section 11: Expenditures on Non – Food Commodities and Services (past 12 months). - Section 12: Expenditures on Non-food Frequent Food Stuff and Commodities (7 days). - Section 12, Table 1: Meals Had Within the Residential Unit. - Section 12, table 2: Number of Persons Participate in the Meals within Household Expenditure Other Than its Members.

    Part 3: Income and Other Data: - Section 13: Job - Section 14: paid jobs - Section 15: Agriculture, forestry and fishing - Section 16: Household non – agricultural projects - Section 17: Income from ownership and transfers - Section 18: Durable goods - Section 19: Loans, advances and subsidies - Section 20: Shocks and strategy of dealing in the households - Section 21: Time use - Section 22: Justice - Section 23: Satisfaction in life - Section 24: Food consumption during past 7 days

    Part 4: Diary of Daily Expenditures: Diary of expenditure is an essential component of this survey. It is left at the household to record all the daily purchases such as expenditures on food and frequent non-food items such as gasoline, newspapers…etc. during 7 days. Two pages were allocated for recording the expenditures of each day, thus the roster will be consists of 14 pages.

    Cleaning operations

    ----> Raw Data:

    Data Editing and Processing: To ensure accuracy and consistency, the data were edited at the following stages: 1. Interviewer: Checks all answers on the household questionnaire, confirming that they are clear and correct. 2. Local Supervisor: Checks to make sure that questions has been correctly completed. 3. Statistical analysis: After exporting data files from excel to SPSS, the Statistical Analysis Unit uses program commands to identify irregular or non-logical values in addition to auditing some variables. 4. World Bank consultants in coordination with the CSO data management team: the World Bank technical consultants use additional programs in SPSS and STAT to examine and correct remaining inconsistencies within the data files. The software detects errors by analyzing questionnaire items according to the expected parameter for each variable.

    ----> Harmonized Data:

    • The SPSS package is used to harmonize the Iraq Household Socio Economic Survey (IHSES) 2007 with Iraq Household Socio Economic Survey (IHSES) 2012.
    • The harmonization process starts with raw data files received from the Statistical Office.
    • A program is generated for each dataset to create harmonized variables.
    • Data is saved on the household and individual level, in SPSS and then converted to STATA, to be disseminated.

    Response rate

    Iraq Household Socio Economic Survey (IHSES) reached a total of 25488 households. Number of households refused to response was 305, response rate was 98.6%. The highest interview rates were in Ninevah and Muthanna (100%) while the lowest rates were in Sulaimaniya (92%).

  4. m

    Dataset on the Development of Codes of Conduct in Online Classrooms of...

    • data.mendeley.com
    Updated Feb 26, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bui Thanh Thuy (2025). Dataset on the Development of Codes of Conduct in Online Classrooms of Vietnamese High School Students [Dataset]. http://doi.org/10.17632/h7y74ygfnw.4
    Explore at:
    Dataset updated
    Feb 26, 2025
    Authors
    Bui Thanh Thuy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Development for Codes of Conduct in Online Classrooms of Vietnamese High School Students (CCOCVHSS) dataset includes 06 files with different formats (.doc, .cvs, .sav) to suit each step in the process of developing items of CCOCVHSS, specifically as follows: 1. Initial_Items_Pool.docx: presents 34 items developed by the research team based on the overview and analysis of research documents related to student behavior in the online learning environment in relation to teachers and other students with two main aspects: attitude and behavior, along with codes of conduct for students at general schools for online learning. 2. Experts_Judge_Results.xlsx: includes 07 columns and 35 rows, in which the columns correspond to data fields. Meanwhile, the rows show information about each item code, the content of that item, each expert's rating for that item, the total score of that item, and the analysis results of the proportions of the three rating levels. 3. Questionare_Of_CCOCVHSS.docx: is a questionnaire designed to serve the data collection with three parts: (1) Introduction and declaration of consent; (2) Demographic information; and (3) Questions. 4. CCOCVHSS _rawdata.csv: is the data used for analysis that has been cleaned from the raw data collected from the online survey.

  5. Population Health (BRFSS: HRQOL)

    • kaggle.com
    zip
    Updated Dec 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Population Health (BRFSS: HRQOL) [Dataset]. https://www.kaggle.com/datasets/thedevastator/unlock-population-health-needs-with-brfss-hrqol
    Explore at:
    zip(2247473 bytes)Available download formats
    Dataset updated
    Dec 14, 2022
    Authors
    The Devastator
    Description

    Population Health (BRFSS: HRQOL)

    Examining Trends, Disparities and Determinants of Health in the US Population

    By Health [source]

    About this dataset

    The Behavioral Risk Factor Surveillance System (BRFSS) offers an expansive collection of data on the health-related quality of life (HRQOL) from 1993 to 2010. Over this time period, the Health-Related Quality of Life dataset consists of a comprehensive survey reflecting the health and well-being of non-institutionalized US adults aged 18 years or older. The data collected can help track and identify unmet population health needs, recognize trends, identify disparities in healthcare, determine determinants of public health, inform decision making and policy development, as well as evaluate programs within public healthcare services.

    The HRQOL surveillance system has developed a compact set of HRQOL measures such as a summary measure indicating unhealthy days which have been validated for population health surveillance purposes and have been widely implemented in practice since 1993. Within this study's dataset you will be able to access information such as year recorded, location abbreviations & descriptions, category & topic overviews, questions asked in surveys and much more detailed information including types & units regarding data values retrieved from respondents along with their sample sizes & geographical locations involved!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset tracks the Health-Related Quality of Life (HRQOL) from 1993 to 2010 using data from the Behavioral Risk Factor Surveillance System (BRFSS). This dataset includes information on the year, location abbreviation, location description, type and unit of data value, sample size, category and topic of survey questions.

    Using this dataset on BRFSS: HRQOL data between 1993-2010 will allow for a variety of analyses related to population health needs. The compact set of HRQOL measures can be used to identify trends in population health needs as well as determine disparities among various locations. Additionally, responses to survey questions can be used to inform decision making and program and policy development in public health initiatives.

    Research Ideas

    • Analyzing trends in HRQOL over the years by location to identify disparities in health outcomes between different populations and develop targeted policy interventions.
    • Developing new models for predicting HRQOL indicators at a regional level, and using this information to inform medical practice and public health implementation efforts.
    • Using the data to understand differences between states in terms of their HRQOL scores and establish best practices for healthcare provision based on that understanding, including areas such as access to care, preventative care services availability, etc

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    See the dataset description for more information.

    Columns

    File: rows.csv | Column name | Description | |:-------------------------------|:----------------------------------------------------------| | Year | Year of survey. (Integer) | | LocationAbbr | Abbreviation of location. (String) | | LocationDesc | Description of location. (String) | | Category | Category of survey. (String) | | Topic | Topic of survey. (String) | | Question | Question asked in survey. (String) | | DataSource | Source of data. (String) | | Data_Value_Unit | Unit of data value. (String) | | Data_Value_Type | Type of data value. (String) | | Data_Value_Footnote_Symbol | Footnote symbol for data value. (String) | | Data_Value_Std_Err | Standard error of the data value. (Float) | | Sample_Size | Sample size used in sample. (Integer) | | Break_Out | Break out categories used. (String) | | Break_Out_Category | Type break out assessed. (String) | | **GeoLocation*...

  6. T

    Resilience dataset for the development of healthcare conditions in countries...

    • data.tpdc.ac.cn
    • tpdc.ac.cn
    • +1more
    zip
    Updated May 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xinliang XU (2022). Resilience dataset for the development of healthcare conditions in countries along the Belt and Road (2000-2019) [Dataset]. http://doi.org/10.11888/HumanNat.tpdc.272237
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 19, 2022
    Dataset provided by
    TPDC
    Authors
    Xinliang XU
    Area covered
    Description

    The resilience of health care development in countries along the Belt and Road reflects the level of resilience of health care development in the countries along the Belt and Road, and the higher the value of the data, the stronger the resilience of health care development in the countries along the Belt and Road. The World Bank statistical database was used for the preparation of the health resilience data. Based on the year-on-year data of these four indicators, and taking into account the year-on-year changes of each indicator, the product of resilience in the development of healthcare conditions was prepared through comprehensive diagnosis based on sensitivity and adaptability analysis. "The Resilience in Health Care Development dataset for countries along the Belt and Road is an important reference for analysing and comparing the current resilience in health care development in each country.

  7. e

    Global - Roads Open Access Data Set - Dataset - ENERGYDATA.INFO

    • energydata.info
    Updated Jul 25, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). Global - Roads Open Access Data Set - Dataset - ENERGYDATA.INFO [Dataset]. https://energydata.info/dataset/global-roads-open-access-data-set-2010
    Explore at:
    Dataset updated
    Jul 25, 2018
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The Global Roads Open Access Data Set, Version 1 (gROADSv1) was developed under the auspices of the CODATA Global Roads Data Development Task Group. The data set combines the best available roads data by country into a global roads coverage, using the UN Spatial Data Infrastructure Transport (UNSDI-T) version 2 as a common data model. All country road networks have been joined topologically at the borders, and many countries have been edited for internal topology. Source data for each country are provided in the documentation, and users are encouraged to refer to the readme file for use constraints that apply to a small number of countries. Because the data are compiled from multiple sources, the date range for road network representations ranges from the 1980s to 2010 depending on the country (most countries have no confirmed date), and spatial accuracy varies. The baseline global data set was compiled by the Information Technology Outreach Services (ITOS) of the University of Georgia. Updated data for 27 countries and 6 smaller geographic entities were assembled by Columbia University's Center for International Earth Science Information Network (CIESIN), with a focus largely on developing countries with the poorest data coverage.

  8. MAMEM Phase I Dataset - A dataset for multimodal human-computer interaction...

    • figshare.com
    • data.europa.eu
    zip
    Updated May 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Spiros Nikolopoulos; Kostas Georgiadis; Fotis Kalaganis; Georgios Liaros; Ioulietta Lazarou; Katerina Adam; Anastasios Papazoglou-Chalikias; Elisavet Chatzilari; Vengerlis P. Oikonomou; Panagiotis C. Petrantonakis; Ioannis Kompatsiaris; Chandan Kumar; Raphael Menges; Steffen Staab; Daniel Muller; Korok Sengupta; Sevasti Bostantjopoulou; Katsarou Zoe; Gabi Zeilig; Meir Plotnik; Amihai Gottlieb; Sofia Fountoukidou; Jaap Ham; Dimitrios Athanasiou; Agnes Mariakaki; Dario Comanducci; Edoardo Sabatini; Wlater Nistico; Markus Plank (2023). MAMEM Phase I Dataset - A dataset for multimodal human-computer interaction using biosignals and eye tracking information [Dataset]. http://doi.org/10.6084/m9.figshare.5231053.v3
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Spiros Nikolopoulos; Kostas Georgiadis; Fotis Kalaganis; Georgios Liaros; Ioulietta Lazarou; Katerina Adam; Anastasios Papazoglou-Chalikias; Elisavet Chatzilari; Vengerlis P. Oikonomou; Panagiotis C. Petrantonakis; Ioannis Kompatsiaris; Chandan Kumar; Raphael Menges; Steffen Staab; Daniel Muller; Korok Sengupta; Sevasti Bostantjopoulou; Katsarou Zoe; Gabi Zeilig; Meir Plotnik; Amihai Gottlieb; Sofia Fountoukidou; Jaap Ham; Dimitrios Athanasiou; Agnes Mariakaki; Dario Comanducci; Edoardo Sabatini; Wlater Nistico; Markus Plank
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset combines multimodal biosignals and eye tracking information gathered under a human-computer interaction framework. The dataset was developed in the vein of the MAMEM project that aims to endow people with motor disabilities with the ability to edit and author multimedia content through mental commands and gaze activity. The dataset includes EEG, eye-tracking, and physiological (GSR and Heart rate) signals along with demographic, clinical and behavioral data collected from 36 individuals (18 able-bodied and 18 motor-impaired). Data were collected during the interaction with specifically designed interface for web browsing and multimedia content manipulation and during imaginary movement tasks. Alongside these data we also include evaluation reports both from the subjects and the experimenters as far as the experimental procedure and collected dataset are concerned. We believe that the presented dataset will contribute towards the development and evaluation of modern human-computer interaction systems that would foster the integration of people with severe motor impairments back into society.Please use the following citation: Nikolopoulos, Spiros, Georgiadis, Kostas, Kalaganis, Fotis, Liaros, Georgios, Lazarou, Ioulietta, Adam, Katerina, Papazoglou – Chalikias, Anastasios, Chatzilari, Elisavet , Oikonomou, Vangelis P., Petrantonakis, Panagiotis C., Kompatsiaris, Ioannis, Kumar, Chandan, Menges, Raphael, Staab, Steffen, Müller, Daniel, Sengupta, Korok, Bostantjopoulou, Sevasti, Zoe, Katsarou , Zeilig, Gabi, Plotnik, Meir, Gottlieb, Amihai, Fountoukidou, Sofia, Ham, Jaap, Athanasiou, Dimitrios, Mariakaki, Agnes, Comanducci, Dario, Sabatini, Edoardo, Nistico, Walter & Plank, Markus. (2017). The MAMEM Project - A dataset for multimodal human-computer interaction using biosignals and eye tracking information. Zenodo. http://doi.org/10.5281/zenodo.834154Read/analyze using the following software:https://github.com/MAMEM/eeg-processing-toolbox

  9. Z

    MESINESP: Medical Semantic Indexing in Spanish - Development dataset

    • data-staging.niaid.nih.gov
    • live.european-language-grid.eu
    • +2more
    Updated Nov 5, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Krallinger; Aitor Gonzalez-Agirre; Alejandro Asensio (2022). MESINESP: Medical Semantic Indexing in Spanish - Development dataset [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_3746595
    Explore at:
    Dataset updated
    Nov 5, 2022
    Dataset provided by
    Barcelona Supercomputing Center
    Authors
    Martin Krallinger; Aitor Gonzalez-Agirre; Alejandro Asensio
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Please use the MESINESP2 corpus (the second edition of the shared-task) since it has a higher level of curation, quality and is organized by document type (scientific articles, patents and clinical trials).

    Introduction

    The Mesinesp (Spanish BioASQ track, see https://temu.bsc.es/mesinesp) development set has a total of 750 records indexed manually by seven experienced medical literature indexers. Indexing is done using DeCS codes, a sort of Spanish equivalent to MeSH terms. Records were distributed in a way that each article was annotated, at least, by two different human indexers.

    The data annotation process consisted in two steps:

    Manual indexing step. DeCS codes were manually assigned to each record following the DeCS manual indexing guidelines.

    Manual validation and consensus. The joined set of manually indexed DeCS codes generated by both indexers were manually revised and corrections were done.

    These annotations were analyzed, resulting in an agreement using the Jaccard index.

    Records consisted basically in medical literature abstracts and titles from the IBECS and LILACS databases.

    Zip structure The zip file contains two different development sets:

    Official development set, which has the union of the annotations, with an agreement of macro = 0.6568 and micro = 0.6819. This set is composed by all the different (unique) DeCS codes that have been added by any annotator for each document; and

    Core-descriptors development set, which has the intersection of the annotations, with an agreement of macro = 1.0 and micro = 1.0. This set is composed of the common DeCS codes that have been added by two or more annotators for each document.

    Corpus format

    Each dataset is a JSON object with one single key named "articles", which contains a list of documents. So, the raw format of the file is one line per document plus two additional lines (the first and the last) to enclose that list of documents and the expected type of data is as follows:

    {"articles":[ {"abstractText":str,"db":str,"decsCodes":list,"id":str,"journal":str,"title":str,"year":int}, ... ]}

    To clarify, the order of appearance of the fields in each document is as follows (note that this example it is pretty printed for readability purposes):

    { "articles": [ { "abstractText": "Content of the abstract", "db": "Name of the source database", "decsCodes": [ "code1", "code2", "code3" ], "id": "Id of the document", "journal": "Name of the journal", "title": "Title of the document", "year": 2019 } ] }

    Note: The fields "db", "journal" and "year" might be null.

    Copyright (c) 2020 Secretaría de Estado de Digitalización e Inteligencia Artificial

  10. International Datasets

    • kaggle.com
    zip
    Updated Jun 27, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    US Census Bureau (2017). International Datasets [Dataset]. https://www.kaggle.com/census/international-data
    Explore at:
    zip(853301245 bytes)Available download formats
    Dataset updated
    Jun 27, 2017
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    Authors
    US Census Bureau
    Description

    Content

    The United States Census Bureau’s International Dataset provides estimates of country populations since 1950 and projections through 2050. Specifically, the data set includes midyear population figures broken down by age and gender assignment at birth. Additionally, they provide time-series data for attributes including fertility rates, birth rates, death rates, and migration rates.

    The full documentation is available here. For basic field details, please see the data dictionary.

    Note: The U.S. Census Bureau provides estimates and projections for countries and areas that are recognized by the U.S. Department of State that have a population of at least 5,000.

    Acknowledgements

    This dataset was created by the United States Census Bureau.

    Inspiration

    Which countries have made the largest improvements in life expectancy? Based on current trends, how long will it take each country to catch up to today’s best performers?

    Use this dataset with BigQuery

    You can use Kernels to analyze, share, and discuss this data on Kaggle, but if you’re looking for real-time updates and bigger data, check out the data on BigQuery, too: https://cloud.google.com/bigquery/public-data/international-census.

  11. 2

    NCDS

    • datacatalogue.ukdataservice.ac.uk
    Updated Nov 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of London, Institute of Education, Centre for Longitudinal Studies (2024). NCDS [Dataset]. http://doi.org/10.5255/UKDA-SN-5578-2
    Explore at:
    Dataset updated
    Nov 14, 2024
    Dataset provided by
    UK Data Servicehttps://ukdataservice.ac.uk/
    Authors
    University of London, Institute of Education, Centre for Longitudinal Studies
    Area covered
    United Kingdom
    Description

    The National Child Development Study (NCDS) is a continuing longitudinal study that seeks to follow the lives of all those living in Great Britain who were born in one particular week in 1958. The aim of the study is to improve understanding of the factors affecting human development over the whole lifespan.

    The NCDS has its origins in the Perinatal Mortality Survey (PMS) (the original PMS study is held at the UK Data Archive under SN 2137). This study was sponsored by the National Birthday Trust Fund and designed to examine the social and obstetric factors associated with stillbirth and death in early infancy among the 17,000 children born in England, Scotland and Wales in that one week. Selected data from the PMS form NCDS sweep 0, held alongside NCDS sweeps 1-3, under SN 5565.

    Survey and Biomeasures Data (GN 33004):

    To date there have been ten attempts to trace all members of the birth cohort in order to monitor their physical, educational and social development. The first three sweeps were carried out by the National Children's Bureau, in 1965, when respondents were aged 7, in 1969, aged 11, and in 1974, aged 16 (these sweeps form NCDS1-3, held together with NCDS0 under SN 5565). The fourth sweep, also carried out by the National Children's Bureau, was conducted in 1981, when respondents were aged 23 (held under SN 5566). In 1985 the NCDS moved to the Social Statistics Research Unit (SSRU) - now known as the Centre for Longitudinal Studies (CLS). The fifth sweep was carried out in 1991, when respondents were aged 33 (held under SN 5567). For the sixth sweep, conducted in 1999-2000, when respondents were aged 42 (NCDS6, held under SN 5578), fieldwork was combined with the 1999-2000 wave of the 1970 Birth Cohort Study (BCS70), which was also conducted by CLS (and held under GN 33229). The seventh sweep was conducted in 2004-2005 when the respondents were aged 46 (held under SN 5579), the eighth sweep was conducted in 2008-2009 when respondents were aged 50 (held under SN 6137), the ninth sweep was conducted in 2013 when respondents were aged 55 (held under SN 7669), and the tenth sweep was conducted in 2020-24 when the respondents were aged 60-64 (held under SN 9412).

    A Secure Access version of the NCDS is available under SN 9413, containing detailed sensitive variables not available under Safeguarded access (currently only sweep 10 data). Variables include uncommon health conditions (including age at diagnosis), full employment codes and income/finance details, and specific life circumstances (e.g. pregnancy details, year/age of emigration from GB).

    Four separate datasets covering responses to NCDS over all sweeps are available. National Child Development Deaths Dataset: Special Licence Access (SN 7717) covers deaths; National Child Development Study Response and Outcomes Dataset (SN 5560) covers all other responses and outcomes; National Child Development Study: Partnership Histories (SN 6940) includes data on live-in relationships; and National Child Development Study: Activity Histories (SN 6942) covers work and non-work activities. Users are advised to order these studies alongside the other waves of NCDS.

    From 2002-2004, a Biomedical Survey was completed and is available under Safeguarded Licence (SN 8731) and Special Licence (SL) (SN 5594). Proteomics analyses of blood samples are available under SL SN 9254.

    Linked Geographical Data (GN 33497):
    A number of geographical variables are available, under more restrictive access conditions, which can be linked to the NCDS EUL and SL access studies.

    Linked Administrative Data (GN 33396):
    A number of linked administrative datasets are available, under more restrictive access conditions, which can be linked to the NCDS EUL and SL access studies. These include a Deaths dataset (SN 7717) available under SL and the Linked Health Administrative Datasets (SN 8697) available under Secure Access.

    Multi-omics Data and Risk Scores Data (GN 33592)
    Proteomics analyses were run on the blood samples collected from NCDS participants in 2002-2004 and are available under SL SN 9254. Metabolomics analyses were conducted on respondents of sweep 10 and are available under SL SN 9411. Polygenic indices are available under SL SN 9439. Derived summary scores have been created that combine the estimated effects of many different genes on a specific trait or characteristic, such as a person's risk of Alzheimer's disease, asthma, substance abuse, or mental health disorders, for example. These scores can be combined with existing survey data to offer a more nuanced understanding of how cohort members' outcomes may be shaped.

    Additional Sub-Studies (GN 33562):
    In addition to the main NCDS sweeps, further studies have also been conducted on a range of subjects such as parent migration, unemployment, behavioural studies and respondent essays. The full list of NCDS studies available from the UK Data Service can be found on the NCDS series access data webpage.

    How to access genetic and/or bio-medical sample data from a range of longitudinal surveys:
    For information on how to access biomedical data from NCDS that are not held at the UKDS, see the CLS Genetic data and biological samples webpage.

    Further information about the full NCDS series can be found on the Centre for Longitudinal Studies website.

    NCDS6:
    The sixth NCDS sweep took place in 1999-2000, when cohort members were aged 41-42 years. Fieldwork was combined with the 29-year follow-up for the 1970 British Cohort Study (BCS70), also conducted by CLS.

    SN 5578 supersedes the former combined NCDS6/BCS70 1999-2000 dataset, which was held under SN 4396 National Child Development Study and 1970 British Cohort Study (BCS70) Follow-ups, 1999-2000. The Centre for Longitudinal Studies updated the first six waves of NCDS in late 2006, and as part of this work separated the composite NCDS6/BCS70 dataset. Improvements made include further data cleaning and the addition of new documentation. Users who have previously obtained SN 4396 should no longer use it, and should completely replace it with this one. The BCS70 component of SN 4396 is now held separately under SN 5558 1970 British Cohort Study: Twenty-Nine-Year Follow-up, 1999-2000.

    Latest edition information
    For the third edition (November 2024), 14 new variables have been added. These variables correspond to truncated ICD-10 codes, limited to the first letter, derived from free-text responses regarding general health issues, kidney and bladder conditions, and long-standing illnesses. In addition, a small number of variables have been removed as a result of a disclosure review.

  12. Practice Dataset

    • kaggle.com
    zip
    Updated Sep 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seif Hafez (2025). Practice Dataset [Dataset]. https://www.kaggle.com/datasets/seifhafez/practice-dataset
    Explore at:
    zip(13049 bytes)Available download formats
    Dataset updated
    Sep 20, 2025
    Authors
    Seif Hafez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset was created in 2025 by the CATReloaded team in the Data Science Circle at Mansoura University, Faculty of Engineering, Egypt.

    The dataset was originally prepared as the supporting material for a pandas practice notebook. That notebook was designed as a practical task after Corey Schafer’s YouTube pandas course

    The goal was to create a comprehensive pandas challenge that includes almost every skill you might need when working with pandas. The idea is that you can save the code and revisit it later whenever you need a reference.

    I believe this task could be useful for:

    • Anyone just starting with pandas

    • Learners who want a structured challenge to test and refresh their skills

    • People looking for a practice task they can build on, enhance, or adapt

    📌 The full task is available in the pinned notebook here:

    👉 "https://www.kaggle.com/code/seifhafez/pandas-exercise/edit">Link to Notebook

    💡 Notes:

    • The task may contain non-beginner-friendly questions, so don’t worry if they take some time.

    • I plan to provide solutions/answers when I have free time to write them down.

    • If anyone from the community shares model answers, I’ll be very grateful. I will gladly give credit and mention those contributions so others can benefit from them too.

    • You are welcome to design new tasks or variations using this dataset or notebook, as long as credit is kept to the CATReloaded team.

    📖 To explore more of what we do, check out the CATReloaded Roadmap on GitHub:

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F19471804%2F9dcd0bfb323cfa328e83bd8a2b7944a7%2F458741397_513503334603832_744753795589333817_n.jpg?generation=1758812067506227&alt=media" alt="">

  13. Major Development Sites - Dataset - data.gov.uk

    • ckan.publishing.service.gov.uk
    Updated Sep 16, 2015
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.publishing.service.gov.uk (2015). Major Development Sites - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/major-development-sites
    Explore at:
    Dataset updated
    Sep 16, 2015
    Dataset provided by
    CKANhttps://ckan.org/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Major Development Sites in York. For further information about major development sites please visit the City of York Council website. *Please note that the data published within this dataset is a live API link to CYC's GIS server. Any changes made to the master copy of the data will be immediately reflected in the resources of this dataset.The date shown in the "Last Updated" field of each GIS resource reflects when the data was first published.

  14. w

    Dataset of books series that contain Property development

    • workwithdata.com
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2024). Dataset of books series that contain Property development [Dataset]. https://www.workwithdata.com/datasets/book-series?f=1&fcol0=j0-book&fop0=%3D&fval0=Property+development&j=1&j0=books
    Explore at:
    Dataset updated
    Nov 25, 2024
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about book series. It has 1 row and is filtered where the books is Property development. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.

  15. w

    Dataset of book subjects that contain Understanding development & learning :...

    • workwithdata.com
    Updated Nov 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2024). Dataset of book subjects that contain Understanding development & learning : implications for teaching [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=j0-book&fop0=%3D&fval0=Understanding+development+%26+learning+%3A+implications+for+teaching&j=1&j0=books
    Explore at:
    Dataset updated
    Nov 7, 2024
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about book subjects. It has 3 rows and is filtered where the books is Understanding development & learning : implications for teaching. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.

  16. c

    Data from: A DICOM dataset for evaluation of medical image de-identification...

    • stage.cancerimagingarchive.net
    • cancerimagingarchive.net
    csv, dicom, n/a
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive, A DICOM dataset for evaluation of medical image de-identification [Dataset]. http://doi.org/10.7937/s17z-r072
    Explore at:
    dicom, csv, n/aAvailable download formats
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    Apr 7, 2021
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    Open access or shared research data must comply with (HIPAA) patient privacy regulations. These regulations require the de-identification of datasets before they can be placed in the public domain. The process of image de-identification is time consuming, requires significant human resources, and is prone to human error. Automated image de-identification algorithms have been developed but the research community requires some method of evaluation before such tools can be widely accepted. This evaluation requires a robust dataset that can be used as part of an evaluation process for de-identification algorithms.

    We developed a DICOM dataset that can be used to evaluate the performance of de-identification algorithms. DICOM image information objects were selected from datasets published in TCIA. Synthetic Protected Health Information (PHI) was generated and inserted into selected DICOM data elements to mimic typical clinical imaging exams. The evaluation dataset was de-identified by a TCIA curation team using standard TCIA tools and procedures. We are publishing the evaluation dataset (containing synthetic PHI) and de-identified evaluation dataset (result of TCIA curation) in advance of a potential competition, sponsored by the National Cancer Institute (NCI), for de-identification algorithm evaluation, and de-identification of medical image datasets. The evaluation dataset published here is a subset of a larger evaluation dataset that was created under contract for the National Cancer Institute. This subset is being published to allow researchers to test their de-identification algorithms and promote standardized procedures for validating automated de-identification.

  17. S

    Mortality Prediction MIMIC-III

    • scidb.cn
    Updated May 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yanrong Cai (2021). Mortality Prediction MIMIC-III [Dataset]. http://doi.org/10.11922/sciencedb.00787
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 6, 2021
    Dataset provided by
    Science Data Bank
    Authors
    Yanrong Cai
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This new dataset was established according to the MIMIC III dataset, an openly available database developed by The Laboratory of Computational Physiology at Massachusetts Institute of Technology (MIT), which consists of data from more than 25,000 patients who were admitted to the Beth Israel Deaconess Medical Center (BIDMC) since 2003 and who have been de-identified for information safety. Here, we identified patients who were diagnosed as pelvic, acetabular, or combined pelvic and acetabular fractures according to ICD-9 code and who survived at least 72 hours after the ICU admission. All the data within the first 72 hours following ICU admission were collected and extracted from the MIMIC-III clinical database version 1.4.

  18. Sunflower Growth Stages

    • kaggle.com
    zip
    Updated Dec 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gülay Karakaya (2024). Sunflower Growth Stages [Dataset]. https://www.kaggle.com/datasets/glaykarahanl/sunflower-development-stages
    Explore at:
    zip(1750700034 bytes)Available download formats
    Dataset updated
    Dec 2, 2024
    Authors
    Gülay Karakaya
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    The images in this dataset were taken from different angles and heights with the help of a DJI Phantom 3 model drone from a 30-acre field in Tekirdağ / Köseilyas, which was planted on April 30, 2020 and harvested on August 31, 2020. The images collected at different times of the day, on 43 separate days, at average intervals of 2-3 days, were filed separately. A total of 6465 images with a resolution of 2250 x 4000 were obtained.

    All images collected on different days and filed separately were combined into a single folder in the order of the day they were taken. When the images are examined, the entire development process from the sprouting of the plants can be easily observed. The dimensions of the original images are quite high (2250 x 4000 pixels) and since they are reduced by about one tenth to fit the input of the network during training, a lot of data is lost. In addition, the success of deep convolutional neural networks depends on having a large amount of data. If there is not enough data, the network may overfit and memorize the data instead of learning it. Therefore, it is thought that using the original images divided into 6 equal parts will reduce data loss and increase the size of the data set. In addition, the divided images increase the diversity in the data set due to perspective differences.

    Although the land where the study was conducted has a smooth (homogeneous) structure, there are development differences in some parts. These differences are seen more especially in the early stages and at the beginning of flowering. For example, while plants have started to emerge from the soil in one part of a divided image, no plants may have sprouted in another part. Or, similarly, while no flowers are seen in one part of a divided image, flowering may have started in another part. Therefore, not all parts of an original image are included in the same class, and each class in the data set is created from individually selected parts.

    When all images are divided into 6 parts, a total of 38790 images with a size of 1333x1125 pixels are obtained. From these images, those that can be clearly distinguished by eye are selected and 8 separate classes are created. In this case, a dataset consisting of 1600 images in each class and 12800 images in total was created. All images were reduced to 224x224 pixels to be suitable for the input of CNN networks.

    The description of each class is as follows.

    1th class: Images were taken from the first emergence from the soil (cotyledon) to the 4-5 leaf stage.

    2th class: Images were taken from the 5-6 leaf stage to the 10-11 leaf stage. The distances between the plants have decreased, overlapping has begun. The plant rows have begun to become clear.

    3th class: Images were taken from the 11-12 leaf stage to the formation of the flower head. The soil ground is almost completely covered with plants. Vivid green tones dominate.

    4th class: Flower beds have begun to open. The flowering process in the middle of the bed has begun. When viewed from above, yellow ray flowers have begun to be seen. Flower heads are upright.

    5th class: The flowering process on the bed is complete or close to complete. Flower beds have begun to bend. Yellow ray flowers continue to be seen.

    6th class: Flowering is complete, flower beds are completely bent. Yellow ray flowers have almost completely fallen off. Green leaves have started to fade.

    7th class: The back of the beds have turned light yellow. Plants can be seen separately. Green leaves have completely faded, soil ground has started to be seen.

    8th class: Physiological maturity is complete. Flower beds and bracts have turned brown. Suitable for harvest.

    You can access the details and the article about the study from the link below. https://dergipark.org.tr/tr/pub/gazimmfd/issue/82783/1200615

  19. Z

    M-Vet Livestock Dataset

    • data-staging.niaid.nih.gov
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mutembesa, Daniel; MVet-Platform (2025). M-Vet Livestock Dataset [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_15194374
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Makerere University
    Authors
    Mutembesa, Daniel; MVet-Platform
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    M-Vet Livestock Dataset is an open-access dataset created by the M-Vet project (www.m-vet.net) based at Makerere University Artificial Intelligence & Data Science Research Lab (www.air.ug) and supported by Lacuna Fund. It is aimed at supporting machine learning models for image classification tasks in Livestock. This dataset contains about 18,000 images of different animal types, including cows, goats, and pigs, collected from various farms and regions and annotated for animal type with corresponding classes. The dataset is designed to facilitate research and development in livestock management, particularly in animal classification tasks using computer vision. It is a valuable resource for researchers, developers, and agricultural stakeholders looking to innovate in animal health monitoring and diagnostics. Available on GitHub for public use.

    The dataset consists of nine subfolders (0001 to 0009), each containing three directories: labels, images, and data. Each image has a corresponding .txt file containing its annotations.

    For example, given the image:

    M-Vet_Livestock-Dataset-main/0001/images/e0b206bf-ee2f-4d6a-bdbe-ea29d70402aa7725714119516389484_jpg.rf.31cf1c70bbd1a46bfb83404ffe9414dc.jpg

    The corresponding label file: M-Vet_Livestock-Dataset-main/0001/labels/e0b206bf-ee2f-4d6a-bdbe-ea29d70402aa7725714119516389484_jpg.rf.31cf1c70bbd1a46bfb83404ffe9414dc.txt

    Contains the following annotation: 0 1 0.07107843124999999 0.311274509375 0.07107843124999999 0.311274509375 0.51992034375 1 0.51992034375 1 0.07107843124999999

    Acknowledgement: Dataset is created by M-Vet project(www.m-vet) led by Daniel Mutembesa(www.linkedin.com/in/mutembesa-daniel-447452165), in collaboration with 162 Expert and rural based Veterinarians and a network of over 1,500 Livestock farmers in Uganda, the National Livestock Resources and Research Institute(https://naro.go.ug/naris/nalirri/), Veterinarians Without Boarders (https://vetswithoutbordersus.org), and Research Consortium on African Swine Fever at Makerere University.

  20. Bengali Character Recognition Dataset (BCRD)

    • kaggle.com
    zip
    Updated Jan 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shuvo Kumar Basak-4004 (2025). Bengali Character Recognition Dataset (BCRD) [Dataset]. https://www.kaggle.com/datasets/shuvokumarbasak4004/bengali-character-recognition-dataset-bcrd
    Explore at:
    zip(73643734 bytes)Available download formats
    Dataset updated
    Jan 14, 2025
    Authors
    Shuvo Kumar Basak-4004
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The Bengali Character Recognition Dataset (BCRD) is a comprehensive collection of images designed for training and evaluating machine learning models focused on the recognition of Bengali characters. The dataset contains 1000 images per character and covers all the basic characters used in the Bengali script, including vowels, consonants, and special symbols. The images in this dataset have been generated using the Noto Sans Bengali font (NotoSansBengali-Regular.ttf), which is a widely used and highly legible font for Bengali text. This ensures that the dataset represents standard, clean text representations in Bengali, making it ideal for character recognition tasks. Dataset Details: • Total Characters: The dataset includes both vowels and consonants, as well as special symbols from the Bengali alphabet. • Number of Images: For each character, there are 1000 images, ensuring a diverse set of images for training deep learning models. • Image Format: All images are stored in PNG format for high quality and clarity. • Resolution: Each image has a resolution of 200x200 pixels. • Font Used: The images are generated using the NotoSansBengali-Regular.ttf font, a standard font for Bengali text representation, which ensures consistency and legibility. • Language: Bengali, which is the primary language of the Bengali-speaking community in Bangladesh and India. Characters Included: The dataset contains the following categories of Bengali characters: 1. Vowels (Sorbonno): অ, আ, ই, ঈ, উ, ঊ, ঋ, এ, ঐ, ও, ঔ 2. Consonants (Bengali Consonants): ক, খ, গ, ঘ, ঙ, চ, ছ, জ, ঝ, ঞ, ট, ঠ, ড, ঢ, ণ, ত, থ, দ, ধ, ন, প, ফ, ব, ভ, ম, য, র, ল, শ, ষ, স, হ, ড়, ঢ়, য় 3. Special Symbols: ৎ, ং, ঃ, ঁ Purpose: • The BCRD is designed to enable and enhance the development of models that can automatically recognize Bengali characters in a variety of real-world applications, including but not limited to: o Optical Character Recognition (OCR) for Bengali text. o Handwriting recognition. o Text classification tasks involving Bengali script. o Linguistic research on the Bengali writing system. Applications: • OCR Systems: This dataset is perfect for training OCR models for Bengali text extraction from scanned documents or images. • AI & Machine Learning: It can be used for various AI tasks, including supervised learning, classification, and model evaluation. • Language Processing: Researchers working on Bengali language processing can use this dataset for training recognition and generation models. Dataset Usage: • Academic Research: This dataset is an excellent resource for students and researchers working on character recognition and natural language processing in Bengali. • Deep Learning Models: The dataset is suitable for training Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), or other state-of-the-art models for Bengali character recognition. • Open Source Projects: It can be utilized by developers and open-source projects aiming to build Bengali OCR systems or handwriting recognition tools. …………………………………..Note for Researchers Using the dataset………………………………………………………………………

    This dataset was created by Shuvo Kumar Basak. If you use this dataset for your research or academic purposes, please ensure to cite this dataset appropriately. If you have published your research using this dataset, please share a link to your paper. Good Luck.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Muhammad Roshan Riaz (2024). Exploring E-commerce Trends⭐️⭐️⭐️ [Dataset]. https://www.kaggle.com/datasets/muhammadroshaanriaz/e-commerce-trends-a-guide-to-leveraging-dataset
Organization logo

Exploring E-commerce Trends⭐️⭐️⭐️

Exploring E-commerce Trends: A Guide to Leveraging Dummy Dataset

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
zip(51169 bytes)Available download formats
Dataset updated
Jul 8, 2024
Authors
Muhammad Roshan Riaz
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Exploring E-commerce Trends: A Guide to Leveraging Dummy Dataset

Introduction: In the world of e-commerce, data is a powerful asset that can be leveraged to understand customer behavior, improve sales strategies, and enhance overall business performance. This guide explores how to effectively utilize a dummy dataset generated to simulate various aspects of an e-commerce platform. By analyzing this dataset, businesses can gain valuable insights into product trends, customer preferences, and market dynamics.

  1. Dataset Overview: The dummy dataset contains information on 1000 products across different categories such as electronics, clothing, home & kitchen, books, toys & games, and more. Each product is associated with attributes such as price, rating, number of reviews, stock quantity, discounts, sales, and date added to inventory. This comprehensive dataset provides a rich source of information for analysis and exploration.

  2. Data Analysis: Using tools like Pandas, NumPy, and visualization libraries like Matplotlib or Seaborn, businesses can perform in-depth analysis of the dataset. Key insights such as top-selling products, popular product categories, pricing trends, and seasonal variations can be extracted through exploratory data analysis (EDA). Visualization techniques can be employed to create intuitive graphs and charts for better understanding and communication of findings.

  3. Machine Learning Applications: The dataset can be used to train machine learning models for various e-commerce tasks such as product recommendation, sales prediction, customer segmentation, and sentiment analysis. By applying algorithms like linear regression, decision trees, or neural networks, businesses can develop predictive models to optimize inventory management, personalize customer experiences, and drive sales growth.

  4. Testing and Prototyping: Businesses can utilize the dummy dataset to test new algorithms, prototype new features, or conduct A/B testing experiments without impacting real user data. This enables rapid iteration and experimentation to validate hypotheses and refine strategies before implementation in a live environment.

  5. Educational Resources: The dummy dataset serves as an invaluable educational resource for students, researchers, and professionals interested in learning about e-commerce data analysis and machine learning. Tutorials, workshops, and online courses can be developed using the dataset to teach concepts such as data manipulation, statistical analysis, and model training in the context of e-commerce.

  6. Decision Support and Strategy Development: Insights derived from the dataset can inform strategic decision-making processes and guide business strategy development. By understanding customer preferences, market trends, and competitor behavior, businesses can make informed decisions regarding product assortment, pricing strategies, marketing campaigns, and resource allocation.

Conclusion: In conclusion, the dummy dataset provides a versatile and valuable resource for exploring e-commerce trends, understanding customer behavior, and driving business growth. By leveraging this dataset effectively, businesses can unlock actionable insights, optimize operations, and stay ahead in today's competitive e-commerce landscape

Search
Clear search
Close search
Google apps
Main menu