90 datasets found
  1. Microsoft Stock Data 2025

    • kaggle.com
    zip
    Updated Feb 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Umer Haddii (2025). Microsoft Stock Data 2025 [Dataset]. https://www.kaggle.com/datasets/umerhaddii/microsoft-stock-data-2025
    Explore at:
    zip(246404 bytes)Available download formats
    Dataset updated
    Feb 4, 2025
    Authors
    Umer Haddii
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Microsoft is an American company that develops and distributes software and services such as: a search engine (Bing), cloud solutions and the computer operating system Windows.

    Market cap

    Market capitalization of Microsoft (MSFT)
    
    Market cap: $3.085 Trillion USD
    

    As of February 2025 Microsoft has a market cap of $3.085 Trillion USD. This makes Microsoft the world's 2nd most valuable company by market cap according to our data. The market capitalization, commonly called market cap, is the total market value of a publicly traded company's outstanding shares and is commonly used to measure how much a company is worth.

    Revenue

    Revenue for Microsoft (MSFT)
    Revenue in 2024 (TTM): $254.19 Billion USD
    

    According to Microsoft's latest financial reports the company's current revenue (TTM ) is $254.19 Billion USD. In 2023 the company made a revenue of $227.58 Billion USD an increase over the revenue in the year 2022 that were of $204.09 Billion USD. The revenue is the total amount of income that a company generates by the sale of goods or services. Unlike with the earnings no expenses are subtracted.

    Earnings

    Earnings for Microsoft (MSFT)
    Earnings in 2024 (TTM): $110.77 Billion USD
    
    

    According to Microsoft's latest financial reports the company's current earnings are $254.19 Billion USD. In 2023 the company made an earning of $101.21 Billion USD, an increase over its 2022 earnings that were of $82.58 Billion USD. The earnings displayed on this page are the earnings before interest and taxes or simply EBIT.

    End of Day market cap according to different sources On Feb 2nd, 2025 the market cap of Microsoft was reported to be:

    • $3.085 Trillion USD by Nasdaq

    • $3.085 Trillion USD by CompaniesMarketCap

    • $3.085 Trillion USD by Yahoo Finance

    Content

    Geography: USA

    Time period: March 1986- February 2025

    Unit of analysis: Microsoft Stock Data 2025

    Variables

    VariableDescription
    datedate
    openThe price at market open.
    highThe highest price for that day.
    lowThe lowest price for that day.
    closeThe price at market close, adjusted for splits.
    adj_closeThe closing price after adjustments for all applicable splits and dividend distributions. Data is adjusted using appropriate split and dividend multipliers, adhering to Center for Research in Security Prices (CRSP) standards.
    volumeThe number of shares traded on that day.

    Acknowledgements

    This dataset belongs to me. I’m sharing it here for free. You may do with it as you wish.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F18335022%2F0304ad0416e7e55515daf890288d7f7f%2FScreenshot%202025-02-03%20152019.png?generation=1738662588735376&alt=media" alt="">

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F18335022%2Fba7629dd0c4dc3e2ea1dbac361b94de1%2FScreenshot%202025-02-03%20152147.png?generation=1738662611945343&alt=media" alt="">

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F18335022%2Fa9f48f1ec5fdf2a363a138389294d5b0%2FScreenshot%202025-02-03%20152159.png?generation=1738662631268574&alt=media" alt="">

  2. Microsoft Stock Data and Key Affiliated Companies

    • kaggle.com
    zip
    Updated Nov 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zongao Bian (2024). Microsoft Stock Data and Key Affiliated Companies [Dataset]. https://www.kaggle.com/datasets/zongaobian/microsoft-stock-data-and-key-affiliated-companies
    Explore at:
    zip(1453413 bytes)Available download formats
    Dataset updated
    Nov 3, 2024
    Authors
    Zongao Bian
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset contains daily stock price data for Microsoft and several key companies that have significantly contributed to its growth and success. The dataset includes historical data from 1980 to 2024 for the following companies:

    • Microsoft (MSFT): The core company behind the dataset.
    • Intel (INTC): A vital partner in the PC revolution, providing processors for many Microsoft-powered devices.
    • IBM (IBM): Microsoft's early partnership with IBM, starting with MS-DOS, laid the foundation for Microsoft's dominance in operating systems.
    • Dell Technologies (DELL): Dell’s PCs pre-installed with Windows helped accelerate Microsoft’s growth in the consumer and enterprise markets.
    • Sony (SONY): A competitor in the gaming industry, Sony played a significant role in shaping Microsoft's strategy for its Xbox division.

    Dataset Details:

    • Date Range: 1980-12-11 to 2024-10-31
    • Interval: Daily stock prices
    • Columns: Date, Open, High, Low, Close, Adjusted Close, Volume

    This dataset is ideal for: - Financial analysis: Study stock price trends over time and compare performance across companies. - Time series forecasting: Predict future stock prices using historical data. - Market correlation analysis: Analyze the relationships between Microsoft and its key affiliated companies in different market conditions.

    Feel free to use this dataset for your financial and stock market projects, analysis, or machine learning models!

  3. 💻 2020-2025 Microsoft Stock Price Data

    • kaggle.com
    zip
    Updated Jun 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saman Fatima (2025). 💻 2020-2025 Microsoft Stock Price Data [Dataset]. https://www.kaggle.com/datasets/samanfatima7/microsoft-stock-price-data-last-5-years
    Explore at:
    zip(51842 bytes)Available download formats
    Dataset updated
    Jun 3, 2025
    Authors
    Saman Fatima
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    📁 Dataset 3: Microsoft Corp. (MSFT)

    💻 Microsoft Corp. (MSFT) Stock Price Data – Last 5 Years

    Description:
    This dataset contains daily historical stock price data for Microsoft Corporation (Ticker: MSFT) over the past 5 years. It is sourced from reliable financial market data providers and is well-suited for:

    • 📈 Time series forecasting and trend analysis
    • 📊 Financial dashboards and visualizations
    • 🧠 Machine learning applications in finance
    • 💼 Investment research and strategy backtesting

    Each entry corresponds to a single trading day and includes various price indicators and trading volume.

    Columns Included:

    • Date: The calendar date of the trading session (YYYY-MM-DD).
    • Open: The price of the stock at the beginning of the trading day.
    • High: The highest price reached during the trading day.
    • Low: The lowest price recorded during the trading day.
    • Close: The price of the stock at the end of the trading session.
    • Adj Close: The closing price adjusted for splits and dividend distributions.
    • Volume: The total number of shares traded during that day.

    🔍 Beginner-Friendly Analysis Techniques:

    If you're new to data analysis or finance, here are some simple but powerful techniques you can apply:

    • Line Plots: Use line charts to visualize how the stock price has changed over time.
    • Moving Averages: Calculate simple or exponential moving averages to identify trends and smooth out short-term fluctuations.
    • Daily Returns: Compute percentage change between closing prices to analyze daily gains or losses.
    • Volatility Analysis: Use rolling standard deviation to understand how volatile the stock has been.
    • Volume Trends: Track trading volume to identify periods of high market activity.
    • Candlestick Charts (intermediate): Create candlestick charts to represent daily open-high-low-close patterns visually.

    Use Cases:
    This dataset can be used to evaluate stock performance trends, calculate technical indicators, simulate investment strategies, or train predictive models on financial data.

  4. Microsoft vs Apple: Stock Performance

    • kaggle.com
    zip
    Updated Oct 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prathamjyot Singh (2024). Microsoft vs Apple: Stock Performance [Dataset]. https://www.kaggle.com/datasets/prathamjyotsingh/microsoft-vs-apple-stock-performance
    Explore at:
    zip(212268 bytes)Available download formats
    Dataset updated
    Oct 24, 2024
    Authors
    Prathamjyot Singh
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Description: Apple vs. Microsoft Price, Split, and Dividends

    This dataset provides a comprehensive analysis of the stock performance of two leading technology companies: Apple Inc. (AAPL) and Microsoft Corporation (MSFT). It includes historical data on stock prices, stock splits, and dividend distributions, enabling a detailed comparison between these industry giants.

    Key Components:

    Daily Stock Prices: This section includes daily open, high, low, close prices, and trading volume for both Apple and Microsoft. It facilitates the examination of market trends and price movements over time.

    Stock Splits: The dataset details any stock splits that have occurred for both companies, including the date of the split and the split ratio. Understanding stock splits is crucial for assessing changes in stock value and investor sentiment.

    Dividends: This section contains information on dividend payments made by both companies, including the payment date, dividend amount, and dividend yield. Analyzing dividends helps investors understand the income-generating potential of each stock.

    Purpose:

    This dataset is designed for financial analysts, investors, and researchers interested in the technology industry and its leading players. By examining historical stock prices, split data, and dividend distributions, users can gain insights into the financial health and market dynamics of Apple and Microsoft, aiding in informed investment decisions and strategic analysis.

    Data Sources:

    Alpha Vantage: Used for fetching daily stock price and dividend data. Yahoo Finance (yfinance): Utilized for obtaining stock splits information.

  5. s

    Analysis of CBCS publications for Open Access, data availability statements...

    • figshare.scilifelab.se
    • researchdata.se
    • +2more
    txt
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Theresa Kieselbach (2025). Analysis of CBCS publications for Open Access, data availability statements and persistent identifiers for supplementary data [Dataset]. http://doi.org/10.17044/scilifelab.23641749.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 15, 2025
    Dataset provided by
    Umeå University
    Authors
    Theresa Kieselbach
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    General descriptionThis dataset contains some markers of Open Science in the publications of the Chemical Biology Consortium Sweden (CBCS) between 2010 and July 2023. The sample of CBCS publications during this period consists of 188 articles. Every publication was visited manually at its DOI URL to answer the following questions.1. Is the research article an Open Access publication?2. Does the research article have a Creative Common license or a similar license?3. Does the research article contain a data availability statement?4. Did the authors submit data of their study to a repository such as EMBL, Genbank, Protein Data Bank PDB, Cambridge Crystallographic Data Centre CCDC, Dryad or a similar repository?5. Does the research article contain supplementary data?6. Do the supplementary data have a persistent identifier that makes them citable as a defined research output?VariablesThe data were compiled in a Microsoft Excel 365 document that includes the following variables.1. DOI URL of research article2. Year of publication3. Research article published with Open Access4. License for research article5. Data availability statement in article6. Supplementary data added to article7. Persistent identifier for supplementary data8. Authors submitted data to NCBI or EMBL or PDB or Dryad or CCDCVisualizationParts of the data were visualized in two figures as bar diagrams using Microsoft Excel 365. The first figure displays the number of publications during a year, the number of publications that is published with open access and the number of publications that contain a data availability statement (Figure 1). The second figure shows the number of publication sper year and how many publications contain supplementary data. This figure also shows how many of the supplementary datasets have a persistent identifier (Figure 2).File formats and softwareThe file formats used in this dataset are:.csv (Text file).docx (Microsoft Word 365 file).jpg (JPEG image file).pdf/A (Portable Document Format for archiving).png (Portable Network Graphics image file).pptx (Microsoft Power Point 365 file).txt (Text file).xlsx (Microsoft Excel 365 file)All files can be opened with Microsoft Office 365 and work likely also with the older versions Office 2019 and 2016. MD5 checksumsHere is a list of all files of this dataset and of their MD5 checksums.1. Readme.txt (MD5: 795f171be340c13d78ba8608dafb3e76)2. Manifest.txt (MD5: 46787888019a87bb9d897effdf719b71)3. Materials_and_methods.docx (MD5: 0eedaebf5c88982896bd1e0fe57849c2),4. Materials_and_methods.pdf (MD5: d314bf2bdff866f827741d7a746f063b),5. Materials_and_methods.txt (MD5: 26e7319de89285fc5c1a503d0b01d08a),6. CBCS_publications_until_date_2023_07_05.xlsx (MD5: 532fec0bd177844ac0410b98de13ca7c),7. CBCS_publications_until_date_2023_07_05.csv (MD5: 2580410623f79959c488fdfefe8b4c7b),8. Data_from_CBCS_publications_until_date_2023_07_05_obtained_by_manual_collection.xlsx (MD5: 9c67dd84a6b56a45e1f50a28419930e5),9. Data_from_CBCS_publications_until_date_2023_07_05_obtained_by_manual_collection.csv (MD5: fb3ac69476bfc57a8adc734b4d48ea2b),10. Aggregated_data_from_CBCS_publications_until_2023_07_05.xlsx (MD5: 6b6cbf3b9617fa8960ff15834869f793),11. Aggregated_data_from_CBCS_publications_until_2023_07_05.csv (MD5: b2b8dd36ba86629ed455ae5ad2489d6e),12. Figure_1_CBCS_publications_until_2023_07_05_Open_Access_and_data_availablitiy_statement.xlsx (MD5: 9c0422cf1bbd63ac0709324cb128410e),13. Figure_1.pptx (MD5: 55a1d12b2a9a81dca4bb7f333002f7fe),14. Image_of_figure_1.jpg (MD5: 5179f69297fbbf2eaaf7b641784617d7),15. Image_of_figure_1.png (MD5: 8ec94efc07417d69115200529b359698),16. Figure_2_CBCS_publications_until_2023_07_05_supplementary_data_and_PID_for_supplementary_data.xlsx (MD5: f5f0d6e4218e390169c7409870227a0a),17. Figure_2.pptx (MD5: 0fd4c622dc0474549df88cf37d0e9d72),18. Image_of_figure_2.jpg (MD5: c6c68b63b7320597b239316a1c15e00d),19. Image_of_figure_2.png (MD5: 24413cc7d292f468bec0ac60cbaa7809)

  6. Microsoft stock price and financials

    • kaggle.com
    zip
    Updated Sep 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Piyush Agrawal2 (2023). Microsoft stock price and financials [Dataset]. https://www.kaggle.com/datasets/piyushagrawal2/microsoft-stock-price-and-financials
    Explore at:
    zip(379059 bytes)Available download formats
    Dataset updated
    Sep 4, 2023
    Authors
    Piyush Agrawal2
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Overview:

    This comprehensive dataset combines Microsoft Corporation's historical stock price data with its annual and quarterly financial statements. It provides a rich source of information for financial analysis, investment research, and data-driven decision-making.

    Content:

    This dataset comprises the following key components:

    • Microsoft Stock Price Data: This section includes historical daily closing prices of Microsoft (MSFT) common stock. The dataset covers a significant time frame, making it suitable for long-term trend analysis and portfolio optimization.

    • Annual Financial Statements:

    Balance Sheets: Microsoft's annual balance sheets, offering insights into the company's financial position, assets, liabilities, and equity. Income Statements: Annual income statements presenting revenue, expenses, and profitability metrics. Cash Flow Statements: Annual cash flow statements providing details on operating, investing, and financing activities.

    • Quarterly Financial Statements:

    Balance Sheets: Microsoft's quarterly balance sheets for a more granular view of financial changes throughout the year. Income Statements: Quarterly income statements offering a closer look at revenue and expenses trends. Cash Flow Statements: Quarterly cash flow statements for insights into short-term financial dynamics.

    • Use Cases:

    Financial Analysis: Researchers and analysts can use this dataset to perform in-depth financial analysis, including ratio analysis, trend analysis, and performance benchmarking.

    Investment Research: Investors can leverage this data to make informed investment decisions, assess risk, and evaluate Microsoft's financial health.

    Portfolio Management: Portfolio managers can use historical stock price data to optimize their portfolios and monitor the performance of Microsoft within their holdings.

    • Data Sources:

    The financial data in this dataset is collected from the Yahoo Finance API, a reliable and widely-used source of financial data. The stock price data is specifically sourced from this API.

    • Note on Data Quality:

    Efforts have been made to ensure the accuracy and consistency of the data collected from the Yahoo Finance API. However, users are encouraged to verify the information independently for critical applications. As with any financial dataset, it's essential to exercise due diligence in analysis and decision-making.

  7. R

    Microsoft Coco Dataset

    • universe.roboflow.com
    zip
    Updated Jul 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Microsoft (2025). Microsoft Coco Dataset [Dataset]. https://universe.roboflow.com/microsoft/coco/model/3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 23, 2025
    Dataset authored and provided by
    Microsoft
    Variables measured
    Object Bounding Boxes
    Description

    Microsoft Common Objects in Context (COCO) Dataset

    The Common Objects in Context (COCO) dataset is a widely recognized collection designed to spur object detection, segmentation, and captioning research. Created by Microsoft, COCO provides annotations, including object categories, keypoints, and more. The model it a valuable asset for machine learning practitioners and researchers. Today, many model architectures are benchmarked against COCO, which has enabled a standard system by which architectures can be compared.

    While COCO is often touted to comprise over 300k images, it's pivotal to understand that this number includes diverse formats like keypoints, among others. Specifically, the labeled dataset for object detection stands at 123,272 images.

    The full object detection labeled dataset is made available here, ensuring researchers have access to the most comprehensive data for their experiments. With that said, COCO has not released their test set annotations, meaning the test data doesn't come with labels. Thus, this data is not included in the dataset.

    The Roboflow team has worked extensively with COCO. Here are a few links that may be helpful as you get started working with this dataset:

  8. llmail-inject-challenge

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Microsoft, llmail-inject-challenge [Dataset]. https://huggingface.co/datasets/microsoft/llmail-inject-challenge
    Explore at:
    Dataset authored and provided by
    Microsofthttp://microsoft.com/
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Summary

    This dataset contains a large number of attack prompts collected as part of the now closed LLMail-Inject: Adaptive Prompt Injection Challenge. We first describe the details of the challenge, and then we provide a documentation of the dataset For the accompanying code, check out: https://github.com/microsoft/llmail-inject-challenge.

      Citation
    

    @article{abdelnabi2025, title = {LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injection… See the full description on the dataset page: https://huggingface.co/datasets/microsoft/llmail-inject-challenge.

  9. The ORBIT (Object Recognition for Blind Image Training)-India Dataset

    • data.niaid.nih.gov
    • nde-dev.biothings.io
    • +1more
    Updated Jul 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    India, Gesu; Grayson, Martin; Massiceti, Daniela; Morrison, Cecily; Robinson, Simon; Pearson, Jennifer; Jones, Matt (2024). The ORBIT (Object Recognition for Blind Image Training)-India Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11394528
    Explore at:
    Dataset updated
    Jul 2, 2024
    Dataset provided by
    Microsofthttp://microsoft.com/
    Swansea University
    Authors
    India, Gesu; Grayson, Martin; Massiceti, Daniela; Morrison, Cecily; Robinson, Simon; Pearson, Jennifer; Jones, Matt
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    India
    Description

    The ORBIT (Object Recognition for Blind Image Training) -India Dataset is a collection of 105,243 images of 76 commonly used objects, collected by 12 individuals in India who are blind or have low vision. This dataset is an "Indian subset" of the original ORBIT dataset [1, 2], which was collected in the UK and Canada. In contrast to the ORBIT dataset, which was created in a Global North, Western, and English-speaking context, the ORBIT-India dataset features images taken in a low-resource, non-English-speaking, Global South context, a home to 90% of the world’s population of people with blindness. Since it is easier for blind or low-vision individuals to gather high-quality data by recording videos, this dataset, like the ORBIT dataset, contains images (each sized 224x224) derived from 587 videos. These videos were taken by our data collectors from various parts of India using the Find My Things [3] Android app. Each data collector was asked to record eight videos of at least 10 objects of their choice.

    Collected between July and November 2023, this dataset represents a set of objects commonly used by people who are blind or have low vision in India, including earphones, talking watches, toothbrushes, and typical Indian household items like a belan (rolling pin), and a steel glass. These videos were taken in various settings of the data collectors' homes and workspaces using the Find My Things Android app.

    The image dataset is stored in the ‘Dataset’ folder, organized by folders assigned to each data collector (P1, P2, ...P12) who collected them. Each collector's folder includes sub-folders named with the object labels as provided by our data collectors. Within each object folder, there are two subfolders: ‘clean’ for images taken on clean surfaces and ‘clutter’ for images taken in cluttered environments where the objects are typically found. The annotations are saved inside a ‘Annotations’ folder containing a JSON file per video (e.g., P1--coffee mug--clean--231220_084852_coffee mug_224.json) that contains keys corresponding to all frames/images in that video (e.g., "P1--coffee mug--clean--231220_084852_coffee mug_224--000001.jpeg": {"object_not_present_issue": false, "pii_present_issue": false}, "P1--coffee mug--clean--231220_084852_coffee mug_224--000002.jpeg": {"object_not_present_issue": false, "pii_present_issue": false}, ...). The ‘object_not_present_issue’ key is True if the object is not present in the image, and the ‘pii_present_issue’ key is True, if there is a personally identifiable information (PII) present in the image. Note, all PII present in the images has been blurred to protect the identity and privacy of our data collectors. This dataset version was created by cropping images originally sized at 1080 × 1920; therefore, an unscaled version of the dataset will follow soon.

    This project was funded by the Engineering and Physical Sciences Research Council (EPSRC) Industrial ICASE Award with Microsoft Research UK Ltd. as the Industrial Project Partner. We would like to acknowledge and express our gratitude to our data collectors for their efforts and time invested in carefully collecting videos to build this dataset for their community. The dataset is designed for developing few-shot learning algorithms, aiming to support researchers and developers in advancing object-recognition systems. We are excited to share this dataset and would love to hear from you if and how you use this dataset. Please feel free to reach out if you have any questions, comments or suggestions.

    REFERENCES:

    Daniela Massiceti, Lida Theodorou, Luisa Zintgraf, Matthew Tobias Harris, Simone Stumpf, Cecily Morrison, Edward Cutrell, and Katja Hofmann. 2021. ORBIT: A real-world few-shot dataset for teachable object recognition collected from people who are blind or low vision. DOI: https://doi.org/10.25383/city.14294597

    microsoft/ORBIT-Dataset. https://github.com/microsoft/ORBIT-Dataset

    Linda Yilin Wen, Cecily Morrison, Martin Grayson, Rita Faia Marques, Daniela Massiceti, Camilla Longden, and Edward Cutrell. 2024. Find My Things: Personalized Accessibility through Teachable AI for People who are Blind or Low Vision. In Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems (CHI EA '24). Association for Computing Machinery, New York, NY, USA, Article 403, 1–6. https://doi.org/10.1145/3613905.3648641

  10. Z

    MAG for Heterogeneous Graph Learning

    • data.niaid.nih.gov
    Updated Jul 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Diea, Maria-Alexandra (2021). MAG for Heterogeneous Graph Learning [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5055135
    Explore at:
    Dataset updated
    Jul 9, 2021
    Dataset provided by
    University of Amsterdam
    Authors
    Diea, Maria-Alexandra
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    We provide an academic graph based on a snapshot of the Microsoft Academic Graph from 26.05.2021. The Microsoft Academic Graph (MAG) is a large-scale dataset containing information about scientific publication records, their citation relations, as well as authors, affiliations, journals, conferences and fields of study. We acknowledge the Microsoft Academic Graph using the URI https://aka.ms/msracad. For more information regarding schema and the entities present in the original dataset please refer to: MAG schema.

    MAG for Heterogeneous Graph Learning We use a recent version of MAG from May 2021 and extract all relevant entities to build a graph that can be directly used for heterogeneous graph learning (node classification, link prediction, etc.). The graph contains all English papers, published after 1900, that have been cited at least 5 times per year since the time of publishing. For fairness, we set a constant citation bound of 100 for papers published before 2000. We further include two smaller subgraphs, one containing computer science papers and one containing medicine papers.

    Nodes and features We define the following nodes:

    paper with mag_id, graph_id, normalized title, year of publication, citations and a 128-dimension title embedding built using word2vec No. of papers: 5,091,690 (all), 1,014,769 (medicine), 367,576 (computer science);

    author with mag_id, graph_id, normalized name, citations No. of authors: 6,363,201 (all), 1,797,980 (medicine), 557,078 (computer science);

    field with mag_id, graph_id, level, citations denoting the hierarchical level of the field where 0 is the highest-level (e.g. computer science) No. of fields: 199,457 (all), 83,970 (medicine), 45,454 (computer science);

    affiliation with mag_id, graph_id, citations No. of affiliations: 19,421 (all), 12,103 (medicine), 10,139 (computer science);

    venue with mag_id, graph_id, citations, type denoting whether conference or journal No. of venues: 24,608 (all), 8,514 (medicine), 9,893 (computer science).

    Edges We define the following edges:

    author is_affiliated_with affiliation No. of author-affiliation edges: 8,292,253 (all), 2,265,728 (medicine), 665,931 (computer science);

    author is_first/last/other paper No. of author-paper edges: 24,907,473 (all), 5,081,752 (medicine), 1,269,485 (computer science);

    paper has_citation_to paper No. of author-affiliation edges: 142,684,074 (all), 16,808,837 (medicine), 4,152,804 (computer science);

    paper conference/journal_published_at venue No. of author-affiliation edges: 5,091,690 (all), 1,014,769 (medicine), 367,576 (computer science);

    paper has_field_L0/L1/L2/L3/L4 field No. of author-affiliation edges: 47,531,366 (all), 9,403,708 (medicine), 3,341,395 (computer science);

    field is_in field No. of author-affiliation edges: 339,036 (all), 138,304 (medicine), 83,245 (computer science);

    We further include a reverse edge for each edge type defined above that is denoted with the prefix rev_ and can be removed based on the downstream task.

    Data structure The nodes and their respective features are provided as separate .tsv files where each feature represents a column. The edges are provided as a pickled python dictionary with schema:

    {target_type: {source_type: {edge_type: {target_id: {source_id: {time } } } } } }

    We provide three compressed ZIP archives, one for each subgraph (all, medicine, computer science), however we split the file for the complete graph into 500mb chunks. Each archive contains the separate node features and edge dictionary.

  11. Data from: Impacts Assessment of Dynamic Speed Harmonization with Queue...

    • catalog.data.gov
    • odgavaprod.ogopendata.com
    • +1more
    Updated Dec 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Federal Highway Administration (2023). Impacts Assessment of Dynamic Speed Harmonization with Queue Warning: Task 3, Impacts Assessment Report [supporting datasets] [Dataset]. https://catalog.data.gov/dataset/impacts-assessment-of-dynamic-speed-harmonization-with-queue-warning-task-3-impacts-assess
    Explore at:
    Dataset updated
    Dec 7, 2023
    Dataset provided by
    Federal Highway Administrationhttps://highways.dot.gov/
    Description

    The datasets in the .pdf and .zip attached to this record are in support of Intelligent Transportation Systems Joint Program Office (ITS JPO) report FHWA-JPO-15-222, "Impacts Assessment of Dynamic Speed Harmonization with Queue Warning: Task 3, Impacts Assessment Report". The files in these zip files are specifically related to the US-101 Testbed, near San Mateo, CA. The uncompressed and compressed files total 2.0265 GB in size. The files have been uploaded as-is; no further documentation was supplied by NTL. All located .docx files were converted to .pdf document files which are an open, archival format. These .pdfs were then added to the zip file alongside the original .docx files. The attached zip files can be unzipped using any zip compression/decompression software. These zip file contains files in the following formats: .pdf document files which can be read using any pdf reader; .xlsxm macro-enabled spreadsheet files which can be read in Microsoft Excel and some Tech Report spreadsheet programs; .accdb database files which may be opened with Microsoft Access Database software and Tech Report open database software applications ; as well as .db generic database files, often associated with thumbnail images in the Windows operating environment. [software requirements] These files were last accessed in 2017. File and .zip file names include: FHWA_JPO_15_222_INFLO_Performance_Measure_METADATA.pdf ; FHWA_JPO_15_222_INFLO_Performance_Measure_METADATA.docx ; FHWA_JPO_15_222_INFLO_VISSIM_Output_and_Analysis_Spreadsheets.zip ; FHWA_JPO_15_222_INFLO_Spreadsheet_PDFs.zip ; FHWA_JPO_15_222_DATA_CV50.zip ; and, FHWA_JPO_15_222_DATA_CV25.zip

  12. DataSheet1_Mitigating Biases in CORD-19 for Analyzing COVID-19...

    • frontiersin.figshare.com
    zip
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anshul Kanakia; Kuansan Wang; Yuxiao Dong; Boya Xie; Kyle Lo; Zhihong Shen; Lucy Lu Wang; Chiyuan Huang; Darrin Eide; Sebastian Kohlmeier; Chieh-Han Wu (2023). DataSheet1_Mitigating Biases in CORD-19 for Analyzing COVID-19 Literature.zip [Dataset]. http://doi.org/10.3389/frma.2020.596624.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Anshul Kanakia; Kuansan Wang; Yuxiao Dong; Boya Xie; Kyle Lo; Zhihong Shen; Lucy Lu Wang; Chiyuan Huang; Darrin Eide; Sebastian Kohlmeier; Chieh-Han Wu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    On the behest of the Office of Science and Technology Policy in the White House, six institutions, including ours, have created an open research dataset called COVID-19 Research Dataset (CORD-19) to facilitate the development of question-answering systems that can assist researchers in finding relevant research on COVID-19. As of May 27, 2020, CORD-19 includes more than 100,000 open access publications from major publishers and PubMed as well as preprint articles deposited into medRxiv, bioRxiv, and arXiv. Recent years, however, have also seen question-answering and other machine learning systems exhibit harmful behaviors to humans due to biases in the training data. It is imperative and only ethical for modern scientists to be vigilant in inspecting and be prepared to mitigate the potential biases when working with any datasets. This article describes a framework to examine biases in scientific document collections like CORD-19 by comparing their properties with those derived from the citation behaviors of the entire scientific community. In total, three expanded sets are created for the analyses: 1) the enclosure set CORD-19E composed of CORD-19 articles and their references and citations, mirroring the methodology used in the renowned “A Century of Physics” analysis; 2) the full closure graph CORD-19C that recursively includes references starting with CORD-19; and 3) the inflection closure CORD-19I, that is, a much smaller subset of CORD-19C but already appropriate for statistical analysis based on the theory of the scale-free nature of the citation network. Taken together, all these expanded datasets show much smoother trends when used to analyze global COVID-19 research. The results suggest that while CORD-19 exhibits a strong tilt toward recent and topically focused articles, the knowledge being explored to attack the pandemic encompasses a much longer time span and is very interdisciplinary. A question-answering system with such expanded scope of knowledge may perform better in understanding the literature and answering related questions. However, while CORD-19 appears to have topical coverage biases compared to the expanded sets, the collaboration patterns, especially in terms of team sizes and geographical distributions, are captured very well already in CORD-19 as the raw statistics and trends agree with those from larger datasets.

  13. d

    Data from: Winter Steelhead Distribution [ds340]

    • catalog.data.gov
    • data.cnra.ca.gov
    • +6more
    Updated Jul 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Fish and Wildlife (2025). Winter Steelhead Distribution [ds340] [Dataset]. https://catalog.data.gov/dataset/winter-steelhead-distribution-ds340-cc0ea
    Explore at:
    Dataset updated
    Jul 24, 2025
    Dataset provided by
    California Department of Fish and Wildlife
    Description

    Winter Steelhead Distribution June 2012 Version This dataset depicts observation-based stream-level geographic distribution of anadromous winter-run steelhead trout, Oncorhynchus mykiss irideus (O. mykiss), in California. It was developed for the express purpose of assisting with steelhead recovery planning efforts. The distributions reported in this dataset were derived from a subset of the data contained in the Aquatic Species Observation Database (ASOD), a Microsoft Access multi-species observation data capture application. ASOD is an ongoing project designed to capture as complete a set of statewide inland aquatic vertebrate species observation information as possible. Please note: A separate distribution is available for summer-run steelhead. Contact information is the same as for the above. ASOD Observation data were used to develop a network of stream segments. These lines are developed by "tracing down" from each observation to the sea using the flow properties of USGS National Hydrography Dataset (NHD) High Resolution hydrography. Lastly these lines, representing stream segments, were assigned a value of either Anad Present (Anadromous present). The end result (i.e., this layer) consists of a set of lines representing the distribution of steelhead based on observations in the Aquatic Species Observation Database. This dataset represents stream reaches that are known or believed to be used by steelhead based on steelhead observations. Thus, it contains only positive steelhead occurrences. The absence of distribution on a stream does not necessarily indicate that steelhead do not utilize that stream. Additionally, steelhead may not be found in all streams or reaches each year. This is due to natural variations in run size, water conditions, and other environmental factors. The information in this data set should be used as an indicator of steelhead presence/suspected presence at the time of the observation as indicated by the 'Late_Yr' (Latest Year) field attribute. The line features in the dataset may not represent the maximum extent of steelhead on a stream; rather it is important to note that this distribution most likely underestimates the actual distribution of steelhead. This distribution is based on observations found in the ASOD database. The individual observations may not have occurred at the upper extent of anadromous occupation. In addition, no attempt was made to capture every observation of O. mykiss and so it should not be assumed that this dataset is complete for each stream. The distribution dataset was built solely from the ASOD observational data. No additional data (habitat mapping, barriers data, gradient modeling, etc.) were utilized to either add to or validate the data. It is very possible that an anadromous observation in this dataset has been recorded above (upstream of) a barrier as identified in the Passage Assessment Database (PAD). In the near future, we hope to perform a comparative analysis between this dataset and the PAD to identify and resolve all such discrepancies. Such an analysis will add rigor to and help validate both datasets. This dataset has recently undergone a review. Data source contributors as well as CDFG fisheries biologists have been provided the opportunity to review and suggest edits or additions during a recent review. Data contributors were notified and invited to review and comment on the handling of the information that they provided. The distribution was then posted to an intranet mapping application and CDFG biologists were provided an opportunity to review and comment on the dataset. During this review, biologists were also encouraged to add new observation data. This resulting final distribution contains their suggestions and additions. Please refer to "Use Constraints" section below.

  14. Data from: Summer Steelhead Distribution [ds341]

    • data.ca.gov
    • data.cnra.ca.gov
    • +5more
    Updated Oct 12, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Fish and Wildlife (2023). Summer Steelhead Distribution [ds341] [Dataset]. https://data.ca.gov/dataset/summer-steelhead-distribution-ds3411
    Explore at:
    geojson, html, kml, csv, zip, arcgis geoservices rest apiAvailable download formats
    Dataset updated
    Oct 12, 2023
    Dataset authored and provided by
    California Department of Fish and Wildlifehttps://wildlife.ca.gov/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Summer Steelhead Distribution October 2009 Version This dataset depicts observation-based stream-level geographic distribution of anadromous summer-run steelhead trout, Oncorhynchus mykiss irideus (O. mykiss), in California. It was developed for the express purpose of assisting with steelhead recovery planning efforts. The distributions reported in this dataset were derived from a subset of the data contained in the Aquatic Species Observation Database (ASOD), a Microsoft Access multi-species observation data capture application. ASOD is an ongoing project designed to capture as complete a set of statewide inland aquatic vertebrate species observation information as possible. Please note: A separate distribution is available for winter-run steelhead. Contact information is the same as for the above. ASOD Observation data were used to develop a network of stream segments. These lines are developed by "tracing down" from each observation to the sea using the flow properties of USGS National Hydrography Dataset (NHD) High Resolution hydrography. Lastly these lines, representing stream segments, were assigned a value of either Anad Present (Anadromous present). The end result (i.e., this layer) consists of a set of lines representing the distribution of steelhead based on observations in the Aquatic Species Observation Database. This dataset represents stream reaches that are known or believed to be used by steelhead based on steelhead observations. Thus, it contains only positive steelhead occurrences. The absence of distribution on a stream does not necessarily indicate that steelhead do not utilize that stream. Additionally, steelhead may not be found in all streams or reaches each year. This is due to natural variations in run size, water conditions, and other environmental factors. The information in this data set should be used as an indicator of steelhead presence/suspected presence at the time of the observation as indicated by the 'Late_Yr' (Latest Year) field attribute. The line features in the dataset may not represent the maximum extent of steelhead on a stream; rather it is important to note that this distribution most likely underestimates the actual distribution of steelhead. This distribution is based on observations found in the ASOD database. The individual observations may not have occurred at the upper extent of anadromous occupation. In addition, no attempt was made to capture every observation of O. mykiss and so it should not be assumed that this dataset is complete for each stream. The distribution dataset was built solely from the ASOD observational data. No additional data (habitat mapping, barriers data, gradient modeling, etc.) were utilized to either add to or validate the data. It is very possible that an anadromous observation in this dataset has been recorded above (upstream of) a barrier as identified in the Passage Assessment Database (PAD). In the near future, we hope to perform a comparative analysis between this dataset and the PAD to identify and resolve all such discrepancies. Such an analysis will add rigor to and help validate both datasets. This dataset has recently undergone a review. Data source contributors as well as CDFG fisheries biologists have been provided the opportunity to review and suggest edits or additions during a recent review. Data contributors were notified and invited to review and comment on the handling of the information that they provided. The distribution was then posted to an intranet mapping application and CDFG biologists were provided an opportunity to review and comment on the dataset. During this review, biologists were also encouraged to add new observation data. This resulting final distribution contains their suggestions and additions. Please refer to "Use Constraints" section below.

  15. m

    Dataset of development of business during the COVID-19 crisis

    • data.mendeley.com
    • narcis.nl
    Updated Nov 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tatiana N. Litvinova (2020). Dataset of development of business during the COVID-19 crisis [Dataset]. http://doi.org/10.17632/9vvrd34f8t.1
    Explore at:
    Dataset updated
    Nov 9, 2020
    Authors
    Tatiana N. Litvinova
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    To create the dataset, the top 10 countries leading in the incidence of COVID-19 in the world were selected as of October 22, 2020 (on the eve of the second full of pandemics), which are presented in the Global 500 ranking for 2020: USA, India, Brazil, Russia, Spain, France and Mexico. For each of these countries, no more than 10 of the largest transnational corporations included in the Global 500 rating for 2020 and 2019 were selected separately. The arithmetic averages were calculated and the change (increase) in indicators such as profitability and profitability of enterprises, their ranking position (competitiveness), asset value and number of employees. The arithmetic mean values of these indicators for all countries of the sample were found, characterizing the situation in international entrepreneurship as a whole in the context of the COVID-19 crisis in 2020 on the eve of the second wave of the pandemic. The data is collected in a general Microsoft Excel table. Dataset is a unique database that combines COVID-19 statistics and entrepreneurship statistics. The dataset is flexible data that can be supplemented with data from other countries and newer statistics on the COVID-19 pandemic. Due to the fact that the data in the dataset are not ready-made numbers, but formulas, when adding and / or changing the values in the original table at the beginning of the dataset, most of the subsequent tables will be automatically recalculated and the graphs will be updated. This allows the dataset to be used not just as an array of data, but as an analytical tool for automating scientific research on the impact of the COVID-19 pandemic and crisis on international entrepreneurship. The dataset includes not only tabular data, but also charts that provide data visualization. The dataset contains not only actual, but also forecast data on morbidity and mortality from COVID-19 for the period of the second wave of the pandemic in 2020. The forecasts are presented in the form of a normal distribution of predicted values and the probability of their occurrence in practice. This allows for a broad scenario analysis of the impact of the COVID-19 pandemic and crisis on international entrepreneurship, substituting various predicted morbidity and mortality rates in risk assessment tables and obtaining automatically calculated consequences (changes) on the characteristics of international entrepreneurship. It is also possible to substitute the actual values identified in the process and following the results of the second wave of the pandemic to check the reliability of pre-made forecasts and conduct a plan-fact analysis. The dataset contains not only the numerical values of the initial and predicted values of the set of studied indicators, but also their qualitative interpretation, reflecting the presence and level of risks of a pandemic and COVID-19 crisis for international entrepreneurship.

  16. s

    Data from: Fostering cultures of open qualitative research: Dataset 1 –...

    • orda.shef.ac.uk
    docx
    Updated Oct 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthew Hanchard; Itzel San Roman Pineda (2025). Fostering cultures of open qualitative research: Dataset 1 – Survey Responses [Dataset]. http://doi.org/10.15131/shef.data.23567250.v1
    Explore at:
    docxAvailable download formats
    Dataset updated
    Oct 8, 2025
    Dataset provided by
    The University of Sheffield
    Authors
    Matthew Hanchard; Itzel San Roman Pineda
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This dataset was created and deposited onto the University of Sheffield Online Research Data repository (ORDA) on 23-Jun-2023 by Dr. Matthew S. Hanchard, Research Associate at the University of Sheffield iHuman Institute.

    The dataset forms part of three outputs from a project titled ‘Fostering cultures of open qualitative research’ which ran from January 2023 to June 2023:

    · Fostering cultures of open qualitative research: Dataset 1 – Survey Responses · Fostering cultures of open qualitative research: Dataset 2 – Interview Transcripts · Fostering cultures of open qualitative research: Dataset 3 – Coding Book

    The project was funded with £13,913.85 Research England monies held internally by the University of Sheffield - as part of their ‘Enhancing Research Cultures’ scheme 2022-2023.

    The dataset aligns with ethical approval granted by the University of Sheffield School of Sociological Studies Research Ethics Committee (ref: 051118) on 23-Jan-2021.This includes due concern for participant anonymity and data management.

    ORDA has full permission to store this dataset and to make it open access for public re-use on the basis that no commercial gain will be made form reuse. It has been deposited under a CC-BY-NC license.

    This dataset comprises one spreadsheet with N=91 anonymised survey responses .xslx format. It includes all responses to the project survey which used Google Forms between 06-Feb-2023 and 30-May-2023. The spreadsheet can be opened with Microsoft Excel, Google Sheet, or open-source equivalents.

    The survey responses include a random sample of researchers worldwide undertaking qualitative, mixed-methods, or multi-modal research.

    The recruitment of respondents was initially purposive, aiming to gather responses from qualitative researchers at research-intensive (targetted Russell Group) Universities. This involved speculative emails and a call for participant on the University of Sheffield ‘Qualitative Open Research Network’ mailing list. As result, the responses include a snowball sample of scholars from elsewhere.

    The spreadsheet has two tabs/sheets: one labelled ‘SurveyResponses’ contains the anonymised and tidied set of survey responses; the other, labelled ‘VariableMapping’, sets out each field/column in the ‘SurveyResponses’ tab/sheet against the original survey questions and responses it relates to.

    The survey responses tab/sheet includes a field/column labelled ‘RespondentID’ (using randomly generated 16-digit alphanumeric keys) which can be used to connect survey responses to interview participants in the accompanying ‘Fostering cultures of open qualitative research: Dataset 2 – Interview transcripts’ files.

    A set of survey questions gathering eligibility criteria detail and consent are not listed with in this dataset, as below. All responses provide in the dataset gained a ‘Yes’ response to all the below questions (with the exception of one question, marked with an asterisk (*) below):

    · I am aged 18 or over · I have read the information and consent statement and above. · I understand how to ask questions and/or raise a query or concern about the survey. · I agree to take part in the research and for my responses to be part of an open access dataset. These will be anonymised unless I specifically ask to be named. · I understand that my participation does not create a legally binding agreement or employment relationship with the University of Sheffield · I understand that I can withdraw from the research at any time. · I assign the copyright I hold in materials generated as part of this project to The University of Sheffield. · * I am happy to be contacted after the survey to take part in an interview.

    The project was undertaken by two staff: Co-investigator: Dr. Itzel San Roman Pineda ORCiD ID: 0000-0002-3785-8057 i.sanromanpineda@sheffield.ac.uk

    Postdoctoral Research Assistant Principal Investigator (corresponding dataset author): Dr. Matthew Hanchard ORCiD ID: 0000-0003-2460-8638 m.s.hanchard@sheffield.ac.uk Research Associate iHuman Institute, Social Research Institutes, Faculty of Social Science

  17. Family Food Open Data

    • gov.uk
    Updated Feb 18, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Environment, Food & Rural Affairs (2016). Family Food Open Data [Dataset]. https://www.gov.uk/government/statistics/family-food-open-data
    Explore at:
    Dataset updated
    Feb 18, 2016
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Department for Environment, Food & Rural Affairs
    Description

    The National Food Survey (NFS) was originally set up in 1940 to monitor the adequacy of the diet of urban working class households. It evolved into a continuous sampling enquiry into the domestic food consumption and expenditure of all private households, regardless of class. This open data release covers the years from 1974 to 2000, when the National Food Survey and Family Expenditure Surveys were merged into the Expenditure and Food Survey, and then became known as the Family Food Module of the Living Costs and Food Survey.

    The data that Defra is releasing now as Open Data are the only remaining data in electronic form. They were stored in Microsoft Access database format as five-year databases except for the last year, 2000. For each year there was a standard set of data tables:

    • Diary data (the summarised records of each purchase of food for consumption in the home, taken from the National Food Survey log-books)
    • Household data (the characteristics of the household such as location, occupation of Head of Household and Housewife (if present) etc., taken from the interviewer’s questionnaire)
    • Mealsout data (record of all meals taken outside the home, taken from the log-books)
    • Visitor data (record of all visitors to the home, taken from the questionnaire)
    • Person data (record of each member of the household such as age, gender, occupation, taken from the questionnaire)

    Some changes have been made to make these suitable for release as Open Data. These are detailed in the document “Introduction to the National Food Survey” within the data release. In particular, the Person data has been withheld from open release for disclosure control purposes. All other data is available as separate tables in tab-separated-value text format for individual years.

    In addition, there are

    • Nutrient Conversion Factor tables for each year (details in the other documentation)
    • lookup tables to translate the short field codes in the original data tables into longer, more meaningful terms, taken from the database system.
    • some additional tables and documentation to try to clarify meanings and changes in the usage of data fields, and some of the changes made to the data for disclosure control purposes. More details are in the “Introduction” file.

    Trying to find a balance between providing a rich and useful source of food purchasing data, and protecting the privacy of respondents throughout the years, has been one of the biggest challenges involved in releasing this data. We have consulted extensively with privacy experts, data protection specialists in Defra and a group of trusted external data testers in the run up to releasing this data. We have published a privacy impact assessment (see link above) which takes you through our process creating a data set which minimises privacy risks while hopefully still being useful to the public.

    The data is being released under the Open Government Licence v3.0 (OGL). For the avoidance of doubt, attempts to re-identify individuals from the openly licensed datasets is not an acceptable use of the data. Any instances of this brought to Defra’s attention will be directed to the Information Commissioner’s Office for investigation.

    Defra takes the privacy of respondents to Family Food surveys seriously. If you identify a privacy-related risk please let us know via familyfood@defra.gsi.gov.uk. Defra will remove the data from data.gov.uk and other online locations if a serious privacy breach is identified, and work to resolve it.

    https://data.gov.uk/dataset/family_food_open_data">The open data release can be found by clicking here.

    Another version of this data, without the disclosure control changes, is available from the United Kingdom Data Service under an End User Licence. https://www.ukdataservice.ac.uk/">For details go to the UK Data Service and search for National Food Survey.

    http://webarchive.nationalarchives.gov.uk/20130103014432/http://www.defra.gov.uk/statistics/foodfarm/food/familyfood/nationalfoodsurvey/">Some annual reports and datasets from the National Food Survey are available online at this link

    You may find the National Food Survey/Family Food timeline helpful in understanding the evolution of the food surveys.

    Defra statistics: family food

    Email mailto:familyfood@defra.gov.uk">familyfood@defra.gov.uk

    <p clas
    
  18. Windows Instance Segmentation Dataset

    • universe.roboflow.com
    zip
    Updated May 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roboflow Universe Projects (2023). Windows Instance Segmentation Dataset [Dataset]. https://universe.roboflow.com/roboflow-universe-projects/windows-instance-segmentation/model/2
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 3, 2023
    Dataset provided by
    Roboflowhttps://roboflow.com/
    Authors
    Roboflow Universe Projects
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Windows Polygons
    Description

    Here are a few use cases for this project:

    1. Smart Building Design and Analysis: Architects and engineers could use the Windows Instance Segmentation model to automatically analyze building facades in images and identify the distribution, sizes, and styles of windows. This information can be used to improve building designs for daylighting, ventilation, and aesthetic purposes.

    2. Real Estate Appraisal and Listing: Real estate professionals can use the model to analyze property photos, automatically identifying and categorizing windows to create more detailed and accurate property listings. Potential buyers and renters can then use this information for better search results and understanding of architectural features.

    3. Energy Efficiency Analysis: Energy consultants and researchers can utilize the Windows Instance Segmentation model to analyze the prevalence of different window styles and their impact on building energy efficiency. This can help in developing more sustainable building designs and energy retrofit strategies.

    4. Urban Planning and Cityscape Analysis: Urban planners and city officials can make use of this model to assess the distribution of windows in urban environments, understanding how they contribute to the overall aesthetic and livability of neighborhoods. This information can guide zoning regulations and future development projects to create more visually appealing and functional cities.

    5. Augmented Reality (AR) Applications: Developers of AR applications, particularly those focused on architecture and interior design, can integrate the Windows Instance Segmentation model to recognize windows in real-world environments. This can enable users to visualize new window styles, treatments, or decorations, helping them make better-informed design decisions.

  19. COVID-19 Open Research Dataset (CORD-19)

    • data.niaid.nih.gov
    • marketplace.sshopencloud.eu
    • +1more
    Updated Jul 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sebastian Kohlmeier; Kyle Lo; Lucy Lu Wang; JJ Yang (2024). COVID-19 Open Research Dataset (CORD-19) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3715505
    Explore at:
    Dataset updated
    Jul 22, 2024
    Dataset provided by
    Allen Institute for AIhttp://allenai.org/
    Authors
    Sebastian Kohlmeier; Kyle Lo; Lucy Lu Wang; JJ Yang
    Description

    A full description of this dataset along with updated information can be found here.

    In response to the COVID-19 pandemic, the Allen Institute for AI has partnered with leading research groups to prepare and distribute the COVID-19 Open Research Dataset (CORD-19), a free resource of scholarly articles, including full text content, about COVID-19 and the coronavirus family of viruses for use by the global research community.

    This dataset is intended to mobilize researchers to apply recent advances in natural language processing to generate new insights in support of the fight against this infectious disease. The corpus will be updated weekly as new research is published in peer-reviewed publications and archival services like bioRxiv, medRxiv, and others.

    By downloading this dataset you are agreeing to the Dataset license. Specific licensing information for individual articles in the dataset is available in the metadata file.

    Additional licensing information is available on the PMC website, medRxiv website and bioRxiv website.

    Dataset content:

    Commercial use subset

    Non-commercial use subset

    PMC custom license subset

    bioRxiv/medRxiv subset (pre-prints that are not peer reviewed)

    Metadata file

    Readme

    Each paper is represented as a single JSON object (see schema file for details).

    Description:

    The dataset contains all COVID-19 and coronavirus-related research (e.g. SARS, MERS, etc.) from the following sources:

    PubMed's PMC open access corpus using this query (COVID-19 and coronavirus research)

    Additional COVID-19 research articles from a corpus maintained by the WHO

    bioRxiv and medRxiv pre-prints using the same query as PMC (COVID-19 and coronavirus research)

    We also provide a comprehensive metadata file of coronavirus and COVID-19 research articles with links to PubMed, Microsoft Academic and the WHO COVID-19 database of publications (includes articles without open access full text).

    We recommend using metadata from the comprehensive file when available, instead of parsed metadata in the dataset. Please note the dataset may contain multiple entries for individual PMC IDs in cases when supplementary materials are available.

    This repository is linked to the WHO database of publications on coronavirus disease and other resources, such as Microsoft Academic Graph, PubMed, and Semantic Scholar. A coalition including the Chan Zuckerberg Initiative, Georgetown University’s Center for Security and Emerging Technology, Microsoft Research, and the National Library of Medicine of the National Institutes of Health came together to provide this service.

    Citation:

    When including CORD-19 data in a publication or redistribution, please cite the dataset as follows:

    In bibliography:

    COVID-19 Open Research Dataset (CORD-19). 2020. Version 2020-MM-DD. Retrieved from https://pages.semanticscholar.org/coronavirus-research. Accessed YYYY-MM-DD. 10.5281/zenodo.3715505

    In text:

    (CORD-19, 2020)

    The Allen Institute for AI and particularly the Semantic Scholar team will continue to provide updates to this dataset as the situation evolves and new research is released.

  20. p

    Royal Institute for Cultural Heritage Radiocarbon and stable isotope...

    • pandora.earth
    Updated Jul 12, 2011
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2011). Royal Institute for Cultural Heritage Radiocarbon and stable isotope measurements - Dataset - Pandora [Dataset]. https://pandora.earth/gl_ES/dataset/royal-institute-for-cultural-heritage-radiocarbon-and-stable-isotope-measurements
    Explore at:
    Dataset updated
    Jul 12, 2011
    Description

    The Radiocarbon dating laboratory of IRPA/KIK was founded in the 1960s. Initially dates were reported at more or less regular intervals in the journal Radiocarbon (Schreurs 1968). Since the advent of radiocarbon dating in the 1950s it had been a common practice amongst radiocarbon laboratories to publish their dates in so-called ‘date-lists’ that were arranged per laboratory. This was first done in the Radiocarbon Supplement of the American Journal of Science and later in the specialised journal Radiocarbon. In the course of time the latter, with the added subtitle An International Journal of Cosmogenic Isotope Research, became a regular scientific journal shifting focus from date-lists to articles. Furthermore the world-wide exponential increase of radiocarbon dates made it almost impossible to publish them all in the same journal, even more so because of the broad range of applications that use radiocarbon analysis, ranging from archaeology and art history to geology and oceanography and recently also biomedical studies.The IRPA/KIK database From 1995 onwards IRPA/KIK’s Radiocarbon laboratory started to publish its dates in small publications, continuing the numbering of the preceding lists in Radiocarbon. The first booklet in this series was “Royal Institute for Cultural Heritage Radiocarbon dates XV” (Van Strydonck et al. 1995), followed by three more volumes (XVI, XVII, XVIII). The next list (XIX, 2005) was no longer printed but instead handed out as a PDF file on CD-rom. The ever increasing number of dates and the difficulties in handling all the data, however, made us look for a more permanent and easier solution. In order to improve data management and consulting, it was thus decided to gather all our dates in a web-based database. List XIX was in fact already a Microsoft Access database that was converted into a reader friendly style and could also be printed as a PDF file. However a Microsoft Access database is not the most practical solution to make information publicly available. Hence the structure of the database was recreated in Mysql and the existing content was transferred into the corresponding fields. To display the records, a web-based front-end was programmed in PHP/Apache. It features a full-text search function that allows for partial word-matching. In addition the records can be consulted in PDF format. Old records from the printed date-lists as well as new records are now added using the same Microsoft Acces back-end, which is now connected directly to the Mysql database. The main problem with introducing the old data was that not all the current criteria were available in the past (e.g. stable isotope measurements). Furthermore since all the sample information is given by the submitter, its quality largely depends on the persons willingness to contribute as well as on the accuracy and correctness of the information he provides. Sometimes problems arrive from the fact that a certain investigation (like an excavation) is carried out over a relatively long period (sometimes even more than ten years) and is directed by different people or even institutions. This can lead to differences in the labeling procedure of the samples, but also in the interpretation of structures and artifacts and in the orthography of the site’s name. Finally the submitter might change address, while the names of institutions or even regions and countries might change as well (e.g.Zaire - Congo)

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Umer Haddii (2025). Microsoft Stock Data 2025 [Dataset]. https://www.kaggle.com/datasets/umerhaddii/microsoft-stock-data-2025
Organization logo

Microsoft Stock Data 2025

All time Microsoft Stock Data 1986 - 2025

Explore at:
zip(246404 bytes)Available download formats
Dataset updated
Feb 4, 2025
Authors
Umer Haddii
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

Microsoft is an American company that develops and distributes software and services such as: a search engine (Bing), cloud solutions and the computer operating system Windows.

Market cap

Market capitalization of Microsoft (MSFT)

Market cap: $3.085 Trillion USD

As of February 2025 Microsoft has a market cap of $3.085 Trillion USD. This makes Microsoft the world's 2nd most valuable company by market cap according to our data. The market capitalization, commonly called market cap, is the total market value of a publicly traded company's outstanding shares and is commonly used to measure how much a company is worth.

Revenue

Revenue for Microsoft (MSFT)
Revenue in 2024 (TTM): $254.19 Billion USD

According to Microsoft's latest financial reports the company's current revenue (TTM ) is $254.19 Billion USD. In 2023 the company made a revenue of $227.58 Billion USD an increase over the revenue in the year 2022 that were of $204.09 Billion USD. The revenue is the total amount of income that a company generates by the sale of goods or services. Unlike with the earnings no expenses are subtracted.

Earnings

Earnings for Microsoft (MSFT)
Earnings in 2024 (TTM): $110.77 Billion USD

According to Microsoft's latest financial reports the company's current earnings are $254.19 Billion USD. In 2023 the company made an earning of $101.21 Billion USD, an increase over its 2022 earnings that were of $82.58 Billion USD. The earnings displayed on this page are the earnings before interest and taxes or simply EBIT.

End of Day market cap according to different sources On Feb 2nd, 2025 the market cap of Microsoft was reported to be:

  • $3.085 Trillion USD by Nasdaq

  • $3.085 Trillion USD by CompaniesMarketCap

  • $3.085 Trillion USD by Yahoo Finance

Content

Geography: USA

Time period: March 1986- February 2025

Unit of analysis: Microsoft Stock Data 2025

Variables

VariableDescription
datedate
openThe price at market open.
highThe highest price for that day.
lowThe lowest price for that day.
closeThe price at market close, adjusted for splits.
adj_closeThe closing price after adjustments for all applicable splits and dividend distributions. Data is adjusted using appropriate split and dividend multipliers, adhering to Center for Research in Security Prices (CRSP) standards.
volumeThe number of shares traded on that day.

Acknowledgements

This dataset belongs to me. I’m sharing it here for free. You may do with it as you wish.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F18335022%2F0304ad0416e7e55515daf890288d7f7f%2FScreenshot%202025-02-03%20152019.png?generation=1738662588735376&alt=media" alt="">

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F18335022%2Fba7629dd0c4dc3e2ea1dbac361b94de1%2FScreenshot%202025-02-03%20152147.png?generation=1738662611945343&alt=media" alt="">

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F18335022%2Fa9f48f1ec5fdf2a363a138389294d5b0%2FScreenshot%202025-02-03%20152159.png?generation=1738662631268574&alt=media" alt="">

Search
Clear search
Close search
Google apps
Main menu