53 datasets found
  1. B

    Big Data Analytics Market in Energy Sector Report

    • marketresearchforecast.com
    doc, pdf, ppt
    Updated Dec 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Forecast (2024). Big Data Analytics Market in Energy Sector Report [Dataset]. https://www.marketresearchforecast.com/reports/big-data-analytics-market-in-energy-sector-5888
    Explore at:
    ppt, doc, pdfAvailable download formats
    Dataset updated
    Dec 5, 2024
    Dataset authored and provided by
    Market Research Forecast
    License

    https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Big Data Analytics Market in Energy Sector size was valued at USD 9.56 USD Billion in 2023 and is projected to reach USD 13.81 USD Billion by 2032, exhibiting a CAGR of 5.4 % during the forecast period. Big Data Analytics in the energy sector can be defined as the application of sophisticated methods or tools in analyzing vast collections of information that are produced by numerous entities within the energy industry. This process covers descriptive, predictive, and prescriptive analytics to provide valuable information for procedures, costs, and strategies. Real-time analytics, etc are immediate, while predictive analytics focuses on the probability to happen in the future and prescriptive analytics solutions provide recommendations for action. Some of the main characteristics of the data collectors include handling large datasets, compatibility with IoT to stream data, and machine learning features for pattern detection. These can range from grid control and load management to predicting customer demand and equipment reliability and equipment efficiency enhancement. Thus, there is a significant advantage because Big Data Analytics helps global energy companies to increase performance, minimize sick time, and develop effective strategies to meet the necessary legal demands. Key drivers for this market are: Growing Focus on Safety and Organization to Fuel Market Growth. Potential restraints include: Higher Cost of Geotechnical Services to Hinder Market Growth. Notable trends are: Growth of IT Infrastructure to Bolster the Demand for Modern Cable Tray Management Solutions.

  2. Big data and business analytics revenue worldwide 2015-2022

    • statista.com
    Updated Nov 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2023). Big data and business analytics revenue worldwide 2015-2022 [Dataset]. https://www.statista.com/statistics/551501/worldwide-big-data-business-analytics-revenue/
    Explore at:
    Dataset updated
    Nov 22, 2023
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    The global big data and business analytics (BDA) market was valued at 168.8 billion U.S. dollars in 2018 and is forecast to grow to 215.7 billion U.S. dollars by 2021. In 2021, more than half of BDA spending will go towards services. IT services is projected to make up around 85 billion U.S. dollars, and business services will account for the remainder. Big data High volume, high velocity and high variety: one or more of these characteristics is used to define big data, the kind of data sets that are too large or too complex for traditional data processing applications. Fast-growing mobile data traffic, cloud computing traffic, as well as the rapid development of technologies such as artificial intelligence (AI) and the Internet of Things (IoT) all contribute to the increasing volume and complexity of data sets. For example, connected IoT devices are projected to generate 79.4 ZBs of data in 2025. Business analytics Advanced analytics tools, such as predictive analytics and data mining, help to extract value from the data and generate business insights. The size of the business intelligence and analytics software application market is forecast to reach around 16.5 billion U.S. dollars in 2022. Growth in this market is driven by a focus on digital transformation, a demand for data visualization dashboards, and an increased adoption of cloud.

  3. d

    Summary of selected characteristics of large reservoirs

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Oct 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Summary of selected characteristics of large reservoirs [Dataset]. https://catalog.data.gov/dataset/summary-of-selected-characteristics-of-large-reservoirs
    Explore at:
    Dataset updated
    Oct 5, 2024
    Dataset provided by
    U.S. Geological Survey
    Description

    This is a point coverage of dams in the United States and Puerto Rico, which originally was derived from the national inventory of dams data base (U.S. Army Corps of Engineers, 1982). The coverage includes locations of and selected characteristics of approximately 2,700 reservoirs and controlled natural lakes that have normal capacities of at least 5,000 acre-feet or maximum capacities of at least 25,000 acre-feet and that were completed as of January 1, 1988.

  4. Forecast revenue big data market worldwide 2011-2027

    • statista.com
    Updated Feb 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Forecast revenue big data market worldwide 2011-2027 [Dataset]. https://www.statista.com/statistics/254266/global-big-data-market-forecast/
    Explore at:
    Dataset updated
    Feb 13, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    The global big data market is forecasted to grow to 103 billion U.S. dollars by 2027, more than double its expected market size in 2018. With a share of 45 percent, the software segment would become the large big data market segment by 2027.

    What is Big data?

    Big data is a term that refers to the kind of data sets that are too large or too complex for traditional data processing applications. It is defined as having one or some of the following characteristics: high volume, high velocity or high variety. Fast-growing mobile data traffic, cloud computing traffic, as well as the rapid development of technologies such as artificial intelligence (AI) and the Internet of Things (IoT) all contribute to the increasing volume and complexity of data sets.

    Big data analytics

    Advanced analytics tools, such as predictive analytics and data mining, help to extract value from the data and generate new business insights. The global big data and business analytics market was valued at 169 billion U.S. dollars in 2018 and is expected to grow to 274 billion U.S. dollars in 2022. As of November 2018, 45 percent of professionals in the market research industry reportedly used big data analytics as a research method.

  5. Local and big brands characteristics according to European consumers 2018

    • statista.com
    Updated Jan 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Local and big brands characteristics according to European consumers 2018 [Dataset]. https://www.statista.com/statistics/1080774/local-and-big-brands-characteristics-according-to-european-consumers-2018/
    Explore at:
    Dataset updated
    Jan 14, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2018
    Area covered
    EU
    Description

    In Europe, local brands were perceived as more sustainable than big brands, a survey revealed. 45 percent of European consumers believed that local brands respected the environment and the resources of territory, while only 17 percent of respondents thought the same about big brands. Additionally, 67 percent of consumers associated local brands with values such as fairness, transparency, honesty, and integrity.

  6. Heat Wave Characteristics in 50 Large U.S. Cities, 1961–2023

    • catalog.data.gov
    • s.cnmilf.com
    Updated Feb 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Environmental Protection Agency, Office of Air and Radiation (Publisher) (2025). Heat Wave Characteristics in 50 Large U.S. Cities, 1961–2023 [Dataset]. https://catalog.data.gov/dataset/heat-wave-characteristics-in-50-large-u-s-cities-196120236
    Explore at:
    Dataset updated
    Feb 25, 2025
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Area covered
    United States
    Description

    These maps show changes in the number of heat waves per year (frequency); the average length of heat waves in days (duration); the number of days between the first and last heat wave of the year (season length); and how hot the heat waves were, compared with the local temperature threshold for defining a heat wave (intensity). These data were analyzed from 1961 to 2023 for 50 large metropolitan areas. The size of each circle indicates the rate of change per decade. Solid-color circles represent cities where the trend was statistically significant. For more information: www.epa.gov/climate-indicators

  7. d

    Identifying Refactoring Opportunities for Large Packages by Analyzing...

    • catalogue.data.govt.nz
    Updated May 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Identifying Refactoring Opportunities for Large Packages by Analyzing Maintainability Characteristics in Java OSS - Dataset - data.govt.nz - discover and use data [Dataset]. https://catalogue.data.govt.nz/dataset/oai-figshare-com-article-14460054
    Explore at:
    Dataset updated
    May 2, 2023
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is provided as a supplementary material for the paper (entitled: Identifying Refactoring Opportunities for Large Packages by Analyzing Maintainability Characteristics in Java OSS). A README file is included with the description of the dataset and Python script used for performing the empirical analysis.

  8. c

    Financial Characteristics of Large British Companies, 1880-1926

    • datacatalogue.cessda.eu
    • beta.ukdataservice.ac.uk
    Updated Nov 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Delargy, R., London School of Economics and Political Science; Kennedy, W., London School of Economics and Political Science (2024). Financial Characteristics of Large British Companies, 1880-1926 [Dataset]. http://doi.org/10.5255/UKDA-SN-4244-2
    Explore at:
    Dataset updated
    Nov 28, 2024
    Dataset provided by
    Department of Economic History
    Authors
    Delargy, R., London School of Economics and Political Science; Kennedy, W., London School of Economics and Political Science
    Time period covered
    Jan 1, 1993 - Jan 1, 2020
    Area covered
    United Kingdom
    Variables measured
    Institutions/organisations, National, Companies
    Measurement technique
    Transcription
    Description

    Abstract copyright UK Data Service and data collection copyright owner.


    The main aims of the project are three-fold:

    (1) To identify and describe the development of the electricity industry in Britain prior to the formation of the national grid using company market-based financial data; and

    (2) To construct a consistent data set of the key market-based financial characteristics of the principle companies.

    (3) To use this to examine the development of the British electricity industry compared to similar development in the United States and Germany.

    Latest edition information

    For the second edition (October 2021) data and documentation relating to Swan United Electric Light Company Limited (1882-1894); Edison and Swan United Electric Light Company Limited (1882-1914), Anglo-American Brush Electric Light and Power Corporation Limited (1882-1889) and Brush Electrical Engineering Limited (1889-1914) were added to the study.


    Main Topics:

    The key financial characteristics are:

    (1) The quarterly (January, April, July and October) market closing price of each traded security for each identified company;

    (2) The number of each security outstanding, both traded and non-traded, at the end of each quarter;

    (3) Multiplied together, characteristics (1) and (2) produce quarterly market capitalization for quoted companies by security. Summing all quoted securities issued by a company produces for each quarter its total market capitalizations. The value of non-quoted securities can be estimated from the dividend and interest payments they make.

    (4) The paid-up amount for each security, including premiums and discounts (if any);

    (5) The nominal value of each security;

    (6) The payments (if any) made in each quarter to holders of the securities.

  9. De-identified article and author characteristics for a large data set of Web...

    • zenodo.org
    txt
    Updated Jan 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jens Peter Andersen; Jens Peter Andersen (2023). De-identified article and author characteristics for a large data set of Web of Science [Dataset]. http://doi.org/10.5281/zenodo.7573523
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 27, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jens Peter Andersen; Jens Peter Andersen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data set contains article and author characteristics for all records in the Web of Science, 2000-2020. Standard article identifiers have been removed and replaced with a document ID (`doc_id`), as linking to the original ID is not permitted.

  10. Data_Sheet_1_One Social Media Company to Rule Them All: Associations Between...

    • frontiersin.figshare.com
    • figshare.com
    xlsx
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Davide Marengo; Cornelia Sindermann; Jon D. Elhai; Christian Montag (2023). Data_Sheet_1_One Social Media Company to Rule Them All: Associations Between Use of Facebook-Owned Social Media Platforms, Sociodemographic Characteristics, and the Big Five Personality Traits.xlsx [Dataset]. http://doi.org/10.3389/fpsyg.2020.00936.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Davide Marengo; Cornelia Sindermann; Jon D. Elhai; Christian Montag
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Currently, 2.7 billion people use at least one of the Facebook-owned social media platforms – Facebook, WhatsApp, and Instagram. Previous research investigating individual differences between users and non-users of these platforms has typically focused on one platform. However, individuals typically use a combination of Facebook-owned platforms. Therefore, we aim (1) to identify the relative prevalence of different patterns of social media use, and (2) to evaluate potential between-group differences in the distributions of age, gender, education, and Big Five personality traits. Data collection was performed using a cross-sectional design. Specifically, we administered a survey assessing participants’ demographic variables, current use of Facebook-owned platforms, and Big Five personality traits. In N = 3003 participants from the general population (60.67% females; mean age = 35.53 years, SD = 13.53), WhatsApp emerged as the most widely used application in the sample, and hence, has the strongest reach. A pattern consisting of a combined use of WhatsApp and Instagram appeared to be most prevalent among the youngest participants. Further, individuals using at least one social media platform were generally younger, more often female, and more extraverted than non-users. Small differences in Conscientiousness and Neuroticism also emerged across groups reporting different combinations of social media use. Interestingly, when examined as control variables, we found demographic characteristics partially accounted for differences in broad personality factors and facets across different patterns of social media use. Our findings are relevant to researchers carrying out their studies via social media platforms, as sample characteristics appear to be different depending on the platform used.

  11. Big Mart Sales Prediction

    • kaggle.com
    Updated Feb 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gaurav Dutta (2025). Big Mart Sales Prediction [Dataset]. https://www.kaggle.com/datasets/gauravduttakiit/big-mart-sales-prediction/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 8, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Gaurav Dutta
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    BigMart Sales Prediction Challenge

    BigMart, a leading retail chain, aims to enhance its sales strategy by analyzing historical sales data. The goal is to develop a predictive model that estimates the sales of various products across different outlets, helping BigMart understand the key factors influencing sales performance.

    Problem Statement

    BigMart has gathered sales data from 2013 for 1,559 products sold across 10 stores in different cities. Along with sales figures, various product and store attributes have been recorded. The objective is to build a machine learning model that can accurately forecast the sales of products at specific outlets.

    By leveraging this predictive model, BigMart can gain insights into product and store characteristics that drive sales growth, enabling better business decisions.

    Challenges

    The dataset may contain missing values due to unreported data from certain stores, requiring appropriate data preprocessing techniques.

    Dataset Overview

    Train Dataset (8,523 records)

    Includes both input features and the target variable (Item_Outlet_Sales).

    • Product Features

      • Item_Identifier: Unique product ID
      • Item_Weight: Weight of the product
      • Item_Fat_Content: Fat level (low-fat or regular)
      • Item_Visibility: Percentage of display area allocated to the product
      • Item_Type: Category of the product
      • Item_MRP: Maximum Retail Price
    • Store Features

      • Outlet_Identifier: Unique store ID
      • Outlet_Establishment_Year: Year the store was established
      • Outlet_Size: Store size (small, medium, large)
      • Outlet_Location_Type: City tier classification
      • Outlet_Type: Type of outlet (grocery store, supermarket, etc.)
    • Target Variable

      • Item_Outlet_Sales: Sales of the product at a particular store (to be predicted)

    Test Dataset (5,681 records)

    Contains the same features as the train dataset except for Item_Outlet_Sales, which needs to be predicted.

    Submission Format

    Your model should generate a CSV file with the following columns:
    - Item_Identifier: Unique product ID
    - Outlet_Identifier: Unique store ID
    - Item_Outlet_Sales: Predicted sales value

    Reference

    For more details, visit: Analytics Vidhya BigMart Sales III

  12. h

    OCEAN

    • huggingface.co
    • hf-proxy-cf.effarig.site
    Updated Nov 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MTHR (2023). OCEAN [Dataset]. https://huggingface.co/datasets/MTHR/OCEAN
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 27, 2023
    Dataset authored and provided by
    MTHR
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Big Five Personality Traits

      OCEAN
    

    Openness Conscientiousness Extraversion Agreeableness Neuroticism

  13. d

    Large Scale Topo Cultural Feature (Line) (LGATE-141) - Datasets -...

    • catalogue.data.wa.gov.au
    Updated Apr 11, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). Large Scale Topo Cultural Feature (Line) (LGATE-141) - Datasets - data.wa.gov.au [Dataset]. https://catalogue.data.wa.gov.au/dataset/large-scale-topo-cultural-feature-line
    Explore at:
    Dataset updated
    Apr 11, 2019
    Area covered
    Western Australia
    Description

    Topographic features whose primary characteristics are of a general cultural type. Multiple points that describe a feature’s centreline or edge. NOTE: Landgate no longer maintains large scale topographic features. The large scale topographic data capture programme ceased in 2016. Please consider carefully the suitability of the data within this service for your purpose. © Western Australian Land Information Authority (Landgate). Use of Landgate data is subject to Personal Use License terms and conditions unless otherwise authorised under approved License terms and conditions.

  14. d

    Coded respondent survey data to analyze the impact of big five personality...

    • search.dataone.org
    • data.niaid.nih.gov
    • +1more
    Updated Aug 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Navin Kumar Koodamara; Debika Layek; Suraj Noronha; Raveendra Rao (2024). Coded respondent survey data to analyze the impact of big five personality traits on student engagement consisting of their emotional and physical engagement [Dataset]. https://search.dataone.org/view/sha256%3A10e5bdc8ec8dec7be1d300edf517c9699e9ca283cf569c5fe6420259a4ae84c6
    Explore at:
    Dataset updated
    Aug 28, 2024
    Dataset provided by
    Dryad Digital Repository
    Authors
    Navin Kumar Koodamara; Debika Layek; Suraj Noronha; Raveendra Rao
    Description

    In the present scholarly inquiry, the author leverages the framework of the Big Five personality traits and the concept of student engagement (dimensions include physical engagement, emotional engagement and cognitive engagement) to investigate the interplay within these constructs. More importantly, understanding the association between personality traits and various dimensions of student engagement helps teachers develop or adopt effective pedagogical practices. The research model was empirically assessed using a sample of 206 B-School students enrolled in a private business school in the southern region of India. The findings recommend that the personality components of conscientiousness and openness positively affect students' levels of physical engagement. Moreover, the study supports the beneficial impact of openness on emotional engagement and extraversion on cognitive engagement. It provides a comprehensive understanding of physical engagement as a mediating factor in the relati..., Source: Data was obtained from 206 postgraduate human resource management students from private B-schools in Sothern India. We restricted this study to human resource management students because of variability in student engagement behaviour across the courses Data collection methods: A non-experimental survey-based questionnaire was distributed online and offline to the respondents. A purposive sampling technique was followed. Respondents were provided a consent form where they were assured that any time of the survey, they could leave without any accountability. All personal details will be kept confidential. Data coding: The datasheet contains two sections. Demographic variables were captured through categorical scales and study variables were based on 5-item Likert scales where 5 =strongly agree, 4= agree, 3= neutral, 2= disagree, 1= strongly disagree. Items with negative intentions were coded reversely. There was no missing data, outliers and data transformations happened., , # Coded respondent survey data to analyze the impact of big five personality traits on student engagement consisting of their emotional and physical engagement.

    https://doi.org/10.5061/dryad.h18931zvj

    1.     Title: Coded respondent survey data to analyze the impact of big five personality traits on student engagement consisting of their emotional and physical engagement.

    2.     Introduction: This study has introduced a conceptual framework to understand the connection between students personality traits and cognitive engagement with a special focus on mediating function of their physical and emotional engagement. This study contained 7 major study variables. To measure these constructs well-established measurement scales have been employed.

    3.     Dataset Description:

    Source: Data was obtained from 206 postgraduate human resource management students from private B-schools in Sothern India. We restricted this study to hum...

  15. e

    Finance large enterprises by industry and various characteristics

    • data.europa.eu
    atom feed, json
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Finance large enterprises by industry and various characteristics [Dataset]. https://data.europa.eu/data/datasets/1655-financi-n-grote-ondernemingen-naar-bedrijfstak-en-diverse-kenmerken/?locale=en
    Explore at:
    atom feed, jsonAvailable download formats
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Integral consolidated balance sheet and income statement of large non-financial corporations including all established in the Netherlands group companies. Relations with the foreign group companies be accountable as a group company abroad. The breakdown is by industry and then by balance sheet total, profit or loss and foreign intertwinedness. Data from 1977 to 2002. Frequency: As of statistical year 2003, this table will no longer be updated.

  16. d

    Large Scale Topo Cultural Feature (Polygon) (LGATE-143) - Datasets -...

    • catalogue.data.wa.gov.au
    Updated Jul 10, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). Large Scale Topo Cultural Feature (Polygon) (LGATE-143) - Datasets - data.wa.gov.au [Dataset]. https://catalogue.data.wa.gov.au/dataset/large-scale-topo-cultural-feature-polygon
    Explore at:
    Dataset updated
    Jul 10, 2019
    Area covered
    Western Australia
    Description

    Topographic features whose primary characteristics are of a general cultural type. Multiple points that describe a feature’s boundary. NOTE: Landgate no longer maintains large scale topographic features. The large scale topographic data capture programme ceased in 2016. Please consider carefully the suitability of the data within this service for your purpose. © Western Australian Land Information Authority (Landgate). Use of Landgate data is subject to Personal Use License terms and conditions unless otherwise authorised under approved License terms and conditions.

  17. CompanyKG Dataset V2.0: A Large-Scale Heterogeneous Graph for Company...

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip, bin +1
    Updated Jun 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lele Cao; Lele Cao; Vilhelm von Ehrenheim; Vilhelm von Ehrenheim; Mark Granroth-Wilding; Mark Granroth-Wilding; Richard Anselmo Stahl; Richard Anselmo Stahl; Drew McCornack; Drew McCornack; Armin Catovic; Armin Catovic; Dhiana Deva Cavacanti Rocha; Dhiana Deva Cavacanti Rocha (2024). CompanyKG Dataset V2.0: A Large-Scale Heterogeneous Graph for Company Similarity Quantification [Dataset]. http://doi.org/10.5281/zenodo.11391315
    Explore at:
    application/gzip, bin, txtAvailable download formats
    Dataset updated
    Jun 4, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Lele Cao; Lele Cao; Vilhelm von Ehrenheim; Vilhelm von Ehrenheim; Mark Granroth-Wilding; Mark Granroth-Wilding; Richard Anselmo Stahl; Richard Anselmo Stahl; Drew McCornack; Drew McCornack; Armin Catovic; Armin Catovic; Dhiana Deva Cavacanti Rocha; Dhiana Deva Cavacanti Rocha
    Time period covered
    May 29, 2024
    Description

    CompanyKG is a heterogeneous graph consisting of 1,169,931 nodes and 50,815,503 undirected edges, with each node representing a real-world company and each edge signifying a relationship between the connected pair of companies.

    Edges: We model 15 different inter-company relations as undirected edges, each of which corresponds to a unique edge type. These edge types capture various forms of similarity between connected company pairs. Associated with each edge of a certain type, we calculate a real-numbered weight as an approximation of the similarity level of that type. It is important to note that the constructed edges do not represent an exhaustive list of all possible edges due to incomplete information. Consequently, this leads to a sparse and occasionally skewed distribution of edges for individual relation/edge types. Such characteristics pose additional challenges for downstream learning tasks. Please refer to our paper for a detailed definition of edge types and weight calculations.

    Nodes: The graph includes all companies connected by edges defined previously. Each node represents a company and is associated with a descriptive text, such as "Klarna is a fintech company that provides support for direct and post-purchase payments ...". To comply with privacy and confidentiality requirements, we encoded the text into numerical embeddings using four different pre-trained text embedding models: mSBERT (multilingual Sentence BERT), ADA2, SimCSE (fine-tuned on the raw company descriptions) and PAUSE.

    Evaluation Tasks. The primary goal of CompanyKG is to develop algorithms and models for quantifying the similarity between pairs of companies. In order to evaluate the effectiveness of these methods, we have carefully curated three evaluation tasks:

    • Similarity Prediction (SP). To assess the accuracy of pairwise company similarity, we constructed the SP evaluation set comprising 3,219 pairs of companies that are labeled either as positive (similar, denoted by "1") or negative (dissimilar, denoted by "0"). Of these pairs, 1,522 are positive and 1,697 are negative.
    • Competitor Retrieval (CR). Each sample contains one target company and one of its direct competitors. It contains 76 distinct target companies, each of which has 5.3 competitors annotated in average. For a given target company A with N direct competitors in this CR evaluation set, we expect a competent method to retrieve all N competitors when searching for similar companies to A.
    • Similarity Ranking (SR) is designed to assess the ability of any method to rank candidate companies (numbered 0 and 1) based on their similarity to a query company. Paid human annotators, with backgrounds in engineering, science, and investment, were tasked with determining which candidate company is more similar to the query company. It resulted in an evaluation set comprising 1,856 rigorously labeled ranking questions. We retained 20% (368 samples) of this set as a validation set for model development.
    • Edge Prediction (EP) evaluates a model's ability to predict future or missing relationships between companies, providing forward-looking insights for investment professionals. The EP dataset, derived (and sampled) from new edges collected between April 6, 2023, and May 25, 2024, includes 40,000 samples, with edges not present in the pre-existing CompanyKG (a snapshot up until April 5, 2023).

    Background and Motivation

    In the investment industry, it is often essential to identify similar companies for a variety of purposes, such as market/competitor mapping and Mergers & Acquisitions (M&A). Identifying comparable companies is a critical task, as it can inform investment decisions, help identify potential synergies, and reveal areas for growth and improvement. The accurate quantification of inter-company similarity, also referred to as company similarity quantification, is the cornerstone to successfully executing such tasks. However, company similarity quantification is often a challenging and time-consuming process, given the vast amount of data available on each company, and the complex and diversified relationships among them.

    While there is no universally agreed definition of company similarity, researchers and practitioners in PE industry have adopted various criteria to measure similarity, typically reflecting the companies' operations and relationships. These criteria can embody one or more dimensions such as industry sectors, employee profiles, keywords/tags, customers' review, financial performance, co-appearance in news, and so on. Investment professionals usually begin with a limited number of companies of interest (a.k.a. seed companies) and require an algorithmic approach to expand their search to a larger list of companies for potential investment.

    In recent years, transformer-based Language Models (LMs) have become the preferred method for encoding textual company descriptions into vector-space embeddings. Then companies that are similar to the seed companies can be searched in the embedding space using distance metrics like cosine similarity. The rapid advancements in Large LMs (LLMs), such as GPT-3/4 and LLaMA, have significantly enhanced the performance of general-purpose conversational models. These models, such as ChatGPT, can be employed to answer questions related to similar company discovery and quantification in a Q&A format.

    However, graph is still the most natural choice for representing and learning diverse company relations due to its ability to model complex relationships between a large number of entities. By representing companies as nodes and their relationships as edges, we can form a Knowledge Graph (KG). Utilizing this KG allows us to efficiently capture and analyze the network structure of the business landscape. Moreover, KG-based approaches allow us to leverage powerful tools from network science, graph theory, and graph-based machine learning, such as Graph Neural Networks (GNNs), to extract insights and patterns to facilitate similar company analysis. While there are various company datasets (mostly commercial/proprietary and non-relational) and graph datasets available (mostly for single link/node/graph-level predictions), there is a scarcity of datasets and benchmarks that combine both to create a large-scale KG dataset expressing rich pairwise company relations.

    Source Code and Tutorial:
    https://github.com/llcresearch/CompanyKG2

    Paper: to be published

  18. Data from: Investigating the association between social interactions and...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    bin, csv, txt
    Updated May 31, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Didem Gundogdu; Ailbhe N. Finnerty; Jacopo Staiano; Stefano Teso; Andrea Passerini; Fabio Pianesi; Bruno Lepri; Didem Gundogdu; Ailbhe N. Finnerty; Jacopo Staiano; Stefano Teso; Andrea Passerini; Fabio Pianesi; Bruno Lepri (2022). Data from: Investigating the association between social interactions and personality states dynamics [Dataset]. http://doi.org/10.5061/dryad.b88c7
    Explore at:
    txt, csv, binAvailable download formats
    Dataset updated
    May 31, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Didem Gundogdu; Ailbhe N. Finnerty; Jacopo Staiano; Stefano Teso; Andrea Passerini; Fabio Pianesi; Bruno Lepri; Didem Gundogdu; Ailbhe N. Finnerty; Jacopo Staiano; Stefano Teso; Andrea Passerini; Fabio Pianesi; Bruno Lepri
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The recent personality psychology literature has coined the name of personality states to refer to states having the same behavioural, affective and cognitive content (described by adjectives) as the corresponding trait, but for a shorter duration. The variability in personality states may be the reaction to specific characteristics of situations. The aim of our study is to investigate whether specific situational factors, that is, different configurations of face-to-face interactions, are predictors of variability of personality states in a work environment. The obtained results provide evidence that within-person variability in personality is associated with variation in face-to-face interactions. Interestingly, the effects differ by type and level of the personality states: adaptation effects for Agreeableness and Emotional Stability, whereby the personality states of an individual trigger similar states in other people interacting with them and complementarity effects for Openness to Experience, whereby the personality states of an individual trigger opposite states in other people interacting with them. Overall, these findings encourage further research to characterize face-to-face and social interactions in terms of their relevance to personality states.

  19. m

    Data from: The relative importance of ski resort- and weather-related...

    • data.mendeley.com
    • narcis.nl
    Updated Jun 8, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erik Haugom (2021). The relative importance of ski resort- and weather-related characteristics when going alpine skiing: data from a rating-based conjoint survey [Dataset]. http://doi.org/10.17632/6w4tzrs3yw.1
    Explore at:
    Dataset updated
    Jun 8, 2021
    Authors
    Erik Haugom
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Alps
    Description

    The data are related to two research articles: “The relative importance of ski resort-and weather-related characteristics when going alpine skiing” [1] and “Optimal pricing of alpine ski passes in the case of crowdedness and reduced skiing capacity” [2]. A rating-based conjoint survey experiment on active alpine skiers at a big ski area located in Inland Norway was performed in February of 2018 to collect the data and pertain to 400 respondents doing more than 7200 ratings. A total of ten versions of the same questionnaire type were used to obtain information about preferences on ski resort- and weather-related characteristics when going alpine skiing. We display the raw data organized such that they can be easily downloaded and used directly to either (1) replicate the analyses performed in the related research articles, or (2) run one’s own analyses on the topic of interest. The data may also be useful to lecturers teaching students about the key concepts of survey experiments and causal modelling.

  20. d

    Big family, warm home, and lots of friends: Pteronotus large

    • datadryad.org
    • data.niaid.nih.gov
    zip
    Updated Feb 6, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jennifer Barros; Enrico Bernard (2023). Big family, warm home, and lots of friends: Pteronotus large [Dataset]. http://doi.org/10.5061/dryad.wm37pvms1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 6, 2023
    Dataset provided by
    Dryad
    Authors
    Jennifer Barros; Enrico Bernard
    Time period covered
    2023
    Description

    We used presence and absence data from the bat species to analyze, using mvabund package, the relationship with the caves characteristics. The data is organized in a spreadsheet containing in each column the values for the caves features, followed by the data of presence/absence of each species. The file “mvabund” was used as an input in R on the script described below.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Market Research Forecast (2024). Big Data Analytics Market in Energy Sector Report [Dataset]. https://www.marketresearchforecast.com/reports/big-data-analytics-market-in-energy-sector-5888

Big Data Analytics Market in Energy Sector Report

Explore at:
ppt, doc, pdfAvailable download formats
Dataset updated
Dec 5, 2024
Dataset authored and provided by
Market Research Forecast
License

https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description

The Big Data Analytics Market in Energy Sector size was valued at USD 9.56 USD Billion in 2023 and is projected to reach USD 13.81 USD Billion by 2032, exhibiting a CAGR of 5.4 % during the forecast period. Big Data Analytics in the energy sector can be defined as the application of sophisticated methods or tools in analyzing vast collections of information that are produced by numerous entities within the energy industry. This process covers descriptive, predictive, and prescriptive analytics to provide valuable information for procedures, costs, and strategies. Real-time analytics, etc are immediate, while predictive analytics focuses on the probability to happen in the future and prescriptive analytics solutions provide recommendations for action. Some of the main characteristics of the data collectors include handling large datasets, compatibility with IoT to stream data, and machine learning features for pattern detection. These can range from grid control and load management to predicting customer demand and equipment reliability and equipment efficiency enhancement. Thus, there is a significant advantage because Big Data Analytics helps global energy companies to increase performance, minimize sick time, and develop effective strategies to meet the necessary legal demands. Key drivers for this market are: Growing Focus on Safety and Organization to Fuel Market Growth. Potential restraints include: Higher Cost of Geotechnical Services to Hinder Market Growth. Notable trends are: Growth of IT Infrastructure to Bolster the Demand for Modern Cable Tray Management Solutions.

Search
Clear search
Close search
Google apps
Main menu