53 datasets found

B
Big Data Analytics Market in Energy Sector Report
marketresearchforecast.com
doc, pdf, ppt
Updated Dec 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Forecast (2024). Big Data Analytics Market in Energy Sector Report [Dataset]. https://www.marketresearchforecast.com/reports/big-data-analytics-market-in-energy-sector-5888
Explore at:
ppt, doc, pdfAvailable download formats
Dataset updated
Dec 5, 2024
Dataset authored and provided by
Market Research Forecast
License
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Big Data Analytics Market in Energy Sector size was valued at USD 9.56 USD Billion in 2023 and is projected to reach USD 13.81 USD Billion by 2032, exhibiting a CAGR of 5.4 % during the forecast period. Big Data Analytics in the energy sector can be defined as the application of sophisticated methods or tools in analyzing vast collections of information that are produced by numerous entities within the energy industry. This process covers descriptive, predictive, and prescriptive analytics to provide valuable information for procedures, costs, and strategies. Real-time analytics, etc are immediate, while predictive analytics focuses on the probability to happen in the future and prescriptive analytics solutions provide recommendations for action. Some of the main characteristics of the data collectors include handling large datasets, compatibility with IoT to stream data, and machine learning features for pattern detection. These can range from grid control and load management to predicting customer demand and equipment reliability and equipment efficiency enhancement. Thus, there is a significant advantage because Big Data Analytics helps global energy companies to increase performance, minimize sick time, and develop effective strategies to meet the necessary legal demands. Key drivers for this market are: Growing Focus on Safety and Organization to Fuel Market Growth. Potential restraints include: Higher Cost of Geotechnical Services to Hinder Market Growth. Notable trends are: Growth of IT Infrastructure to Bolster the Demand for Modern Cable Tray Management Solutions.
Big data and business analytics revenue worldwide 2015-2022
statista.com
Updated Nov 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2023). Big data and business analytics revenue worldwide 2015-2022 [Dataset]. https://www.statista.com/statistics/551501/worldwide-big-data-business-analytics-revenue/
Explore at:
Dataset updated
Nov 22, 2023
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
The global big data and business analytics (BDA) market was valued at 168.8 billion U.S. dollars in 2018 and is forecast to grow to 215.7 billion U.S. dollars by 2021. In 2021, more than half of BDA spending will go towards services. IT services is projected to make up around 85 billion U.S. dollars, and business services will account for the remainder. Big data High volume, high velocity and high variety: one or more of these characteristics is used to define big data, the kind of data sets that are too large or too complex for traditional data processing applications. Fast-growing mobile data traffic, cloud computing traffic, as well as the rapid development of technologies such as artificial intelligence (AI) and the Internet of Things (IoT) all contribute to the increasing volume and complexity of data sets. For example, connected IoT devices are projected to generate 79.4 ZBs of data in 2025. Business analytics Advanced analytics tools, such as predictive analytics and data mining, help to extract value from the data and generate business insights. The size of the business intelligence and analytics software application market is forecast to reach around 16.5 billion U.S. dollars in 2022. Growth in this market is driven by a focus on digital transformation, a demand for data visualization dashboards, and an increased adoption of cloud.
d
Summary of selected characteristics of large reservoirs
catalog.data.gov
data.usgs.gov
+1more
Updated Oct 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Summary of selected characteristics of large reservoirs [Dataset]. https://catalog.data.gov/dataset/summary-of-selected-characteristics-of-large-reservoirs
Explore at:
Dataset updated
Oct 5, 2024
Dataset provided by
U.S. Geological Survey
Description
This is a point coverage of dams in the United States and Puerto Rico, which originally was derived from the national inventory of dams data base (U.S. Army Corps of Engineers, 1982). The coverage includes locations of and selected characteristics of approximately 2,700 reservoirs and controlled natural lakes that have normal capacities of at least 5,000 acre-feet or maximum capacities of at least 25,000 acre-feet and that were completed as of January 1, 1988.
Forecast revenue big data market worldwide 2011-2027
statista.com
Updated Feb 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Forecast revenue big data market worldwide 2011-2027 [Dataset]. https://www.statista.com/statistics/254266/global-big-data-market-forecast/
Explore at:
Dataset updated
Feb 13, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
The global big data market is forecasted to grow to 103 billion U.S. dollars by 2027, more than double its expected market size in 2018. With a share of 45 percent, the software segment would become the large big data market segment by 2027.

What is Big data?

Big data is a term that refers to the kind of data sets that are too large or too complex for traditional data processing applications. It is defined as having one or some of the following characteristics: high volume, high velocity or high variety. Fast-growing mobile data traffic, cloud computing traffic, as well as the rapid development of technologies such as artificial intelligence (AI) and the Internet of Things (IoT) all contribute to the increasing volume and complexity of data sets.

Big data analytics

Advanced analytics tools, such as predictive analytics and data mining, help to extract value from the data and generate new business insights. The global big data and business analytics market was valued at 169 billion U.S. dollars in 2018 and is expected to grow to 274 billion U.S. dollars in 2022. As of November 2018, 45 percent of professionals in the market research industry reportedly used big data analytics as a research method.
Local and big brands characteristics according to European consumers 2018
statista.com
Updated Jan 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Local and big brands characteristics according to European consumers 2018 [Dataset]. https://www.statista.com/statistics/1080774/local-and-big-brands-characteristics-according-to-european-consumers-2018/
Explore at:
Dataset updated
Jan 14, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2018
Area covered
EU
Description
In Europe, local brands were perceived as more sustainable than big brands, a survey revealed. 45 percent of European consumers believed that local brands respected the environment and the resources of territory, while only 17 percent of respondents thought the same about big brands. Additionally, 67 percent of consumers associated local brands with values such as fairness, transparency, honesty, and integrity.
Heat Wave Characteristics in 50 Large U.S. Cities, 1961–2023
catalog.data.gov
s.cnmilf.com
Updated Feb 25, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Environmental Protection Agency, Office of Air and Radiation (Publisher) (2025). Heat Wave Characteristics in 50 Large U.S. Cities, 1961–2023 [Dataset]. https://catalog.data.gov/dataset/heat-wave-characteristics-in-50-large-u-s-cities-196120236
Explore at:
Dataset updated
Feb 25, 2025
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Area covered
United States
Description
These maps show changes in the number of heat waves per year (frequency); the average length of heat waves in days (duration); the number of days between the first and last heat wave of the year (season length); and how hot the heat waves were, compared with the local temperature threshold for defining a heat wave (intensity). These data were analyzed from 1961 to 2023 for 50 large metropolitan areas. The size of each circle indicates the rate of change per decade. Solid-color circles represent cities where the trend was statistically significant. For more information: www.epa.gov/climate-indicators
d
Identifying Refactoring Opportunities for Large Packages by Analyzing...
catalogue.data.govt.nz
Updated May 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Identifying Refactoring Opportunities for Large Packages by Analyzing Maintainability Characteristics in Java OSS - Dataset - data.govt.nz - discover and use data [Dataset]. https://catalogue.data.govt.nz/dataset/oai-figshare-com-article-14460054
Explore at:
Dataset updated
May 2, 2023
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is provided as a supplementary material for the paper (entitled: Identifying Refactoring Opportunities for Large Packages by Analyzing Maintainability Characteristics in Java OSS). A README file is included with the description of the dataset and Python script used for performing the empirical analysis.
c
Financial Characteristics of Large British Companies, 1880-1926
datacatalogue.cessda.eu
beta.ukdataservice.ac.uk
Updated Nov 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Delargy, R., London School of Economics and Political Science; Kennedy, W., London School of Economics and Political Science (2024). Financial Characteristics of Large British Companies, 1880-1926 [Dataset]. http://doi.org/10.5255/UKDA-SN-4244-2
Explore at:
Unique identifier
https://doi.org/10.5255/UKDA-SN-4244-2
Dataset updated
Nov 28, 2024
Dataset provided by
Department of Economic History
Authors
Delargy, R., London School of Economics and Political Science; Kennedy, W., London School of Economics and Political Science
Time period covered
Jan 1, 1993 - Jan 1, 2020
Area covered
United Kingdom
Variables measured
Institutions/organisations, National, Companies
Measurement technique
Transcription
Description
Abstract copyright UK Data Service and data collection copyright owner.

The main aims of the project are three-fold:

(1) To identify and describe the development of the electricity industry in Britain prior to the formation of the national grid using company market-based financial data; and

(2) To construct a consistent data set of the key market-based financial characteristics of the principle companies.

(3) To use this to examine the development of the British electricity industry compared to similar development in the United States and Germany.
Latest edition information
For the second edition (October 2021) data and documentation relating to Swan United Electric Light Company Limited (1882-1894); Edison and Swan United Electric Light Company Limited (1882-1914), Anglo-American Brush Electric Light and Power Corporation Limited (1882-1889) and Brush Electrical Engineering Limited (1889-1914) were added to the study.

Main Topics:

The key financial characteristics are:

(1) The quarterly (January, April, July and October) market closing price of each traded security for each identified company;

(2) The number of each security outstanding, both traded and non-traded, at the end of each quarter;

(3) Multiplied together, characteristics (1) and (2) produce quarterly market capitalization for quoted companies by security. Summing all quoted securities issued by a company produces for each quarter its total market capitalizations. The value of non-quoted securities can be estimated from the dividend and interest payments they make.

(4) The paid-up amount for each security, including premiums and discounts (if any);

(5) The nominal value of each security;

(6) The payments (if any) made in each quarter to holders of the securities.
De-identified article and author characteristics for a large data set of Web...
zenodo.org
txt
Updated Jan 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jens Peter Andersen; Jens Peter Andersen (2023). De-identified article and author characteristics for a large data set of Web of Science [Dataset]. http://doi.org/10.5281/zenodo.7573523
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7573523
Dataset updated
Jan 27, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jens Peter Andersen; Jens Peter Andersen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data set contains article and author characteristics for all records in the Web of Science, 2000-2020. Standard article identifiers have been removed and replaced with a document ID (`doc_id`), as linking to the original ID is not permitted.
Data_Sheet_1_One Social Media Company to Rule Them All: Associations Between...
frontiersin.figshare.com
figshare.com
xlsx
Updated Jun 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Davide Marengo; Cornelia Sindermann; Jon D. Elhai; Christian Montag (2023). Data_Sheet_1_One Social Media Company to Rule Them All: Associations Between Use of Facebook-Owned Social Media Platforms, Sociodemographic Characteristics, and the Big Five Personality Traits.xlsx [Dataset]. http://doi.org/10.3389/fpsyg.2020.00936.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fpsyg.2020.00936.s001
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Davide Marengo; Cornelia Sindermann; Jon D. Elhai; Christian Montag
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Currently, 2.7 billion people use at least one of the Facebook-owned social media platforms – Facebook, WhatsApp, and Instagram. Previous research investigating individual differences between users and non-users of these platforms has typically focused on one platform. However, individuals typically use a combination of Facebook-owned platforms. Therefore, we aim (1) to identify the relative prevalence of different patterns of social media use, and (2) to evaluate potential between-group differences in the distributions of age, gender, education, and Big Five personality traits. Data collection was performed using a cross-sectional design. Specifically, we administered a survey assessing participants’ demographic variables, current use of Facebook-owned platforms, and Big Five personality traits. In N = 3003 participants from the general population (60.67% females; mean age = 35.53 years, SD = 13.53), WhatsApp emerged as the most widely used application in the sample, and hence, has the strongest reach. A pattern consisting of a combined use of WhatsApp and Instagram appeared to be most prevalent among the youngest participants. Further, individuals using at least one social media platform were generally younger, more often female, and more extraverted than non-users. Small differences in Conscientiousness and Neuroticism also emerged across groups reporting different combinations of social media use. Interestingly, when examined as control variables, we found demographic characteristics partially accounted for differences in broad personality factors and facets across different patterns of social media use. Our findings are relevant to researchers carrying out their studies via social media platforms, as sample characteristics appear to be different depending on the platform used.
Big Mart Sales Prediction
kaggle.com
Updated Feb 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gaurav Dutta (2025). Big Mart Sales Prediction [Dataset]. https://www.kaggle.com/datasets/gauravduttakiit/big-mart-sales-prediction/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 8, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Gaurav Dutta
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
BigMart Sales Prediction Challenge

BigMart, a leading retail chain, aims to enhance its sales strategy by analyzing historical sales data. The goal is to develop a predictive model that estimates the sales of various products across different outlets, helping BigMart understand the key factors influencing sales performance.

Problem Statement

BigMart has gathered sales data from 2013 for 1,559 products sold across 10 stores in different cities. Along with sales figures, various product and store attributes have been recorded. The objective is to build a machine learning model that can accurately forecast the sales of products at specific outlets.

By leveraging this predictive model, BigMart can gain insights into product and store characteristics that drive sales growth, enabling better business decisions.

Challenges

The dataset may contain missing values due to unreported data from certain stores, requiring appropriate data preprocessing techniques.

Dataset Overview

Train Dataset (8,523 records)

Includes both input features and the target variable (Item_Outlet_Sales).

Product Features

Item_Identifier: Unique product ID

Item_Weight: Weight of the product

Item_Fat_Content: Fat level (low-fat or regular)

Item_Visibility: Percentage of display area allocated to the product

Item_Type: Category of the product

Item_MRP: Maximum Retail Price

Store Features

Outlet_Identifier: Unique store ID

Outlet_Establishment_Year: Year the store was established

Outlet_Size: Store size (small, medium, large)

Outlet_Location_Type: City tier classification

Outlet_Type: Type of outlet (grocery store, supermarket, etc.)

Target Variable

Item_Outlet_Sales: Sales of the product at a particular store (to be predicted)

Test Dataset (5,681 records)

Contains the same features as the train dataset except for Item_Outlet_Sales, which needs to be predicted.

Submission Format

Your model should generate a CSV file with the following columns:
- Item_Identifier: Unique product ID
- Outlet_Identifier: Unique store ID
- Item_Outlet_Sales: Predicted sales value

Reference

For more details, visit: Analytics Vidhya BigMart Sales III
h
OCEAN
huggingface.co
hf-proxy-cf.effarig.site
Updated Nov 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MTHR (2023). OCEAN [Dataset]. https://huggingface.co/datasets/MTHR/OCEAN
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 27, 2023
Dataset authored and provided by
MTHR
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Big Five Personality Traits

OCEAN

Openness Conscientiousness Extraversion Agreeableness Neuroticism
d
Large Scale Topo Cultural Feature (Line) (LGATE-141) - Datasets -...
catalogue.data.wa.gov.au
Updated Apr 11, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2019). Large Scale Topo Cultural Feature (Line) (LGATE-141) - Datasets - data.wa.gov.au [Dataset]. https://catalogue.data.wa.gov.au/dataset/large-scale-topo-cultural-feature-line
Explore at:
Dataset updated
Apr 11, 2019
Area covered
Western Australia
Description
Topographic features whose primary characteristics are of a general cultural type. Multiple points that describe a feature’s centreline or edge. NOTE: Landgate no longer maintains large scale topographic features. The large scale topographic data capture programme ceased in 2016. Please consider carefully the suitability of the data within this service for your purpose. © Western Australian Land Information Authority (Landgate). Use of Landgate data is subject to Personal Use License terms and conditions unless otherwise authorised under approved License terms and conditions.
d
Coded respondent survey data to analyze the impact of big five personality...
search.dataone.org
data.niaid.nih.gov
+1more
Updated Aug 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Navin Kumar Koodamara; Debika Layek; Suraj Noronha; Raveendra Rao (2024). Coded respondent survey data to analyze the impact of big five personality traits on student engagement consisting of their emotional and physical engagement [Dataset]. https://search.dataone.org/view/sha256%3A10e5bdc8ec8dec7be1d300edf517c9699e9ca283cf569c5fe6420259a4ae84c6
Explore at:
Dataset updated
Aug 28, 2024
Dataset provided by
Dryad Digital Repository
Authors
Navin Kumar Koodamara; Debika Layek; Suraj Noronha; Raveendra Rao
Description
In the present scholarly inquiry, the author leverages the framework of the Big Five personality traits and the concept of student engagement (dimensions include physical engagement, emotional engagement and cognitive engagement) to investigate the interplay within these constructs. More importantly, understanding the association between personality traits and various dimensions of student engagement helps teachers develop or adopt effective pedagogical practices. The research model was empirically assessed using a sample of 206 B-School students enrolled in a private business school in the southern region of India. The findings recommend that the personality components of conscientiousness and openness positively affect students' levels of physical engagement. Moreover, the study supports the beneficial impact of openness on emotional engagement and extraversion on cognitive engagement. It provides a comprehensive understanding of physical engagement as a mediating factor in the relati..., Source: Data was obtained from 206 postgraduate human resource management students from private B-schools in Sothern India. We restricted this study to human resource management students because of variability in student engagement behaviour across the courses Data collection methods: A non-experimental survey-based questionnaire was distributed online and offline to the respondents. A purposive sampling technique was followed. Respondents were provided a consent form where they were assured that any time of the survey, they could leave without any accountability. All personal details will be kept confidential. Data coding: The datasheet contains two sections. Demographic variables were captured through categorical scales and study variables were based on 5-item Likert scales where 5 =strongly agree, 4= agree, 3= neutral, 2= disagree, 1= strongly disagree. Items with negative intentions were coded reversely. There was no missing data, outliers and data transformations happened., , # Coded respondent survey data to analyze the impact of big five personality traits on student engagement consisting of their emotional and physical engagement.

https://doi.org/10.5061/dryad.h18931zvj

1.Â Â Â Â Â Title: Coded respondent survey data to analyze the impact of big five personality traits on student engagement consisting of their emotional and physical engagement.

2.Â Â Â Â Â Introduction: This study has introduced a conceptual framework to understand the connection between students personality traits and cognitive engagement with a special focus on mediating function of their physical and emotional engagement. This study contained 7 major study variables. To measure these constructs well-established measurement scales have been employed.

3.Â Â Â Â Â Dataset Description:

Source: Data was obtained from 206 postgraduate human resource management students from private B-schools in Sothern India. We restricted this study to hum...
e
Finance large enterprises by industry and various characteristics
data.europa.eu
atom feed, json
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Finance large enterprises by industry and various characteristics [Dataset]. https://data.europa.eu/data/datasets/1655-financi-n-grote-ondernemingen-naar-bedrijfstak-en-diverse-kenmerken/?locale=en
Explore at:
atom feed, jsonAvailable download formats
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Integral consolidated balance sheet and income statement of large non-financial corporations including all established in the Netherlands group companies. Relations with the foreign group companies be accountable as a group company abroad. The breakdown is by industry and then by balance sheet total, profit or loss and foreign intertwinedness. Data from 1977 to 2002. Frequency: As of statistical year 2003, this table will no longer be updated.
d
Large Scale Topo Cultural Feature (Polygon) (LGATE-143) - Datasets -...
catalogue.data.wa.gov.au
Updated Jul 10, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2019). Large Scale Topo Cultural Feature (Polygon) (LGATE-143) - Datasets - data.wa.gov.au [Dataset]. https://catalogue.data.wa.gov.au/dataset/large-scale-topo-cultural-feature-polygon
Explore at:
Dataset updated
Jul 10, 2019
Area covered
Western Australia
Description
Topographic features whose primary characteristics are of a general cultural type. Multiple points that describe a feature’s boundary. NOTE: Landgate no longer maintains large scale topographic features. The large scale topographic data capture programme ceased in 2016. Please consider carefully the suitability of the data within this service for your purpose. © Western Australian Land Information Authority (Landgate). Use of Landgate data is subject to Personal Use License terms and conditions unless otherwise authorised under approved License terms and conditions.
CompanyKG Dataset V2.0: A Large-Scale Heterogeneous Graph for Company...
zenodo.org
data.niaid.nih.gov
application/gzip, bin +1
Updated Jun 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lele Cao; Lele Cao; Vilhelm von Ehrenheim; Vilhelm von Ehrenheim; Mark Granroth-Wilding; Mark Granroth-Wilding; Richard Anselmo Stahl; Richard Anselmo Stahl; Drew McCornack; Drew McCornack; Armin Catovic; Armin Catovic; Dhiana Deva Cavacanti Rocha; Dhiana Deva Cavacanti Rocha (2024). CompanyKG Dataset V2.0: A Large-Scale Heterogeneous Graph for Company Similarity Quantification [Dataset]. http://doi.org/10.5281/zenodo.11391315
Explore at:
application/gzip, bin, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11391315
Dataset updated
Jun 4, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Lele Cao; Lele Cao; Vilhelm von Ehrenheim; Vilhelm von Ehrenheim; Mark Granroth-Wilding; Mark Granroth-Wilding; Richard Anselmo Stahl; Richard Anselmo Stahl; Drew McCornack; Drew McCornack; Armin Catovic; Armin Catovic; Dhiana Deva Cavacanti Rocha; Dhiana Deva Cavacanti Rocha
Time period covered
May 29, 2024
Description
CompanyKG is a heterogeneous graph consisting of 1,169,931 nodes and 50,815,503 undirected edges, with each node representing a real-world company and each edge signifying a relationship between the connected pair of companies.

Edges: We model 15 different inter-company relations as undirected edges, each of which corresponds to a unique edge type. These edge types capture various forms of similarity between connected company pairs. Associated with each edge of a certain type, we calculate a real-numbered weight as an approximation of the similarity level of that type. It is important to note that the constructed edges do not represent an exhaustive list of all possible edges due to incomplete information. Consequently, this leads to a sparse and occasionally skewed distribution of edges for individual relation/edge types. Such characteristics pose additional challenges for downstream learning tasks. Please refer to our paper for a detailed definition of edge types and weight calculations.

Nodes: The graph includes all companies connected by edges defined previously. Each node represents a company and is associated with a descriptive text, such as "Klarna is a fintech company that provides support for direct and post-purchase payments ...". To comply with privacy and confidentiality requirements, we encoded the text into numerical embeddings using four different pre-trained text embedding models: mSBERT (multilingual Sentence BERT), ADA2, SimCSE (fine-tuned on the raw company descriptions) and PAUSE.

Evaluation Tasks. The primary goal of CompanyKG is to develop algorithms and models for quantifying the similarity between pairs of companies. In order to evaluate the effectiveness of these methods, we have carefully curated three evaluation tasks:

Similarity Prediction (SP). To assess the accuracy of pairwise company similarity, we constructed the SP evaluation set comprising 3,219 pairs of companies that are labeled either as positive (similar, denoted by "1") or negative (dissimilar, denoted by "0"). Of these pairs, 1,522 are positive and 1,697 are negative.

Competitor Retrieval (CR). Each sample contains one target company and one of its direct competitors. It contains 76 distinct target companies, each of which has 5.3 competitors annotated in average. For a given target company A with N direct competitors in this CR evaluation set, we expect a competent method to retrieve all N competitors when searching for similar companies to A.

Similarity Ranking (SR) is designed to assess the ability of any method to rank candidate companies (numbered 0 and 1) based on their similarity to a query company. Paid human annotators, with backgrounds in engineering, science, and investment, were tasked with determining which candidate company is more similar to the query company. It resulted in an evaluation set comprising 1,856 rigorously labeled ranking questions. We retained 20% (368 samples) of this set as a validation set for model development.

Edge Prediction (EP) evaluates a model's ability to predict future or missing relationships between companies, providing forward-looking insights for investment professionals. The EP dataset, derived (and sampled) from new edges collected between April 6, 2023, and May 25, 2024, includes 40,000 samples, with edges not present in the pre-existing CompanyKG (a snapshot up until April 5, 2023).

Background and Motivation

In the investment industry, it is often essential to identify similar companies for a variety of purposes, such as market/competitor mapping and Mergers & Acquisitions (M&A). Identifying comparable companies is a critical task, as it can inform investment decisions, help identify potential synergies, and reveal areas for growth and improvement. The accurate quantification of inter-company similarity, also referred to as company similarity quantification, is the cornerstone to successfully executing such tasks. However, company similarity quantification is often a challenging and time-consuming process, given the vast amount of data available on each company, and the complex and diversified relationships among them.

While there is no universally agreed definition of company similarity, researchers and practitioners in PE industry have adopted various criteria to measure similarity, typically reflecting the companies' operations and relationships. These criteria can embody one or more dimensions such as industry sectors, employee profiles, keywords/tags, customers' review, financial performance, co-appearance in news, and so on. Investment professionals usually begin with a limited number of companies of interest (a.k.a. seed companies) and require an algorithmic approach to expand their search to a larger list of companies for potential investment.

In recent years, transformer-based Language Models (LMs) have become the preferred method for encoding textual company descriptions into vector-space embeddings. Then companies that are similar to the seed companies can be searched in the embedding space using distance metrics like cosine similarity. The rapid advancements in Large LMs (LLMs), such as GPT-3/4 and LLaMA, have significantly enhanced the performance of general-purpose conversational models. These models, such as ChatGPT, can be employed to answer questions related to similar company discovery and quantification in a Q&A format.

However, graph is still the most natural choice for representing and learning diverse company relations due to its ability to model complex relationships between a large number of entities. By representing companies as nodes and their relationships as edges, we can form a Knowledge Graph (KG). Utilizing this KG allows us to efficiently capture and analyze the network structure of the business landscape. Moreover, KG-based approaches allow us to leverage powerful tools from network science, graph theory, and graph-based machine learning, such as Graph Neural Networks (GNNs), to extract insights and patterns to facilitate similar company analysis. While there are various company datasets (mostly commercial/proprietary and non-relational) and graph datasets available (mostly for single link/node/graph-level predictions), there is a scarcity of datasets and benchmarks that combine both to create a large-scale KG dataset expressing rich pairwise company relations.

Source Code and Tutorial:
https://github.com/llcresearch/CompanyKG2

Paper: to be published
Data from: Investigating the association between social interactions and...
zenodo.org
data.niaid.nih.gov
+1more
bin, csv, txt
Updated May 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Didem Gundogdu; Ailbhe N. Finnerty; Jacopo Staiano; Stefano Teso; Andrea Passerini; Fabio Pianesi; Bruno Lepri; Didem Gundogdu; Ailbhe N. Finnerty; Jacopo Staiano; Stefano Teso; Andrea Passerini; Fabio Pianesi; Bruno Lepri (2022). Data from: Investigating the association between social interactions and personality states dynamics [Dataset]. http://doi.org/10.5061/dryad.b88c7
Explore at:
txt, csv, binAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.b88c7
Dataset updated
May 31, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Didem Gundogdu; Ailbhe N. Finnerty; Jacopo Staiano; Stefano Teso; Andrea Passerini; Fabio Pianesi; Bruno Lepri; Didem Gundogdu; Ailbhe N. Finnerty; Jacopo Staiano; Stefano Teso; Andrea Passerini; Fabio Pianesi; Bruno Lepri
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The recent personality psychology literature has coined the name of personality states to refer to states having the same behavioural, affective and cognitive content (described by adjectives) as the corresponding trait, but for a shorter duration. The variability in personality states may be the reaction to specific characteristics of situations. The aim of our study is to investigate whether specific situational factors, that is, different configurations of face-to-face interactions, are predictors of variability of personality states in a work environment. The obtained results provide evidence that within-person variability in personality is associated with variation in face-to-face interactions. Interestingly, the effects differ by type and level of the personality states: adaptation effects for Agreeableness and Emotional Stability, whereby the personality states of an individual trigger similar states in other people interacting with them and complementarity effects for Openness to Experience, whereby the personality states of an individual trigger opposite states in other people interacting with them. Overall, these findings encourage further research to characterize face-to-face and social interactions in terms of their relevance to personality states.
m
Data from: The relative importance of ski resort- and weather-related...
data.mendeley.com
narcis.nl
Updated Jun 8, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erik Haugom (2021). The relative importance of ski resort- and weather-related characteristics when going alpine skiing: data from a rating-based conjoint survey [Dataset]. http://doi.org/10.17632/6w4tzrs3yw.1
Explore at:
Unique identifier
https://doi.org/10.17632/6w4tzrs3yw.1
Dataset updated
Jun 8, 2021
Authors
Erik Haugom
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Alps
Description
The data are related to two research articles: “The relative importance of ski resort-and weather-related characteristics when going alpine skiing” [1] and “Optimal pricing of alpine ski passes in the case of crowdedness and reduced skiing capacity” [2]. A rating-based conjoint survey experiment on active alpine skiers at a big ski area located in Inland Norway was performed in February of 2018 to collect the data and pertain to 400 respondents doing more than 7200 ratings. A total of ten versions of the same questionnaire type were used to obtain information about preferences on ski resort- and weather-related characteristics when going alpine skiing. We display the raw data organized such that they can be easily downloaded and used directly to either (1) replicate the analyses performed in the related research articles, or (2) run one’s own analyses on the topic of interest. The data may also be useful to lecturers teaching students about the key concepts of survey experiments and causal modelling.
d
Big family, warm home, and lots of friends: Pteronotus large
datadryad.org
data.niaid.nih.gov
zip
Updated Feb 6, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jennifer Barros; Enrico Bernard (2023). Big family, warm home, and lots of friends: Pteronotus large [Dataset]. http://doi.org/10.5061/dryad.wm37pvms1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.wm37pvms1
Dataset updated
Feb 6, 2023
Dataset provided by
Dryad
Authors
Jennifer Barros; Enrico Bernard
Time period covered
2023
Description
We used presence and absence data from the bat species to analyze, using mvabund package, the relationship with the caves characteristics. The data is organized in a spreadsheet containing in each column the values for the caves features, followed by the data of presence/absence of each species. The file “mvabund” was used as an input in R on the script described below.

Facebook

Twitter

Click to copy link

Link copied

Cite

Market Research Forecast (2024). Big Data Analytics Market in Energy Sector Report [Dataset]. https://www.marketresearchforecast.com/reports/big-data-analytics-market-in-energy-sector-5888

Big Data Analytics Market in Energy Sector Report

Explore at:

ppt, doc, pdfAvailable download formats

Dataset updated

Dec 5, 2024

Dataset authored and provided by

Market Research Forecast

License

https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

Time period covered

2025 - 2033

Area covered

Global

Variables measured

Market Size

Description

The Big Data Analytics Market in Energy Sector size was valued at USD 9.56 USD Billion in 2023 and is projected to reach USD 13.81 USD Billion by 2032, exhibiting a CAGR of 5.4 % during the forecast period. Big Data Analytics in the energy sector can be defined as the application of sophisticated methods or tools in analyzing vast collections of information that are produced by numerous entities within the energy industry. This process covers descriptive, predictive, and prescriptive analytics to provide valuable information for procedures, costs, and strategies. Real-time analytics, etc are immediate, while predictive analytics focuses on the probability to happen in the future and prescriptive analytics solutions provide recommendations for action. Some of the main characteristics of the data collectors include handling large datasets, compatibility with IoT to stream data, and machine learning features for pattern detection. These can range from grid control and load management to predicting customer demand and equipment reliability and equipment efficiency enhancement. Thus, there is a significant advantage because Big Data Analytics helps global energy companies to increase performance, minimize sick time, and develop effective strategies to meet the necessary legal demands. Key drivers for this market are: Growing Focus on Safety and Organization to Fuel Market Growth. Potential restraints include: Higher Cost of Geotechnical Services to Hinder Market Growth. Notable trends are: Growth of IT Infrastructure to Bolster the Demand for Modern Cable Tray Management Solutions.

Clear search

Close search

Google apps

Main menu

Big Data Analytics Market in Energy Sector Report

Big data and business analytics revenue worldwide 2015-2022

Summary of selected characteristics of large reservoirs

Forecast revenue big data market worldwide 2011-2027

Local and big brands characteristics according to European consumers 2018

Heat Wave Characteristics in 50 Large U.S. Cities, 1961–2023

Identifying Refactoring Opportunities for Large Packages by Analyzing...

Financial Characteristics of Large British Companies, 1880-1926

De-identified article and author characteristics for a large data set of Web...

Data_Sheet_1_One Social Media Company to Rule Them All: Associations Between...

Big Mart Sales Prediction

Problem Statement

Challenges

Dataset Overview

Train Dataset (8,523 records)

Test Dataset (5,681 records)

Submission Format

Reference

OCEAN

Large Scale Topo Cultural Feature (Line) (LGATE-141) - Datasets -...

Coded respondent survey data to analyze the impact of big five personality...

Finance large enterprises by industry and various characteristics

Large Scale Topo Cultural Feature (Polygon) (LGATE-143) - Datasets -...

CompanyKG Dataset V2.0: A Large-Scale Heterogeneous Graph for Company...

Data from: Investigating the association between social interactions and...

Data from: The relative importance of ski resort- and weather-related...

Big family, warm home, and lots of friends: Pteronotus large

Big Data Analytics Market in Energy Sector Report