91 datasets found

house prices data exploration
kaggle.com
zip
Updated Sep 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
yvonne gatwiri (2024). house prices data exploration [Dataset]. https://www.kaggle.com/datasets/yvonnegatwiri/house-prices-data-exploration/suggestions
Explore at:
zip(165334 bytes)Available download formats
Dataset updated
Sep 13, 2024
Authors
yvonne gatwiri
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by yvonne gatwiri

Released under Apache 2.0

Contents
f
Data_Sheet_1_ImputEHR: A Visualization Tool of Imputation for the Prediction...
figshare.com
frontiersin.figshare.com
pdf
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yi-Hui Zhou; Ehsan Saghapour (2023). Data_Sheet_1_ImputEHR: A Visualization Tool of Imputation for the Prediction of Biomedical Data.PDF [Dataset]. http://doi.org/10.3389/fgene.2021.691274.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fgene.2021.691274.s001
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers
Authors
Yi-Hui Zhou; Ehsan Saghapour
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Electronic health records (EHRs) have been widely adopted in recent years, but often include a high proportion of missing data, which can create difficulties in implementing machine learning and other tools of personalized medicine. Completed datasets are preferred for a number of analysis methods, and successful imputation of missing EHR data can improve interpretation and increase our power to predict health outcomes. However, use of the most popular imputation methods mainly require scripting skills, and are implemented using various packages and syntax. Thus, the implementation of a full suite of methods is generally out of reach to all except experienced data scientists. Moreover, imputation is often considered as a separate exercise from exploratory data analysis, but should be considered as art of the data exploration process. We have created a new graphical tool, ImputEHR, that is based on a Python base and allows implementation of a range of simple and sophisticated (e.g., gradient-boosted tree-based and neural network) data imputation approaches. In addition to imputation, the tool enables data exploration for informed decision-making, as well as implementing machine learning prediction tools for response data selected by the user. Although the approach works for any missing data problem, the tool is primarily motivated by problems encountered for EHR and other biomedical data. We illustrate the tool using multiple real datasets, providing performance measures of imputation and downstream predictive analysis.
Multi-Dimensional Data Viewer (MDV) user manual for data exploration:...
zenodo.org
pdf, zip
Updated Jul 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maria Kiourlappou; Maria Kiourlappou; Martin Sergeant; Martin Sergeant; Joshua S. Titlow; Joshua S. Titlow; Jeffrey Y. Lee; Jeffrey Y. Lee; Darragh Ennis; Stephen Taylor; Stephen Taylor; Ilan Davis; Ilan Davis; Darragh Ennis (2024). Multi-Dimensional Data Viewer (MDV) user manual for data exploration: "Systematic analysis of YFP traps reveals common discordance between mRNA and protein across the nervous system" [Dataset]. http://doi.org/10.5281/zenodo.7875495
Explore at:
zip, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7875495
Dataset updated
Jul 12, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Maria Kiourlappou; Maria Kiourlappou; Martin Sergeant; Martin Sergeant; Joshua S. Titlow; Joshua S. Titlow; Jeffrey Y. Lee; Jeffrey Y. Lee; Darragh Ennis; Stephen Taylor; Stephen Taylor; Ilan Davis; Ilan Davis; Darragh Ennis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description

Please also see the latest version of the repository:
https://doi.org/10.5281/zenodo.6374011 and
our website: https://ilandavis.com/jcb2023-yfp

The explosion in the volume of biological imaging data challenges the available technologies for data interrogation and its intersection with related published bioinformatics data sets. Moreover, intersection of highly rich and complex datasets from different sources provided as flat csv files requires advanced informatics skills, which is time consuming and not accessible to all. Here, we provide a “user manual” to our new paradigm for systematically filtering and analysing a dataset with more than 1300 microscopy data figures using Multi-Dimensional Viewer (MDV) -link, a solution for interactive multimodal data visualisation and exploration. The primary data we use are derived from our published systematic analysis of 200 YFP traps reveals common discordance between mRNA and protein across the nervous system (eprint link). This manual provides the raw image data together with the expert annotations of the mRNA and protein distribution as well as associated bioinformatics data. We provide an explanation, with specific examples, of how to use MDV to make the multiple data types interoperable and explore them together. We also provide the open-source python code (github link) used to annotate the figures, which could be adapted to any other kind of data annotation task.
Zegami user manual for data exploration: "Systematic analysis of YFP gene...
zenodo.org
pdf, zip
Updated Jul 17, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maria Kiourlappou; Maria Kiourlappou; Stephen Taylor; Ilan Davis; Ilan Davis; Stephen Taylor (2024). Zegami user manual for data exploration: "Systematic analysis of YFP gene traps reveals common discordance between mRNA and protein across the nervous system" [Dataset]. http://doi.org/10.5281/zenodo.6374012
Explore at:
pdf, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6374012
Dataset updated
Jul 17, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Maria Kiourlappou; Maria Kiourlappou; Stephen Taylor; Ilan Davis; Ilan Davis; Stephen Taylor
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The explosion in biological data generation challenges the available technologies and methodologies for data interrogation. Moreover, highly rich and complex datasets together with diverse linked data are difficult to explore when provided in flat files. Here we provide a way to filter and analyse in a systematic way a dataset with more than 18 thousand data points using Zegami, a solution for interactive data visualisation and exploration. The primary data we use are derived from a systematic analysis of 200 YFP gene traps reveals common discordance between mRNA and protein across the nervous system which is submitted elsewhere. This manual provides the raw image data together with annotations and associated data and explains how to use Zegami for exploring all these data types together by providing specific examples. We also provide the open source python code used to annotate the figures.
Big Data in Oil and Gas Exploration & Production Market Size & Share...
mordorintelligence.com
pdf,excel,csv,ppt
Updated Sep 4, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mordor Intelligence (2015). Big Data in Oil and Gas Exploration & Production Market Size & Share Analysis - Industry Research Report - Growth Trends [Dataset]. https://www.mordorintelligence.com/industry-reports/big-data-in-oil-and-gas-exploration-and-production-market
Explore at:
pdf,excel,csv,pptAvailable download formats
Dataset updated
Sep 4, 2015
Dataset authored and provided by
Mordor Intelligence
License
https://www.mordorintelligence.com/privacy-policyhttps://www.mordorintelligence.com/privacy-policy
Time period covered
2020 - 2030
Area covered
Global
Description
Big Data in the oil and gas exploration and production market is segmented by Product (Hardware, Software, and Services) and Geography (North America, Europe, Asia-Pacific, South America, and the Middle-East and Africa).
d
An example data set for exploration of Multiple Linear Regression
catalog.data.gov
data.usgs.gov
+2more
Updated Jul 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). An example data set for exploration of Multiple Linear Regression [Dataset]. https://catalog.data.gov/dataset/an-example-data-set-for-exploration-of-multiple-linear-regression
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
This data set contains example data for exploration of the theory of regression based regionalization. The 90th percentile of annual maximum streamflow is provided as an example response variable for 293 streamgages in the conterminous United States. Several explanatory variables are drawn from the GAGES-II data base in order to demonstrate how multiple linear regression is applied. Example scripts demonstrate how to collect the original streamflow data provided and how to recreate the figures from the associated Techniques and Methods chapter.
Iterative Imputation of Jane St train.csv
kaggle.com
Updated Nov 29, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
tpmeli (2020). Iterative Imputation of Jane St train.csv [Dataset]. https://www.kaggle.com/tpmeli/iterative-imputation-of-jane-st-traincsv/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 29, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
tpmeli
Description
I will be sharing all of my missing data exploration here:

https://www.kaggle.com/tpmeli/missing-data-exploration-mean-iterative-more
Dataset for Exploring case-control samples with non-targeted analysis
catalog.data.gov
datasets.ai
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Dataset for Exploring case-control samples with non-targeted analysis [Dataset]. https://catalog.data.gov/dataset/dataset-for-exploring-case-control-samples-with-non-targeted-analysis
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
These data contain the results of GC-MS, LC-MS and immunochemistry analyses of mask sample extracts. The data include tentatively identified compounds through library searches and compound abundance. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: The data can not be accessed. Format: The dataset contains the identification of compounds found in the mask samples as well as the abundance of those compounds for individuals who participated in the trial. This dataset is associated with the following publication: Pleil, J., M. Wallace, J. McCord, M. Madden, J. Sobus, and G. Ferguson. How do cancer-sniffing dogs sort biological samples? Exploring case-control samples with non-targeted LC-Orbitrap, GC-MS, and immunochemistry methods. Journal of Breath Research. Institute of Physics Publishing, Bristol, UK, 14(1): 016006, (2019).
The values of betweenness, closeness, and Eigenvector centrality for one...
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Komenda; Martin Víta; Christos Vaitsis; Daniel Schwarz; Andrea Pokorná; Nabil Zary; Ladislav Dušek (2023). The values of betweenness, closeness, and Eigenvector centrality for one particular subset within the analyzed medical curriculum. [Dataset]. http://doi.org/10.1371/journal.pone.0143748.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0143748.t003
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Martin Komenda; Martin Víta; Christos Vaitsis; Daniel Schwarz; Andrea Pokorná; Nabil Zary; Ladislav Dušek
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The values of betweenness, closeness, and Eigenvector centrality for one particular subset within the analyzed medical curriculum.
c
Looking for data (Expert interviews)
datacatalogue.cessda.eu
search.gesis.org
+1more
Updated Mar 11, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Friedrich, Tanja (2023). Looking for data (Expert interviews) [Dataset]. http://doi.org/10.7802/1.1943
Explore at:
Unique identifier
https://doi.org/10.7802/1.1943
Dataset updated
Mar 11, 2023
Dataset provided by
GESIS - Leibniz-Institut für Sozialwissenschaften
Authors
Friedrich, Tanja
Area covered
Germany
Measurement technique
Persönliches Interview
Description
These interview data are part of the project "Looking for data: information seeking behaviour of survey data users", a study of secondary data users’ information-seeking behaviour. The overall goal of this study was to create evidence of actual information practices of users of one particular retrieval system for social science data in order to inform the development of research data infrastructures that facilitate data sharing. In the project, data were collected based on a mixed methods design. The research design included a qualitative study in the form of expert interviews and – building on the results found therein – a quantitative web survey of secondary survey data users. For the qualitative study, expert interviews with six reference persons of a large social science data archive have been conducted. They were interviewed in their role as intermediaries who provide guidance for secondary users of survey data. The knowledge from their reference work was expected to provide a condensed view of goals, practices, and problems of people who are looking for survey data. The anonymized transcripts of these interviews are provided here. They can be reviewed or reused upon request. The survey dataset from the quantitative study of secondary survey data users is downloadable through this data archive after registration. The core result of the Looking for data study is that community involvement plays a pivotal role in survey data seeking. The analyses show that survey data communities are an important determinant in survey data users' information seeking behaviour and that community involvement facilitates data seeking and has the capacity of reducing problems or barriers. The qualitative part of the study was designed and conducted using constructivist grounded theory methodology as introduced by Kathy Charmaz (2014). In line with grounded theory methodology, the interviews did not follow a fixed set of questions, but were conducted based on a guide that included areas of exploration with tentative questions. This interview guide can be obtained together with the transcript. For the Looking for data project, the data were coded and scrutinized by constant comparison, as proposed by grounded theory methodology. This analysis resulted in core categories that make up the "theory of problem-solving by community involvement". This theory was exemplified in the quantitative part of the study. For this exemplification, the following hypotheses were drawn from the qualitative study: (1) The data seeking hypotheses: (1a) When looking for data, information seeking through personal contact is used more often than impersonal ways of information seeking. (1b) Ways of information seeking (personal or impersonal) differ with experience. (2) The experience hypotheses: (2a) Experience is positively correlated with having ambitious goals. (2b) Experience is positively correlated with having more advanced requirements for data. (2c) Experience is positively correlated with having more specific problems with data. (3) The community involvement hypothesis: Experience is positively correlated with community involvement. (4) The problem solving hypothesis: Community involvement is positively correlated with problem solving strategies that require personal interactions.
Meta Kaggle Code
kaggle.com
zip
Updated Mar 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code
Explore at:
zip(133186454988 bytes)Available download formats
Dataset updated
Mar 20, 2025
Dataset authored and provided by
Kagglehttp://kaggle.com/
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Explore our public notebook content!

Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

Why we’re releasing this dataset

By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

Sensitive data

While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

Joining with Meta Kaggle

The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

File organization

The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

Questions / Comments

We love feedback! Let us know in the Discussion tab.

Happy Kaggling!
Data Visualization Tools Market Analysis North America, Europe, APAC, South...
technavio.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technavio, Data Visualization Tools Market Analysis North America, Europe, APAC, South America, Middle East and Africa - US, UK, China, Japan, Canada, Germany, France, India, Brazil, Italy - Size and Forecast 2025-2029 [Dataset]. https://www.technavio.com/report/data-visualization-tools-market-industry-analysis
Explore at:
Dataset provided by
TechNavio
Authors
Technavio
Time period covered
2021 - 2025
Area covered
United Kingdom, Germany, Europe, Japan, United States, Global
Description
Snapshot img

Data Visualization Tools Market Size 2025-2029

The data visualization tools market size is forecast to increase by USD 7.95 billion at a CAGR of 11.2% between 2024 and 2029.

The market is experiencing significant growth, driven by the increasing demand for business intelligence and AI-powered insights. With the rising complexity and voluminous data being generated across industries, there is a pressing need for effective data visualization tools to make data-driven decisions. This trend is particularly prominent in sectors such as healthcare, finance, and retail, where large datasets are common. Moreover, the automation of data visualization is another key driver, enabling organizations to save time and resources by streamlining the data analysis process. However, challenges such as data security concerns, lack of standardization, and integration issues persist, necessitating continuous innovation and investment in advanced technologies. Companies seeking to capitalize on this market opportunity must focus on addressing these challenges through user-friendly interfaces, security features, and seamless integration capabilities. Additionally, partnerships and collaborations with industry leaders and emerging technologies, such as machine learning and artificial intelligence, can provide a competitive edge in this rapidly evolving market.

What will be the Size of the Data Visualization Tools Market during the forecast period?

Request Free SampleThe market is experiencing growth, driven by the increasing demand for intuitive and interactive ways to analyze complex data. The market encompasses a range of solutions, including visual analytics tools and cloud-based services. The services segment, which includes integration services, is also gaining traction due to the growing need for customized and comprehensive data visualization solutions. Small and Medium-sized Enterprises (SMEs) are increasingly adopting these tools to gain insights into customer behavior and enhance decision-making. Cloud-based data visualization tools are becoming increasingly popular due to their flexibility, scalability, and cost-effectiveness. Security remains a key concern, with data security features becoming a priority for companies. Additionally, the integration of advanced technologies such as artificial intelligence (AI), machine learning (ML), augmented reality (AR), and virtual reality (VR) is transforming the market, enabling more and interactive data exploration experiences. Overall, the market is poised for continued expansion, offering significant opportunities for businesses seeking to gain a competitive edge through data-driven insights.

How is this Data Visualization Tools Industry segmented?

The data visualization tools industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments. DeploymentOn-premisesCloudCustomer TypeLarge enterprisesSMEsComponentSoftwareServicesApplicationHuman resourcesFinanceOthersEnd-userBFSIIT and telecommunicationHealthcareRetailOthersGeographyNorth AmericaUSCanadaEuropeFranceGermanyItalyUKAPACChinaIndiaJapanSouth AmericaBrazilMiddle East and Africa

By Deployment Insights

The on-premises segment is estimated to witness significant growth during the forecast period.The market has experienced substantial growth due to the increasing demand for data-driven insights in businesses. On-premises deployment of these tools allows organizations to maintain control over their data, ensuring data security, privacy, and adherence to regulatory requirements. This deployment model is ideal for enterprises dealing with sensitive information, as it restricts data transmission to cloud-based solutions. In addition, cloud-based solutions offer real-time data analysis, innovative solutions, integration services, customized dashboards, and mobile access. Advanced technologies like artificial intelligence (AI), machine learning (ML), Augmented Reality (AR), Virtual Reality (VR), and Business Intelligence (BI) are integrated into these tools to provide strategic insights from unstructured data. Data collection, maintenance, sharing, and analysis are simplified, enabling businesses to make informed decisions based on customer behavior and preferences. Key players in this market include , , and others, providing professional expertise and resources for data scientists and programmers using various programming languages.

Get a glance at the market report of share of various segments Request Free Sample

The On-premises segment was valued at USD 4.15 billion in 2019 and showed a gradual increase during the forecast period.

Regional Analysis

North America is estimated to contribute 31% to the growth of the global market during the forecast period.Technavio’s an
Data from: Australian Mineral Exploration: analysis and implications
ecat.ga.gov.au
data.wu.ac.at
Updated Jan 1, 2002
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Commonwealth of Australia (Geoscience Australia) (2002). Australian Mineral Exploration: analysis and implications [Dataset]. https://ecat.ga.gov.au/geonetwork/srv/api/records/a05f7892-db67-7506-e044-00144fdd4fa6
Explore at:
www:link-1.0-http--linkAvailable download formats
Dataset updated
Jan 1, 2002
Dataset provided by
Geoscience Australiahttp://ga.gov.au/
Area covered
Asia
Description
Australian mineral exploration is at a 20 year low in real terms. After doubling in line with global exploration activity during the 1990s, exploration expenditure peaked in 1996/97 and then fell sharply. The current decline differs from previous downturns in exploration that have occurred as part of the economic cycle as it is accompanied by major structural changes in the industry. Forces resulting in these changes are strongly inter-related and include:

• cost cutting to stay competitive in the face of low (declining) commodity prices • demand for greater return on shareholder investment • consolidation in response to globalisation • intense competition for risk capital (particularly for junior companies) from new sources • loss of confidence in exploration as an economic activity following declining rates of discovery and land access issues.

These factors have changed and continue to change the face of the industry.

Published in the Australasian Institute of Mining and Metallurgy Bulletin No. 1 Jan/Feb 2002, 45-52.
Train file sizes Google Identify Contrails
kaggle.com
Updated May 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sergey Saharovskiy (2023). Train file sizes Google Identify Contrails [Dataset]. https://www.kaggle.com/datasets/sergiosaharovskiy/train-file-sizes-google-identify-contrails/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 11, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sergey Saharovskiy
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset comprises metadata for 225,819 train files Google Research - Identify Contrails to Reduce Global Warming challenge.

The code was obtained by using a simple bash script:

shopt -s globstar dotglob nullglob for pathname in train/**/*; do if [[ -f $pathname ]] && [[ ! -h $pathname ]]; then stat -c $'%s\t%n' "$pathname" fi done >train_file_sizes.csv

After the bash script, the file was preprocessed with the following python code:

train_sizes = pd.read_csv('data/train_file_sizes.csv', delim_whitespace=True, names=['file_size', 'file_path']) train_sizes['record_id'] = train_sizes.file_path.str.split('/', expand=True)[1].astype(int) train_sizes.to_csv('data/train_file_sizes.csv', index=False)
w
Statistics for ecologists using R and Excel : data collection,...
workwithdata.com
Updated Sep 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2022). Statistics for ecologists using R and Excel : data collection, exploration,.. [Dataset]. https://www.workwithdata.com/object/statistics-for-ecologists-using-r-and-excel-data-collection-exploration-analysis-and-presentation-book-by-mark-gardener-0000
Explore at:
Dataset updated
Sep 21, 2022
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Statistics for ecologists using R and Excel : data collection, exploration, analysis and presentation is a book. It was written by Mark Gardener and published by Pelagic in 2012.
q
Introduction to Primate Data Exploration and Linear Modeling with R
qubeshub.org
Updated Jun 26, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Raisa Hernández-Pacheco; Alexandra Bland; Alexis Diaz; Alexandra Rosati; Stephanie Gonzalez (2023). Introduction to Primate Data Exploration and Linear Modeling with R [Dataset]. http://doi.org/10.25334/T0ZY-PK40
Explore at:
Unique identifier
https://doi.org/10.25334/T0ZY-PK40
Dataset updated
Jun 26, 2023
Dataset provided by
QUBES
Authors
Raisa Hernández-Pacheco; Alexandra Bland; Alexis Diaz; Alexandra Rosati; Stephanie Gonzalez
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Introduction to Primate Data Exploration and Linear Modeling with R was created with the goal of providing training to undergraduate biology research students on data management and statistical analysis using authentic data of Cayo Santiago rhesus macaques.
E
Exploration Software Report
archivemarketresearch.com
doc, pdf, ppt
Updated Feb 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). Exploration Software Report [Dataset]. https://www.archivemarketresearch.com/reports/exploration-software-50857
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
Feb 24, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global Exploration Software market is projected to reach $230.5 million by 2033, expanding at a CAGR of 6.9% from 2025 to 2033. The increasing demand for efficient and cost-effective exploration solutions, coupled with the growing adoption of digital technologies in the oil and gas industry, is driving market growth. The market is segmented based on type (cloud-based and web-based) and application (large enterprises and SMEs). Key market players include Schlumberger, Sintef, Petrel E&P, Quorum, geoSCOUT, Exprodat, and others. The market is primarily driven by the rising need for accurate and real-time data in exploration activities. Exploration software provides comprehensive data analysis, visualization, and modeling capabilities, enabling geologists and engineers to make informed decisions. The adoption of cloud-based solutions is further fueling market growth, as it offers flexibility, scalability, and cost-effectiveness. However, factors such as data security concerns and the availability of skilled professionals may restrain market growth to some extent. Geographically, North America and Europe are expected to be major contributors to the market, while Asia Pacific is projected to witness significant growth potential in the coming years.
d
Exploration Gap Assessment (FY13 Update)
catalog.data.gov
Updated Jan 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Renewable Energy Laboratory (2025). Exploration Gap Assessment (FY13 Update) [Dataset]. https://catalog.data.gov/dataset/exploration-gap-assessment-fy13-update-07cc7
Explore at:
Dataset updated
Jan 20, 2025
Dataset provided by
National Renewable Energy Laboratory
Description
This submission contains an update to the previous Exploration Gap Assessment funded in 2012, which identify high potential hydrothermal areas where critical data are needed (gap analysis on exploration data). The uploaded data are contained in two data files for each data category: A shape (SHP) file containing the grid, and a data file (CSV) containing the individual layers that intersected with the grid. This CSV can be joined with the map to retrieve a list of datasets that are available at any given site. A grid of the contiguous U.S. was created with 88,000 10-km by 10-km grid cells, and each cell was populated with the status of data availability corresponding to five data types: well data geologic maps fault maps geochemistry data geophysical data
O
DEEPEN Global Standardized Categorical Exploration Datasets for Magmatic...
data.openei.org
gdr.openei.org
+3more
data, website
Updated Jun 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicole Taverna; Nils Caliandro; Rachel King; Nicole Taverna; Nils Caliandro; Rachel King (2023). DEEPEN Global Standardized Categorical Exploration Datasets for Magmatic Plays [Dataset]. http://doi.org/10.15121/1995526
Explore at:
website, dataAvailable download formats
Unique identifier
https://doi.org/10.15121/1995526
Dataset updated
Jun 30, 2023
Dataset provided by
Open Energy Data Initiative (OEDI)
USDOE Office of Energy Efficiency and Renewable Energy (EERE), Multiple Programs (EE)
National Renewable Energy Laboratory
Authors
Nicole Taverna; Nils Caliandro; Rachel King; Nicole Taverna; Nils Caliandro; Rachel King
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
DEEPEN stands for DE-risking Exploration of geothermal Plays in magmatic ENvironments.

As part of the development of the DEEPEN 3D play fairway analysis (PFA) methodology for magmatic plays (conventional hydrothermal, superhot EGS, and supercritical), weights needed to be developed for use in the weighted sum of the different favorability index models produced from geoscientific exploration datasets. This was done using two different approaches: one based on expert opinions, and one based on statistical learning. This GDR submission includes the datasets used to produce the statistical learning-based weights.

While expert opinions allow us to include more nuanced information in the weights, expert opinions are subject to human bias. Data-centric or statistical approaches help to overcome these potential human biases by focusing on and drawing conclusions from the data alone. The drawback is that, to apply these types of approaches, a dataset is needed. Therefore, we attempted to build comprehensive standardized datasets mapping anomalies in each exploration dataset to each component of each play. This data was gathered through a literature review focused on magmatic hydrothermal plays along with well-characterized areas where superhot or supercritical conditions are thought to exist. Datasets were assembled for all three play types, but the hydrothermal dataset is the least complete due to its relatively low priority.

For each known or assumed resource, the dataset states what anomaly in each exploration dataset is associated with each component of the system. The data is only a semi-quantitative, where values are either high, medium, or low, relative to background levels. In addition, the dataset has significant gaps, as not every possible exploration dataset has been collected and analyzed at every known or suspected geothermal resource area, in the context of all possible play types. The following training sites were used to assemble this dataset: - Conventional magmatic hydrothermal: Akutan (from AK PFA), Oregon Cascades PFA, Glass Buttes OR, Mauna Kea (from HI PFA), Lanai (from HI PFA), Mt St Helens Shear Zone (from WA PFA), Wind River Valley (From WA PFA), Mount Baker (from WA PFA). - Superhot EGS: Newberry (EGS demonstration project), Coso (EGS demonstration project), Geysers (EGS demonstration project), Eastern Snake River Plain (EGS demonstration project), Utah FORGE, Larderello, Kakkonda, Taupo Volcanic Zone, Acoculco, Krafla. - Supercritical: Coso, Geysers, Salton Sea, Larderello, Los Humeros, Taupo Volcanic Zone, Krafla, Reyjanes, Hengill. **Disclaimer: Treat the supercritical fluid anomalies with skepticism. They are based on assumptions due to the general lack of confirmed supercritical fluid encounters and samples at the sites included in this dataset, at the time of assembling the dataset. The main assumption was that the supercritical fluid in a given geothermal system has shared properties with the hydrothermal fluid, which may not be the case in reality.

Once the datasets were assembled, principal component analysis (PCA) was applied to each. PCA is an unsupervised statistical learning technique, meaning that labels are not required on the data, that summarized the directions of variance in the data. This approach was chosen because our labels are not certain, i.e., we do not know with 100% confidence that superhot resources exist at all the assumed positive areas. We also do not have data for any known non-geothermal areas, meaning that it would be challenging to apply a supervised learning technique. In order to generate weights from the PCA, an analysis of the PCA loading values was conducted. PCA loading values represent how much a feature is contributing to each principal component, and therefore the overall variance in the data.
g
Geothermal Exploration Raster Files for Utah Play Fairway Analysis |...
gimi9.com
Updated Jul 1, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2017). Geothermal Exploration Raster Files for Utah Play Fairway Analysis | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_geothermal-exploration-raster-files-for-utah-play-fairway-analysis
Explore at:
Dataset updated
Jul 1, 2017
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Utah
Description
This submission contains raster files associated with several datasets that include earthquake density, Na/K geothermometers, fault density, heat flow, and gravity. Integrated together using spatial modeler tools in ArcGIS, these files can be used for play fairway analysis in regard to geothermal exploration.

Facebook

Twitter

Click to copy link

Link copied

Cite

yvonne gatwiri (2024). house prices data exploration [Dataset]. https://www.kaggle.com/datasets/yvonnegatwiri/house-prices-data-exploration/suggestions

house prices data exploration

Exploratory Data Analysis (EDA)

Explore at:

zip(165334 bytes)Available download formats

Dataset updated

Sep 13, 2024

Authors

yvonne gatwiri

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Dataset

This dataset was created by yvonne gatwiri

Released under Apache 2.0

Clear search

Close search

Google apps

Main menu

house prices data exploration

Dataset

Contents

Data_Sheet_1_ImputEHR: A Visualization Tool of Imputation for the Prediction...

Multi-Dimensional Data Viewer (MDV) user manual for data exploration:...

Zegami user manual for data exploration: "Systematic analysis of YFP gene...

Big Data in Oil and Gas Exploration & Production Market Size & Share...

An example data set for exploration of Multiple Linear Regression

Iterative Imputation of Jane St train.csv

I will be sharing all of my missing data exploration here:

Dataset for Exploring case-control samples with non-targeted analysis

The values of betweenness, closeness, and Eigenvector centrality for one...

Looking for data (Expert interviews)

Meta Kaggle Code

Explore our public notebook content!

Why we’re releasing this dataset

Sensitive data

Joining with Meta Kaggle

File organization

Questions / Comments

Data Visualization Tools Market Analysis North America, Europe, APAC, South...

Snapshot img

Data from: Australian Mineral Exploration: analysis and implications

Train file sizes Google Identify Contrails

Statistics for ecologists using R and Excel : data collection,...

Introduction to Primate Data Exploration and Linear Modeling with R

Exploration Software Report

Exploration Gap Assessment (FY13 Update)

DEEPEN Global Standardized Categorical Exploration Datasets for Magmatic...

Geothermal Exploration Raster Files for Utah Play Fairway Analysis |...

house prices data exploration

Exploratory Data Analysis (EDA)

Dataset

Contents