91 datasets found
  1. house prices data exploration

    • kaggle.com
    zip
    Updated Sep 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    yvonne gatwiri (2024). house prices data exploration [Dataset]. https://www.kaggle.com/datasets/yvonnegatwiri/house-prices-data-exploration/suggestions
    Explore at:
    zip(165334 bytes)Available download formats
    Dataset updated
    Sep 13, 2024
    Authors
    yvonne gatwiri
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by yvonne gatwiri

    Released under Apache 2.0

    Contents

  2. f

    Data_Sheet_1_ImputEHR: A Visualization Tool of Imputation for the Prediction...

    • figshare.com
    • frontiersin.figshare.com
    pdf
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yi-Hui Zhou; Ehsan Saghapour (2023). Data_Sheet_1_ImputEHR: A Visualization Tool of Imputation for the Prediction of Biomedical Data.PDF [Dataset]. http://doi.org/10.3389/fgene.2021.691274.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers
    Authors
    Yi-Hui Zhou; Ehsan Saghapour
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Electronic health records (EHRs) have been widely adopted in recent years, but often include a high proportion of missing data, which can create difficulties in implementing machine learning and other tools of personalized medicine. Completed datasets are preferred for a number of analysis methods, and successful imputation of missing EHR data can improve interpretation and increase our power to predict health outcomes. However, use of the most popular imputation methods mainly require scripting skills, and are implemented using various packages and syntax. Thus, the implementation of a full suite of methods is generally out of reach to all except experienced data scientists. Moreover, imputation is often considered as a separate exercise from exploratory data analysis, but should be considered as art of the data exploration process. We have created a new graphical tool, ImputEHR, that is based on a Python base and allows implementation of a range of simple and sophisticated (e.g., gradient-boosted tree-based and neural network) data imputation approaches. In addition to imputation, the tool enables data exploration for informed decision-making, as well as implementing machine learning prediction tools for response data selected by the user. Although the approach works for any missing data problem, the tool is primarily motivated by problems encountered for EHR and other biomedical data. We illustrate the tool using multiple real datasets, providing performance measures of imputation and downstream predictive analysis.

  3. Multi-Dimensional Data Viewer (MDV) user manual for data exploration:...

    • zenodo.org
    pdf, zip
    Updated Jul 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maria Kiourlappou; Maria Kiourlappou; Martin Sergeant; Martin Sergeant; Joshua S. Titlow; Joshua S. Titlow; Jeffrey Y. Lee; Jeffrey Y. Lee; Darragh Ennis; Stephen Taylor; Stephen Taylor; Ilan Davis; Ilan Davis; Darragh Ennis (2024). Multi-Dimensional Data Viewer (MDV) user manual for data exploration: "Systematic analysis of YFP traps reveals common discordance between mRNA and protein across the nervous system" [Dataset]. http://doi.org/10.5281/zenodo.7875495
    Explore at:
    zip, pdfAvailable download formats
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Maria Kiourlappou; Maria Kiourlappou; Martin Sergeant; Martin Sergeant; Joshua S. Titlow; Joshua S. Titlow; Jeffrey Y. Lee; Jeffrey Y. Lee; Darragh Ennis; Stephen Taylor; Stephen Taylor; Ilan Davis; Ilan Davis; Darragh Ennis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Please also see the latest version of the repository:
    https://doi.org/10.5281/zenodo.6374011 and
    our website: https://ilandavis.com/jcb2023-yfp

    The explosion in the volume of biological imaging data challenges the available technologies for data interrogation and its intersection with related published bioinformatics data sets. Moreover, intersection of highly rich and complex datasets from different sources provided as flat csv files requires advanced informatics skills, which is time consuming and not accessible to all. Here, we provide a “user manual” to our new paradigm for systematically filtering and analysing a dataset with more than 1300 microscopy data figures using Multi-Dimensional Viewer (MDV) -link, a solution for interactive multimodal data visualisation and exploration. The primary data we use are derived from our published systematic analysis of 200 YFP traps reveals common discordance between mRNA and protein across the nervous system (eprint link). This manual provides the raw image data together with the expert annotations of the mRNA and protein distribution as well as associated bioinformatics data. We provide an explanation, with specific examples, of how to use MDV to make the multiple data types interoperable and explore them together. We also provide the open-source python code (github link) used to annotate the figures, which could be adapted to any other kind of data annotation task.

  4. Zegami user manual for data exploration: "Systematic analysis of YFP gene...

    • zenodo.org
    pdf, zip
    Updated Jul 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maria Kiourlappou; Maria Kiourlappou; Stephen Taylor; Ilan Davis; Ilan Davis; Stephen Taylor (2024). Zegami user manual for data exploration: "Systematic analysis of YFP gene traps reveals common discordance between mRNA and protein across the nervous system" [Dataset]. http://doi.org/10.5281/zenodo.6374012
    Explore at:
    pdf, zipAvailable download formats
    Dataset updated
    Jul 17, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Maria Kiourlappou; Maria Kiourlappou; Stephen Taylor; Ilan Davis; Ilan Davis; Stephen Taylor
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The explosion in biological data generation challenges the available technologies and methodologies for data interrogation. Moreover, highly rich and complex datasets together with diverse linked data are difficult to explore when provided in flat files. Here we provide a way to filter and analyse in a systematic way a dataset with more than 18 thousand data points using Zegami, a solution for interactive data visualisation and exploration. The primary data we use are derived from a systematic analysis of 200 YFP gene traps reveals common discordance between mRNA and protein across the nervous system which is submitted elsewhere. This manual provides the raw image data together with annotations and associated data and explains how to use Zegami for exploring all these data types together by providing specific examples. We also provide the open source python code used to annotate the figures.

  5. Big Data in Oil and Gas Exploration & Production Market Size & Share...

    • mordorintelligence.com
    pdf,excel,csv,ppt
    Updated Sep 4, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mordor Intelligence (2015). Big Data in Oil and Gas Exploration & Production Market Size & Share Analysis - Industry Research Report - Growth Trends [Dataset]. https://www.mordorintelligence.com/industry-reports/big-data-in-oil-and-gas-exploration-and-production-market
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset updated
    Sep 4, 2015
    Dataset authored and provided by
    Mordor Intelligence
    License

    https://www.mordorintelligence.com/privacy-policyhttps://www.mordorintelligence.com/privacy-policy

    Time period covered
    2020 - 2030
    Area covered
    Global
    Description

    Big Data in the oil and gas exploration and production market is segmented by Product (Hardware, Software, and Services) and Geography (North America, Europe, Asia-Pacific, South America, and the Middle-East and Africa).

  6. d

    An example data set for exploration of Multiple Linear Regression

    • catalog.data.gov
    • data.usgs.gov
    • +2more
    Updated Jul 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). An example data set for exploration of Multiple Linear Regression [Dataset]. https://catalog.data.gov/dataset/an-example-data-set-for-exploration-of-multiple-linear-regression
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    This data set contains example data for exploration of the theory of regression based regionalization. The 90th percentile of annual maximum streamflow is provided as an example response variable for 293 streamgages in the conterminous United States. Several explanatory variables are drawn from the GAGES-II data base in order to demonstrate how multiple linear regression is applied. Example scripts demonstrate how to collect the original streamflow data provided and how to recreate the figures from the associated Techniques and Methods chapter.

  7. Iterative Imputation of Jane St train.csv

    • kaggle.com
    Updated Nov 29, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    tpmeli (2020). Iterative Imputation of Jane St train.csv [Dataset]. https://www.kaggle.com/tpmeli/iterative-imputation-of-jane-st-traincsv/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 29, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    tpmeli
    Description

    I will be sharing all of my missing data exploration here:

    https://www.kaggle.com/tpmeli/missing-data-exploration-mean-iterative-more

  8. Dataset for Exploring case-control samples with non-targeted analysis

    • catalog.data.gov
    • datasets.ai
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Dataset for Exploring case-control samples with non-targeted analysis [Dataset]. https://catalog.data.gov/dataset/dataset-for-exploring-case-control-samples-with-non-targeted-analysis
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    These data contain the results of GC-MS, LC-MS and immunochemistry analyses of mask sample extracts. The data include tentatively identified compounds through library searches and compound abundance. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: The data can not be accessed. Format: The dataset contains the identification of compounds found in the mask samples as well as the abundance of those compounds for individuals who participated in the trial. This dataset is associated with the following publication: Pleil, J., M. Wallace, J. McCord, M. Madden, J. Sobus, and G. Ferguson. How do cancer-sniffing dogs sort biological samples? Exploring case-control samples with non-targeted LC-Orbitrap, GC-MS, and immunochemistry methods. Journal of Breath Research. Institute of Physics Publishing, Bristol, UK, 14(1): 016006, (2019).

  9. The values of betweenness, closeness, and Eigenvector centrality for one...

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Komenda; Martin Víta; Christos Vaitsis; Daniel Schwarz; Andrea Pokorná; Nabil Zary; Ladislav Dušek (2023). The values of betweenness, closeness, and Eigenvector centrality for one particular subset within the analyzed medical curriculum. [Dataset]. http://doi.org/10.1371/journal.pone.0143748.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Martin Komenda; Martin Víta; Christos Vaitsis; Daniel Schwarz; Andrea Pokorná; Nabil Zary; Ladislav Dušek
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The values of betweenness, closeness, and Eigenvector centrality for one particular subset within the analyzed medical curriculum.

  10. c

    Looking for data (Expert interviews)

    • datacatalogue.cessda.eu
    • search.gesis.org
    • +1more
    Updated Mar 11, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Friedrich, Tanja (2023). Looking for data (Expert interviews) [Dataset]. http://doi.org/10.7802/1.1943
    Explore at:
    Dataset updated
    Mar 11, 2023
    Dataset provided by
    GESIS - Leibniz-Institut für Sozialwissenschaften
    Authors
    Friedrich, Tanja
    Area covered
    Germany
    Measurement technique
    Persönliches Interview
    Description

    These interview data are part of the project "Looking for data: information seeking behaviour of survey data users", a study of secondary data users’ information-seeking behaviour. The overall goal of this study was to create evidence of actual information practices of users of one particular retrieval system for social science data in order to inform the development of research data infrastructures that facilitate data sharing. In the project, data were collected based on a mixed methods design. The research design included a qualitative study in the form of expert interviews and – building on the results found therein – a quantitative web survey of secondary survey data users. For the qualitative study, expert interviews with six reference persons of a large social science data archive have been conducted. They were interviewed in their role as intermediaries who provide guidance for secondary users of survey data. The knowledge from their reference work was expected to provide a condensed view of goals, practices, and problems of people who are looking for survey data. The anonymized transcripts of these interviews are provided here. They can be reviewed or reused upon request. The survey dataset from the quantitative study of secondary survey data users is downloadable through this data archive after registration. The core result of the Looking for data study is that community involvement plays a pivotal role in survey data seeking. The analyses show that survey data communities are an important determinant in survey data users' information seeking behaviour and that community involvement facilitates data seeking and has the capacity of reducing problems or barriers. The qualitative part of the study was designed and conducted using constructivist grounded theory methodology as introduced by Kathy Charmaz (2014). In line with grounded theory methodology, the interviews did not follow a fixed set of questions, but were conducted based on a guide that included areas of exploration with tentative questions. This interview guide can be obtained together with the transcript. For the Looking for data project, the data were coded and scrutinized by constant comparison, as proposed by grounded theory methodology. This analysis resulted in core categories that make up the "theory of problem-solving by community involvement". This theory was exemplified in the quantitative part of the study. For this exemplification, the following hypotheses were drawn from the qualitative study: (1) The data seeking hypotheses: (1a) When looking for data, information seeking through personal contact is used more often than impersonal ways of information seeking. (1b) Ways of information seeking (personal or impersonal) differ with experience. (2) The experience hypotheses: (2a) Experience is positively correlated with having ambitious goals. (2b) Experience is positively correlated with having more advanced requirements for data. (2c) Experience is positively correlated with having more specific problems with data. (3) The community involvement hypothesis: Experience is positively correlated with community involvement. (4) The problem solving hypothesis: Community involvement is positively correlated with problem solving strategies that require personal interactions.

  11. Meta Kaggle Code

    • kaggle.com
    zip
    Updated Mar 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code
    Explore at:
    zip(133186454988 bytes)Available download formats
    Dataset updated
    Mar 20, 2025
    Dataset authored and provided by
    Kagglehttp://kaggle.com/
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Explore our public notebook content!

    Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

    Why we’re releasing this dataset

    By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

    Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

    The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

    Sensitive data

    While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

    Joining with Meta Kaggle

    The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

    File organization

    The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

    The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

    Questions / Comments

    We love feedback! Let us know in the Discussion tab.

    Happy Kaggling!

  12. Data Visualization Tools Market Analysis North America, Europe, APAC, South...

    • technavio.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio, Data Visualization Tools Market Analysis North America, Europe, APAC, South America, Middle East and Africa - US, UK, China, Japan, Canada, Germany, France, India, Brazil, Italy - Size and Forecast 2025-2029 [Dataset]. https://www.technavio.com/report/data-visualization-tools-market-industry-analysis
    Explore at:
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2021 - 2025
    Area covered
    United Kingdom, Germany, Europe, Japan, United States, Global
    Description

    Snapshot img

    Data Visualization Tools Market Size 2025-2029

    The data visualization tools market size is forecast to increase by USD 7.95 billion at a CAGR of 11.2% between 2024 and 2029.

    The market is experiencing significant growth, driven by the increasing demand for business intelligence and AI-powered insights. With the rising complexity and voluminous data being generated across industries, there is a pressing need for effective data visualization tools to make data-driven decisions. This trend is particularly prominent in sectors such as healthcare, finance, and retail, where large datasets are common. Moreover, the automation of data visualization is another key driver, enabling organizations to save time and resources by streamlining the data analysis process. However, challenges such as data security concerns, lack of standardization, and integration issues persist, necessitating continuous innovation and investment in advanced technologies. Companies seeking to capitalize on this market opportunity must focus on addressing these challenges through user-friendly interfaces, security features, and seamless integration capabilities. Additionally, partnerships and collaborations with industry leaders and emerging technologies, such as machine learning and artificial intelligence, can provide a competitive edge in this rapidly evolving market.

    What will be the Size of the Data Visualization Tools Market during the forecast period?

    Request Free SampleThe market is experiencing growth, driven by the increasing demand for intuitive and interactive ways to analyze complex data. The market encompasses a range of solutions, including visual analytics tools and cloud-based services. The services segment, which includes integration services, is also gaining traction due to the growing need for customized and comprehensive data visualization solutions. Small and Medium-sized Enterprises (SMEs) are increasingly adopting these tools to gain insights into customer behavior and enhance decision-making. Cloud-based data visualization tools are becoming increasingly popular due to their flexibility, scalability, and cost-effectiveness. Security remains a key concern, with data security features becoming a priority for companies. Additionally, the integration of advanced technologies such as artificial intelligence (AI), machine learning (ML), augmented reality (AR), and virtual reality (VR) is transforming the market, enabling more and interactive data exploration experiences. Overall, the market is poised for continued expansion, offering significant opportunities for businesses seeking to gain a competitive edge through data-driven insights.

    How is this Data Visualization Tools Industry segmented?

    The data visualization tools industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments. DeploymentOn-premisesCloudCustomer TypeLarge enterprisesSMEsComponentSoftwareServicesApplicationHuman resourcesFinanceOthersEnd-userBFSIIT and telecommunicationHealthcareRetailOthersGeographyNorth AmericaUSCanadaEuropeFranceGermanyItalyUKAPACChinaIndiaJapanSouth AmericaBrazilMiddle East and Africa

    By Deployment Insights

    The on-premises segment is estimated to witness significant growth during the forecast period.The market has experienced substantial growth due to the increasing demand for data-driven insights in businesses. On-premises deployment of these tools allows organizations to maintain control over their data, ensuring data security, privacy, and adherence to regulatory requirements. This deployment model is ideal for enterprises dealing with sensitive information, as it restricts data transmission to cloud-based solutions. In addition, cloud-based solutions offer real-time data analysis, innovative solutions, integration services, customized dashboards, and mobile access. Advanced technologies like artificial intelligence (AI), machine learning (ML), Augmented Reality (AR), Virtual Reality (VR), and Business Intelligence (BI) are integrated into these tools to provide strategic insights from unstructured data. Data collection, maintenance, sharing, and analysis are simplified, enabling businesses to make informed decisions based on customer behavior and preferences. Key players in this market include , , and others, providing professional expertise and resources for data scientists and programmers using various programming languages.

    Get a glance at the market report of share of various segments Request Free Sample

    The On-premises segment was valued at USD 4.15 billion in 2019 and showed a gradual increase during the forecast period.

    Regional Analysis

    North America is estimated to contribute 31% to the growth of the global market during the forecast period.Technavio’s an

  13. Data from: Australian Mineral Exploration: analysis and implications

    • ecat.ga.gov.au
    • data.wu.ac.at
    Updated Jan 1, 2002
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Commonwealth of Australia (Geoscience Australia) (2002). Australian Mineral Exploration: analysis and implications [Dataset]. https://ecat.ga.gov.au/geonetwork/srv/api/records/a05f7892-db67-7506-e044-00144fdd4fa6
    Explore at:
    www:link-1.0-http--linkAvailable download formats
    Dataset updated
    Jan 1, 2002
    Dataset provided by
    Geoscience Australiahttp://ga.gov.au/
    Area covered
    Asia
    Description

    Australian mineral exploration is at a 20 year low in real terms. After doubling in line with global exploration activity during the 1990s, exploration expenditure peaked in 1996/97 and then fell sharply. The current decline differs from previous downturns in exploration that have occurred as part of the economic cycle as it is accompanied by major structural changes in the industry. Forces resulting in these changes are strongly inter-related and include:

    • cost cutting to stay competitive in the face of low (declining) commodity prices • demand for greater return on shareholder investment • consolidation in response to globalisation • intense competition for risk capital (particularly for junior companies) from new sources • loss of confidence in exploration as an economic activity following declining rates of discovery and land access issues.

    These factors have changed and continue to change the face of the industry.

    Published in the Australasian Institute of Mining and Metallurgy Bulletin No. 1 Jan/Feb 2002, 45-52.

  14. Train file sizes Google Identify Contrails

    • kaggle.com
    Updated May 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sergey Saharovskiy (2023). Train file sizes Google Identify Contrails [Dataset]. https://www.kaggle.com/datasets/sergiosaharovskiy/train-file-sizes-google-identify-contrails/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 11, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sergey Saharovskiy
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset comprises metadata for 225,819 train files Google Research - Identify Contrails to Reduce Global Warming challenge.

    The code was obtained by using a simple bash script:

    shopt -s globstar dotglob nullglob
    
    for pathname in train/**/*; do
      if [[ -f $pathname ]] && [[ ! -h $pathname ]]; then
        stat -c $'%s\t%n' "$pathname"
      fi
    done >train_file_sizes.csv
    

    After the bash script, the file was preprocessed with the following python code:

    train_sizes = pd.read_csv('data/train_file_sizes.csv', delim_whitespace=True, names=['file_size', 'file_path'])
    train_sizes['record_id'] = train_sizes.file_path.str.split('/', expand=True)[1].astype(int)
    train_sizes.to_csv('data/train_file_sizes.csv', index=False)
    
  15. w

    Statistics for ecologists using R and Excel : data collection,...

    • workwithdata.com
    Updated Sep 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2022). Statistics for ecologists using R and Excel : data collection, exploration,.. [Dataset]. https://www.workwithdata.com/object/statistics-for-ecologists-using-r-and-excel-data-collection-exploration-analysis-and-presentation-book-by-mark-gardener-0000
    Explore at:
    Dataset updated
    Sep 21, 2022
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Statistics for ecologists using R and Excel : data collection, exploration, analysis and presentation is a book. It was written by Mark Gardener and published by Pelagic in 2012.

  16. q

    Introduction to Primate Data Exploration and Linear Modeling with R

    • qubeshub.org
    Updated Jun 26, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raisa Hernández-Pacheco; Alexandra Bland; Alexis Diaz; Alexandra Rosati; Stephanie Gonzalez (2023). Introduction to Primate Data Exploration and Linear Modeling with R [Dataset]. http://doi.org/10.25334/T0ZY-PK40
    Explore at:
    Dataset updated
    Jun 26, 2023
    Dataset provided by
    QUBES
    Authors
    Raisa Hernández-Pacheco; Alexandra Bland; Alexis Diaz; Alexandra Rosati; Stephanie Gonzalez
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Introduction to Primate Data Exploration and Linear Modeling with R was created with the goal of providing training to undergraduate biology research students on data management and statistical analysis using authentic data of Cayo Santiago rhesus macaques.

  17. E

    Exploration Software Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Feb 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Exploration Software Report [Dataset]. https://www.archivemarketresearch.com/reports/exploration-software-50857
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Feb 24, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global Exploration Software market is projected to reach $230.5 million by 2033, expanding at a CAGR of 6.9% from 2025 to 2033. The increasing demand for efficient and cost-effective exploration solutions, coupled with the growing adoption of digital technologies in the oil and gas industry, is driving market growth. The market is segmented based on type (cloud-based and web-based) and application (large enterprises and SMEs). Key market players include Schlumberger, Sintef, Petrel E&P, Quorum, geoSCOUT, Exprodat, and others. The market is primarily driven by the rising need for accurate and real-time data in exploration activities. Exploration software provides comprehensive data analysis, visualization, and modeling capabilities, enabling geologists and engineers to make informed decisions. The adoption of cloud-based solutions is further fueling market growth, as it offers flexibility, scalability, and cost-effectiveness. However, factors such as data security concerns and the availability of skilled professionals may restrain market growth to some extent. Geographically, North America and Europe are expected to be major contributors to the market, while Asia Pacific is projected to witness significant growth potential in the coming years.

  18. d

    Exploration Gap Assessment (FY13 Update)

    • catalog.data.gov
    Updated Jan 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Renewable Energy Laboratory (2025). Exploration Gap Assessment (FY13 Update) [Dataset]. https://catalog.data.gov/dataset/exploration-gap-assessment-fy13-update-07cc7
    Explore at:
    Dataset updated
    Jan 20, 2025
    Dataset provided by
    National Renewable Energy Laboratory
    Description

    This submission contains an update to the previous Exploration Gap Assessment funded in 2012, which identify high potential hydrothermal areas where critical data are needed (gap analysis on exploration data). The uploaded data are contained in two data files for each data category: A shape (SHP) file containing the grid, and a data file (CSV) containing the individual layers that intersected with the grid. This CSV can be joined with the map to retrieve a list of datasets that are available at any given site. A grid of the contiguous U.S. was created with 88,000 10-km by 10-km grid cells, and each cell was populated with the status of data availability corresponding to five data types: well data geologic maps fault maps geochemistry data geophysical data

  19. O

    DEEPEN Global Standardized Categorical Exploration Datasets for Magmatic...

    • data.openei.org
    • gdr.openei.org
    • +3more
    data, website
    Updated Jun 30, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicole Taverna; Nils Caliandro; Rachel King; Nicole Taverna; Nils Caliandro; Rachel King (2023). DEEPEN Global Standardized Categorical Exploration Datasets for Magmatic Plays [Dataset]. http://doi.org/10.15121/1995526
    Explore at:
    website, dataAvailable download formats
    Dataset updated
    Jun 30, 2023
    Dataset provided by
    Open Energy Data Initiative (OEDI)
    USDOE Office of Energy Efficiency and Renewable Energy (EERE), Multiple Programs (EE)
    National Renewable Energy Laboratory
    Authors
    Nicole Taverna; Nils Caliandro; Rachel King; Nicole Taverna; Nils Caliandro; Rachel King
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    DEEPEN stands for DE-risking Exploration of geothermal Plays in magmatic ENvironments.

    As part of the development of the DEEPEN 3D play fairway analysis (PFA) methodology for magmatic plays (conventional hydrothermal, superhot EGS, and supercritical), weights needed to be developed for use in the weighted sum of the different favorability index models produced from geoscientific exploration datasets. This was done using two different approaches: one based on expert opinions, and one based on statistical learning. This GDR submission includes the datasets used to produce the statistical learning-based weights.

    While expert opinions allow us to include more nuanced information in the weights, expert opinions are subject to human bias. Data-centric or statistical approaches help to overcome these potential human biases by focusing on and drawing conclusions from the data alone. The drawback is that, to apply these types of approaches, a dataset is needed. Therefore, we attempted to build comprehensive standardized datasets mapping anomalies in each exploration dataset to each component of each play. This data was gathered through a literature review focused on magmatic hydrothermal plays along with well-characterized areas where superhot or supercritical conditions are thought to exist. Datasets were assembled for all three play types, but the hydrothermal dataset is the least complete due to its relatively low priority.

    For each known or assumed resource, the dataset states what anomaly in each exploration dataset is associated with each component of the system. The data is only a semi-quantitative, where values are either high, medium, or low, relative to background levels. In addition, the dataset has significant gaps, as not every possible exploration dataset has been collected and analyzed at every known or suspected geothermal resource area, in the context of all possible play types. The following training sites were used to assemble this dataset: - Conventional magmatic hydrothermal: Akutan (from AK PFA), Oregon Cascades PFA, Glass Buttes OR, Mauna Kea (from HI PFA), Lanai (from HI PFA), Mt St Helens Shear Zone (from WA PFA), Wind River Valley (From WA PFA), Mount Baker (from WA PFA). - Superhot EGS: Newberry (EGS demonstration project), Coso (EGS demonstration project), Geysers (EGS demonstration project), Eastern Snake River Plain (EGS demonstration project), Utah FORGE, Larderello, Kakkonda, Taupo Volcanic Zone, Acoculco, Krafla. - Supercritical: Coso, Geysers, Salton Sea, Larderello, Los Humeros, Taupo Volcanic Zone, Krafla, Reyjanes, Hengill. **Disclaimer: Treat the supercritical fluid anomalies with skepticism. They are based on assumptions due to the general lack of confirmed supercritical fluid encounters and samples at the sites included in this dataset, at the time of assembling the dataset. The main assumption was that the supercritical fluid in a given geothermal system has shared properties with the hydrothermal fluid, which may not be the case in reality.

    Once the datasets were assembled, principal component analysis (PCA) was applied to each. PCA is an unsupervised statistical learning technique, meaning that labels are not required on the data, that summarized the directions of variance in the data. This approach was chosen because our labels are not certain, i.e., we do not know with 100% confidence that superhot resources exist at all the assumed positive areas. We also do not have data for any known non-geothermal areas, meaning that it would be challenging to apply a supervised learning technique. In order to generate weights from the PCA, an analysis of the PCA loading values was conducted. PCA loading values represent how much a feature is contributing to each principal component, and therefore the overall variance in the data.

  20. g

    Geothermal Exploration Raster Files for Utah Play Fairway Analysis |...

    • gimi9.com
    Updated Jul 1, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2017). Geothermal Exploration Raster Files for Utah Play Fairway Analysis | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_geothermal-exploration-raster-files-for-utah-play-fairway-analysis
    Explore at:
    Dataset updated
    Jul 1, 2017
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Utah
    Description

    This submission contains raster files associated with several datasets that include earthquake density, Na/K geothermometers, fault density, heat flow, and gravity. Integrated together using spatial modeler tools in ArcGIS, these files can be used for play fairway analysis in regard to geothermal exploration.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
yvonne gatwiri (2024). house prices data exploration [Dataset]. https://www.kaggle.com/datasets/yvonnegatwiri/house-prices-data-exploration/suggestions
Organization logo

house prices data exploration

Exploratory Data Analysis (EDA)

Explore at:
zip(165334 bytes)Available download formats
Dataset updated
Sep 13, 2024
Authors
yvonne gatwiri
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Dataset

This dataset was created by yvonne gatwiri

Released under Apache 2.0

Contents

Search
Clear search
Close search
Google apps
Main menu