35 datasets found

m
Graphite//LFP synthetic V vs. Q dataset (>700,000 unique curves)
data.mendeley.com
narcis.nl
Updated Mar 12, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthieu Dubarry (2021). Graphite//LFP synthetic V vs. Q dataset (>700,000 unique curves) [Dataset]. http://doi.org/10.17632/bs2j56pn7y.2
Explore at:
Unique identifier
https://doi.org/10.17632/bs2j56pn7y.2
Dataset updated
Mar 12, 2021
Authors
Matthieu Dubarry
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This training dataset was calculated using the mechanistic modeling approach. See “Big data training data for artificial intelligence-based Li-ion diagnosis and prognosis“ (Journal of Power Sources, Volume 479, 15 December 2020, 228806) and "Analysis of Synthetic Voltage vs. Capacity Datasets for Big Data Diagnosis and Prognosis" (Energies, under review) for more details

The V vs. Q dataset was compiled with a resolution of 0.01 for the triplets and C/25 charges. This accounts for more than 5,000 different paths. Each path was simulated with at most 0.85% increases for each The training dataset, therefore, contains more than 700,000 unique voltage vs. capacity curves.

4 Variables are included, see read me file for details and example how to use. Cell info: Contains information on the setup of the mechanistic model Qnorm: normalize capacity scale for all voltage curves pathinfo: index for simulated conditions for all voltage curves volt: voltage data. Each column corresponds to the voltage simulated under the conditions of the corresponding line in pathinfo.
u
Data from: Current and projected research data storage needs of Agricultural...
agdatacommons.nal.usda.gov
datasets.ai
+2more
pdf
Updated Nov 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cynthia Parr (2023). Current and projected research data storage needs of Agricultural Research Service researchers in 2016 [Dataset]. http://doi.org/10.15482/USDA.ADC/1346946
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.15482/USDA.ADC/1346946
Dataset updated
Nov 30, 2023
Dataset provided by
Ag Data Commons
Authors
Cynthia Parr
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey.
Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values.

Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel
m
A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and...
data.mendeley.com
data.niaid.nih.gov
+2more
Updated Jun 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nirmalya Thakur (2024). A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and other sources about the 2024 Outbreak of Measles [Dataset]. http://doi.org/10.17632/rs6jnrjfsx.1
Explore at:
Unique identifier
https://doi.org/10.17632/rs6jnrjfsx.1
Dataset updated
Jun 24, 2024
Authors
Nirmalya Thakur
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
YouTube
Description
Please cite the following paper when using this dataset:

N. Thakur, V. Su, M. Shao, K. Patel, H. Jeong, V. Knieling, and A.Bian “A labelled dataset for sentiment analysis of videos on YouTube, TikTok, and other sources about the 2024 outbreak of measles,” arXiv [cs.CY], 2024. Available: https://doi.org/10.48550/arXiv.2406.07693

Abstract

This dataset contains the data of 4011 videos about the ongoing outbreak of measles published on 264 websites on the internet between January 1, 2024, and May 31, 2024. These websites primarily include YouTube and TikTok, which account for 48.6% and 15.2% of the videos, respectively. The remainder of the websites include Instagram and Facebook as well as the websites of various global and local news organizations. For each of these videos, the URL of the video, title of the post, description of the post, and the date of publication of the video are presented as separate attributes in the dataset. After developing this dataset, sentiment analysis (using VADER), subjectivity analysis (using TextBlob), and fine-grain sentiment analysis (using DistilRoBERTa-base) of the video titles and video descriptions were performed. This included classifying each video title and video description into (i) one of the sentiment classes i.e. positive, negative, or neutral, (ii) one of the subjectivity classes i.e. highly opinionated, neutral opinionated, or least opinionated, and (iii) one of the fine-grain sentiment classes i.e. fear, surprise, joy, sadness, anger, disgust, or neutral. These results are presented as separate attributes in the dataset for the training and testing of machine learning algorithms for performing sentiment analysis or subjectivity analysis in this field as well as for other applications. The paper associated with this dataset (please see the above-mentioned citation) also presents a list of open research questions that may be investigated using this dataset.
A
AI and Big Data Analytics in Telecom Report
datainsightsmarket.com
doc, pdf, ppt
Updated Apr 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). AI and Big Data Analytics in Telecom Report [Dataset]. https://www.datainsightsmarket.com/reports/ai-and-big-data-analytics-in-telecom-1394146
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
Apr 15, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The AI and Big Data Analytics market within the telecommunications sector is experiencing robust growth, driven by the increasing need for network optimization, personalized customer experiences, and advanced fraud detection. The market, estimated at $15 billion in 2025, is projected to achieve a Compound Annual Growth Rate (CAGR) of 18% from 2025 to 2033, reaching approximately $60 billion by 2033. This expansion is fueled by several key factors. Firstly, the exponential growth of data generated by 5G networks and IoT devices necessitates sophisticated analytical tools to manage and extract value. Secondly, telecom operators are increasingly adopting AI-powered solutions for predictive maintenance of network infrastructure, resulting in significant cost savings and improved service reliability. Thirdly, personalized marketing campaigns driven by AI-powered customer segmentation and predictive analytics are boosting customer engagement and revenue generation. Finally, the rising threat of fraud and security breaches is driving demand for AI-based security systems capable of detecting and mitigating these threats in real-time. The market is segmented by application (private vs. commercial) and deployment type (cloud-based vs. on-premise), with cloud-based solutions gaining significant traction due to their scalability and cost-effectiveness. Major players like AWS, Google, and IBM are actively shaping the market landscape through strategic partnerships and continuous innovation, while numerous smaller specialized firms cater to specific needs within the sector. Geographic distribution shows strong growth across North America and Asia-Pacific, reflecting high technological adoption and expanding digital infrastructure in these regions. The competitive landscape is characterized by both large technology companies offering comprehensive solutions and specialized niche players focusing on specific segments within the telecom industry. While the rapid adoption of cloud-based solutions presents opportunities for growth, challenges remain, including data privacy concerns, the need for skilled professionals to implement and manage these systems, and the high initial investment costs associated with AI and big data infrastructure. Despite these challenges, the long-term outlook for the AI and Big Data Analytics market in telecommunications remains extremely positive, driven by ongoing technological advancements and the increasing reliance of telecom operators on data-driven decision-making to enhance operational efficiency and improve customer satisfaction. The market's evolution will be further influenced by the development of 6G technologies and the expansion of the Internet of Things (IoT), which will generate even larger volumes of data requiring sophisticated AI and big data analytics for effective management and analysis.
View of AI and big data as core skill in industry across business worldwide...
statista.com
Updated Mar 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). View of AI and big data as core skill in industry across business worldwide 2025-2030 [Dataset]. https://www.statista.com/statistics/1602860/ai-and-big-data-core-skills-by-industry/
Explore at:
Dataset updated
Mar 19, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
May 2024 - Sep 2024
Area covered
Worldwide
Description
Information and technology services and telecommunications have the highest share of employers that expect that AI and big data will be core skills for their workers between 2025 and 2030 or over 65 percent. This is unsurprising as AI is vital to disseminating large quantities of information and improve telecommunication services.
Big data services revenue in Asia-Pacific (excl. Japan) 2012-2017
statista.com
Updated Oct 30, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2014). Big data services revenue in Asia-Pacific (excl. Japan) 2012-2017 [Dataset]. https://www.statista.com/statistics/496266/big-data-services-revenue-asia-pacific/
Explore at:
Dataset updated
Oct 30, 2014
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2012 - 2014
Area covered
Asia-Pacific, APAC
Description
This statistic depicts the revenue generated by the big data services market in the Asia Pacific (excluding Japan) from 2012 to 2014, as well as a forecast of revenue from 2015 to 2017. In 2014, revenues associated with the big data services market in the Asia Pacific amounted to *** million U.S. dollars. 'Big data' refers to data sets that are too large or too complex for traditional data processing applications. Additionally, the term is often used to refer to the technologies that enable predictive analytics or other methods of extracting value from data.
f
Data_Sheet_1_Advanced large language models and visualization tools for data...
frontiersin.figshare.com
txt
Updated Aug 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jorge Valverde-Rebaza; Aram González; Octavio Navarro-Hinojosa; Julieta Noguez (2024). Data_Sheet_1_Advanced large language models and visualization tools for data analytics learning.csv [Dataset]. http://doi.org/10.3389/feduc.2024.1418006.s001
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.3389/feduc.2024.1418006.s001
Dataset updated
Aug 8, 2024
Dataset provided by
Frontiers
Authors
Jorge Valverde-Rebaza; Aram González; Octavio Navarro-Hinojosa; Julieta Noguez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IntroductionIn recent years, numerous AI tools have been employed to equip learners with diverse technical skills such as coding, data analysis, and other competencies related to computational sciences. However, the desired outcomes have not been consistently achieved. This study aims to analyze the perspectives of students and professionals from non-computational fields on the use of generative AI tools, augmented with visualization support, to tackle data analytics projects. The focus is on promoting the development of coding skills and fostering a deep understanding of the solutions generated. Consequently, our research seeks to introduce innovative approaches for incorporating visualization and generative AI tools into educational practices.MethodsThis article examines how learners perform and their perspectives when using traditional tools vs. LLM-based tools to acquire data analytics skills. To explore this, we conducted a case study with a cohort of 59 participants among students and professionals without computational thinking skills. These participants developed a data analytics project in the context of a Data Analytics short session. Our case study focused on examining the participants' performance using traditional programming tools, ChatGPT, and LIDA with GPT as an advanced generative AI tool.ResultsThe results shown the transformative potential of approaches based on integrating advanced generative AI tools like GPT with specialized frameworks such as LIDA. The higher levels of participant preference indicate the superiority of these approaches over traditional development methods. Additionally, our findings suggest that the learning curves for the different approaches vary significantly. Since learners encountered technical difficulties in developing the project and interpreting the results. Our findings suggest that the integration of LIDA with GPT can significantly enhance the learning of advanced skills, especially those related to data analytics. We aim to establish this study as a foundation for the methodical adoption of generative AI tools in educational settings, paving the way for more effective and comprehensive training in these critical areas.DiscussionIt is important to highlight that when using general-purpose generative AI tools such as ChatGPT, users must be aware of the data analytics process and take responsibility for filtering out potential errors or incompleteness in the requirements of a data analytics project. These deficiencies can be mitigated by using more advanced tools specialized in supporting data analytics tasks, such as LIDA with GPT. However, users still need advanced programming knowledge to properly configure this connection via API. There is a significant opportunity for generative AI tools to improve their performance, providing accurate, complete, and convincing results for data analytics projects, thereby increasing user confidence in adopting these technologies. We hope this work underscores the opportunities and needs for integrating advanced LLMs into educational practices, particularly in developing computational thinking skills.
S
How Big Data Applications Drive Disruptive Innovation: Evidence from China’s...
scidb.cn
Updated Jun 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jia-Hui Zhang (2025). How Big Data Applications Drive Disruptive Innovation: Evidence from China’s Manufacturing Firms [Dataset]. http://doi.org/10.57760/sciencedb.25655
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.25655
Dataset updated
Jun 6, 2025
Dataset provided by
Science Data Bank
Authors
Jia-Hui Zhang
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
China
Description
Based on the above, this study selects A-share-listed manufacturing firms from 2014 to 2023 as the research sample. Data on big data applications is extracted from firm annual reports published on the official websites of the Shenzhen Stock Exchange and the Shanghai Stock Exchange. To ensure the validity and robustness of the constructed indicators, the measurement of disruptive innovation draws on patent data from the China National Intellectual Property Administration (CNIPA), covering the period from 2000 to 2023. The specific measurement methodology is detailed in Section 3.2. Additional firm-level data are primarily obtained from the China Stock Market & Accounting Research (CSMAR) database and Wind Information Co., Ltd. (Wind). The data were processed as follows: (1) Firms designated as Special Treatment (ST, Firms that have exhibited financial distress for two consecutive years), particularly Special Treatment (*ST, Firms that have reported consecutive losses for three years or face the risk of trading suspension), or Particular Transfer (PT) were excluded; (2) Financial institutions were removed; (3) Firms with substantial missing values for key variables were excluded. (4) To mitigate the influence of extreme values on the empirical results, selected variables—such as market-oriented disruptive innovation, technology-oriented disruptive innovation, managerial myopia, and government intervention—were winsorized at the 1st and 99th percentiles. After applying the above criteria, a total of 21,203 valid firm-year observations were retained for analysis.
Data Monetization Market Research Report 2033
growthmarketreports.com
csv, pdf, pptx
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Growth Market Reports (2025). Data Monetization Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/data-monetization-market-global-industry-analysis
Explore at:
csv, pdf, pptxAvailable download formats
Dataset updated
Jun 30, 2025
Dataset authored and provided by
Growth Market Reports
Time period covered
2024 - 2032
Area covered
Global
Description
Data Monetization Market Outlook

According to our latest research, the global data monetization market size reached USD 3.6 billion in 2024, demonstrating robust momentum driven by the increasing adoption of data-driven business models across multiple sectors. The market is expected to register a CAGR of 18.2% from 2025 to 2033, propelling the market to an estimated USD 18.8 billion by 2033. This remarkable growth trajectory is primarily attributed to the surging demand for actionable business intelligence, the proliferation of big data analytics, and the strategic imperative for enterprises to unlock new revenue streams from their data assets.

One of the most significant growth factors for the data monetization market is the exponential increase in data generation across industries such as BFSI, healthcare, retail, and telecommunications. As organizations collect vast volumes of structured and unstructured data from customer interactions, transactions, and IoT devices, the imperative to derive value from these data sets has never been greater. The evolution of advanced analytics, machine learning, and artificial intelligence has enabled enterprises to analyze, segment, and commercialize their data, either by improving internal processes or by creating new data-centric products and services. This shift is further bolstered by the growing recognition among C-level executives that data is a strategic asset, capable of driving innovation, enhancing customer experiences, and unlocking new growth opportunities.

Another critical driver is the increasing regulatory focus on data privacy and compliance, which, paradoxically, is fostering innovation in data monetization strategies. With regulations such as GDPR and CCPA setting stringent guidelines for data usage, organizations are investing in secure data platforms and consent management tools to ensure compliance while still extracting value from their data. This has led to the emergence of privacy-preserving data monetization models, such as data anonymization and federated learning, which enable organizations to monetize data without compromising customer trust or violating regulatory mandates. The convergence of regulatory compliance and data monetization is thus creating a fertile ground for technology providers to offer differentiated solutions tailored to industry-specific needs.

The proliferation of cloud computing and the rise of data marketplaces are also catalyzing the growth of the data monetization market. Cloud platforms provide scalable infrastructure and advanced analytics capabilities, enabling organizations of all sizes to store, process, and monetize their data efficiently. Furthermore, the emergence of data marketplaces and data exchanges is democratizing access to third-party data, allowing businesses to buy, sell, or trade data assets seamlessly. This trend is particularly pronounced among small and medium enterprises (SMEs), which can now participate in the data economy without the need for substantial upfront investments in IT infrastructure. As a result, the data monetization ecosystem is becoming increasingly dynamic, with new business models and value chains emerging at a rapid pace.

From a regional perspective, North America continues to dominate the data monetization market owing to its mature digital infrastructure, high adoption of advanced analytics, and a strong culture of innovation. The presence of leading technology vendors and a large base of data-driven enterprises further strengthens the region's position. However, Asia Pacific is emerging as the fastest-growing region, driven by rapid digitalization, expanding internet penetration, and a burgeoning start-up ecosystem. Europe, with its focus on data privacy and regulatory compliance, is also witnessing significant investments in secure data monetization platforms. Meanwhile, Latin America and the Middle East & Africa are gradually catching up, supported by increasing awareness and government-led digital transformation initiatives.

Component Analysis

<b
e
Full-population web crawl of .gov.uk web domain, 2014 - Dataset - B2FIND
b2find.eudat.eu
Updated Aug 10, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2019). Full-population web crawl of .gov.uk web domain, 2014 - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/2811fbd9-62e3-5722-a5c6-27f17928f3de
Explore at:
Dataset updated
Aug 10, 2019
Description
This dataset is the result of a full-population crawl of the .gov.uk web domain, aiming to capture a full picture of the scope of public-facing government activity online and the links between different government bodies. Local governments have been developing online services, aiming to better serve the public and reduce administrative costs. However, the impact of this work, and the links between governments’ online and offline activities, remain uncertain. The overall research question for this research examines whether local e-government has met these expectations, of Digital Era Governance and of its practitioners. Aim was to directly analyse the structure and content of government online. It shows that recent digital-centric public administration theories, typified by the Digital Era Governance quasi-paradigm, are not empirically supported by the UK local government experience. The data consist of a file of individual Uniform Resource Locators (URLs) fetched during the crawl, and a further file containing pairs of URLs reflecting the Hypertext Markup Language (HTML) links between them. In addition, a GraphML format file is presented for a version of the data reduced to third-level-domains, with accompanying attribute data for the publishing government organisations and calculated webometric statistics based on the third-level-domain link network.This project engages with the Digital Era Governance (DEG) work of Dunleavy et. al. and draws upon new empirical methods to explore local government and its use of Internet-related technology. It challenges the existing literature, arguing that e-government benefits have been oversold, particularly for transactional services; it updates DEG with insights from local government. The distinctive methodological approach is to use full-population datasets and large-scale web data to provide an empirical foundation for theoretical development, and to test existing theorists’ claims. A new full-population web crawl of .gov.uk is used to analyse the shape and structure of online government using webometrics. Tools from computer science, such as automated classification, are used to enrich our understanding of the dataset. A new full-population panel dataset is constructed covering council performance, cost, web quality, and satisfaction. The local government web shows a wide scope of provision but only limited evidence in support of the existing rhetorics of Internet-enabled service delivery. In addition, no evidence is found of a link between web development and performance, cost, or satisfaction. DEG is challenged and developed in light of these findings. The project adds value by developing new methods for the use of big data in public administration, by empirically challenging long-held assumptions on the value of the web for government, and by building a foundation of knowledge about local government online to be built on by further research. This is an ESRC-funded DPhil research project. A web crawl was carried out with Heritrix, the Internet Archive's web crawler. A list of all registered domains in .gov.uk (and their www.x.gov.uk equivalents) was used as a set of start seeds. Sites outside .gov.uk were excluded; robots.txt files were respected, with the consequence that some .gov.uk sites (and some parts of other .gov.uk sites) were not fetched. Certain other areas were manually excluded, particularly crawling traps (e.g. calendars which will serve infinite numbers of pages in the past and future and those websites returning different URLs for each browser session) and the contents of certain large peripheral databases such as online local authority library catalogues. A full set of regular expressions used to filter the URLs fetched are included in the archive. On completion of the crawl, the page URLs and link data were extracted from the output WARC files. The page URLs were manually examined and re-filtered to handle various broken web servers and to reduce duplication of content where multiple views were presented onto the same content (for example, where a site was presented at both http://organisation.gov.uk/ and http://www.organisation.gov.uk/ without HTTP redirection between the two). Finally, The link list was filtered against the URL list to remove bogus links and both lists were map/reduced to a single set of files. Also included in this data release is a derived dataset more useful for high-level work. This is a GraphML file containing all the link and page information reduced to third-level domain level (so darlington.gov.uk is considered as a single node, not a large set of pages) and with the links binarised to present/not present between each node. Each graph node also has various attributes, including the name of the registering organisation and various webometric measures including PageRank, indegree and betweenness centrality.
f
Using Virtuoso as an alternate triple store for a VIVO instance
vivo.figshare.com
pdf
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paul Albert; Eliza Chan; Prakesh Adekkanattu; Mohammad Mansour (2023). Using Virtuoso as an alternate triple store for a VIVO instance [Dataset]. http://doi.org/10.6084/m9.figshare.2002032.v2
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.2002032.v2
Dataset updated
May 30, 2023
Dataset provided by
VIVO
Authors
Paul Albert; Eliza Chan; Prakesh Adekkanattu; Mohammad Mansour
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Background: For some time, the VIVO for Weill Cornell Medical College (WCMC) had struggled with both unacceptable page load times and unreliable uptime. With some individual profiles containing upwards of 800 publications, WCMC VIVO has relatively large profiles, but no profile was so large that it could account for this performance. The WCMC VIVO Implementation Team explored a number of options for improving performance including caching, better hardware, query optimization, limiting user access to large pages, using another instance of Tomcat, throttling bots, and blocking IP's issuing too many requests. But none of these avenues were fruitful. Analysis of triple stores: With the 1.7 version, VIVO ships with the Jena SDB triple store, but the SDB version of Jena is no longer supported by its developers. In April, we reviewed various published analyses and benchmarks suggesting there were alternatives to Jena such as Virtuoso that perform better than even Jena's successor, TDB. In particular, the Berlin SPARQL Benchmark v. 3.1[1] showed that Virtuoso had the strongest performance compared to the other data stores measured including BigData, BigOwlim, and Jena TDB. In addition, Virtuoso is used on dbpedia.org which serves up 3 billion triples compared to the only 12 million with WCMC's VIVO site. Whereas Jena SDB stores its triples in a MySQL database, Virtuoso manages its in a binary file. The software is available in open source and commercial editions. Configuration: In late 2014, we installed Virtuoso on a local machine and loaded data from our production VIVO. Some queries completed in about 10% of the time as compared to our production VIVO. However, we noticed that the listview queries invoked whenever profile pages were loaded were still slow. After soliciting feedback from members of both the Virtuoso and VIVO communities, we modified these queries to rely on the OPTIONAL instead of UNION construct. This modification, which wasn't possible in a Jena SDB environment, reduced by eight-fold the number of queries that the application makes of the triple store. About four or five additional steps were required for VIVO and Virtuoso to work optimally with one another; these are documented in the VIVO Duraspace wiki. Results: On March 31, WCMC launched Virtuoso in its production environment. According to our instance of New Relic, VIVO has an average page load of about four seconds and 99% uptime, both of which are dramatic improvements. There are opportunities for further tuning: the four second average includes pages such as the visualizations as well as pages served up to logged in users, which are slower than other types of pages. [1] http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/results/V7/#comparison
z
Vitamin D deficiency and SARS‑CoV‑2 infection: D-COVID study
zenodo.org
Updated Sep 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marta Neira Álvarez; Noemi Anguita Sánchez; Gema Navarro Jiménez; María del Mar Bermejo Olano; Rocío Queipó; María Benavent Nuñez; Alejandro Parralejo Jiménez; Guillermo López Yepes; Carmen Sáez Nieto; Marta Neira Álvarez; Noemi Anguita Sánchez; Gema Navarro Jiménez; María del Mar Bermejo Olano; Rocío Queipó; María Benavent Nuñez; Alejandro Parralejo Jiménez; Guillermo López Yepes; Carmen Sáez Nieto (2022). Vitamin D deficiency and SARS‑CoV‑2 infection: D-COVID study [Dataset]. http://doi.org/10.5281/zenodo.7053208
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.7053208
Dataset updated
Sep 6, 2022
Dataset provided by
Zenodo
Authors
Marta Neira Álvarez; Noemi Anguita Sánchez; Gema Navarro Jiménez; María del Mar Bermejo Olano; Rocío Queipó; María Benavent Nuñez; Alejandro Parralejo Jiménez; Guillermo López Yepes; Carmen Sáez Nieto; Marta Neira Álvarez; Noemi Anguita Sánchez; Gema Navarro Jiménez; María del Mar Bermejo Olano; Rocío Queipó; María Benavent Nuñez; Alejandro Parralejo Jiménez; Guillermo López Yepes; Carmen Sáez Nieto
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
D-COVID is a proyect to study the association between COVID-19 infection and vitamin D deficiency in patients of a terciary university hospital. To investigate the clinical evolution and prognosis of patients with COVID-19 and vitamin D deficiency, several queries qere launched into a Database containing plain text apparitions of certain terms in medical reports, as well as structured data. These apparitions were detected using NLP technology, and then saved individually in a database. The presented dataset is a bounded version of such database, containing only relevant data to the one extracted for the associated study. As for the strcture of the dataset, each row represents an apparition of a term in plain text, and the columns contain additional information.

- reportdate: date of the report where the term appears up.

- admission_days: strctured data days in hospitalization, if any.

- patient_id: anonymized patient identifier

- sex: patient sex (1=male, 2=female)

-birthdate: patient birthdate

- service: Service where the report was generated

- report_type: Type of report generated (either a discharge report, note, etc)

- record: unique identifier for the report itself

- term: term (that was read (NLP takes synonims and acronyms into account)

- exitus: medical exitus, if available in structured data, it could also be found in plan text in the previous column.
d
Smart Triage Jinja Data De-identification
search.dataone.org
borealisdata.ca
Updated Dec 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mawji, Alishah (2023). Smart Triage Jinja Data De-identification [Dataset]. http://doi.org/10.5683/SP3/MSTH98
Explore at:
Unique identifier
https://doi.org/10.5683/SP3/MSTH98
Dataset updated
Dec 28, 2023
Dataset provided by
Borealis
Authors
Mawji, Alishah
Description
This dataset contains de-identified data with an accompanying data dictionary and the R script for de-identification procedures., Objective(s): To demonstrate application of a risk based de-identification framework using the Smart Triage dataset as a clinical example. Data Description: This dataset contains the de-identified version of the Smart Triage Jinja dataset with the accompanying data dictionary and R script for de-identification procedures. Limitations: Utility of the de-identified dataset has only been evaluated with regard to use for the development of prediction models based on a need for hospital admission. Abbreviations: NA Ethics Declaration: The study was reviewed by the instituational review boards at the University of British Columbia in Canada (ID: H19-02398; H20-00484), The Makerere University School of Public Health in Uganda and the Uganda National Council for Science and Technology
z
Data from: Calculated state-of-the art results for solvation and ionization...
zenodo.org
json, zip
Updated Oct 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jan Weinreich; Jan Weinreich; Konstantin Karandashev; Konstantin Karandashev; Daniel Jose Arismendi Arrieta; Daniel Jose Arismendi Arrieta; Kersti Hermansson; Kersti Hermansson; Anatole von Lilienfeld; Anatole von Lilienfeld (2024). Calculated state-of-the art results for solvation and ionization energies of thousands of organic molecules relevant to battery design [Dataset]. http://doi.org/10.5281/zenodo.11036086
Explore at:
json, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11036086
Dataset updated
Oct 20, 2024
Dataset provided by
University of Vienna
Authors
Jan Weinreich; Jan Weinreich; Konstantin Karandashev; Konstantin Karandashev; Daniel Jose Arismendi Arrieta; Daniel Jose Arismendi Arrieta; Kersti Hermansson; Kersti Hermansson; Anatole von Lilienfeld; Anatole von Lilienfeld
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset presents molecular properties critical for battery electrolyte design, specifically solvation energies, ionization potentials, and electron affinities. The dataset is intended for use in machine learning model testing and algorithm validation. The properties calculated include solvation energies using the COSMO-RS method [1] and ionization potentials and electron affinities using various high-accuracy computational methods as implemented in MOLPRO [2]. Computational details can be found in Ref. [3], with scripts used to generate the data mostly uploaded to our github repository [4].

Molecular Datasets Considered:

QM9 Dataset: Contains small organic molecules broadly relevant for quantum chemistry [5]

Electrolyte Genome Project (EGP): Focuses on materials relevant to electrolytes.[6]

GDB17 and ZINC databases: Offer a broad chemical diversity with potential application in battery technologies. [7, 8]

Data structure

How to Load the Data:

All files can be loaded with

import json

with open("file.json", "r") as f:
data_dict = json.load(f)

and the filestructure can be explored with

data_dict.keys()

Solvation energies

The data is stored in two types of JSON archives: files for full molecules of GDB17 and ZINC and files for amons of GDB17 and ZINC. They are structured differently as amon entries are sorted by the number of heavy atoms in the amon (e.g., all amons with 3 heavy atoms are stored in ni3). Because of the large number of amons with 6 or 7 heavy atoms,they are further split into ni6_1, ni6_2, and so on. A sub dictionary of an amon dictionary or a full molecule dictionary contains the following keys:

ECFP - ECFP4 representation vector

SMILES - SMILES string

SYMBOLS - atomic symbols

COORDS - atomic positions in Angstrom

ATOMIZATION - atomization energy in [kcal/mol]

DIPOLE - dipole moment in Debye

ENERGY - energy in Hartree

SOLVATION - solvation energy in [kcal/mol] for different solvents at 300 K.

Files:

GDB17.json.zip (unpack with unzip first!) - subset of GDB17 random molecules

AMONS_ZINC.json - all amons of ZINC up to 7 heavy atoms

EGP.json - EGP molecules

AMONS_GDB17.json - all amons of GDB17 up to 7 heavy atoms

File Name Description Molecules
all_amons_gdb17.json GDB17 amons 40726
all_amons_zinc.json ZINC amons 91876
GDB17.json Subset of GDB17 312793
EGP.json EGP molecules 15569

Atomic energies $E_{at}$ at BP and def2-TZVPD level in Hartree [Ha]

Element H C N O F Br Cl S P
$E_{at}$ [Ha] -0.5 -37.85 -54.60 -75.09 -99.77 -2574.40 -460.20 -398.16 -341.30|

B Si
-24.65 -289.40

We follow the convention of negative atomization energies for stablity compared to the isolated atoms:

$E_{atomization} = E_{mol} - \sum_{i} E_{at,i}$

Free energy of solvation at 300 K in [kcal/mol]:

Ionization potentials and electron affinities

The upload contains two JSON files, QM9IPEA.json and QM9IPEA_atom_ens.json. QM9IPEA.json summarizes MOLPRO calculation data grouping it along the following dictionary keys:

COORDS - atom coordinates in Angstroms.

SYMBOLS - atom element symbols.

ENERGY - total energies for each charge (0, -1, 1) and method considered.

CPU_TIME - CPU times (in seconds) spent at each step of each part of the calculation.

DISK_USAGE - highest total disk usage in GB.

ATOMIZATION_ENERGY - atomization energy at charge 0.

QM9_ID - ID of the molecule in the QM9 dataset.

All energies are given in Hartrees with NaN indicating the calculation failed to converge. Ionization potentials and electron affinities can be recovered as energy differences between neutral and charged (+1 for ionization potentials, -1 for electron affinities) species.

"CPU_time" entries contain steps corresponding to individual method calculations, as well as steps corresponding to program operation: "INT" (calculating integrals over basis functions relevant for the calculation), "FILE" (dumping intermediate data to restart file), and "RESTART" (importing restart data). The latter two steps appeared since we reused relevant integrals calculated for neutral species in charged species' calculations; we also used restart functionality to use HF density matrix obtained for the neutral species as the initial density matrix guess for the SCF-HF calculation for charged species. NaN CPU time value means the step was not present or that the calculation is invalid. Note that the CPU times were measured while parallelizing on 12 cores and were not adjusted to single-core.

QM9IPEA_atom_ens.json contains atomic energies used to calculate atomization energies in QM9IPEA.json, the dictionary keys are:

SPINS - the spin assigned to elements during calculations of atomic energies.

ENERGY - energies of atoms using different methods.

(Note that H has only one electron and thus does not require a level of theory beyond Hartree-Fock.)

NOTE: Additional calculations were performed between publication of arXiv:2308.11196 and creation of this upload. For the version of the dataset used in the manuscript, please refer to DOI:10.5281/zenodo.8252498.

Acknowledgement

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 957189 (BIG-MAP) and No. 957213 (BATTERY 2030+). O.A.v.L. has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 772834). O.A.v.L. has received support as the Ed Clark Chair of Advanced Materials and as a Canada CIFAR AI Chair. O.A.v.L. acknowledges that this research is part of the University of Toronto’s Acceleration Consortium, which receives funding from the Canada First Research Excellence Fund (CFREF). Obtaining the presented computational results has been facilitated using the queueing system implemented at https://leruli.com. The project has been supported by the Swedish Research Council (Vetenskapsrådet), and the Swedish National Strategic e-Science program eSSENCE as well as by computing resources from the Swedish National Infrastructure for Computing (SNIC/NAISS).

References

[1] Klamt, A.; Eckert, F. COSMO-RS: a novel and efficient method for the a priori prediction of thermophysical data of liquids. Fluid Phase Equilibria 2000, 172, 43–72

[2] Werner, H.-J.; Knowles, P. J.; Knizia, G.; Manby, F. R.; Schutz, M. Molpro: a general-purpose quantum chemistry program package. WIREs Comput. Mol. Sci. 2012, 2, 242–253

[3] arxiv link of draft

[4] https://github.com/chemspacelab/ViennaUppDa

[5] Ramakrishnan, R.; Dral, P. O.; Rupp, M.; von Lilienfeld, O. A. Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data 2014, 1, 140022

[6] Qu, X.; Jain, A.; Rajput, N. N.; Cheng, L.; Zhang, Y.; Ong, S. P.; Brafman, M.; Mag- inn, E.; Curtiss, L. A.; Persson, K. A. The Electrolyte Genome Project: A big data approach in battery materials discovery. Comput. Mater. Sci. 2015, 103, 56–67

[7] Ruddigkeit, L.; van Deursen, R.; Blum, L. C.; Reymond, J.-L. Enu- meration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17. Journal of Chemical Information and Modeling 2012, 52, 2864–2875

[8] Irwin, J. J.; Shoichet, B. K. ZINC A Free Database of Commercially Available Compounds for Virtual Screening. Journal of Chemical Information and Modeling 2005, 45, 177–182.
Z
Data from: Caravan - A global community dataset for large-sample hydrology
data.niaid.nih.gov
Updated Jan 16, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gudmundsson, Lukas (2025). Caravan - A global community dataset for large-sample hydrology [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6522634
Explore at:
Dataset updated
Jan 16, 2025
Dataset provided by
Erickson, Tyler
Gilon, Oren
Shalev, Guy
Gudmundsson, Lukas
Kratzert, Frederik
Gauch, Martin
Hassidim, Avinatan
Addor, Nans
Matias, Yossi
Nearing, Grey
Nevo, Sella
Klotz, Daniel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the accompanying dataset to the following paper https://www.nature.com/articles/s41597-023-01975-w

Caravan is an open community dataset of meteorological forcing data, catchment attributes, and discharge daat for catchments around the world. Additionally, Caravan provides code to derive meteorological forcing data and catchment attributes from the same data sources in the cloud, making it easy for anyone to extend Caravan to new catchments. The vision of Caravan is to provide the foundation for a truly global open source community resource that will grow over time.

If you use Caravan in your research, it would be appreciated to not only cite Caravan itself, but also the source datasets, to pay respect to the amount of work that was put into the creation of these datasets and that made Caravan possible in the first place.

All current development and additional community extensions can be found at https://github.com/kratzert/Caravan

Channel Log:

23 May 2022: Version 0.2 - Resolved a bug when renaming the LamaH gauge ids from the LamaH ids to the official gauge ids provided as "govnr" in the LamaH dataset attribute files.

24 May 2022: Version 0.3 - Fixed gaps in forcing data in some "camels" (US) basins.

15 June 2022: Version 0.4 - Fixed replacing negative CAMELS US values with NaN (-999 in CAMELS indicates missing observation).

1 December 2022: Version 0.4 - Added 4298 basins in the US, Canada and Mexico (part of HYSETS), now totalling to 6830 basins. Fixed a bug in the computation of catchment attributes that are defined as pour point properties, where sometimes the wrong HydroATLAS polygon was picked. Restructured the attribute files and added some more meta data (station name and country).

16 January 2023: Version 1.0 - Version of the official paper release. No changes in the data but added a static copy of the accompanying code of the paper. For the most up to date version, please check https://github.com/kratzert/Caravan

10 May 2023: Version 1.1 - No data change, just update data description.

17 May 2023: Version 1.2 - Updated a handful of attribute values that were affected by a bug in their derivation. See https://github.com/kratzert/Caravan/issues/22 for details.

16 April 2024: Version 1.4 - Added 9130 gauges from the original source dataset that were initially not included because of the area thresholds (i.e. basins smaller than 100sqkm or larger than 2000sqkm). Also extended the forcing period for all gauges (including the original ones) to 1950-2023. Added two different download options that include timeseries data only as either csv files (Caravan-csv.tar.xz) or netcdf files (Caravan-nc.tar.xz). Including the large basins also required an update in the earth engine code

16 Jan 2025: Version 1.5 - Added FAO Penman-Monteith PET (potential_evaporation_sum_FAO_PENMAN_MONTEITH) and renamed the ERA5-LAND potential_evaporation band to potential_evaporation_sum_ERA5_LAND. Also added all PET-related climated indices derived with the Penman-Monteith PET band (suffix "_FAO_PM") and renamed the old PET-related indices accordingly (suffix "_ERA5_LAND").
Turbulence modelling using machine learning
kaggle.com
Updated Jan 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ryley McConkey (2025). Turbulence modelling using machine learning [Dataset]. http://doi.org/10.34740/kaggle/dsv/10463446
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/10463446
Dataset updated
Jan 13, 2025
Dataset provided by
Kaggle
Authors
Ryley McConkey
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset overview

This dataset accompanies A curated dataset for data-driven turbulence modelling, in Scientific Data. This dataset has been updated, so make sure to check that you are using the latest version (click Version in the header).

Note: This dataset was updated on 12 January 2024. Changes include: - Dataset now includes zero pressure gradient flat plate data from Schlatter and Orlu (https://www.mech.kth.se/~pschlatt/DATA/), including kepsilonphitf, komega, and komegasst OpenFOAM simulations of this case. The case index is fp_XXXX, where XXXX is the Re_theta value. - Total number of points with the new data: 902,812 - Non-realizable points are no longer removed from the dataset (this affected 1 point from periodic hills case_1p0, and the CBFS case). This was based on several requests for the full raw dataset. This means all the points in the OpenFOAM files are in the dataset. - Dataset format is now: tabular files (.csv) for all turbulence model and reference ("REF") data, and then a big dump of all the OpenFOAM files. For downloading, I recommend you download the .csv files individually, as the OpenFOAM download is big. If you're using OpenFOAM, you will find those files very useful. - Rather than supply a ton of precomputed fields, I have aimed to keep the dataset lightweight and include more gradient fields. For example, I now include gradients of omega and epsilon. Additionally, for the flat plate, periodic hills, and square duct cases, I include the DNS velocity gradient field, which is required in many more recent RANS+ML frameworks. I have provided an example script which shows you how to compute the fields from the original dataset. Also, I show generally how to play around with the dataset in this script. - Removed kepsilon data. This is a high Reynolds number model that was run on a low Reynolds number mesh in the original dataset. If you need that data, it is still available in previous versions on Kaggle. - I have now included the dataset assembly code on github, here: https://github.com/rmcconke/upstream_pipeline. If you have any questions, it's probably faster to just ask them on Kaggle or email me (rmcconke@mit.edu).

Summary

The dataset is a collection of RANS simulations of reference cases where DNS or LES data are available, to enable training and testing of machine learnt turbulence models. The DNS/LES data are mapped onto the RANS grid, so that at each point, both RANS and DNS/LES fields are available. For each turbulence model, 902,812 points with RANS fields and corresponding DNS/LES fields are available.

File structure

There are 4 .csv files: kepsilonphitf.csv, komega.csv, komegasst.csv, and REF.csv. These are 3 RANS turbulence models, and the reference data (DNS/LES). All 4 of these files are "collocated" (the rows match between these files). All columns have a prefix depending on which model they came from (i.e., which .csv file they come from).

Note that all cases do not contain all fields. For example, I have provided yPlus and UPlus data for the flat plate cases, however this data is not available for the other RANS cases.

Here's a list of all columns from the komega.csv file:

'komega_U_1', 'komega_U_2', 'komega_U_3': velocity components

'komega_k', 'komega_epsilon', 'komega_omega', 'komega_nut', 'komega_p': scalars

'komega_gradU_11', 'komega_gradU_12', 'komega_gradU_13', 'komega_gradU_21', 'komega_gradU_22', 'komega_gradU_23', 'komega_gradU_31', 'komega_gradU_32', 'komega_gradU_33': velocity gradient tensor components

'komega_gradk_1', 'komega_gradk_2', 'komega_gradk_3': k gradient components

'komega_gradepsilon_1', 'komega_gradepsilon_2', 'komega_gradepsilon_3', epsilon gradient components

'komega_gradomega_1', 'komega_gradomega_2', 'komega_gradomega_3': omega gradient components

'komega_gradnut_1', 'komega_gradnut_2', 'komega_gradnut_3': nut gradient components

'komega_gradp_1', 'komega_gradp_2', 'komega_gradp_3': pressure gradient components

'komega_turbR_11', 'komega_turbR_12', 'komega_turbR_13', 'komega_turbR_21', 'komega_turbR_22', 'komega_turbR_23', 'komega_turbR_31', 'komega_turbR_32', 'komega_turbR_33': OpenFOAM's turbR field (Reynolds stress tensor as predicted by the RANS model: turbR = (2.0/3.0)*I*k - (nut)*dev(twoSymm(fvc::grad(U))))

'komega_divturbR_1', 'komega_divturbR_2', 'komega_divturbR_3': divergence of the Reynolds stress tensor estimated by RANS.

'komega_DUDt_1', 'komega_DUDt_2', 'komega_DUDt_3': material derivative components (these are all steady state flows, so the time partial derivative part of this is zero).

'komega_wallDistance': wall distance as calculated by OpenFOAM

'komega_S_11', 'komega_S_12', 'komega_S_13', 'komega_S_22', 'komega_S_23', 'komega_S_33', 'komega_R_11', 'komega_R_...
d
Data from: Database for Forensic Anthropology in the United States,...
catalog.data.gov
datasets.ai
+2more
Updated Mar 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Justice (2025). Database for Forensic Anthropology in the United States, 1962-1991 [Dataset]. https://catalog.data.gov/dataset/database-for-forensic-anthropology-in-the-united-states-1962-1991-486d3
Explore at:
Dataset updated
Mar 12, 2025
Dataset provided by
National Institute of Justice
Description
This project was undertaken to establish a computerized skeletal database composed of recent forensic cases to represent the present ethnic diversity and demographic structure of the United States population. The intent was to accumulate a forensic skeletal sample large and diverse enough to reflect different socioeconomic groups of the general population from different geographical regions of the country in order to enable researchers to revise the standards being used for forensic skeletal identification. The database is composed of eight data files, comprising four categories. The primary "biographical" or "identification" files (Part 1, Demographic Data, and Part 2, Geographic and Death Data) comprise the first category of information and pertain to the positive identification of each of the 1,514 data records in the database. Information in Part 1 includes sex, ethnic group affiliation, birth date, age at death, height (living and cadaver), and weight (living and cadaver). Variables in Part 2 pertain to the nature of the remains, means and sources of identification, city and state/country born, occupation, date missing/last seen, date of discovery, date of death, time since death, cause of death, manner of death, deposit/exposure of body, area found, city, county, and state/country found, handedness, and blood type. The Medical History File (Part 3) represents the second category of information and contains data on the documented medical history of the individual. Variables in Part 3 include general comments on medical history as well as comments on congenital malformations, dental notes, bone lesions, perimortem trauma, and other comments. The third category consists of an inventory file (Part 4, Skeletal Inventory Data) in which data pertaining to the specific contents of the database are maintained. This includes the inventory of skeletal material by element and side (left and right), indicating the condition of the bone as either partial or complete. The variables in Part 4 provide a skeletal inventory of the cranium, mandible, dentition, and postcranium elements and identify the element as complete, fragmentary, or absent. If absent, four categories record why it is missing. The last part of the database is composed of three skeletal data files, covering quantitative observations of age-related changes in the skeleton (Part 5), cranial measurements (Part 6), and postcranial measurements (Part 7). Variables in Part 5 provide assessments of epiphyseal closure and cranial suture closure (left and right), rib end changes (left and right), Todd Pubic Symphysis, Suchey-Brooks Pubic Symphysis, McKern & Steward--Phases I, II, and III, Gilbert & McKern--Phases I, II, and III, auricular surface, and dorsal pubic pitting (all for left and right). Variables in Part 6 include cranial measurements (length, breadth, height) and mandibular measurements (height, thickness, diameter, breadth, length, and angle) of various skeletal elements. Part 7 provides postcranial measurements (length, diameter, breadth, circumference, and left and right, where appropriate) of the clavicle, scapula, humerus, radius, ulna, scarum, innominate, femur, tibia, fibula, and calcaneus. A small file of noted problems for a few cases is also included (Part 8).
Data for Medical Data Science Shortcourse
zenodo.org
application/gzip, bin
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kevin Kunzmann; Kevin Kunzmann (2020). Data for Medical Data Science Shortcourse [Dataset]. http://doi.org/10.5281/zenodo.3379064
Explore at:
application/gzip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3379064
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Kevin Kunzmann; Kevin Kunzmann
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a .csv version of the World Bank Data on Health Nutrition and Population, cf. https://datacatalog.worldbank.org/dataset/health-nutrition-and-population-statistics and derived data sets for training purposes.
N
Numerical Analysis Software Report
datainsightsmarket.com
doc, pdf, ppt
Updated Jun 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Numerical Analysis Software Report [Dataset]. https://www.datainsightsmarket.com/reports/numerical-analysis-software-540809
Explore at:
pdf, doc, pptAvailable download formats
Dataset updated
Jun 7, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Numerical Analysis Software market is experiencing robust growth, driven by the increasing demand for advanced computational capabilities across diverse sectors. The market, estimated at $2.5 billion in 2025, is projected to witness a Compound Annual Growth Rate (CAGR) of 8% from 2025 to 2033, reaching an estimated market value of $4.8 billion by 2033. This expansion is fueled by several key factors. The proliferation of big data and the need for efficient data analysis techniques are pushing organizations to adopt sophisticated numerical analysis software solutions. Furthermore, advancements in artificial intelligence (AI), machine learning (ML), and high-performance computing (HPC) are creating new applications and opportunities for numerical analysis software. The rising adoption of cloud-based solutions is also contributing to market growth, offering scalability and cost-effectiveness. However, the market faces certain restraints, including the high cost of advanced software licenses and the need for specialized expertise to effectively utilize these tools. The market is segmented by software type (commercial vs. open-source), application (engineering, finance, scientific research), and deployment mode (on-premise vs. cloud). Key players in the market include established names like MathWorks (MATLAB), and Analytica, alongside open-source options like GNU Octave and Scilab. The competitive landscape is characterized by a mix of large vendors offering comprehensive solutions and smaller players focusing on niche applications. The continued growth of the Numerical Analysis Software market hinges on several key trends. The increasing integration of numerical analysis techniques within broader data science and analytics workflows is a prominent factor. This is leading to the development of more user-friendly interfaces and integrated platforms. Furthermore, the growing emphasis on data security and privacy regulations is influencing the development of secure and compliant software solutions. The market also witnesses ongoing innovation in algorithms and computational techniques, driving improvements in accuracy, speed, and efficiency. The rise of specialized applications within specific industries, such as financial modeling, weather forecasting, and drug discovery, also fuels further market growth. The adoption of advanced hardware, such as GPUs and specialized processors, is enhancing the performance and capabilities of numerical analysis software, fostering further market expansion.
N
Big Lake, TX Annual Population and Growth Analysis Dataset: A Comprehensive...
neilsberg.com
csv, json
Updated Jul 30, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2024). Big Lake, TX Annual Population and Growth Analysis Dataset: A Comprehensive Overview of Population Changes and Yearly Growth Rates in Big Lake from 2000 to 2023 // 2024 Edition [Dataset]. https://www.neilsberg.com/insights/big-lake-tx-population-by-year/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Jul 30, 2024
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Big Lake, Texas
Variables measured
Annual Population Growth Rate, Population Between 2000 and 2023, Annual Population Growth Rate Percent
Measurement technique
The data presented in this dataset is derived from the 20 years data of U.S. Census Bureau Population Estimates Program (PEP) 2000 - 2023. To measure the variables, namely (a) population and (b) population change in ( absolute and as a percentage ), we initially analyzed and tabulated the data for each of the years between 2000 and 2023. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the Big Lake population over the last 20 plus years. It lists the population for each year, along with the year on year change in population, as well as the change in percentage terms for each year. The dataset can be utilized to understand the population change of Big Lake across the last two decades. For example, using this dataset, we can identify if the population is declining or increasing. If there is a change, when the population peaked, or if it is still growing and has not reached its peak. We can also compare the trend with the overall trend of United States population over the same period of time.

Key observations

In 2023, the population of Big Lake was 2,753, a 0.58% increase year-by-year from 2022. Previously, in 2022, Big Lake population was 2,737, a decline of 3.12% compared to a population of 2,825 in 2021. Over the last 20 plus years, between 2000 and 2023, population of Big Lake decreased by 76. In this period, the peak population was 3,355 in the year 2019. The numbers suggest that the population has already reached its peak and is showing a trend of decline. Source: U.S. Census Bureau Population Estimates Program (PEP).

Content

When available, the data consists of estimates from the U.S. Census Bureau Population Estimates Program (PEP).

Data Coverage:

From 2000 to 2023

Variables / Data Columns

Year: This column displays the data year (Measured annually and for years 2000 to 2023)

Population: The population for the specific year for the Big Lake is shown in this column.

Year on Year Change: This column displays the change in Big Lake population for each year compared to the previous year.

Change in Percent: This column displays the year on year change as a percentage. Please note that the sum of all percentages may not equal one due to rounding of values.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Big Lake Population by Year. You can refer the same here

Facebook

Twitter

Click to copy link

Link copied

Cite

Matthieu Dubarry (2021). Graphite//LFP synthetic V vs. Q dataset (>700,000 unique curves) [Dataset]. http://doi.org/10.17632/bs2j56pn7y.2

Graphite//LFP synthetic V vs. Q dataset (>700,000 unique curves)

Explore at:

Unique identifier

https://doi.org/10.17632/bs2j56pn7y.2

Dataset updated

Mar 12, 2021

Authors

Matthieu Dubarry

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This training dataset was calculated using the mechanistic modeling approach. See “Big data training data for artificial intelligence-based Li-ion diagnosis and prognosis“ (Journal of Power Sources, Volume 479, 15 December 2020, 228806) and "Analysis of Synthetic Voltage vs. Capacity Datasets for Big Data Diagnosis and Prognosis" (Energies, under review) for more details

The V vs. Q dataset was compiled with a resolution of 0.01 for the triplets and C/25 charges. This accounts for more than 5,000 different paths. Each path was simulated with at most 0.85% increases for each The training dataset, therefore, contains more than 700,000 unique voltage vs. capacity curves.

4 Variables are included, see read me file for details and example how to use. Cell info: Contains information on the setup of the mechanistic model Qnorm: normalize capacity scale for all voltage curves pathinfo: index for simulated conditions for all voltage curves volt: voltage data. Each column corresponds to the voltage simulated under the conditions of the corresponding line in pathinfo.

Clear search

Close search

Google apps

Main menu

File Name	Description	Molecules
all_amons_gdb17.json	GDB17 amons	40726
all_amons_zinc.json	ZINC amons	91876
GDB17.json	Subset of GDB17	312793
EGP.json	EGP molecules	15569

Element	H	C	N	O	F	Br	Cl	S	P
$E_{at}$ [Ha]	-0.5	-37.85	-54.60	-75.09	-99.77	-2574.40	-460.20	-398.16	-341.30\|

B	Si
-24.65	-289.40

Graphite//LFP synthetic V vs. Q dataset (>700,000 unique curves)

Data from: Current and projected research data storage needs of Agricultural...

A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and...

AI and Big Data Analytics in Telecom Report

View of AI and big data as core skill in industry across business worldwide...

Big data services revenue in Asia-Pacific (excl. Japan) 2012-2017

Data_Sheet_1_Advanced large language models and visualization tools for data...

How Big Data Applications Drive Disruptive Innovation: Evidence from China’s...

Data Monetization Market Research Report 2033

Data Monetization Market Outlook

Component Analysis

Full-population web crawl of .gov.uk web domain, 2014 - Dataset - B2FIND

Using Virtuoso as an alternate triple store for a VIVO instance

Vitamin D deficiency and SARS‑CoV‑2 infection: D-COVID study

Smart Triage Jinja Data De-identification

Data from: Calculated state-of-the art results for solvation and ionization...

Data structure

Solvation energies

Ionization potentials and electron affinities

Acknowledgement

References

Data from: Caravan - A global community dataset for large-sample hydrology

Turbulence modelling using machine learning

Dataset overview

Summary

File structure

Data from: Database for Forensic Anthropology in the United States,...

Data for Medical Data Science Shortcourse

Numerical Analysis Software Report

Big Lake, TX Annual Population and Growth Analysis Dataset: A Comprehensive...

About this dataset

Content

Inspiration

Recommended for further research

Graphite//LFP synthetic V vs. Q dataset (>700,000 unique curves)See More Versions

Graphite//LFP synthetic V vs. Q dataset (>700,000 unique curves)