Autoscraping's Zillow USA Real Estate Data is a comprehensive and meticulously curated dataset that covers over 10 million property listings across the United States. This data product is designed to meet the needs of professionals across various sectors, including real estate investment, market analysis, urban planning, and academic research. Our dataset is unique in its depth, accuracy, and timeliness, ensuring that users have access to the most relevant and actionable information available.
What Makes Our Data Unique? The uniqueness of our data lies in its extensive coverage and the precision of the information provided. Each property listing is enriched with detailed attributes, including but not limited to, full addresses, asking prices, property types, number of bedrooms and bathrooms, lot size, and Zillow’s proprietary value and rent estimates. This level of detail allows users to perform in-depth analyses, make informed decisions, and gain a competitive edge in their respective fields.
Furthermore, our data is continually updated to reflect the latest market conditions, ensuring that users always have access to current and accurate information. We prioritize data quality, and each entry is carefully validated to maintain a high standard of accuracy, making this dataset one of the most reliable on the market.
Data Sourcing: The data is sourced directly from Zillow, one of the most trusted names in the real estate industry. By leveraging Zillow’s extensive real estate database, Autoscraping ensures that users receive data that is not only comprehensive but also highly reliable. Our proprietary scraping technology ensures that data is extracted efficiently and without errors, preserving the integrity and accuracy of the original source. Additionally, we implement strict data processing and validation protocols to filter out any inconsistencies or outdated information, further enhancing the quality of the dataset.
Primary Use-Cases and Vertical Applications: Autoscraping's Zillow USA Real Estate Data is versatile and can be applied across a variety of use cases and industries:
Real Estate Investment: Investors can use this data to identify lucrative opportunities, analyze market trends, and compare property values across different regions. The detailed pricing and valuation data allow for comprehensive due diligence and risk assessment.
Market Analysis: Market researchers can leverage this dataset to track real estate trends, evaluate the performance of different property types, and assess the impact of economic factors on property values. The dataset’s nationwide coverage makes it ideal for both local and national market studies.
Urban Planning and Development: Urban planners and developers can use the data to identify growth areas, plan new developments, and assess the demand for different property types in various regions. The detailed location data is particularly valuable for site selection and zoning analysis.
Academic Research: Universities and research institutions can utilize this data for studies on housing markets, urbanization, and socioeconomic trends. The comprehensive nature of the dataset allows for a wide range of academic applications.
Integration with Our Broader Data Offering: Autoscraping's Zillow USA Real Estate Data is part of our broader data portfolio, which includes various datasets focused on real estate, market trends, and consumer behavior. This dataset can be seamlessly integrated with our other offerings to provide a more holistic view of the market. For example, combining this data with our consumer demographic datasets can offer insights into the relationship between property values and demographic trends.
By choosing Autoscraping's data products, you gain access to a suite of complementary datasets that can be tailored to meet your specific needs. Whether you’re looking to gain a comprehensive understanding of the real estate market, identify new investment opportunities, or conduct advanced research, our data offerings are designed to provide you with the insights you need.
The Procurement Analysis Tool (PAT) was developed at NREL to help organizations explore renewable energy options that align with their goals. Users input facility data and answer goal-oriented questions. PAT analyzes this information to identify potential wind, solar, or storage resources and suitable procurement options (PPA, Green Tariffs) that align with their budget, location, and sustainability goals. For more information see the "Procurement Analysis Tool" resource below. The Renewable Electricity Procurement Options Data (RE-POD) was an aggregated dataset meant to help local jurisdictions and utility customers within those jurisdictions understand the options that may be available to them to procure renewable electricity or renewable energy credits to meet energy goals. RE-POD has been discontinued and replaced with the PAT. This data is part of a suite of state and local energy profile data available at the "State and Local Energy Profile Data Suite" link below and builds on Cities-LEAP energy modeling, available at the "EERE Cities-LEAP Page" link below. Examples of how to use the data to inform energy planning can be found at the "Example Uses" link below.
The State Contract and Procurement Registration System (SCPRS) was established in 2003, as a centralized database of information on State contracts and purchases over $5000. eSCPRS represents the data captured in the State's eProcurement (eP) system, Bidsync, as of March 16, 2009. The data provided is an extract from that system for fiscal years 2012-2013, 2013-2014, and 2014-2015
Data Limitations:
Some purchase orders have multiple UNSPSC numbers, however only first was used to identify the purchase order. Multiple UNSPSC numbers were included to provide additional data for a DGS special event however this affects the formatting of the file. The source system Bidsync is being deprecated and these issues will be resolved in the future as state systems transition to Fi$cal.
Data Collection Methodology:
The data collection process starts with a data file from eSCPRS that is scrubbed and standardized prior to being uploaded into a SQL Server database. There are four primary tables. The Supplier, Department and United Nations Standard Products and Services Code (UNSPSC) tables are reference tables. The Supplier and Department tables are updated and mapped to the appropriate numbering schema and naming conventions. The UNSPSC table is used to categorize line item information and requires no further manipulation. The Purchase Order table contains raw data that requires conversion to the correct data format and mapping to the corresponding data fields. A stacking method is applied to the table to eliminate blanks where needed. Extraneous characters are removed from fields. The four tables are joined together and queries are executed to update the final Purchase Order Dataset table. Once the scrubbing and standardization process is complete the data is then uploaded into the SQL Server database.
Secondary/Related Resources:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We introduce a large-scale dataset of the complete texts of free/open source software (FOSS) license variants. To assemble it we have collected from the Software Heritage archive—the largest publicly available archive of FOSS source code with accompanying development history—all versions of files whose names are commonly used to convey licensing terms to software users and developers. The dataset consists of 6.5 million unique license files that can be used to conduct empirical studies on open source licensing, training of automated license classifiers, natural language processing (NLP) analyses of legal texts, as well as historical and phylogenetic studies on FOSS licensing. Additional metadata about shipped license files are also provided, making the dataset ready to use in various contexts; they include: file length measures, detected MIME type, detected SPDX license (using ScanCode), example origin (e.g., GitHub repository), oldest public commit in which the license appeared. The dataset is released as open data as an archive file containing all deduplicated license blobs, plus several portable CSV files for metadata, referencing blobs via cryptographic checksums.
For more details see the included README file and companion paper:
Stefano Zacchiroli. A Large-scale Dataset of (Open Source) License Text Variants. In proceedings of the 2022 Mining Software Repositories Conference (MSR 2022). 23-24 May 2022 Pittsburgh, Pennsylvania, United States. ACM 2022.
If you use this dataset for research purposes, please acknowledge its use by citing the above paper.
International Data & Economic Analysis (IDEA) is USAID's comprehensive source of economic and social data and analysis. IDEA brings together over 12,000 data series from over 125 sources into one location for easy access by USAID and its partners through the USAID public website. The data are broken down by countries, years and the following sectors: Economy, Country Ratings and Rankings, Trade, Development Assistance, Education, Health, Population, and Natural Resources. IDEA regularly updates the database as new data become available. Examples of IDEA sources include the Demographic and Health Surveys, STATcompiler; UN Food and Agriculture Organization, Food Price Index; IMF, Direction of Trade Statistics; Millennium Challenge Corporation; and World Bank, World Development Indicators. The database can be queried by navigating to the site displayed in the Home Page field below.
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global strategic sourcing software market size was valued at approximately USD 4.5 billion in 2023 and is projected to reach USD 9.2 billion by 2032, exhibiting a compound annual growth rate (CAGR) of 8.2% during the forecast period. The growth of this market is driven by the increasing demand for efficient supplier management processes and cost reduction strategies across various industries. As businesses strive to optimize their procurement processes and supplier relationships, strategic sourcing software has emerged as a pivotal tool in enabling organizations to manage supplier data, contracts, and performance effectively, thereby enhancing overall operational efficiency and competitive advantage.
One of the primary growth factors driving the strategic sourcing software market is the rising complexity of supply chains in the globalized economy. Organizations are increasingly seeking sophisticated solutions to manage multiple suppliers across different geographies, which has led to heightened demand for software that can streamline supplier selection, evaluation, and management processes. Moreover, the increasing emphasis on sustainable and ethical sourcing practices is propelling the adoption of strategic sourcing software, as companies aim to ensure compliance with environmental and social governance (ESG) criteria. This software aids in tracking supplier compliance with sustainability standards, thus bolstering its appeal in the market.
Another significant contributor to the growth of the strategic sourcing software market is the digital transformation initiatives being undertaken by organizations worldwide. As companies look to leverage technology to drive procurement efficiencies, there is a shift towards adopting cloud-based solutions that offer scalability and real-time analytics capabilities. Cloud-based strategic sourcing software allows organizations to make data-driven decisions, improve supplier collaboration, and enhance transparency across the procurement lifecycle. Additionally, the integration of advanced technologies such as artificial intelligence (AI) and machine learning (ML) within these software solutions is expected to unlock new opportunities for enhanced supplier insights and predictive analytics, further stimulating market growth.
The increasing adoption of strategic sourcing software in various end-user industries is another vital factor propelling market expansion. Industries such as retail, manufacturing, healthcare, and BFSI are recognizing the value of robust sourcing solutions in driving cost efficiencies, minimizing supply chain disruptions, and enhancing supplier performance. In the healthcare sector, for example, strategic sourcing software is being leveraged to manage complex supplier networks and optimize procurement cycles, ensuring timely availability of critical medical supplies. Similarly, in the BFSI sector, strategic sourcing solutions help in managing vendor risks and ensuring compliance with regulatory requirements. This growing cross-industry adoption underscores the wide-ranging applicability and benefits of strategic sourcing software.
In the realm of procurement, the significance of Purchasing Software cannot be overstated. As organizations strive to streamline their buying processes and enhance operational efficiency, purchasing software emerges as a critical component in achieving these objectives. This software facilitates the automation of purchase orders, supplier interactions, and invoice management, thereby reducing manual errors and accelerating procurement cycles. Moreover, purchasing software provides valuable insights into spending patterns and supplier performance, enabling companies to make informed purchasing decisions and negotiate better terms. As businesses continue to prioritize cost control and process optimization, the adoption of purchasing software is expected to rise, further driving the strategic sourcing software market.
The strategic sourcing software market is segmented by components into software and services. The software segment comprises standalone solutions as well as integrated suites that offer functionalities such as supplier management, contract management, spend analysis, and more. These software solutions are increasingly being favored due to their ability to automate and optimize procurement processes, thereby reducing operational costs and improving supplier relationships. The integration of AI and ML technologies within these
The global big data and business analytics (BDA) market was valued at 168.8 billion U.S. dollars in 2018 and is forecast to grow to 215.7 billion U.S. dollars by 2021. In 2021, more than half of BDA spending will go towards services. IT services is projected to make up around 85 billion U.S. dollars, and business services will account for the remainder. Big data High volume, high velocity and high variety: one or more of these characteristics is used to define big data, the kind of data sets that are too large or too complex for traditional data processing applications. Fast-growing mobile data traffic, cloud computing traffic, as well as the rapid development of technologies such as artificial intelligence (AI) and the Internet of Things (IoT) all contribute to the increasing volume and complexity of data sets. For example, connected IoT devices are projected to generate 79.4 ZBs of data in 2025. Business analytics Advanced analytics tools, such as predictive analytics and data mining, help to extract value from the data and generate business insights. The size of the business intelligence and analytics software application market is forecast to reach around 16.5 billion U.S. dollars in 2022. Growth in this market is driven by a focus on digital transformation, a demand for data visualization dashboards, and an increased adoption of cloud.
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Real World Evidence Solutions Market size was valued at USD 1.30 Billion in 2024 and is projected to reach USD 3.71 Billion by 2031, growing at a CAGR of 13.92% during the forecast period 2024-2031.
Global Real World Evidence Solutions Market Drivers
The market drivers for the Real World Evidence Solutions Market can be influenced by various factors. These may include:
Growing Need for Evidence-Based Healthcare: Real-world evidence (RWE) is becoming more and more important in healthcare decision-making, according to stakeholders such as payers, providers, and regulators. In addition to traditional clinical trial data, RWE solutions offer important insights into the efficacy, safety, and value of healthcare interventions in real-world situations. Growing Use of RWE by Pharmaceutical Companies: RWE solutions are being used by pharmaceutical companies to assist with market entry, post-marketing surveillance, and drug development initiatives. Pharmaceutical businesses can find new indications for their current medications, improve clinical trial designs, and convince payers and providers of the worth of their products with the use of RWE. Increasing Priority for Value-Based Healthcare: The emphasis on proving the cost- and benefit-effectiveness of healthcare interventions in real-world settings is growing as value-based healthcare models gain traction. To assist value-based decision-making, RWE solutions are essential in evaluating the economic effect and real-world consequences of healthcare interventions. Technological and Data Analytics Advancements: RWE solutions are becoming more capable due to advances in machine learning, artificial intelligence, and big data analytics. With the use of these technologies, healthcare stakeholders can obtain actionable insights from the analysis of vast and varied datasets, including patient-generated data, claims data, and electronic health records. Regulatory Support for RWE Integration: RWE is being progressively integrated into regulatory decision-making processes by regulatory organisations including the European Medicines Agency (EMA) and the U.S. Food and Drug Administration (FDA). The FDA's Real-World Evidence Programme and the EMA's Adaptive Pathways and PRIority MEdicines (PRIME) programme are two examples of initiatives that are making it easier to incorporate RWE into regulatory submissions and drug development. Increasing Emphasis on Patient-Centric Healthcare: The value of patient-reported outcomes and real-world experiences in healthcare decision-making is becoming more widely acknowledged. RWE technologies facilitate the collection and examination of patient-centered data, offering valuable insights into treatment efficacy, patient inclinations, and quality of life consequences. Extension of RWE Use Cases: RWE solutions are being used in medication development, post-market surveillance, health economics and outcomes research (HEOR), comparative effectiveness research, and market access, among other healthcare fields. The necessity for a variety of RWE solutions catered to the needs of different stakeholders is being driven by the expansion of RWE use cases.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Raw data supporting the Springer Nature Data Availability Statement (DAS) analysis in the State of Open Data 2024. SOOD_2024_special_analysis_DAS_SN.xlsx contains the DAS, DOI, publication date, DAS categories and related country by Insitution of any author.SOOD 2024_DAS_analysis_sharing.xlsx contains the summary data by country and data sharing type.Utilizing the Dimensions database, we identified articles containing key DAS identifiers such as “Data Availability Statement” or “Availability of Data and Materials” within their full text. Digital Object Identifiers (DOIs) of these articles were collected and matched against Springer Nature’s XML database to extract the DAS for each article. The extracted DAS were categorized into specific sharing types using text and data matching terms. For statements indicating that data are publicly available in a repository, we matched against a predefined list of repository identifiers, names, and URLs. The DAS were classified into the following categories:1. Data are available from the author on request. 2. Data are included in the manuscript or its supplementary material. 3. Some or all of the data are publicly available, for example in a repository.4. Figure source data are included with the manuscript. 5. Data availability is not applicable.6. Data are declared as not available by the author.7. Data available online but not in a repository.These categories are non-exclusive: more than one can apply to any one article. Publications outside the 2019–2023 range and non-article publication types (e.g., book chapters) that were initially included in the Dimensions search results were excluded from the final dataset. Articles were included in the final analysis after applying the exclusion criteria. Upon processing, it was found that only 370 results were returned for Botswana across the five-year period; due to this low number, Botswana was not included in the DAS focused country-level analysis. This analysis does not assess the accuracy of the DAS in the context of each individual article. There was no manual verification of the categories applied; as a result, terms used out of context could have led to misclassification. Approximately 5% of articles remained unclassified following text and data matching due to these limitations.
Project abstract: The ERC project 'Food citizens?' is a comparative analysis of a growing phenomenon in Europe: collective food procurement, namely networks of people who organize direct food production, distribution, and consumption. The project studied several cases in Gdańsk, Rotterdam and Turin: self-production and foraging (for example in food gardens); short food chains (for example through food cooperatives), and local food governance (for example through food councils, but also social networks or NGOs). ‘Solidarity’, ‘Diversity’, ‘Skill’, and ‘Scale’ are our categories of analysis, to ask questions such as: Which skills do people involved in collective food procurement acquire or lack? How do they operate across and within diverse communities? Do their networks scale ‘up’ or ‘out’, and how? How do they interpret and articulate solidarity? To what extent do collective food procurement networks indicate emerging forms of ‘food citizenship’, and what might this mean? Which practices and notions of civic participation, solidarity, diversity and belonging do they use and produce, and how?
Three Ph.D. candidates conducted ethnographic research for 16 months in the three cities. This dataset consists of the biweekly field reports they wrote to document their fieldwork and communicate field developments within the team.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This description is part of the blog post "Systematic Literature Review of teaching Open Science" https://sozmethode.hypotheses.org/839
According to my opinion, we do not pay enough attention to teaching Open Science in higher education. Therefore, I designed a seminar to teach students the practices of Open Science by doing qualitative research.About this seminar, I wrote the article ”Teaching Open Science and qualitative methods“. For the article ”Teaching Open Science and qualitative methods“, I started to review the literature on ”Teaching Open Science“. The result of my literature review is that certain aspects of Open Science are used for teaching. However, Open Science with all its aspects (Open Access, Open Data, Open Methodology, Open Science Evaluation and Open Science Tools) is not an issue in publications about teaching.
Based on this insight, I have started a systematic literature review. I realized quickly that I need help to analyse and interpret the articles and to evaluate my preliminary findings. Especially different disciplinary cultures of teaching different aspects of Open Science are challenging, as I myself, as a social scientist, do not have enough insight to be able to interpret the results correctly. Therefore, I would like to invite you to participate in this research project!
I am now looking for people who would like to join a collaborative process to further explore and write the systematic literature review on “Teaching Open Science“. Because I want to turn this project into a Massive Open Online Paper (MOOP). According to the 10 rules of Tennant et al (2019) on MOOPs, it is crucial to find a core group that is enthusiastic about the topic. Therefore, I am looking for people who are interested in creating the structure of the paper and writing the paper together with me. I am also looking for people who want to search for and review literature or evaluate the literature I have already found. Together with the interested persons I would then define, the rules for the project (cf. Tennant et al. 2019). So if you are interested to contribute to the further search for articles and / or to enhance the interpretation and writing of results, please get in touch. For everyone interested to contribute, the list of articles collected so far is freely accessible at Zotero: https://www.zotero.org/groups/2359061/teaching_open_science. The figure shown below provides a first overview of my ongoing work. I created the figure with the free software yEd and uploaded the file to zenodo, so everyone can download and work with it:
To make transparent what I have done so far, I will first introduce what a systematic literature review is. Secondly, I describe the decisions I made to start with the systematic literature review. Third, I present the preliminary results.
Systematic literature review – an Introduction
Systematic literature reviews “are a method of mapping out areas of uncertainty, and identifying where little or no relevant research has been done.” (Petticrew/Roberts 2008: 2). Fink defines the systematic literature review as a “systemic, explicit, and reproducible method for identifying, evaluating, and synthesizing the existing body of completed and recorded work produced by researchers, scholars, and practitioners.” (Fink 2019: 6). The aim of a systematic literature reviews is to surpass the subjectivity of a researchers’ search for literature. However, there can never be an objective selection of articles. This is because the researcher has for example already made a preselection by deciding about search strings, for example “Teaching Open Science”. In this respect, transparency is the core criteria for a high-quality review.
In order to achieve high quality and transparency, Fink (2019: 6-7) proposes the following seven steps:
I have adapted these steps for the “Teaching Open Science” systematic literature review. In the following, I will present the decisions I have made.
Systematic literature review – decisions I made
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Information related to diet and energy flow is fundamental to a diverse range of Antarctic and Southern Ocean biological and ecosystem studies. This metadata record describes a database of such information being collated by the SCAR Expert Groups on Antarctic Biodiversity Informatics (EG-ABI) and Birds and Marine Mammals (EG-BAMM) to assist the scientific community in this work. It includes data related to diet and energy flow from conventional (e.g. gut content) and modern (e.g. molecular) studies, stable isotopes, fatty acids, and energetic content. It is a product of the SCAR community and open for all to participate in and use.
Data have been drawn from published literature, existing trophic data collections, and unpublished data. The database comprises five principal tables, relating to (i) direct sampling methods of dietary assessment (e.g. gut, scat, and bolus content analyses, stomach flushing, and observed predation), (ii) stable isotopes, (iii) lipids, (iv) DNA-based diet assessment, and (v) energetics values. The schemas of these tables are described below, and a list of the sources used to populate the tables is provided with the data.
A range of manual and automated checks were used to ensure that the entered data were as accurate as possible. These included visual checking of transcribed values, checking of row or column sums against known totals, and checking for values outside of allowed ranges. Suspicious entries were re-checked against original source.
Notes on names: Names have been validated against the World Register of Marine Species (http://www.marinespecies.org/). For uncertain taxa, the most specific taxonomic name has been used (e.g. prey reported in a study as "Pachyptila sp." will appear here as "Pachyptila"; "Cephalopods" will appear as "Cephalopoda"). Uncertain species identifications (e.g. "Notothenia rossii?" or "Gymnoscopelus cf. piabilis") have been assigned the genus name (e.g. "Notothenia", "Gymnoscopelus"). Original names have been retained in a separate column to allow future cross-checking. WoRMS identifiers (APHIA_ID numbers) are given where possible.
Grouped prey data in the diet sample table need to be handled with a bit of care. Papers commonly report prey statistics aggregated over groups of prey - e.g. one might give the diet composition by individual cephalopod prey species, and then an overall record for all cephalopod prey. The PREY_IS_AGGREGATE column identifies such records. This allows us to differentiate grouped data like this from unidentified prey items from a certain prey group - for example, an unidentifiable cephalopod record would be entered as Cephalopoda (the scientific name), with "N" in the PREY_IS_AGGREGATE column. A record that groups together a number of cephalopod records, possibly including some unidentifiable cephalopods, would also be entered as Cephalopoda, but with "Y" in the PREY_IS_AGGREGATE column. See the notes on PREY_IS_AGGREGATE, below.
There are two related R packages that provide data access and functionality for working with these data. See the package home pages for more information: https://github.com/SCAR/sohungry and https://github.com/SCAR/solong.
Data table schemas
Sources data table
SOURCE_ID: The unique identifier of this source
DETAILS: The bibliographic details for this source (e.g. "Hindell M (1988) The diet of the royal penguin Eudyptes schlegeli at Macquarie Island. Emu 88:219–226")
NOTES: Relevant notes about this source – if it’s a published paper, this is probably the abstract
DOI: The DOI of the source (paper or dataset), in the form "10.xxxx/yyyy"
Diet data table
RECORD_ID: The unique identifier of this record
SOURCE_ID: The identifier of the source study from which this record was obtained (see corresponding entry in the sources data table)
SOURCE_DETAILS, SOURCE_DOI: The details and DOI of the source, copied from the sources data table for convenience
ORIGINAL_RECORD_ID: The identifier of this data record in its original source, if it had one
LOCATION: The name of the location at which the data was collected
WEST: The westernmost longitude of the sampling region, in decimal degrees (negative values for western hemisphere longitudes)
EAST: The easternmost longitude of the sampling region, in decimal degrees (negative values for western hemisphere longitudes)
SOUTH: The southernmost latitude of the sampling region, in decimal degrees (negative values for southern hemisphere latitudes)
NORTH: The northernmost latitude of the sampling region, in decimal degrees (negative values for southern hemisphere latitudes)
ALTITUDE_MIN: The minimum altitude of the sampling region, in metres
ALTITUDE_MAX: The maximum altitude of the sampling region, in metres
DEPTH_MIN: The shallowest depth of the sampling, in metres
DEPTH_MAX: The deepest depth of the sampling, in metres
OBSERVATION_DATE_START: The start of the sampling period
OBSERVATION_DATE_END: The end of the sampling period. If sampling was carried out over multiple seasons (e.g. during January of 2002 and January of 2003), this will be the first and last dates (in this example, from 1-Jan-2002 to 31-Jan-2003)
PREDATOR_NAME: The name of the predator. This may differ from predator_name_original if, for example, taxonomy has changed since the original publication, if the original publication had spelling errors or used common (not scientific) names
PREDATOR_NAME_ORIGINAL: The name of the predator, as it appeared in the original source
PREDATOR_APHIA_ID: The numeric identifier of the predator in the WoRMS taxonomic register
PREDATOR_WORMS_RANK, PREDATOR_WORMS_KINGDOM, PREDATOR_WORMS_PHYLUM, PREDATOR_WORMS_CLASS, PREDATOR_WORMS_ORDER, PREDATOR_WORMS_FAMILY, PREDATOR_WORMS_GENUS: The taxonomic details of the predator, from the WoRMS taxonomic register
PREDATOR_GROUP_SOKI: A descriptive label of the group to which the predator belongs (currently used in the Southern Ocean Knowledge and Information wiki, http://soki.aq)
PREDATOR_LIFE_STAGE: Life stage of the predator, e.g. "adult", "chick", "larva", "juvenile". Note that if a food sample was taken from an adult animal, but that food was destined for a juvenile, then the life stage will be "juvenile" (this is common with seabirds feeding chicks)
PREDATOR_BREEDING_STAGE: Stage of the breeding season of the predator, if applicable, e.g. "brooding", "chick rearing", "nonbreeding", "posthatching"
PREDATOR_SEX: Sex of the predator: "male", "female", "both", or "unknown"
PREDATOR_SAMPLE_COUNT: The number of predators for which data are given. If (say) 50 predators were caught but only 20 analysed, this column will contain 20. For scat content studies, this will be the number of scats analysed
PREDATOR_SAMPLE_ID: The identifier of the predator(s). If predators are being reported at the individual level (i.e. PREDATOR_SAMPLE_COUNT = 1) then PREDATOR_SAMPLE_ID is the individual animal ID. Alternatively, if the data values being entered here are from a group of predators, then the PREDATOR_SAMPLE_ID identifies that group of predators. PREDATOR_SAMPLE_ID values are unique within a source (i.e. SOURCE_ID, PREDATOR_SAMPLE_ID pairs are globally unique). Rows with the same SOURCE_ID and PREDATOR_SAMPLE_ID values relate to the same predator individual or group of individuals, and so can be combined (e.g. for prey diversity analyses). Subsamples are indicated by a decimal number S.nnn, where S is the parent PREDATOR_SAMPLE_ID, and nnn (001-999) is the subsample number. Studies will sometimes report detailed prey information for a large sample, but then report prey information for various subsamples of that sample (e.g. broken down by predator sex, or sampling season). In the simplest case, the diet of each predator will be reported only once in the study, and in this scenario the PREDATOR_SAMPLE_ID values will simply be 1 to N (for N predators).
PREDATOR_SIZE_MIN, PREDATOR_SIZE_MAX, PREDATOR_SIZE_MEAN, PREDATOR_SIZE_SD: The minimum, maximum, mean, and standard deviation of the size of the predators in the sample
PREDATOR_SIZE_UNITS: The units of size (e.g. "mm")
PREDATOR_SIZE_NOTES: Notes on the predator size information, including a definition of what the size value represents (e.g. "total length", "standard length")
PREDATOR_MASS_MIN, PREDATOR_MASS_MAX, PREDATOR_MASS_MEAN, PREDATOR_MASS_SD: The minimum, maximum, mean, and standard deviation of the mass of the predators in the sample
PREDATOR_MASS_UNITS: The units of mass (e.g. "g", "kg")
PREDATOR_MASS_NOTES: Notes on the predator mass information, including a definition of what the mass value represents
PREY_NAME: The scientific name of the prey item (corrected, if necessary)
PREY_NAME_ORIGINAL: The name of the prey item, as it appeared in the original source
PREY_APHIA_ID: The numeric identifier of the prey in the WoRMS taxonomic register
PREY_WORMS_RANK, PREY_WORMS_KINGDOM, PREY_WORMS_PHYLUM, PREY_WORMS_CLASS, PREY_WORMS_ORDER, PREY_WORMS_FAMILY, PREY_WORMS_GENUS: The taxonomic details of the prey, from the WoRMS taxonomic register
PREY_GROUP_SOKI: A descriptive label of the group to which the prey belongs (currently used in the Southern Ocean Knowledge and Information wiki, http://soki.aq)
PREY_IS_AGGREGATE: "Y" indicates that this row is an aggregation of other rows in this data source. For example, a study might give a number of individual squid species records, and then an overall squid record that encompasses the individual records. Use the PREY_IS_AGGREGATE information to avoid double-counting during analyses
PREY_LIFE_STAGE: Life stage of the prey (e.g. "adult", "chick", "larva")
PREY_SEX: The sex of the prey ("male", "female", "both", or "unknown"). Note that this is generally "unknown"
PREY_SAMPLE_COUNT: The number of prey individuals from which size and mass measurements were made (note: this is NOT the total number of individuals of
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Original EPIC-1 data source and documented intermediate data manipulation. These files are provided in order to ensure a complete audit trail and documentation. These files include original source data, as well as files created in the process of cleaning and preparing the datasets found in section I of the dataverse (1. Pooled and Adjusted EPIC Data). These intermediary files contain documentation in any adjustment in assumptions, currency conversions, and data cleaning processes. Ordinarily, analysis would be done using the datasets in section I. Researchers would not find the need to use the files in this section unless for tracing the origin of the variables to the original source. “Adjustments for the EPIC-2 data is conducted with advice and input from data collection team (EPIC-1). The magnitude of these adjustments are documented in the table attached. These documented adjustments explained the lion’s share of the discrepancies, leaving only minor unaccounted differences in the data (Δ range 0% - 1.1%).” “In addition to using the sampling weights, any extrapolation to achieve nationwide cost estimates for Benin, Ghana, Zambia, and Honduras uses scale-up factor to take into account facilities that are outside of the sampling frame. For example, after taking into account the sampling weights, the total facility-level delivery cost in Benin sampling frame (343 facilities) is $2,094,031. To estimate the total facility-level delivery cost in the entire country of Benin (695 facilities), the sample-frame cost estimate is multiplied by 695/343. “Additional adjustments for the EPIC-2 analysis include the series of decisions for weighting, methods, and data sources. For EPIC-2 analyses, average costs per dose and DTP3 were calculated as total costs divided by total outputs, representing a funder’s perspective. We also report results as a simple average of the site-level cost per output. All estimates were adjusted for survey weighting. In particular, the analyses in EPIC-2 relied exclusively on information from the sample, whereas in some instance EPIC-1 teams were able to strategically leverage other available data sources.”
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
All grid squares are approximately the size of a downtown Salt Lake City block.For each grid square, three metrics are available, each of which reflects a total from the analysis grid square and its 12 nearest grid squares (those whose center is within 0.25 miles of the boundary of the analysis grid square). Geometrically the 12 nearest cells are two cells in each cardinal direction and 1 cell diagonally (see graphic below).
The three attribute values, representing metrics for the current 2015 model base year, are:
Nearby Employment Intensity (NEI):
Jobs within quarter mile of each grid square. County-level job counts are controlled to the official Gardner Policy Institute (GPI) estimates for 2015. Job locations are then determined using the WFRC/MAG Real Estate Market Model (an customized implementation of UrbanSim open source software) using county assessor tax parcel data together with generalized job data from the Department of Workforce Services as key of the model inputs.
Nearby Residential Intensity (NRI):
Households within Quarter Mile. County-level household counts are controlled to the official Gardner Policy Institute (GPI) estimates for 2015. Household locations are determined using the WFRC/MAG REM model using county assessor tax parcel data together with US Census population (block level) as key model inputs.
Nearby Combined Intensity (NCI):
Jobs plus scaled households within a quarter mile of each grid square. To give NEI and NRI equal weighting, the NRI household number is scaled by multiplying by 1,295,513 (total number of jobs in the region) and dividing by. 731,392 (the total number of households in the region)0
Quarter mile grid square example graphic:
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
🇺🇸 미국
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
This data release contains the source code, executable file, and example files for WATRMod, a Water-budget Accounting for Tropical Regions Model code that is documented in U.S. Geological Survey Open-File Report 2022-1013 available at https://doi.org/10.3133/ofr20221013. The source code is written in the Fortran computer language. The model source code was compiled using Intel(R) Visual Fortran Intel(R) 64 for Windows, version 11.0.061, Copyright(C) 1985-2008. WATRMod can be executed (run) in a Command window by typing the command WATRMod1 (preceded by the appropriate path to the file WATRMod1.exe if the file WATRMod1.exe does not reside in the folder from which the command is issued) at the prompt; the file WATRMOD.FIL must exist in the folder from which the command is issued. The example files provided with this data release will help the user understand the input requirements to run the model.
PEST++ Version 5 software release. This release includes ASCII format C++11 source code, precompiled binaries for windows 10 and linux, and inputs files the example problem shown in the report
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
As large-scale cross-linking data becomes available, new software tools for data processing and visualization are required to replace manual data analysis. XLink-DB serves as a data storage site and visualization tool for cross-linking results. XLink-DB accepts data generated with any cross-linker and stores them in a relational database. Cross-linked sites are automatically mapped onto PDB structures if available, and results are compared to existing protein interaction databases. A protein interaction network is also automatically generated for the entire data set. The XLink-DB server, including examples, and a help page are available for noncommercial use at http://brucelab.gs.washington.edu/crosslinkdbv1/. The source code can be viewed and downloaded at https://sourceforge.net/projects/crosslinkdb/?source=directory.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Example DataFrame (Teeny-Tiny Castle)
This dataset is part of a tutorial tied to the Teeny-Tiny Castle, an open-source repository containing educational tools for AI Ethics and Safety research.
How to Use
from datasets import load_dataset
dataset = load_dataset("AiresPucrs/example-data-frame", split = 'train')
https://brightdata.com/licensehttps://brightdata.com/license
Unlock the full potential of LinkedIn data with our extensive dataset that combines profiles, company information, and job listings into one powerful resource for business decision-making, strategic hiring, competitive analysis, and market trend insights. This all-encompassing dataset is ideal for professionals, recruiters, analysts, and marketers aiming to enhance their strategies and operations across various business functions. Dataset Features
Profiles: Dive into detailed public profiles featuring names, titles, positions, experience, education, skills, and more. Utilize this data for talent sourcing, lead generation, and investment signaling, with a refresh rate ensuring up to 30 million records per month. Companies: Access comprehensive company data including ID, country, industry, size, number of followers, website details, subsidiaries, and posts. Tailored subsets by industry or region provide invaluable insights for CRM enrichment, competitive intelligence, and understanding the startup ecosystem, updated monthly with up to 40 million records. Job Listings: Explore current job opportunities detailed with job titles, company names, locations, and employment specifics such as seniority levels and employment functions. This dataset includes direct application links and real-time application numbers, serving as a crucial tool for job seekers and analysts looking to understand industry trends and the job market dynamics.
Customizable Subsets for Specific Needs Our LinkedIn dataset offers the flexibility to tailor the dataset according to your specific business requirements. Whether you need comprehensive insights across all data points or are focused on specific segments like job listings, company profiles, or individual professional details, we can customize the dataset to match your needs. This modular approach ensures that you get only the data that is most relevant to your objectives, maximizing efficiency and relevance in your strategic applications. Popular Use Cases
Strategic Hiring and Recruiting: Track talent movement, identify growth opportunities, and enhance your recruiting efforts with targeted data. Market Analysis and Competitive Intelligence: Gain a competitive edge by analyzing company growth, industry trends, and strategic opportunities. Lead Generation and CRM Enrichment: Enrich your database with up-to-date company and professional data for targeted marketing and sales strategies. Job Market Insights and Trends: Leverage detailed job listings for a nuanced understanding of employment trends and opportunities, facilitating effective job matching and market analysis. AI-Driven Predictive Analytics: Utilize AI algorithms to analyze large datasets for predicting industry shifts, optimizing business operations, and enhancing decision-making processes based on actionable data insights.
Whether you are mapping out competitive landscapes, sourcing new talent, or analyzing job market trends, our LinkedIn dataset provides the tools you need to succeed. Customize your access to fit specific needs, ensuring that you have the most relevant and timely data at your fingertips.
Autoscraping's Zillow USA Real Estate Data is a comprehensive and meticulously curated dataset that covers over 10 million property listings across the United States. This data product is designed to meet the needs of professionals across various sectors, including real estate investment, market analysis, urban planning, and academic research. Our dataset is unique in its depth, accuracy, and timeliness, ensuring that users have access to the most relevant and actionable information available.
What Makes Our Data Unique? The uniqueness of our data lies in its extensive coverage and the precision of the information provided. Each property listing is enriched with detailed attributes, including but not limited to, full addresses, asking prices, property types, number of bedrooms and bathrooms, lot size, and Zillow’s proprietary value and rent estimates. This level of detail allows users to perform in-depth analyses, make informed decisions, and gain a competitive edge in their respective fields.
Furthermore, our data is continually updated to reflect the latest market conditions, ensuring that users always have access to current and accurate information. We prioritize data quality, and each entry is carefully validated to maintain a high standard of accuracy, making this dataset one of the most reliable on the market.
Data Sourcing: The data is sourced directly from Zillow, one of the most trusted names in the real estate industry. By leveraging Zillow’s extensive real estate database, Autoscraping ensures that users receive data that is not only comprehensive but also highly reliable. Our proprietary scraping technology ensures that data is extracted efficiently and without errors, preserving the integrity and accuracy of the original source. Additionally, we implement strict data processing and validation protocols to filter out any inconsistencies or outdated information, further enhancing the quality of the dataset.
Primary Use-Cases and Vertical Applications: Autoscraping's Zillow USA Real Estate Data is versatile and can be applied across a variety of use cases and industries:
Real Estate Investment: Investors can use this data to identify lucrative opportunities, analyze market trends, and compare property values across different regions. The detailed pricing and valuation data allow for comprehensive due diligence and risk assessment.
Market Analysis: Market researchers can leverage this dataset to track real estate trends, evaluate the performance of different property types, and assess the impact of economic factors on property values. The dataset’s nationwide coverage makes it ideal for both local and national market studies.
Urban Planning and Development: Urban planners and developers can use the data to identify growth areas, plan new developments, and assess the demand for different property types in various regions. The detailed location data is particularly valuable for site selection and zoning analysis.
Academic Research: Universities and research institutions can utilize this data for studies on housing markets, urbanization, and socioeconomic trends. The comprehensive nature of the dataset allows for a wide range of academic applications.
Integration with Our Broader Data Offering: Autoscraping's Zillow USA Real Estate Data is part of our broader data portfolio, which includes various datasets focused on real estate, market trends, and consumer behavior. This dataset can be seamlessly integrated with our other offerings to provide a more holistic view of the market. For example, combining this data with our consumer demographic datasets can offer insights into the relationship between property values and demographic trends.
By choosing Autoscraping's data products, you gain access to a suite of complementary datasets that can be tailored to meet your specific needs. Whether you’re looking to gain a comprehensive understanding of the real estate market, identify new investment opportunities, or conduct advanced research, our data offerings are designed to provide you with the insights you need.