100+ datasets found
  1. A Journey through Data Cleaning

    • kaggle.com
    zip
    Updated Mar 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    kenanyafi (2024). A Journey through Data Cleaning [Dataset]. https://www.kaggle.com/datasets/kenanyafi/a-journey-through-data-cleaning
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 22, 2024
    Authors
    kenanyafi
    Description

    Embark on a transformative journey with our Data Cleaning Project, where we meticulously refine and polish raw data into valuable insights. Our project focuses on streamlining data sets, removing inconsistencies, and ensuring accuracy to unlock its full potential.

    Through advanced techniques and rigorous processes, we standardize formats, address missing values, and eliminate duplicates, creating a clean and reliable foundation for analysis. By enhancing data quality, we empower organizations to make informed decisions, drive innovation, and achieve strategic objectives with confidence.

    Join us as we embark on this essential phase of data preparation, paving the way for more accurate and actionable insights that fuel success."

  2. d

    DOC Tracks - Dataset - data.govt.nz - discover and use data

    • catalogue.data.govt.nz
    Updated Aug 1, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). DOC Tracks - Dataset - data.govt.nz - discover and use data [Dataset]. https://catalogue.data.govt.nz/dataset/doc-tracks6
    Explore at:
    Dataset updated
    Aug 1, 2016
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Department of Conservation (DOC) - recreation track lines (approx. centreline). Dataset shows all tracks. If you intend to walk a track, please confirm with your local office or the DOC website that the track isn't under a temporary or more permanent closure before embarking.Detailed characteristics about each track are held in the ‘CharName’ and ‘CharValue’ fields in the attribute table.Each ‘CharName’ field contains the name of a characteristic, and each ‘CharValue’ field to the right of this contains the value related to this characteristic.*DISCLAIMER1. DOC makes no express or implied warranties as to the accuracy or completeness of the data or information, nor its suitability for any purpose. Errors are inevitably part of any database, and can arise by a number of means, from errors during field data collection, to errors during data entry.2. DOC makes no warranties or representations as to possible infringement upon copyrights or other intellectual property rights of others in the data or information.3. DOC will not accept liability for any direct, indirect, special or consequential damages, losses or expenses howsoever arising and relating to use, or lack of use, of the data or information supplied.GUIDELINES FOR THE USE OF THE INFORMATION4. Care should be taken in deriving conclusions from any data or information supplied.5. Any use of the data or information supplied should state when the data or information was acquired and that it may now be out-of-date.COPYRIGHT OBLIGATIONS6. All proprietary rights to the intellectual property in the data or information remain with the Crown as its sole property.7. Modification of the data and information or the addition of the information does not confer copyright or any other form of property of the original material to a user.8. All maps or reports that are derived from the data or information must acknowledge the Crown copyright, in the following way: Crown Copyright: Department of Conservation Te Papa Atawhai [year].9. This information resource may be passed onto another party, in either hard copy or electronic form. If a user does this, then it is recommended that they also supply this metadata record with the information resource.LICENCE***This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit https://creativecommons.org/licenses/by/4.0/ or send a letter to Creative Commons, 444 Castro Street, Suite 900, Mountain View, California, 94041, USA.

  3. a

    Rapid Transit and Bus Prediction Accuracy Data

    • mbta-massdot.opendata.arcgis.com
    • gis.data.mass.gov
    • +1more
    Updated Feb 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Massachusetts geoDOT (2022). Rapid Transit and Bus Prediction Accuracy Data [Dataset]. https://mbta-massdot.opendata.arcgis.com/datasets/155ab68df00145cabddfb90377201b0e
    Explore at:
    Dataset updated
    Feb 1, 2022
    Dataset authored and provided by
    Massachusetts geoDOT
    Description

    This file contains the prediction accuracy for subway and bus. Prediction accuracy is determined by the number of accurate predictions vs the number of total predictions for each "bin" or timeframe. Data is not guaranteed to be complete for any line or date. Name Description Data Type Example weekly Date representing the week's worth of data. For bus, it's the last day of the week and for subway it's the first day of the week. The date is based on "service day", so "May 1" means May 1, 3:00am ET until May 2, 2:59am ET. Date 8/6/2020 mode Either "bus" for bus predictions, or "subway" for Red, Orange, Green-[B/C/D/E], Blue, and Mattapan predictions. String bus route_id The subway route the data is for. Our bus data provider does not have this data at a per-route level. String Green-B arrival_departure For bus, whether the data is about the timing of an arrival at a bus stop, or the departure from that bus stop. Bus only supports "departure". Absent on subway data because subway uses a "blended" approach of departure predictions at terminals, and arrival predictions otherwise. String departure bin The bin a prediction belongs to based on how far in the future the predicted event is for. The options are "0-3 min", "3-6 min", "6-12 min", and "12-30 min". String 0-3 min num_predictions The count of predictions sampled that meet the criteria of the other fields. Integer 50000 num_accurate_predictions Of the num_predictions, how many of them were considered accurate, where "accurate" means the predicted number of seconds was within a threshold of the actual number of seconds, based on the bin. For a given bin, the passing threshold is if a vehicle arrives: 0-3 min: 60 seconds early to 60 seconds late, 3-6 min: 90 seconds early to 120 seconds late, 6-12 min: 150 seconds early to 210 seconds late, 12:30 min: 240 seconds early to 360 seconds late. Integer 30000 MassDOT/MBTA shall not be held liable for any errors in this data. This includes errors of omission, commission, errors concerning the content of the data, and relative and positional accuracy of the data. This data cannot be construed to be a legal document. Primary sources from which this data was compiled must be consulted for verification of information contained in this data.

  4. c

    Taxi Trips in 2025

    • s.cnmilf.com
    • catalog.data.gov
    • +2more
    Updated Mar 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of For-Hire Vehicles (2025). Taxi Trips in 2025 [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/taxi-trips-in-2025
    Explore at:
    Dataset updated
    Mar 4, 2025
    Dataset provided by
    Department of For-Hire Vehicles
    Description

    Taxi trip data provided as a zip file containing pipe (|) delimited text files or csv for trips by month. DFHV provided OCTO with a taxicab trip text file representing trips. OCTO processed the data to assign a block locations to pick up and drop off locations. The blocks were assigned using the original pick up, drop off lat/long coordinates and searching for the block locations in the DC Master Address Repository (radius tolerance of 250 meters and less). The pick and drop off times were also rounded to the nearest hour. See ReadMe.txt in zip file for summary.In addition, the pick up and drop off locations were assigned to an airport using locator polygons for Reagan, BWI, and Dulles. These polygons generally followed the visual borders of these airports.The Department of For Hire Vehicles continues its growing investment in good governance and public transparency with data sets, research reports, and taxicab trip ratings available for review below. Access to information enables the public to engage in more robust debates about DFHV regulations and programs; better inform the public about the industry and agency policies; encourage innovators to design new programs; and help improve safety. The data provided herein is derived from electronic sources the accuracy of which cannot be guaranteed. While DFHV strives to provide data that is accurate and current, all data provided is for informational purposes only. The District of Columbia disclaims all liability for errors, omissions, completeness, accuracy and currentness of the data provided herein. Use of data provided herein constitutes acceptance of these terms. Revisions to the dashboard have included the addition of Transport DC data and an update to address inaccurate data that was inadvertently posted due to a technical glitch.

  5. d

    DOC Operations Regions - Dataset - data.govt.nz - discover and use data

    • catalogue.data.govt.nz
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DOC Operations Regions - Dataset - data.govt.nz - discover and use data [Dataset]. https://catalogue.data.govt.nz/dataset/doc-operations-regions4
    Explore at:
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    DOCs Operational Boundaries - Regions.These represent the administrative areas that DOC organises work within. Each Region has an Operations Director, based at the Regional Office, who has overarching responsibility for the Districts within that Region. LICENCE This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit https://creativecommons.org/licenses/by/4.0/ or send a letter to Creative Commons, 444 Castro Street, Suite 900, Mountain View, California, 94041, USA.DISCLAIMER 1. DOC makes no express or implied warranties as to the accuracy or completeness of the data or information, nor its suitability for any purpose. Errors are inevitably part of any database, and can arise by a number of means, from errors during field data collection, to errors during data entry.2. DOC makes no warranties or representations as to possible infringement upon copyrights or other intellectual property rights of others in the data or information. 3. DOC will not accept liability for any direct, indirect, special or consequential damages, losses or expenses howsoever arising and relating to use, or lack of use, of the data or information supplied.GUIDELINES FOR THE USE OF THE INFORMATION 4. Care should be taken in deriving conclusions from any data or information supplied. 5. Any use of the data or information supplied should state when the data or information was acquired and that it may now be out-of-date.COPYRIGHT OBLIGATIONS** 6. All proprietary rights to the intellectual property in the data or information remain with the Crown as its sole property.7. Modification of the data and information or the addition of the information does not confer copyright or any other form of property of the original material to a user. 8. All maps or reports that are derived from the data or information must acknowledge the Crown copyright, in the following way: Crown Copyright: Department of Conservation Te Papa Atawhai 2024. 9. This information resource may be passed onto another party, in either hard copy or electronic form. If a user does this, then it is recommended that they also supply this metadata record with the information resource.

  6. a

    DOC CMS Visitor Management Zones

    • doc-deptconservation.opendata.arcgis.com
    • hub.arcgis.com
    • +1more
    Updated Jul 28, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DOC_admin (2020). DOC CMS Visitor Management Zones [Dataset]. https://doc-deptconservation.opendata.arcgis.com/items/c0c6558006bc4c1e89c74d284afeb709
    Explore at:
    Dataset updated
    Jul 28, 2020
    Dataset authored and provided by
    DOC_admin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    This dataset contains visitor management zones (VMZ) for public conservation land, as defined in the Conservation Management Strategy documents since 2012. The zones attribute public conservation land with a ‘recreation class’ that defines the type of experience the topography and accessibility imply. The visitor management zones are defined for the administrative area that each Conservation Management Strategy is written for. The data is originally based on an implementation of the Recreational Opportunity Spectrum (ROS) model (Taylor, 1993), which incorporates ‘the activity, the setting and the experience’ of an area to produce a classification of the opportunity that the area offers. This data was adapted to form this more applicable, regionally focused visitor management zone dataset.The dataset is primarily for cartographic purposes and should not be used for legal definition. The public conservation land that the visitor management zones are based on are valid as of the date identified in the data.The visitor management zones defined in this dataset need to be taken in context of the relevant Conservation Management Strategy document. For more information on Conservation Management Strategies visit https://www.doc.govt.nz/about-us/our-policies-and-plans/statutory-plans/conservation-management-strategies*****LICENCE*****This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit https://creativecommons.org/licenses/by/4.0/ or send a letter to Creative Commons, 444 Castro Street, Suite 900, Mountain View, California, 94041, USA.*****DISCLAIMER***** 1. DOC makes no express or implied warranties as to the accuracy or completeness of the data or information, nor its suitability for any purpose. Errors are inevitably part of any database, and can arise by a number of means, from errors during field data collection, to errors during data entry. 2. DOC makes no warranties or representations as to possible infringement upon copyrights or other intellectual property rights of others in the data or information. 3. DOC will not accept liability for any direct, indirect, special or consequential damages, losses or expenses howsoever arising and relating to use, or lack of use, of the data or information supplied.*****GUIDELINES FOR THE USE OF THE INFORMATION***** 4. Care should be taken in deriving conclusions from any data or information supplied.5. Any use of the data or information supplied should state when the data or information was acquired and that it may now be out-of-date.*****COPYRIGHT OBLIGATIONS*****6. All proprietary rights to the intellectual property in the data or information remain with the Crown as its sole property. 7. Modification of the data and information or the addition of the information does not confer copyright or any other form of property of the original material to a user. 8. All maps or reports that are derived from the data or information must acknowledge the Crown copyright, in the following way: Crown Copyright: Department of Conservation Te Papa Atawhai [year]. 9. This information resource may be passed onto another party, in either hard copy or electronic form. If a user does this, then it is recommended that they also supply this metadata record with the information resource.

  7. d

    MTA Transit Oriented Development (TOD) Data

    • catalog.data.gov
    • opendata.maryland.gov
    • +3more
    Updated Mar 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    opendata.maryland.gov (2024). MTA Transit Oriented Development (TOD) Data [Dataset]. https://catalog.data.gov/dataset/mta-transit-oriented-development-tod-data
    Explore at:
    Dataset updated
    Mar 29, 2024
    Dataset provided by
    opendata.maryland.gov
    Description

    *** DISCLAIMER - This web page is a public resource of general information. The Maryland Mass Transit Administration (MTA) makes no warranty, representation, or guarantee as to the content, sequence, accuracy, timeliness, or completeness of any of the spatial data or database information provided herein. MTA and partner state, local, and other agencies shall assume no liability for errors, omissions, or inaccuracies in the information provided regardless of how caused; or any decision made or action taken or not taken by any person relying on any information or data furnished within. *** This dataset assesses rail station potential for different forms of transit oriented development (TOD). A key driver of increased transit ridership in Maryland, TOD capitalizes on existing rapid transit infrastructure. The online tool focuses on the MTA’s existing MARC Commuter Rail, Metro Subway, and Central Light Rail lines and includes information specific to each station. The goal of this dataset is to give MTA planning staff, developers, local governments, and transit riders a picture of how each MTA rail station could attract TOD investment. In order to make this assessment, MTA staff gathered data on characteristics that are likely to influence TOD potential. The station-specific data is organized into 6 different categories referring to transit activity; station facilities; parking provision and utilization; bicycle and pedestrian access; and local zoning and land availability around each station. As a publicly shared resource, this dataset can be used by local communities to identify and prioritize area improvements in coordination with the MTA that can help attract investment around rail stations. You can view an interactive version of this dataset at geodata.md.gov/tod. ** Ridership is calculated the following ways: Metro Rail ridership is based on Metro gate exit counts. Light Rail ridership is estimated using a statistical sampling process in line with FTA established guidelines, and approved by the FTA. MARC ridership is calculated using two (2) independent methods: Monthly Line level ridership is estimated using a statistical sampling process in line with FTA established guidelines, and approved by the FTA. This method of ridership calculation is used by the MTA for official reporting purposes to State level and Federal level reporting. Station level ridership is estimated by using person counts completed by the third party vendor. This method of calculation has not been verified by the FTA for statistical reporting and is used for scheduling purposes only. However, because of the granularity of detail, this information is useful for TOD applications. *Please note that the monthly level ridership and the station level ridership are calculated using two (2) independent methods that are not interchangeable and should not be compared for analysis purposes.

  8. m

    Communities of National Environmental Significance Database - RESTRICTED -...

    • demo.dev.magda.io
    • researchdata.edu.au
    • +2more
    Updated Aug 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bioregional Assessment Program (2023). Communities of National Environmental Significance Database - RESTRICTED - Metadata only [Dataset]. https://demo.dev.magda.io/dataset/ds-dga-bd0aaaed-4708-4d4d-912d-81aad2539cec
    Explore at:
    Dataset updated
    Aug 8, 2023
    Dataset provided by
    Bioregional Assessment Program
    Description

    Abstract This dataset and its metadata statement were supplied to the Bioregional Assessment Programme by a third party and are presented here as originally supplied. The Database of Communities of …Show full descriptionAbstract This dataset and its metadata statement were supplied to the Bioregional Assessment Programme by a third party and are presented here as originally supplied. The Database of Communities of National Environmental Significance stores maps, taxonomic, ecological, and management information about Communities of National Environmental Significance listed in the Environment Protection and Biodiversity Conservation (EPBC) Act 1999 as threatened ecological communities. Credit: State and Commonwealth Herbaria, Museums and Conservation Agencies Centre for Plant Biodiversity Research Australian Government Department of the Environment, Environmental Resources Information Network External accuracy: The positional accuracy of spatial data is a statistical estimate of the degree to which planimetric coordinates and elevations of features agree with their real world values. The planimetric accuracy attainable in the vector data will be composed of errors from three sources: The positional accuracy of the source material Errors due to the conversion processes. Errors due to the manipulation processes. This specification cannot prescribe a figure for the planimetric accuracy of the existing source material used for capture of community distributions as it has already been produced. The errors due to the digitising process depend on the accuracy of the digitising table set-up or the scanner resolution, systematic errors in the equipment, errors due to software and errors specific to the operator. An accepted standard for digitising is that the line accuracy should be within half a line width. Non Quantitative accuracy: Tests are undertaken to ensure that there are no errors in attributes: The spatial resolution of the data is reflected in the Presence Categories Presence categories are one of: * Community known to occur within area * Community likely to occur within area * Community may occur within area (general indication only) Conceptual consistency: Tests undertaken for logical consistency: Names of export files and data quality table are correct Table names are valid Item names in coverages are valid Item names are present in coverage attribute files Label points and entity point features have only one coordinate pair The Arc/Info coverages can be generated, have attributes attached and be 'built' In polygon coverages there are no label errors i.e. every polygon has one and only one polygon label point Data format, projection and data type are correct There are no overshoots, i.e. arc overhangs at intersections (1% error acceptable) There are no undershoots, i.e. arcs failing to meet at intersections (0.5% error acceptable) There are no new polygons smaller than the minimum specified area (5% error acceptable) There are no new linear features shorter than the minimum length (5% error acceptable) There are no artefacts such as spikes or deviations visible at 1:125 000 (5% error acceptable) Separate covers have exactly coincident lines where intended (5% error acceptable) Completeness omission: The database is continually being updated as the lists of threatened ecological communities on schedules of the EPBC Act are amended. The Species of National Environmental Significance database is available at https://www.environment.gov.au/science/erin/databases-maps/snes Dataset History This dataset and its metadata statement were supplied to the Bioregional Assessment Programme by a third party and are presented here as originally supplied. The Spatial information is stored in a geographic information system and links to the Species Profile tables through the community identifier. Source data were provided from a range of government, industry and non-government organisations. Testing is carried out using a combination of expert opinion and on-screen checks. Dataset Citation Department of the Environment (2015) Communities of National Environmental Significance Database - RESTRICTED - Metadata only. Bioregional Assessment Source Dataset. Viewed 13 March 2019, http://data.bioregionalassessments.gov.au/dataset/c01c4693-0a51-4dbc-bbbd-7a07952aa5f6.

  9. d

    Environmental Monitoring Results for Radioactivity: Water Samples

    • catalog.data.gov
    • data.ct.gov
    Updated Jan 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.ct.gov (2025). Environmental Monitoring Results for Radioactivity: Water Samples [Dataset]. https://catalog.data.gov/dataset/environmental-monitoring-results-for-radioactivity-water-samples
    Explore at:
    Dataset updated
    Jan 10, 2025
    Dataset provided by
    data.ct.gov
    Description

    Reporting units of sample results [where 1 picoCurie (pCi) = 1 trillionth (1E-12) Curie (Ci)]: • Water Samples are reported in pCi/L. Data Quality Disclaimer: This database is for informational use and is not a controlled quality database. Efforts have been made to ensure accuracy of data in the database; however, errors and omissions may occur. Examples of potential errors include: • Data entry errors. • Lab results not reported for entry into the database. • Missing results due to equipment failure or unable to retrieve samples due to lost or environmental hazards. • Translation errors – the data has been migrated to newer data platforms numerous times, and each time there have been errors and data losses. Error Results are the calculated uncertainty for the sample measurement results and are reported as (+/-). Environmental Sample Records are from the year 1998 until present. Prior to 1998 results were stored in hardcopy, in a non-database format. Requests for results from samples taken prior to 1998 or results subject to quality assurance are available from archived records and can be made through the DEEP Freedom of Information Act (FOIA) administrator at deep.foia@ct.gov. Information on FOIA requests can be found on the DEEP website. FOIA Administrator Office of the Commissioner Department of Energy and Environmental Protection 79 Elm Street, 3rd Floor Hartford, CT 06106

  10. Small Business Contact Data | North American Small Business Owners |...

    • datarade.ai
    Updated Oct 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Success.ai (2021). Small Business Contact Data | North American Small Business Owners | Verified Contact Details from 170M Profiles | Best Price Guaranteed [Dataset]. https://datarade.ai/data-products/small-business-contact-data-north-american-small-business-o-success-ai
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Oct 27, 2021
    Dataset provided by
    Area covered
    Belize, Mexico, Bermuda, United States of America, Panama, Honduras, Saint Pierre and Miquelon, Costa Rica, Guatemala, Greenland
    Description

    Access B2B Contact Data for North American Small Business Owners with Success.ai—your go-to provider for verified, high-quality business datasets. This dataset is tailored for businesses, agencies, and professionals seeking direct access to decision-makers within the small business ecosystem across North America. With over 170 million professional profiles, it’s an unparalleled resource for powering your marketing, sales, and lead generation efforts.

    Key Features of the Dataset:

    Verified Contact Details

    Includes accurate and up-to-date email addresses and phone numbers to ensure you reach your targets reliably.

    AI-validated for 99% accuracy, eliminating errors and reducing wasted efforts.

    Detailed Professional Insights

    Comprehensive data points include job titles, skills, work experience, and education to enable precise segmentation and targeting.

    Enriched with insights into decision-making roles, helping you connect directly with small business owners, CEOs, and other key stakeholders.

    Business-Specific Information

    Covers essential details such as industry, company size, location, and more, enabling you to tailor your campaigns effectively. Ideal for profiling and understanding the unique needs of small businesses.

    Continuously Updated Data

    Our dataset is maintained and updated regularly to ensure relevance and accuracy in fast-changing market conditions. New business contacts are added frequently, helping you stay ahead of the competition.

    Why Choose Success.ai?

    At Success.ai, we understand the critical importance of high-quality data for your business success. Here’s why our dataset stands out:

    Tailored for Small Business Engagement Focused specifically on North American small business owners, this dataset is an invaluable resource for building relationships with SMEs (Small and Medium Enterprises). Whether you’re targeting startups, local businesses, or established small enterprises, our dataset has you covered.

    Comprehensive Coverage Across North America Spanning the United States, Canada, and Mexico, our dataset ensures wide-reaching access to verified small business contacts in the region.

    Categories Tailored to Your Needs Includes highly relevant categories such as Small Business Contact Data, CEO Contact Data, B2B Contact Data, and Email Address Data to match your marketing and sales strategies.

    Customizable and Flexible Choose from a wide range of filtering options to create datasets that meet your exact specifications, including filtering by industry, company size, geographic location, and more.

    Best Price Guaranteed We pride ourselves on offering the most competitive rates without compromising on quality. When you partner with Success.ai, you receive superior data at the best value.

    Seamless Integration Delivered in formats that integrate effortlessly with your CRM, marketing automation, or sales platforms, so you can start acting on the data immediately.

    Use Cases: This dataset empowers you to:

    Drive Sales Growth: Build and refine your sales pipeline by connecting directly with decision-makers in small businesses. Optimize Marketing Campaigns: Launch highly targeted email and phone outreach campaigns with verified contact data. Expand Your Network: Leverage the dataset to build relationships with small business owners and other key figures within the B2B landscape. Improve Data Accuracy: Enhance your existing databases with verified, enriched contact information, reducing bounce rates and increasing ROI. Industries Served: Whether you're in B2B SaaS, digital marketing, consulting, or any field requiring accurate and targeted contact data, this dataset serves industries of all kinds. It is especially useful for professionals focused on:

    Lead Generation Business Development Market Research Sales Outreach Customer Acquisition What’s Included in the Dataset: Each profile provides:

    Full Name Verified Email Address Phone Number (where available) Job Title Company Name Industry Company Size Location Skills and Professional Experience Education Background With over 170 million profiles, you can tap into a wealth of opportunities to expand your reach and grow your business.

    Why High-Quality Contact Data Matters: Accurate, verified contact data is the foundation of any successful B2B strategy. Reaching small business owners and decision-makers directly ensures your message lands where it matters most, reducing costs and improving the effectiveness of your campaigns. By choosing Success.ai, you ensure that every contact in your pipeline is a genuine opportunity.

    Partner with Success.ai for Better Data, Better Results: Success.ai is committed to delivering premium-quality B2B data solutions at scale. With our small business owner dataset, you can unlock the potential of North America's dynamic small business market.

    Get Started Today Request a sample or customize your dataset to fit your unique...

  11. F

    Bahasa Open Ended Question Answer Text Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Bahasa Open Ended Question Answer Text Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/bahasa-open-ended-question-answer-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/data-license-agreementhttps://www.futurebeeai.com/data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    The Bahasa Open-Ended Question Answering Dataset is a meticulously curated collection of comprehensive Question-Answer pairs. It serves as a valuable resource for training Large Language Models (LLMs) and Question-answering models in the Bahasa language, advancing the field of artificial intelligence.

    Dataset Content: This QA dataset comprises a diverse set of open-ended questions paired with corresponding answers in Bahasa. There is no context paragraph given to choose an answer from, and each question is answered without any predefined context content. The questions cover a broad range of topics, including science, history, technology, geography, literature, current affairs, and more.

    Each question is accompanied by an answer, providing valuable information and insights to enhance the language model training process. Both the questions and answers were manually curated by native Bahasa people, and references were taken from diverse sources like books, news articles, websites, and other reliable references.

    This question-answer prompt completion dataset contains different types of prompts, including instruction type, continuation type, and in-context learning (zero-shot, few-shot) type. The dataset also contains questions and answers with different types of rich text, including tables, code, JSON, etc., with proper markdown.

    Question Diversity: To ensure diversity, this Q&A dataset includes questions with varying complexity levels, ranging from easy to medium and hard. Different types of questions, such as multiple-choice, direct, and true/false, are included. Additionally, questions are further classified into fact-based and opinion-based categories, creating a comprehensive variety. The QA dataset also contains the question with constraints and persona restrictions, which makes it even more useful for LLM training.Answer Formats: To accommodate varied learning experiences, the dataset incorporates different types of answer formats. These formats include single-word, short phrases, single sentences, and paragraph types of answers. The answer contains text strings, numerical values, date and time formats as well. Such diversity strengthens the Language model's ability to generate coherent and contextually appropriate answers.Data Format and Annotation Details: This fully labeled Bahasa Open Ended Question Answer Dataset is available in JSON and CSV formats. It includes annotation details such as id, language, domain, question_length, prompt_type, question_category, question_type, complexity, answer_type, rich_text.Quality and Accuracy: The dataset upholds the highest standards of quality and accuracy. Each question undergoes careful validation, and the corresponding answers are thoroughly verified. To prioritize inclusivity, the dataset incorporates questions and answers representing diverse perspectives and writing styles, ensuring it remains unbiased and avoids perpetuating discrimination.

    Both the question and answers in Bahasa are grammatically accurate without any word or grammatical errors. No copyrighted, toxic, or harmful content is used while building this dataset.

    Continuous Updates and Customization: The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Continuous efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to collect custom question-answer data tailored to specific needs, providing flexibility and customization options.License: The dataset, created by FutureBeeAI, is now ready for commercial use. Researchers, data scientists, and developers can utilize this fully labeled and ready-to-deploy Bahasa Open Ended Question Answer Dataset to enhance the language understanding capabilities of their generative ai models, improve response generation, and explore new approaches to NLP question-answering tasks.

  12. d

    Autoscraping | Mexico Real Estate Data | 150K+ Listings from 5 Platforms...

    • datarade.ai
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AutoScraping, Autoscraping | Mexico Real Estate Data | 150K+ Listings from 5 Platforms with Pricing & Amenities [Dataset]. https://datarade.ai/data-products/autoscraping-s-mexico-real-estate-data-150k-property-listin-autoscraping
    Explore at:
    .json, .xml, .csv, .xlsAvailable download formats
    Dataset authored and provided by
    AutoScraping
    Area covered
    Mexico
    Description

    What Makes Our Data Unique?

    Autoscraping’s Mexico Real Estate Listings Data is an invaluable resource for anyone seeking in-depth, reliable, and up-to-date information on the Mexican property market. What sets this dataset apart is its breadth and depth, covering over 150,000 property listings from four of the most reputable real estate platforms in Mexico: Propiedades.com, Lamudi, ValoresAMPI, REMAX, and Century21. These platforms are trusted sources of real estate data, ensuring that our dataset is both comprehensive and of the highest quality.

    Our data is distinguished by its extensive detail and accuracy. Each listing includes a wide range of attributes, such as property type, location (including geolocation data with latitude and longitude), pricing, surface area (built and terrain), number of bedrooms and bathrooms, amenities (such as balconies, swimming pools, parking spaces), and much more. The data is continually updated to reflect the latest market conditions, including price changes and property status updates.

    Additionally, our dataset captures rich metadata from each listing, including the seller’s information (contact details like phone numbers and emails), publication dates, and URLs linking back to the original listings. This level of detail makes our dataset a powerful tool for conducting granular analysis and making informed decisions.

    How is the Data Generally Sourced?

    The data is sourced from four of Mexico’s leading real estate platforms: Propiedades.com, Lamudi, ValoresAMPI, REMAX, and Century21. Our robust web scraping technology is designed to extract every relevant detail from these platforms efficiently and accurately. We employ advanced scraping techniques that allow us to capture comprehensive data across all major property types, including residential, commercial, and land listings.

    The scraping process is automated and conducted at regular intervals to ensure that the data remains current and reflects real-time changes in the market. Each listing undergoes rigorous data cleaning and validation processes to remove duplicates, correct inconsistencies, and ensure the highest possible data quality. The result is a dataset that users can trust to be accurate, up-to-date, and reflective of the actual market conditions.

    Primary Use-Cases and Verticals

    This Mexico Real Estate Listings Data Product serves a wide range of use cases across various verticals, making it a versatile resource for professionals in different fields:

    Real Estate Investment and Analysis: Investors and analysts can use this dataset to identify profitable investment opportunities by analyzing property prices, market trends, and location-based attributes. The detailed metadata, combined with historical pricing information and geolocation data, provides a solid foundation for making informed investment decisions.

    Market Research and Trends Analysis: Researchers and market analysts can leverage this data to track and analyze real estate trends across Mexico. The dataset’s comprehensive coverage allows for detailed segmentation by property type, location, price range, and more, enabling users to gain deep insights into market dynamics and consumer behavior.

    Urban Planning and Development: Government bodies, urban planners, and developers can utilize this dataset to assess the current state of the real estate market in various regions of Mexico. The geolocation data is particularly valuable for spatial analysis, helping planners understand urban sprawl, housing density, and infrastructure needs.

    Real Estate Marketing and Lead Generation: Real estate agencies, marketers, and brokers can use this data to generate leads and tailor their marketing strategies. The inclusion of contact details, such as phone numbers and emails, makes it easier for these professionals to connect with potential buyers and sellers directly, enhancing their ability to close deals.

    Location-Based Services and Applications: Companies that offer location-based services or applications can integrate this data to provide users with precise and relevant property information. The high-precision geolocation data allows for accurate mapping and location analysis, adding significant value to location-based tools and platforms.

    How Does This Data Product Fit into Our Broader Data Offering?

    AUTOScraping’s Mexico Real Estate Listings Data is a key component of our extensive data offering, which spans multiple industries and geographies. This dataset complements our broader portfolio of real estate data products, including those covering the U.S., Europe, and other Latin American countries. By integrating this dataset with our other offerings, users can gain a comprehensive understanding of the global real estate market, allowing for cross-regional comparisons and insights.

    In addition to real estate, our broader data offering includes datasets for financial services, consumer behavior, geospatial analysis, an...

  13. s

    Replication Data for: Election Polling Errors across Time and Space

    • eprints.soton.ac.uk
    • dataverse.harvard.edu
    Updated May 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jennings, William; Wlezien, Christopher (2023). Replication Data for: Election Polling Errors across Time and Space [Dataset]. http://doi.org/10.7910/dvn/8421dx
    Explore at:
    Dataset updated
    May 6, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Jennings, William; Wlezien, Christopher
    Description

    Replication data for an over-time and cross-national assessment of the accuracy of pre-election polls.

  14. Company Financial Data | Private & Public Companies | Verified Profiles &...

    • datarade.ai
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Success.ai, Company Financial Data | Private & Public Companies | Verified Profiles & Contact Data | Best Price Guaranteed [Dataset]. https://datarade.ai/data-products/b2b-contact-data-premium-us-contact-data-us-b2b-contact-d-success-ai
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset provided by
    Area covered
    Suriname, United Kingdom, Iceland, Georgia, Antigua and Barbuda, Togo, Montserrat, Guam, Korea (Democratic People's Republic of), Dominican Republic
    Description

    Success.ai offers a cutting-edge solution for businesses and organizations seeking Company Financial Data on private and public companies. Our comprehensive database is meticulously crafted to provide verified profiles, including contact details for financial decision-makers such as CFOs, financial analysts, corporate treasurers, and other key stakeholders. This robust dataset is continuously updated and validated using AI technology to ensure accuracy and relevance, empowering businesses to make informed decisions and optimize their financial strategies.

    Key Features of Success.ai's Company Financial Data:

    Global Coverage: Access data from over 70 million businesses worldwide, including public and private companies across all major industries and regions. Our datasets span 250+ countries, offering extensive reach for your financial analysis and market research.

    Detailed Financial Profiles: Gain insights into company financials, including revenue, profit margins, funding rounds, and operational costs. Profiles are enriched with key contact details, including work emails, phone numbers, and physical addresses, ensuring direct access to decision-makers.

    Industry-Specific Data: Tailored datasets for sectors such as financial services, manufacturing, technology, healthcare, and energy, among others. Each dataset is customized to meet the unique needs of industry professionals and analysts.

    Real-Time Accuracy: With continuous updates powered by AI-driven validation, our financial data maintains a 99% accuracy rate, ensuring you have access to the most reliable and up-to-date information available.

    Compliance and Security: All data is collected and processed in strict adherence to global compliance standards, including GDPR, ensuring ethical and lawful usage.

    Why Choose Success.ai for Company Financial Data?

    Best Price Guarantee: We pride ourselves on offering the most competitive pricing in the industry, ensuring you receive unparalleled value for comprehensive financial data.

    AI-Validated Accuracy: Our advanced AI algorithms meticulously verify every data point to ensure precision and reliability, helping you avoid costly errors in your financial decision-making.

    Customized Data Solutions: Whether you need data for a specific region, industry, or type of business, we tailor our datasets to align perfectly with your requirements.

    Scalable Data Access: From small startups to global enterprises, our platform caters to businesses of all sizes, delivering scalable solutions to suit your operational needs.

    Comprehensive Use Cases for Financial Data:

    1. Strategic Financial Planning:

    Leverage our detailed financial profiles to create accurate budgets, forecasts, and strategic plans. Gain insights into competitors’ financial health and market positions to make data-driven decisions.

    1. Mergers and Acquisitions (M&A):

    Access key financial details and contact information to streamline your M&A processes. Identify potential acquisition targets or partners with verified profiles and financial data.

    1. Investment Analysis:

    Evaluate the financial performance of public and private companies for informed investment decisions. Use our data to identify growth opportunities and assess risk factors.

    1. Lead Generation and Sales:

    Enhance your sales outreach by targeting CFOs, financial analysts, and other decision-makers with verified contact details. Utilize accurate email and phone data to increase conversion rates.

    1. Market Research:

    Understand market trends and financial benchmarks with our industry-specific datasets. Use the data for competitive analysis, benchmarking, and identifying market gaps.

    APIs to Power Your Financial Strategies:

    Enrichment API: Integrate real-time updates into your systems with our Enrichment API. Keep your financial data accurate and current to drive dynamic decision-making and maintain a competitive edge.

    Lead Generation API: Supercharge your lead generation efforts with access to verified contact details for key financial decision-makers. Perfect for personalized outreach and targeted campaigns.

    Tailored Solutions for Industry Professionals:

    Financial Services Firms: Gain detailed insights into revenue streams, funding rounds, and operational costs for competitor analysis and client acquisition.

    Corporate Finance Teams: Enhance decision-making with precise data on industry trends and benchmarks.

    Consulting Firms: Deliver informed recommendations to clients with access to detailed financial datasets and key stakeholder profiles.

    Investment Firms: Identify potential investment opportunities with verified data on financial performance and market positioning.

    What Sets Success.ai Apart?

    Extensive Database: Access detailed financial data for 70M+ companies worldwide, including small businesses, startups, and large corporations.

    Ethical Practices: Our data collection and processing methods are fully comp...

  15. f

    Table 7 -

    • plos.figshare.com
    xls
    Updated Dec 31, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hyunsoo Yoon; Todd J. Schwedt; Catherine D. Chong; Oyekanmi Olatunde; Teresa Wu (2024). Table 7 - [Dataset]. http://doi.org/10.1371/journal.pone.0288300.t007
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 31, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Hyunsoo Yoon; Todd J. Schwedt; Catherine D. Chong; Oyekanmi Olatunde; Teresa Wu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Multicenter and multi-scanner imaging studies may be necessary to ensure sufficiently large sample sizes for developing accurate predictive models. However, multicenter studies, incorporating varying research participant characteristics, MRI scanners, and imaging acquisition protocols, may introduce confounding factors, potentially hindering the creation of generalizable machine learning models. Models developed using one dataset may not readily apply to another, emphasizing the importance of classification model generalizability in multi-scanner and multicenter studies for producing reproducible results. This study focuses on enhancing generalizability in classifying individual migraine patients and healthy controls using brain MRI data through a data harmonization strategy. We propose identifying a ’healthy core’—a group of homogeneous healthy controls with similar characteristics—from multicenter studies. The Maximum Mean Discrepancy (MMD) in Geodesic Flow Kernel (GFK) space is employed to compare two datasets, capturing data variabilities and facilitating the identification of this ‘healthy core’. Homogeneous healthy controls play a vital role in mitigating unwanted heterogeneity, enabling the development of highly accurate classification models with improved performance on new datasets. Extensive experimental results underscore the benefits of leveraging a ’healthy core’. We utilized two datasets: one comprising 120 individuals (66 with migraine and 54 healthy controls), and another comprising 76 individuals (34 with migraine and 42 healthy controls). Notably, a homogeneous dataset derived from a cohort of healthy controls yielded a significant 25% accuracy improvement for both episodic and chronic migraineurs.

  16. u

    Data from: DIPSER: A Dataset for In-Person Student Engagement Recognition in...

    • observatorio-cientifico.ua.es
    • scidb.cn
    Updated 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Márquez-Carpintero, Luis; Suescun-Ferrandiz, Sergio; Álvarez, Carolina Lorenzo; Fernandez-Herrero, Jorge; Viejo, Diego; Rosabel Roig-Vila; Cazorla, Miguel; Márquez-Carpintero, Luis; Suescun-Ferrandiz, Sergio; Álvarez, Carolina Lorenzo; Fernandez-Herrero, Jorge; Viejo, Diego; Rosabel Roig-Vila; Cazorla, Miguel (2025). DIPSER: A Dataset for In-Person Student Engagement Recognition in the Wild [Dataset]. https://observatorio-cientifico.ua.es/documentos/67321d21aea56d4af0484172
    Explore at:
    Dataset updated
    2025
    Authors
    Márquez-Carpintero, Luis; Suescun-Ferrandiz, Sergio; Álvarez, Carolina Lorenzo; Fernandez-Herrero, Jorge; Viejo, Diego; Rosabel Roig-Vila; Cazorla, Miguel; Márquez-Carpintero, Luis; Suescun-Ferrandiz, Sergio; Álvarez, Carolina Lorenzo; Fernandez-Herrero, Jorge; Viejo, Diego; Rosabel Roig-Vila; Cazorla, Miguel
    Description

    Data DescriptionThe DIPSER dataset is designed to assess student attention and emotion in in-person classroom settings, consisting of RGB camera data, smartwatch sensor data, and labeled attention and emotion metrics. It includes multiple camera angles per student to capture posture and facial expressions, complemented by smartwatch data for inertial and biometric metrics. Attention and emotion labels are derived from self-reports and expert evaluations. The dataset includes diverse demographic groups, with data collected in real-world classroom environments, facilitating the training of machine learning models for predicting attention and correlating it with emotional states.Data Collection and Generation ProceduresThe dataset was collected in a natural classroom environment at the University of Alicante, Spain. The recording setup consisted of six general cameras positioned to capture the overall classroom context and individual cameras placed at each student’s desk. Additionally, smartwatches were used to collect biometric data, such as heart rate, accelerometer, and gyroscope readings.Experimental SessionsNine distinct educational activities were designed to ensure a comprehensive range of engagement scenarios:News Reading – Students read projected or device-displayed news.Brainstorming Session – Idea generation for problem-solving.Lecture – Passive listening to an instructor-led session.Information Organization – Synthesizing information from different sources.Lecture Test – Assessment of lecture content via mobile devices.Individual Presentations – Students present their projects.Knowledge Test – Conducted using Kahoot.Robotics Experimentation – Hands-on session with robotics.MTINY Activity Design – Development of educational activities with computational thinking.Technical SpecificationsRGB Cameras: Individual cameras recorded at 640×480 pixels, while context cameras captured at 1280×720 pixels.Frame Rate: 9-10 FPS depending on the setup.Smartwatch Sensors: Collected heart rate, accelerometer, gyroscope, rotation vector, and light sensor data at a frequency of 1–100 Hz.Data Organization and FormatsThe dataset follows a structured directory format:/groupX/experimentY/subjectZ.zip Each subject-specific folder contains:images/ (individual facial images)watch_sensors/ (sensor readings in JSON format)labels/ (engagement & emotion annotations)metadata/ (subject demographics & session details)Annotations and LabelingEach data entry includes engagement levels (1-5) and emotional states (9 categories) based on both self-reported labels and evaluations by four independent experts. A custom annotation tool was developed to ensure consistency across evaluations.Missing Data and Data QualitySynchronization: A centralized server ensured time alignment across devices. Brightness changes were used to verify synchronization.Completeness: No major missing data, except for occasional random frame drops due to embedded device performance.Data Consistency: Uniform collection methodology across sessions, ensuring high reliability.Data Processing MethodsTo enhance usability, the dataset includes preprocessed bounding boxes for face, body, and hands, along with gaze estimation and head pose annotations. These were generated using YOLO, MediaPipe, and DeepFace.File Formats and AccessibilityImages: Stored in standard JPEG format.Sensor Data: Provided as structured JSON files.Labels: Available as CSV files with timestamps.The dataset is publicly available under the CC-BY license and can be accessed along with the necessary processing scripts via the DIPSER GitHub repository.Potential Errors and LimitationsDue to camera angles, some student movements may be out of frame in collaborative sessions.Lighting conditions vary slightly across experiments.Sensor latency variations are minimal but exist due to embedded device constraints.CitationIf you find this project helpful for your research, please cite our work using the following bibtex entry:@misc{marquezcarpintero2025dipserdatasetinpersonstudent1, title={DIPSER: A Dataset for In-Person Student1 Engagement Recognition in the Wild}, author={Luis Marquez-Carpintero and Sergio Suescun-Ferrandiz and Carolina Lorenzo Álvarez and Jorge Fernandez-Herrero and Diego Viejo and Rosabel Roig-Vila and Miguel Cazorla}, year={2025}, eprint={2502.20209}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2502.20209}, } Usage and ReproducibilityResearchers can utilize standard tools like OpenCV, TensorFlow, and PyTorch for analysis. The dataset supports research in machine learning, affective computing, and education analytics, offering a unique resource for engagement and attention studies in real-world classroom environments.

  17. F

    Portuguese Chain of Thought Prompt & Response Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Portuguese Chain of Thought Prompt & Response Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/portuguese-chain-of-thought-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/data-license-agreementhttps://www.futurebeeai.com/data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Welcome to the Portuguese Chain of Thought prompt-response dataset, a meticulously curated collection containing 3000 comprehensive prompt and response pairs. This dataset is an invaluable resource for training Language Models (LMs) to generate well-reasoned answers and minimize inaccuracies. Its primary utility lies in enhancing LLMs' reasoning skills for solving arithmetic, common sense, symbolic reasoning, and complex problems. Dataset Content: This COT dataset comprises a diverse set of instructions and questions paired with corresponding answers and rationales in the Portuguese language. These prompts and completions cover a broad range of topics and questions, including mathematical concepts, common sense reasoning, complex problem-solving, scientific inquiries, puzzles, and more. Each prompt is meticulously accompanied by a response and rationale, providing essential information and insights to enhance the language model training process. These prompts, completions, and rationales were manually curated by native Portuguese people, drawing references from various sources, including open-source datasets, news articles, websites, and other reliable references. Our chain-of-thought prompt-completion dataset includes various prompt types, such as instructional prompts, continuations, and in-context learning (zero-shot, few-shot) prompts. Additionally, the dataset contains prompts and completions enriched with various forms of rich text, such as lists, tables, code snippets, JSON, and more, with proper markdown format. Prompt Diversity: To ensure a wide-ranging dataset, we have included prompts from a plethora of topics related to mathematics, common sense reasoning, and symbolic reasoning. These topics encompass arithmetic, percentages, ratios, geometry, analogies, spatial reasoning, temporal reasoning, logic puzzles, patterns, and sequences, among others. These prompts vary in complexity, spanning easy, medium, and hard levels. Various question types are included, such as multiple-choice, direct queries, and true/false assessments. Response Formats: To accommodate diverse learning experiences, our dataset incorporates different types of answers depending on the prompt and provides step-by-step rationales. The detailed rationale aids the language model in building reasoning process for complex questions. These responses encompass text strings, numerical values, and date and time formats, enhancing the language model's ability to generate reliable, coherent, and contextually appropriate answers. Data Format and Annotation Details: This fully labeled Portuguese Chain of Thought Prompt Completion Dataset is available in JSON and CSV formats. It includes annotation details such as a unique ID, prompt, prompt type, prompt complexity, prompt category, domain, response, rationale, response type, and rich text presence. Quality and Accuracy: Our dataset upholds the highest standards of quality and accuracy. Each prompt undergoes meticulous validation, and the corresponding responses and rationales are thoroughly verified. We prioritize inclusivity, ensuring that the dataset incorporates prompts and completions representing diverse perspectives and writing styles, maintaining an unbiased and discrimination-free stance. The Portuguese version is grammatically accurate without any spelling or grammatical errors. No copyrighted, toxic, or harmful content is used during the construction of this dataset. Continuous Updates and Customization: The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Ongoing efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to gather custom chain of thought prompt completion data tailored to specific needs, providing flexibility and customization options. License: The dataset, created by FutureBeeAI, is now available for commercial use. Researchers, data scientists, and developers can leverage this fully labeled and ready-to-deploy Portuguese Chain of Thought Prompt Completion Dataset to enhance the rationale and accurate response generation capabilities of their generative AI models and explore new approaches to NLP tasks.

  18. d

    Tree Canopy 2022

    • catalog.data.gov
    • s.cnmilf.com
    Updated Mar 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.austintexas.gov (2025). Tree Canopy 2022 [Dataset]. https://catalog.data.gov/dataset/tree-canopy-2022
    Explore at:
    Dataset updated
    Mar 25, 2025
    Dataset provided by
    data.austintexas.gov
    Description

    City of Austin Open Data Terms of Use https://data.austintexas.gov/stories/s/ranj-cccq This dataset was created to depict approximate tree canopy cover for all land within the City of Austin's "full watershed regulation area." Intended for planning purposes and measuring citywide percent canopy. Definition: Tree canopy is defined as the layer of leaves, branches, and stems of trees that cover the ground when viewed from above. Methods: The 2022 tree canopy layer was derived from satellite imagery (Maxar) and aerial imagery (NAIP). Images were used to extract tree canopy into GIS vector features. First, a “visual recognition engine” generated the vector features. The engine used machine learning algorithms to detect and label image pixels as tree canopy. Then using prior knowledge of feature geometries, more modeling algorithms were used to predict and transform probability maps of labeled pixels into finished vector polygons depicting tree canopy. The resulting features were reviewed and edited through manual interpretation by GIS professionals. When appropriate, NAIP 2022 aerial imagery supplemented satellite images that had cloud cover, and a manual editing process made sure tree canopy represented 2022 conditions. Finally, an independent accuracy assessment was performed by the City of Austin and the Texas A&M Forest Service for quality assurance. GIS professionals assessed agreement between the tree canopy data and its source satellite imagery. An overall accuracy of 98% was found. Only 23 errors were found out of a total 1,000 locations reviewed. These were mostly omission errors (e.g. not including canopy in this dataset when canopy is shown in the satellite or aerial image). Best efforts were made to ensure ground-truth locations contained a tree on the ground. To ensure this, location data were used from City of Austin and Texas A&M Forest Service databases. Analysis: The City of Austin measures tree canopy using the calculation: acres of tree canopy divided by acres of land. The area of interest for the land acres is evaluated at the City of Austin's jurisdiction including Full Purpose, Limited Purpose, and Extraterritorial jurisdictions as of May 2023. New data show, in 2022, tree canopy covered 41% of the total land area within Austin's city limits (using city limit boundaries May 2023 and included in the download as layer name "city_of_austin_2023"). 160,046.50 canopy acres (2022) / 395,037.53 land acres = 40.51% ~41%. This compares to 36% last measured in 2018, and a historical average that’s also hovered around 36%. The time period between 2018 and 2022 saw a 5 percentage point change resulting in over 19K acres of canopy gained (estimated). Data Disclaimer: It's possible changes in percent canopy over the years is due to annexation and improved data methods (e.g. higher resolution imagery, AI, software used, etc.) in addition to actual in changes in tree canopy cover on the ground. For planning purposes only. Dataset does not account for individual trees, tree species nor any metric for tree canopy height. Tree canopy data is provided in vector GIS format housed in a Geodatabase. Download and unzip the folder to get started. Please note, errors may exist in this dataset due to the variation in species composition and land use found across the study area. This product is for informational purposes and may not have been prepared for or be suitable for legal, engineering, or surveying purposes. It does not represent an on-the-ground survey and represents only the approximate relative location of property boundaries. This product has been produced by the City of Austin for the sole purpose of geographic reference. No warranty is made by the City of Austin regarding specific accuracy or completeness. Data Provider: Ecopia AI Tech Corporation and PlanIT Geo, Inc. Data derived from Maxar Technologies, Inc. and USDA NAIP imagery

  19. f

    Accuracy data (percentage of errors and no responses) in the MI, MC, and MN...

    • figshare.com
    • plos.figshare.com
    xls
    Updated Jun 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julien Grandjean; Kevin D’Ostilio; Christophe Phillips; Evelyne Balteau; Christian Degueldre; André Luxen; Pierre Maquet; Eric Salmon; Fabienne Collette (2023). Accuracy data (percentage of errors and no responses) in the MI, MC, and MN contexts for incongruent, congruent and neutral items. [Dataset]. http://doi.org/10.1371/journal.pone.0041513.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 10, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Julien Grandjean; Kevin D’Ostilio; Christophe Phillips; Evelyne Balteau; Christian Degueldre; André Luxen; Pierre Maquet; Eric Salmon; Fabienne Collette
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Note: Numbers in parentheses correspond to standard deviations.

  20. Virginia Springs/Groundwater Layers - 2023

    • data.virginia.gov
    • opendata.winchesterva.gov
    • +3more
    Updated Oct 23, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Virginia Department of Environmental Quality (2024). Virginia Springs/Groundwater Layers - 2023 [Dataset]. https://data.virginia.gov/dataset/virginia-springs-groundwater-layers-2023
    Explore at:
    html, arcgis geoservices rest apiAvailable download formats
    Dataset updated
    Oct 23, 2024
    Dataset authored and provided by
    Virginia Department of Environmental Qualityhttps://deq.virginia.gov/
    Area covered
    Hot Springs
    Description
    The VDEQ Spring SITES database contains data describing the geographic locations and site attributes of natural springs throughout the commonwealth. This data coverage continues to evolve and contains only spring locations known to exist with a reasonable degree of certainty on the date of publication. The dataset does not replace site specific inventorying or receptor surveys but can be used as a starting point. VDEQ's initial geospatial dataset of approximately 325 springs was formed in 2008 by digitizing historical spring information sheets created by State Water Control Board geologists in the 1970s through early 1990s. Additional data has been consolidated from the EPA STORET database, the U.S. Geological Survey's Ground Water Site Inventory (GWSI) and Geographic Names Inventory System (GNIS), the Virginia Department of Health SDWIS database, the Virginia DEQ Virginia Water Use Data Set (VWUDS), the Commonwealth of Virginia Division of Water Resources and Power Bulletin No. 1: "Springs of Virginia" by Collins et al., 1930 as well as several VDWR&P Surface Water Supply bulletins from the 1940's - 1950's. A 1992 Virginia Department of Game and Inland Fisheries / Virginia Tech sponsored study by Helfrich et al. titled "Evaluation of the Natural Springs of Virginia: Fisheries Management Implications", a 2004 Rockbridge County groundwater resources report written by Frits van der Leeden, and several smaller datasets from consultants and citizens were evaluated and added to the database when confidence in locational accuracy was high or could be verified with aerial or LIDAR imagery. Significant contributions have been made throughout the years by VDEQ Groundwater Characterization staff site visits as well as other geologists working in the region including: Matt Heller at Virginia Division of Geology and Mineral Resources (VDMME), Wil Orndorff at the Virginia Department of Conservation and Recreation Karst Program (VDCR), and David Nelms and Dan Doctor of the U.S. Geological Survey (USGS). Substantial effort has been made to improve locational accuracy and remove duplication present between data sources. Hundreds of spring locations that were originally obtained using topographic maps or unknown methods were updated to sub-meter locational accuracy using post-processed differential GPS (PPGPS) and through the use of several generations of aerial imagery (2002-2017) obtained from Virginia's Geographic Information Network (VGIN) and 1-meter LIDAR, where available. Scores of new spring locations were also obtained by systematic quadrangle by quadrangle analysis in areas of the Shenandoah Valley where 1-meter LIDAR datasets where obtained from the U.S. Geological Survey. Future improvements to the dataset will result when statewide 1-meter LIDAR datasets becomes available and through continued field work by DEQ staff and other contributors working in the region. Please do not hesitate to contact the author to correct mistakes or to contribute to the database.

    The VDEQ Spring FIELD MEASUREMENTS database contains data describing field derived physio-chemical properties of spring discharges measured throughout the Commonwealth of Virginia. Field visits compiled in this dataset were performed from 1928 to 2019 by geologists with the State Water Control Board, the Virginia Division of Water and Power, the Virginia Department of Environmental Quality, and the U.S. Geological Survey with contributions from other sources as noted. Values of -9999 indicate that measurements were not performed for the referenced parameter. Please do not hesitate to contact the author to add data to the database or correct errors.


    The VDEQ_Spring_WQ database is a geodatabase containing groundwater sample information collected from springs throughout Virginia. Sample specific information include: location and site information, measured field parameters, and lab verified quantifications of major ionic concentrations, trace element concentrations, nutrient concentrations, and radiological data. The VDEQ_Spring_WQ database is a subset of the VDEQ GWCHEM database which is a flat-file geodatabase containing groundwater sample information from groundwater wells and springs throughout Virginia. Sample information has been correlated via DEQ Well # and projected using coordinates in VDEQ_Spring_SITES database. The GWCHEM database is comprised of historic groundwater sample data originally archived in the United States Geological Survey (USGS) National Water Information System (NWIS) and the Environmental Protection Agency (EPA) Storage and Retrieval (STORET) data warehouse. Archived STORET data originated as groundwater sample data collected and uploaded by Virginia State Water Control Board Personnel. While groundwater sample data in the STORET data warehouse are static, new groundwater sample data are periodically uploaded to NWIS and spring laboratory WQ data reflect NWIS downloaded on 9/30/2019. Recent groundwater sample data collected by Virginia Department of Environmental Quality (DEQ) personnel as part of the Ambient Groundwater Sampling Program are entered into the database as lab results are made available by the Division of Consolidated Laboratory Services (DCLS). When possible, charge balances were calculated for samples with reported values for major ions including (at a minimum) calcium, magnesium, potassium, sodium, bicarbonate, chloride, and sulfate. Reported values for Nitrate as N, carbonate, and fluoride were included in the charge balance calculation when available. Field determined values for bicarbonate and carbonate were used in the charge balance calculation when available. For much of the legacy DEQ groundwater sample data, bicarbonate values were derived from lab reported values of alkalinity (as mg/CaCO3) under the assumption that there was no contribution by carbonate to the reported alkalinity value. Charge balance values are reported in the "Charge Balance" column of the GWCHEM geodatabase. The closer the charge balance value is to unity (1), the lower the assumed charge balance error.In order to preserve the numerical capabilities of the database, non- numeric lab qualifiers were given the following numeric identifiers:- (minus sign) = less than the concentration specified to the right of the sign-11110 = estimated-22220 = presence verified but not quantified-33330 = radchem non-detect, below sslc-4440 = analyzed for but not detected-55550 = greater than the concentration to the right of the zero-66660 = sample held beyond normal holding time-77770 = quality control failure. Data not valid.-88880 = sample held beyond normal holding time. Sample analyzed for but not detected. Value stored is limit of detection for proces in use.-11120 = Value reported is less than the criteria of detection.-9999 = no data (parameter not quantified)

    A more in depth descprition and hydrogeologic analysis of the database can be found here
    An in Depth data fact sheet can be found here
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
kenanyafi (2024). A Journey through Data Cleaning [Dataset]. https://www.kaggle.com/datasets/kenanyafi/a-journey-through-data-cleaning
Organization logo

A Journey through Data Cleaning

Streamlining Data for Enhanced Analysis and Decision-Making

Explore at:
zip(0 bytes)Available download formats
Dataset updated
Mar 22, 2024
Authors
kenanyafi
Description

Embark on a transformative journey with our Data Cleaning Project, where we meticulously refine and polish raw data into valuable insights. Our project focuses on streamlining data sets, removing inconsistencies, and ensuring accuracy to unlock its full potential.

Through advanced techniques and rigorous processes, we standardize formats, address missing values, and eliminate duplicates, creating a clean and reliable foundation for analysis. By enhancing data quality, we empower organizations to make informed decisions, drive innovation, and achieve strategic objectives with confidence.

Join us as we embark on this essential phase of data preparation, paving the way for more accurate and actionable insights that fuel success."

Search
Clear search
Close search
Google apps
Main menu