100+ datasets found
  1. f

    Orange dataset table

    • figshare.com
    xlsx
    Updated Mar 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rui Simões (2022). Orange dataset table [Dataset]. http://doi.org/10.6084/m9.figshare.19146410.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Mar 4, 2022
    Dataset provided by
    figshare
    Authors
    Rui Simões
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.

    Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.

  2. All Seaborn Built-in Datasets 📊✨

    • kaggle.com
    Updated Aug 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdelrahman Mohamed (2024). All Seaborn Built-in Datasets 📊✨ [Dataset]. https://www.kaggle.com/datasets/abdoomoh/all-seaborn-built-in-datasets
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 27, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Abdelrahman Mohamed
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Description: - This dataset includes all 22 built-in datasets from the Seaborn library, a widely used Python data visualization tool. Seaborn's built-in datasets are essential resources for anyone interested in practicing data analysis, visualization, and machine learning. They span a wide range of topics, from classic datasets like the Iris flower classification to real-world data such as Titanic survival records and diamond characteristics.

    • Included Datasets:
      • Anagrams: Analysis of word anagram patterns.
      • Anscombe: Anscombe's quartet demonstrating the importance of data visualization.
      • Attention: Data on attention span variations in different scenarios.
      • Brain Networks: Connectivity data within brain networks.
      • Car Crashes: US car crash statistics.
      • Diamonds: Data on diamond properties including price, cut, and clarity.
      • Dots: Randomly generated data for scatter plot visualization.
      • Dow Jones: Historical records of the Dow Jones Industrial Average.
      • Exercise: The relationship between exercise and health metrics.
      • Flights: Monthly passenger numbers on flights.
      • FMRI: Functional MRI data capturing brain activity.
      • Geyser: Eruption times of the Old Faithful geyser.
      • Glue: Strength of glue under different conditions.
      • Health Expenditure: Health expenditure statistics across countries.
      • Iris: Famous dataset for classifying Iris species.
      • MPG: Miles per gallon for various vehicles.
      • Penguins: Data on penguin species and their features.
      • Planets: Characteristics of discovered exoplanets.
      • Sea Ice: Measurements of sea ice extent.
      • Taxis: Taxi trips data in a city.
      • Tips: Tipping data collected from a restaurant.
      • Titanic: Survival data from the Titanic disaster.

    This complete collection serves as an excellent starting point for anyone looking to improve their data science skills, offering a wide array of datasets suitable for both beginners and advanced users.

  3. Best Books Ever Dataset

    • zenodo.org
    csv
    Updated Nov 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells (2020). Best Books Ever Dataset [Dataset]. http://doi.org/10.5281/zenodo.4265096
    Explore at:
    csvAvailable download formats
    Dataset updated
    Nov 10, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Lorena Casanova Lozano; Sergio Costa Planells; Lorena Casanova Lozano; Sergio Costa Planells
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The dataset has been collected in the frame of the Prac1 of the subject Tipology and Data Life Cycle of the Master's Degree in Data Science of the Universitat Oberta de Catalunya (UOC).

    The dataset contains 25 variables and 52478 records corresponding to books on the GoodReads Best Books Ever list (the larges list on the site).

    Original code used to retrieve the dataset can be found on github repository: github.com/scostap/goodreads_bbe_dataset

    The data was retrieved in two sets, the first 30000 books and then the remainig 22478. Dates were not parsed and reformated on the second chunk so publishDate and firstPublishDate are representet in a mm/dd/yyyy format for the first 30000 records and Month Day Year for the rest.

    Book cover images can be optionally downloaded from the url in the 'coverImg' field. Python code for doing so and an example can be found on the github repo.

    The 25 fields of the dataset are:

    | Attributes | Definition | Completeness |
    | ------------- | ------------- | ------------- | 
    | bookId | Book Identifier as in goodreads.com | 100 |
    | title | Book title | 100 |
    | series | Series Name | 45 |
    | author | Book's Author | 100 |
    | rating | Global goodreads rating | 100 |
    | description | Book's description | 97 |
    | language | Book's language | 93 |
    | isbn | Book's ISBN | 92 |
    | genres | Book's genres | 91 |
    | characters | Main characters | 26 |
    | bookFormat | Type of binding | 97 |
    | edition | Type of edition (ex. Anniversary Edition) | 9 |
    | pages | Number of pages | 96 |
    | publisher | Editorial | 93 |
    | publishDate | publication date | 98 |
    | firstPublishDate | Publication date of first edition | 59 |
    | awards | List of awards | 20 |
    | numRatings | Number of total ratings | 100 |
    | ratingsByStars | Number of ratings by stars | 97 |
    | likedPercent | Derived field, percent of ratings over 2 starts (as in GoodReads) | 99 |
    | setting | Story setting | 22 |
    | coverImg | URL to cover image | 99 |
    | bbeScore | Score in Best Books Ever list | 100 |
    | bbeVotes | Number of votes in Best Books Ever list | 100 |
    | price | Book's price (extracted from Iberlibro) | 73 |

  4. Online Retail Transaction Data

    • kaggle.com
    Updated Dec 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Online Retail Transaction Data [Dataset]. https://www.kaggle.com/datasets/thedevastator/online-retail-transaction-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 21, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    Description

    Online Retail Transaction Data

    UK Online Retail Sales and Customer Transaction Data

    By UCI [source]

    About this dataset

    Comprehensive Dataset on Online Retail Sales and Customer Data

    Welcome to this comprehensive dataset offering a wide array of information related to online retail sales. This data set provides an in-depth look at transactions, product details, and customer information documented by an online retail company based in the UK. The scope of the data spans vastly, from granular details about each product sold to extensive customer data sets from different countries.

    This transnational data set is a treasure trove of vital business insights as it meticulously catalogues all the transactions that happened during its span. It houses rich transactional records curated by a renowned non-store online retail company based in the UK known for selling unique all-occasion gifts. A considerable portion of its clientele includes wholesalers; ergo, this dataset can prove instrumental for companies looking for patterns or studying purchasing trends among such businesses.

    The available attributes within this dataset offer valuable pieces of information:

    • InvoiceNo: This attribute refers to invoice numbers that are six-digit integral numbers uniquely assigned to every transaction logged in this system. Transactions marked with 'c' at the beginning signify cancellations - adding yet another dimension for purchase pattern analysis.

    • StockCode: Stock Code corresponds with specific items as they're represented within the inventory system via 5-digit integral numbers; these allow easy identification and distinction between products.

    • Description: This refers to product names, giving users qualitative knowledge about what kind of items are being bought and sold frequently.

    • Quantity: These figures ascertain the volume of each product per transaction – important figures that can help understand buying trends better.

    • InvoiceDate: Invoice Dates detail when each transaction was generated down to precise timestamps – invaluable when conducting time-based trend analysis or segmentation studies.

    • UnitPrice: Unit prices represent how much each unit retails at — crucial for revenue calculations or cost-related analyses.

    Finally,

    • Country: This locational attribute shows where each customer hails from, adding geographical segmentation to your data investigation toolkit.

    This dataset was originally collated by Dr Daqing Chen, Director of the Public Analytics group based at the School of Engineering, London South Bank University. His research studies and business cases with this dataset have been published in various papers contributing to establishing a solid theoretical basis for direct, data and digital marketing strategies.

    Access to such records can ensure enriching explorations or formulating insightful hypotheses about consumer behavior patterns among wholesalers. Whether it's managing inventory or studying transactional trends over time or spotting cancellation patterns - this dataset is apt for multiple forms of retail analysis

    How to use the dataset

    1. Sales Analysis:

    Sales data forms the backbone of this dataset, and it allows users to delve into various aspects of sales performance. You can use the Quantity and UnitPrice fields to calculate metrics like revenue, and further combine it with InvoiceNo information to understand sales over individual transactions.

    2. Product Analysis:

    Each product in this dataset comes with its unique identifier (StockCode) and its name (Description). You could analyse which products are most popular based on Quantity sold or look at popularity per transaction by considering both Quantity and InvoiceNo.

    3. Customer Segmentation:

    If you associated specific business logic onto the transactions (such as calculating total amounts), then you could use standard machine learning methods or even RFM (Recency, Frequency, Monetary) segmentation techniques combining it with 'CustomerID' for your customer base to understand customer behavior better. Concatenating invoice numbers (which stand for separate transactions) per client will give insights about your clients as well.

    4. Geographical Analysis:

    The Country column enables analysts to study purchase patterns across different geographical locations.

    Practical applications

    Understand what products sell best where - It can help drive tailored marketing strategies. Anomalies detection – Identify unusual behaviors that might lead frau...

  5. d

    Success.ai | Consumer Behavior Data | In-depth Intent Data for Strategic...

    • datarade.ai
    Updated Oct 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Success.ai (2022). Success.ai | Consumer Behavior Data | In-depth Intent Data for Strategic Engagement – Unbeatable Prices Guaranteed [Dataset]. https://datarade.ai/data-categories/consumer-behavior-data/datasets
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Oct 27, 2022
    Dataset provided by
    Success.ai
    Area covered
    Bonaire, Venezuela (Bolivarian Republic of), Taiwan, Andorra, Dominican Republic, Djibouti, Japan, Romania, Benin, Egypt
    Description

    Success.ai is at the forefront of delivering precise consumer behavior insights that empower businesses to understand and anticipate customer needs more effectively. Our extensive datasets provide a deep dive into the nuances of consumer actions, preferences, and trends, enabling businesses to tailor their strategies for maximum engagement and conversion.

    Explore the Multifaceted Dimensions of Consumer Behavior:

    • Consumer Sentiment Analysis: Decode the emotions and sentiments behind consumer interactions with brands and products to refine messaging and product offerings.
    • Web Activity Insights: Monitor and analyze consumer online behaviors, from browsing patterns to engagement metrics, to optimize digital strategies and user experience.
    • Geodemographic Segmentation: Utilize detailed demographic and geographic data to segment audiences accurately, enabling personalized marketing approaches that resonate with diverse consumer groups.
    • Consumer Purchasing Patterns: Understand the what, when, and why behind consumer purchases to forecast trends and align inventory and marketing efforts accordingly.
    • Advanced Consumer Profiling: Build detailed profiles based on consumer behavior data to target or retarget customers with precision.

    Why Choose Success.ai for Consumer Behavior Data?

    • Comprehensive Data Integration: Seamlessly integrate our rich consumer data into your CRM systems, enhancing your data reservoir with valuable consumer insights.
    • Real-Time Updates and Predictive Analytics: Leverage the latest consumer behavior trends powered by AI-driven analytics to stay ahead in a rapidly changing market.
    • Precision and Reliability: Count on our meticulous data collection and processing methods, ensuring high accuracy and compliance with international data protection regulations.
    • Scalable Solutions: Whether you're a small business or a large enterprise, our flexible data solutions can be scaled to meet your specific needs and budget constraints.
    • Competitive Pricing: We offer the most compelling pricing in the industry, guaranteeing you get top-tier data without overspending.

    Strategic Applications of Consumer Behavior Data for Business Growth:

    • Enhanced Email Marketing: Use detailed consumer profiles to craft personalized email campaigns that increase open rates and conversions.
    • Optimized Online Marketing: Apply insights from consumer web activity and search trends to fine-tune your online marketing tactics for better ROI.
    • Effective B2B Lead Generation: Identify and engage potential business clients by understanding their industry-specific behaviors and preferences.
    • Robust Sales Data Enrichment: Enrich your sales strategies with deep behavioral insights, turning cold calls into informed discussions and increasing sales success.
    • Dynamic Competitive Intelligence: Stay competitive by monitoring how consumer behaviors are shifting in your industry and adjust your strategies proactively.

    Empower Your Business with Actionable Consumer Insights from Success.ai

    Success.ai provides not just data, but a gateway to transformative business strategies. Our comprehensive consumer behavior insights allow you to make informed decisions, personalize customer interactions, and ultimately drive higher engagement and sales.

    Get in touch with us today to discover how our Consumer Behavior Intent Data can revolutionize your business strategies and help you achieve your market potential.

    Contact Success.ai now and start transforming data into growth. Let us show you how our unmatched data solutions can be the cornerstone of your business success.

  6. d

    Protected Areas Database of the United States (PAD-US) 3.0 Vector Analysis...

    • catalog.data.gov
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Protected Areas Database of the United States (PAD-US) 3.0 Vector Analysis and Summary Statistics [Dataset]. https://catalog.data.gov/dataset/protected-areas-database-of-the-united-states-pad-us-3-0-vector-analysis-and-summary-stati
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    United States
    Description

    Spatial analysis and statistical summaries of the Protected Areas Database of the United States (PAD-US) provide land managers and decision makers with a general assessment of management intent for biodiversity protection, natural resource management, and recreation access across the nation. The PAD-US 3.0 Combined Fee, Designation, Easement feature class (with Military Lands and Tribal Areas from the Proclamation and Other Planning Boundaries feature class) was modified to remove overlaps, avoiding overestimation in protected area statistics and to support user needs. A Python scripted process ("PADUS3_0_CreateVectorAnalysisFileScript.zip") associated with this data release prioritized overlapping designations (e.g. Wilderness within a National Forest) based upon their relative biodiversity conservation status (e.g. GAP Status Code 1 over 2), public access values (in the order of Closed, Restricted, Open, Unknown), and geodatabase load order (records are deliberately organized in the PAD-US full inventory with fee owned lands loaded before overlapping management designations, and easements). The Vector Analysis File ("PADUS3_0VectorAnalysisFile_ClipCensus.zip") associated item of PAD-US 3.0 Spatial Analysis and Statistics ( https://doi.org/10.5066/P9KLBB5D ) was clipped to the Census state boundary file to define the extent and serve as a common denominator for statistical summaries. Boundaries of interest to stakeholders (State, Department of the Interior Region, Congressional District, County, EcoRegions I-IV, Urban Areas, Landscape Conservation Cooperative) were incorporated into separate geodatabase feature classes to support various data summaries ("PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.zip") and Comma-separated Value (CSV) tables ("PADUS3_0SummaryStatistics_TabularData_CSV.zip") summarizing "PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.zip" are provided as an alternative format and enable users to explore and download summary statistics of interest (Comma-separated Table [CSV], Microsoft Excel Workbook [.XLSX], Portable Document Format [.PDF] Report) from the PAD-US Lands and Inland Water Statistics Dashboard ( https://www.usgs.gov/programs/gap-analysis-project/science/pad-us-statistics ). In addition, a "flattened" version of the PAD-US 3.0 combined file without other extent boundaries ("PADUS3_0VectorAnalysisFile_ClipCensus.zip") allow for other applications that require a representation of overall protection status without overlapping designation boundaries. The "PADUS3_0VectorAnalysis_State_Clip_CENSUS2020" feature class ("PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.gdb") is the source of the PAD-US 3.0 raster files (associated item of PAD-US 3.0 Spatial Analysis and Statistics, https://doi.org/10.5066/P9KLBB5D ). Note, the PAD-US inventory is now considered functionally complete with the vast majority of land protection types represented in some manner, while work continues to maintain updates and improve data quality (see inventory completeness estimates at: http://www.protectedlands.net/data-stewards/ ). In addition, changes in protected area status between versions of the PAD-US may be attributed to improving the completeness and accuracy of the spatial data more than actual management actions or new acquisitions. USGS provides no legal warranty for the use of this data. While PAD-US is the official aggregation of protected areas ( https://www.fgdc.gov/ngda-reports/NGDA_Datasets.html ), agencies are the best source of their lands data.

  7. d

    Job Postings Dataset for Labour Market Research and Insights

    • datarade.ai
    Updated Sep 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oxylabs (2023). Job Postings Dataset for Labour Market Research and Insights [Dataset]. https://datarade.ai/data-products/job-postings-dataset-for-labour-market-research-and-insights-oxylabs
    Explore at:
    .json, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Sep 20, 2023
    Dataset authored and provided by
    Oxylabs
    Area covered
    Anguilla, Sierra Leone, Jamaica, Switzerland, Tajikistan, Zambia, Luxembourg, Togo, British Indian Ocean Territory, Kyrgyzstan
    Description

    Introducing Job Posting Datasets: Uncover labor market insights!

    Elevate your recruitment strategies, forecast future labor industry trends, and unearth investment opportunities with Job Posting Datasets.

    Job Posting Datasets Source:

    1. Indeed: Access datasets from Indeed, a leading employment website known for its comprehensive job listings.

    2. Glassdoor: Receive ready-to-use employee reviews, salary ranges, and job openings from Glassdoor.

    3. StackShare: Access StackShare datasets to make data-driven technology decisions.

    Job Posting Datasets provide meticulously acquired and parsed data, freeing you to focus on analysis. You'll receive clean, structured, ready-to-use job posting data, including job titles, company names, seniority levels, industries, locations, salaries, and employment types.

    Choose your preferred dataset delivery options for convenience:

    Receive datasets in various formats, including CSV, JSON, and more. Opt for storage solutions such as AWS S3, Google Cloud Storage, and more. Customize data delivery frequencies, whether one-time or per your agreed schedule.

    Why Choose Oxylabs Job Posting Datasets:

    1. Fresh and accurate data: Access clean and structured job posting datasets collected by our seasoned web scraping professionals, enabling you to dive into analysis.

    2. Time and resource savings: Focus on data analysis and your core business objectives while we efficiently handle the data extraction process cost-effectively.

    3. Customized solutions: Tailor our approach to your business needs, ensuring your goals are met.

    4. Legal compliance: Partner with a trusted leader in ethical data collection. Oxylabs is a founding member of the Ethical Web Data Collection Initiative, aligning with GDPR and CCPA best practices.

    Pricing Options:

    Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.

    Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.

    Experience a seamless journey with Oxylabs:

    • Understanding your data needs: We work closely to understand your business nature and daily operations, defining your unique data requirements.
    • Developing a customized solution: Our experts create a custom framework to extract public data using our in-house web scraping infrastructure.
    • Delivering data sample: We provide a sample for your feedback on data quality and the entire delivery process.
    • Continuous data delivery: We continuously collect public data and deliver custom datasets per the agreed frequency.

    Effortlessly access fresh job posting data with Oxylabs Job Posting Datasets.

  8. LinkedIn Datasets

    • brightdata.com
    .json, .csv, .xlsx
    Updated Dec 17, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2021). LinkedIn Datasets [Dataset]. https://brightdata.com/products/datasets/linkedin
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Dec 17, 2021
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Unlock the full potential of LinkedIn data with our extensive dataset that combines profiles, company information, and job listings into one powerful resource for business decision-making, strategic hiring, competitive analysis, and market trend insights. This all-encompassing dataset is ideal for professionals, recruiters, analysts, and marketers aiming to enhance their strategies and operations across various business functions. Dataset Features

    Profiles: Dive into detailed public profiles featuring names, titles, positions, experience, education, skills, and more. Utilize this data for talent sourcing, lead generation, and investment signaling, with a refresh rate ensuring up to 30 million records per month. Companies: Access comprehensive company data including ID, country, industry, size, number of followers, website details, subsidiaries, and posts. Tailored subsets by industry or region provide invaluable insights for CRM enrichment, competitive intelligence, and understanding the startup ecosystem, updated monthly with up to 40 million records. Job Listings: Explore current job opportunities detailed with job titles, company names, locations, and employment specifics such as seniority levels and employment functions. This dataset includes direct application links and real-time application numbers, serving as a crucial tool for job seekers and analysts looking to understand industry trends and the job market dynamics.

    Customizable Subsets for Specific Needs Our LinkedIn dataset offers the flexibility to tailor the dataset according to your specific business requirements. Whether you need comprehensive insights across all data points or are focused on specific segments like job listings, company profiles, or individual professional details, we can customize the dataset to match your needs. This modular approach ensures that you get only the data that is most relevant to your objectives, maximizing efficiency and relevance in your strategic applications. Popular Use Cases

    Strategic Hiring and Recruiting: Track talent movement, identify growth opportunities, and enhance your recruiting efforts with targeted data. Market Analysis and Competitive Intelligence: Gain a competitive edge by analyzing company growth, industry trends, and strategic opportunities. Lead Generation and CRM Enrichment: Enrich your database with up-to-date company and professional data for targeted marketing and sales strategies. Job Market Insights and Trends: Leverage detailed job listings for a nuanced understanding of employment trends and opportunities, facilitating effective job matching and market analysis. AI-Driven Predictive Analytics: Utilize AI algorithms to analyze large datasets for predicting industry shifts, optimizing business operations, and enhancing decision-making processes based on actionable data insights.

    Whether you are mapping out competitive landscapes, sourcing new talent, or analyzing job market trends, our LinkedIn dataset provides the tools you need to succeed. Customize your access to fit specific needs, ensuring that you have the most relevant and timely data at your fingertips.

  9. Lending Club Loan Data Analysis - Deep Learning

    • kaggle.com
    Updated Aug 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deependra Verma (2023). Lending Club Loan Data Analysis - Deep Learning [Dataset]. https://www.kaggle.com/datasets/deependraverma13/lending-club-loan-data-analysis-deep-learning
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 9, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Deependra Verma
    Description

    DESCRIPTION

    Create a model that predicts whether or not a loan will be default using the historical data.

    Problem Statement:

    For companies like Lending Club correctly predicting whether or not a loan will be a default is very important. In this project, using the historical data from 2007 to 2015, you have to build a deep learning model to predict the chance of default for future loans. As you will see later this dataset is highly imbalanced and includes a lot of features that make this problem more challenging.

    Domain: Finance

    Analysis to be done: Perform data preprocessing and build a deep learning prediction model.

    Content:

    Dataset columns and definition:

    credit.policy: 1 if the customer meets the credit underwriting criteria of LendingClub.com, and 0 otherwise.

    purpose: The purpose of the loan (takes values "credit_card", "debt_consolidation", "educational", "major_purchase", "small_business", and "all_other").

    int.rate: The interest rate of the loan, as a proportion (a rate of 11% would be stored as 0.11). Borrowers judged by LendingClub.com to be more risky are assigned higher interest rates.

    installment: The monthly installments owed by the borrower if the loan is funded.

    log.annual.inc: The natural log of the self-reported annual income of the borrower.

    dti: The debt-to-income ratio of the borrower (amount of debt divided by annual income).

    fico: The FICO credit score of the borrower.

    days.with.cr.line: The number of days the borrower has had a credit line.

    revol.bal: The borrower's revolving balance (amount unpaid at the end of the credit card billing cycle).

    revol.util: The borrower's revolving line utilization rate (the amount of the credit line used relative to total credit available).

    inq.last.6mths: The borrower's number of inquiries by creditors in the last 6 months.

    delinq.2yrs: The number of times the borrower had been 30+ days past due on a payment in the past 2 years.

    pub.rec: The borrower's number of derogatory public records (bankruptcy filings, tax liens, or judgments).

    Steps to perform:

    Perform exploratory data analysis and feature engineering and then apply feature engineering. Follow up with a deep learning model to predict whether or not the loan will be default using the historical data.

    Tasks:

    1. Feature Transformation

    Transform categorical values into numerical values (discrete)

    1. Exploratory data analysis of different factors of the dataset.

    2. Additional Feature Engineering

    You will check the correlation between features and will drop those features which have a strong correlation

    This will help reduce the number of features and will leave you with the most relevant features

    1. Modeling

    After applying EDA and feature engineering, you are now ready to build the predictive models

    In this part, you will create a deep learning model using Keras with Tensorflow backend

  10. Exploring Excellence: Top 1000 Global Universities

    • kaggle.com
    Updated Oct 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    M.Ghiasvand (2023). Exploring Excellence: Top 1000 Global Universities [Dataset]. https://www.kaggle.com/datasets/ariohakhamanesh/exploring-excellence-top-1000-global-universities
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 16, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    M.Ghiasvand
    Description

    A modified dataset ready for analysis (the modifications made and the source will be mentioned later) This dataset is related to the ranking of one thousand top universities in the world with fourteen columns and one thousand rows.

    The columns are respectively : World Rank Institution Location National Rank Quality of Education Alumni Employment Quality of Faculty Research output Quality Publications Influence Citations Score Latitude Longitude

    The main source of the dataset is: https://www.kaggle.com/datasets/alifarajnia/eighteen-nineteen-university-datasets

    But its general flaws, including the following, have been fixed and are ready for analysis :

    1. The error in the names of the two columns Quality of Education and Quality of Faculty has been fixed.
    2. Missing and empty data in the two columns above have been corrected and filled with a maximum plus one in each column.
    3. Also, all the data or cells (<1000) were modified and the number 1001 was placed in its place.
    4. Tabp and data type has been changed from object to integer and float.
    5. And at the end, two more columns have been added for better spatial analysis (including geographic locations).
  11. 🍕🍽️ Pizza Restaurant Sales

    • kaggle.com
    Updated Oct 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shi Long Zhuang (2022). 🍕🍽️ Pizza Restaurant Sales [Dataset]. https://www.kaggle.com/datasets/shilongzhuang/pizza-sales/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 21, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Shi Long Zhuang
    Description

    Contents

    This pizza sales dataset make up 12 relevant features: - order_id: Unique identifier for each order placed by a table - order_details_id: Unique identifier for each pizza placed within each order (pizzas of the same type and size are kept in the same row, and the quantity increases) - pizza_id: Unique key identifier that ties the pizza ordered to its details, like size and price - quantity: Quantity ordered for each pizza of the same type and size - order_date: Date the order was placed (entered into the system prior to cooking & serving) - order_time: Time the order was placed (entered into the system prior to cooking & serving) - unit_price: Price of the pizza in USD - total_price: unit_price * quantity - pizza_size: Size of the pizza (Small, Medium, Large, X Large, or XX Large) - pizza_type: Unique key identifier that ties the pizza ordered to its details, like size and price - pizza_ingredients: ingredients used in the pizza as shown in the menu (they all include Mozzarella Cheese, even if not specified; and they all include Tomato Sauce, unless another sauce is specified) - pizza_name: Name of the pizza as shown in the menu

    🍕The Pizza Challenge

    For the Maven Pizza Challenge, you’ll be playing the role of a BI Consultant hired by Plato's Pizza, a Greek-inspired pizza place in New Jersey. You've been hired to help the restaurant use data to improve operations, and just received the following note:

    Welcome aboard, we're glad you're here to help!

    Things are going OK here at Plato's, but there's room for improvement. We've been collecting transactional data for the past year, but really haven't been able to put it to good use. Hoping you can analyze the data and put together a report to help us find opportunities to drive more sales and work more efficiently.

    Here are some questions that we'd like to be able to answer:

    1. What days and times do we tend to be busiest?
    2. How many pizzas are we making during peak periods?
    3. What are our best and worst-selling pizzas?
    4. What's our average order value?
    5. How well are we utilizing our seating capacity? (we have 15 tables and 60 seats)

    That's all I can think of for now, but if you have any other ideas I'd love to hear them – you're the expert!

    Thanks in advance,

    Mario Maven (Manager, Plato's Pizza)

    Colllection Methodology

    The public dataset is completely available on the Maven Analytics website platform where it stores and consolidates all available datasets for analysis in the Data Playground. The specific individual datasets at hand can be obtained at this link below: https://www.mavenanalytics.io/blog/maven-pizza-challenge

    📌I set up the data model to include all the related instances in one single table so obtaining data for analysis is made easier.

    My Inspiration

    Complete details were also provided about the challenge in the link if you are interested. The purpose of uploading here is to conduct exploratory data analysis about the dataset beforehand with the use of Pandas and data visualization libraries in order to have a comprehensive review of the data and translate my findings and insights in the form of a single page visualization.

  12. m

    Data for "Best Practices for Your Exploratory Factor Analysis: Factor...

    • data.mendeley.com
    Updated Jul 16, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pablo Rogers (2021). Data for "Best Practices for Your Exploratory Factor Analysis: Factor Tutorial" published by RAC-Revista de Administração Contemporânea [Dataset]. http://doi.org/10.17632/rdky78bk8r.1
    Explore at:
    Dataset updated
    Jul 16, 2021
    Authors
    Pablo Rogers
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains material related to the analysis performed in the article "Best Practices for Your Exploratory Factor Analysis: Factor Tutorial". The material includes the data used in the analyses in .dat format, the labels (.txt) of the variables used in the Factor software, the outputs (.txt) evaluated in the article, and videos (.mp4 with English subtitles) recorded for the purpose of explaining the article. The videos can also be accessed in the following playlist: https://youtube.com/playlist?list=PLDfyRtHbxiZ3R-T3H1cY8dusz273aUFVe. Below is a summary of the article:

    "Exploratory Factor Analysis (EFA) is one of the statistical methods most widely used in Administration, however, its current practice coexists with rules of thumb and heuristics given half a century ago. The purpose of this article is to present the best practices and recent recommendations for a typical EFA in Administration through a practical solution accessible to researchers. In this sense, in addition to discussing current practices versus recommended practices, a tutorial with real data on Factor is illustrated, a software that is still little known in the Administration area, but freeware, easy to use (point and click) and powerful. The step-by-step illustrated in the article, in addition to the discussions raised and an additional example, is also available in the format of tutorial videos. Through the proposed didactic methodology (article-tutorial + video-tutorial), we encourage researchers/methodologists who have mastered a particular technique to do the same. Specifically, about EFA, we hope that the presentation of the Factor software, as a first solution, can transcend the current outdated rules of thumb and heuristics, by making best practices accessible to Administration researchers"

  13. Real Estate Data London 2024

    • kaggle.com
    Updated Nov 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dwipayan Mondal (2024). Real Estate Data London 2024 [Dataset]. https://www.kaggle.com/datasets/dwipayanmondal/real-estate-data-london-2024
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 18, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Dwipayan Mondal
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    London
    Description

    About Dataset

    Dataset Overview This dataset provides a snapshot of real estate transactions in London for 2024. It includes key property details such as the number of bedrooms, bathrooms, living space size, lot size, and transaction price. Additionally, it incorporates information about property features like waterfront views, renovation history, and construction quality. Designed for educational and research purposes, the dataset offers insights into London's real estate market trends and serves as a valuable resource for data analysis and machine learning applications.

    Data Science Applications This dataset is ideal for students, researchers, and professionals seeking to apply data science techniques to real-world real estate data. Potential applications include:

    Exploratory Data Analysis (EDA): Investigate price trends, property characteristics, and geographical distribution of transactions. Price Prediction Models: Develop machine learning models to predict property prices based on features like size, location, and condition. Trend Analysis: Analyze historical and geographical trends in property prices and features. Geospatial Analysis: Map properties based on latitude and longitude to identify hotspots or underserved areas.

    Column Descriptions

    Column NameDescription
    idUnique identifier for the property listing.
    dateTransaction date in YYYYMMDDT000000 format.
    priceSale price of the property in GBP (£).
    bedroomsNumber of bedrooms in the property.
    bathroomsNumber of bathrooms in the property.
    sqft_livingLiving area size in square feet.
    sqft_lotLot size in square feet.
    floorsNumber of floors in the property.
    waterfrontIndicates if the property has a waterfront view (1: Yes, 0: No).
    viewProperty view rating (scale of 0–4).
    conditionProperty condition rating (scale of 1–5, 5 being best).
    gradeProperty construction and design rating (scale of 1–13, higher is better).
    sqft_aboveSquare footage of the property above ground level.
    sqft_basementSquare footage of the basement area.
    yr_builtYear the property was built.
    yr_renovatedYear the property was last renovated (0 if never renovated).
    zipcodeZip code of the property's location.
    latLatitude coordinate of the property.
    longLongitude coordinate of the property.
    sqft_living15Average living area square footage of 15 nearby properties.
    sqft_lot15Average lot size square footage of 15 nearby properties.

    Ethically Mined Data This dataset was ethically sourced from publicly available property listings. It does not include any Personally Identifiable Information (PII) or data that could infringe on individual privacy. All information represents public details about properties for sale in London.

    Acknowledgements

    Data Source: Real estate data provided from publicly accessible resources. Image Credit: Unsplash for real estate-themed visuals. Use this dataset responsibly for educational and analytical purposes!

  14. o

    Top 100 Bestselling Book Reviews on Amazon

    • opendatabay.com
    .undefined
    Updated Jun 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Top 100 Bestselling Book Reviews on Amazon [Dataset]. https://www.opendatabay.com/data/ai-ml/1639fb85-1580-4646-8216-326b2fac3437
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jun 23, 2025
    Dataset authored and provided by
    Datasimple
    Area covered
    Reviews & Ratings
    Description

    This dataset offers an in-depth look into Amazon's top 100 Bestselling books along with their customer reviews, Ratings, Price etc. Whether you're a book enthusiast, data scientist, or just curious about the latest literary trends, this dataset provides a window into the world of popular reading.{Scrapped dataset on November 2023}

    Book Rank: The ranking of the book among the top 100 Bestselling books on Amazon. Book Title: The title of the book. Price: The price of the book in USD. Rating: The overall rating of the book, on a scale of 1 to 5. Author: The author of the book. Year of Publication: The year in which the book was published. Genre: The genre or category to which the book belongs. URL: The URL link to the book on Amazon's platform. Review Title: The title of the book review. Reviewer: The name of the person who has written a review for the book. Reviewer Rating: The rating given by the reviewer for the book, on a scale of 1 to 5. Review Description: The text description of the review given. Is_verified: Indicates whether the review is verified as a genuine customer review. Date: The timestamp indicates the date when the review was posted. Timestamp: The timestamp indicates when the review was posted. ASIN: Amazon Standard Identification Number assigned to products on Amazon. Feel free to download the data and use it in your work. I will wait for interesting notebooks from your side. Thank you

    Original Data Source: Top 100 Bestselling Book Reviews on Amazon

  15. Optimizing Xenium In Situ data utility by quality assessment and best...

    • zenodo.org
    bin, zip
    Updated Jan 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sergio Marco Salas; Sergio Marco Salas (2024). Optimizing Xenium In Situ data utility by quality assessment and best practice analysis workflows- Xenium mouse brain datasets [Dataset]. http://doi.org/10.5281/zenodo.10566172
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Jan 25, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sergio Marco Salas; Sergio Marco Salas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 14, 2024
    Description

    Four mouse brain coronal datasets generated for with Xenium's mouse brain panel.

  16. N

    College Springs, IA Population Breakdown by Gender Dataset: Male and Female...

    • neilsberg.com
    csv, json
    Updated Feb 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). College Springs, IA Population Breakdown by Gender Dataset: Male and Female Population Distribution // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/b2297cea-f25d-11ef-8c1b-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Feb 24, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    College Springs, Iowa
    Variables measured
    Male Population, Female Population, Male Population as Percent of Total Population, Female Population as Percent of Total Population
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of College Springs by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of College Springs across both sexes and to determine which sex constitutes the majority.

    Key observations

    There is a majority of male population, with 56.68% of total population being male. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Scope of gender :

    Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.

    Variables / Data Columns

    • Gender: This column displays the Gender (Male / Female)
    • Population: The population of the gender in the College Springs is shown in this column.
    • % of Total Population: This column displays the percentage distribution of each gender as a proportion of College Springs total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for College Springs Population by Race & Ethnicity. You can refer the same here

  17. Data from: Inventory of online public databases and repositories holding...

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    • +2more
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Inventory of online public databases and repositories holding agricultural data in 2017 [Dataset]. https://catalog.data.gov/dataset/inventory-of-online-public-databases-and-repositories-holding-agricultural-data-in-2017-d4c81
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    United States agricultural researchers have many options for making their data available online. This dataset aggregates the primary sources of ag-related data and determines where researchers are likely to deposit their agricultural data. These data serve as both a current landscape analysis and also as a baseline for future studies of ag research data. Purpose As sources of agricultural data become more numerous and disparate, and collaboration and open data become more expected if not required, this research provides a landscape inventory of online sources of open agricultural data. An inventory of current agricultural data sharing options will help assess how the Ag Data Commons, a platform for USDA-funded data cataloging and publication, can best support data-intensive and multi-disciplinary research. It will also help agricultural librarians assist their researchers in data management and publication. The goals of this study were to establish where agricultural researchers in the United States-- land grant and USDA researchers, primarily ARS, NRCS, USFS and other agencies -- currently publish their data, including general research data repositories, domain-specific databases, and the top journals compare how much data is in institutional vs. domain-specific vs. federal platforms determine which repositories are recommended by top journals that require or recommend the publication of supporting data ascertain where researchers not affiliated with funding or initiatives possessing a designated open data repository can publish data Approach The National Agricultural Library team focused on Agricultural Research Service (ARS), Natural Resources Conservation Service (NRCS), and United States Forest Service (USFS) style research data, rather than ag economics, statistics, and social sciences data. To find domain-specific, general, institutional, and federal agency repositories and databases that are open to US research submissions and have some amount of ag data, resources including re3data, libguides, and ARS lists were analysed. Primarily environmental or public health databases were not included, but places where ag grantees would publish data were considered. Search methods We first compiled a list of known domain specific USDA / ARS datasets / databases that are represented in the Ag Data Commons, including ARS Image Gallery, ARS Nutrition Databases (sub-components), SoyBase, PeanutBase, National Fungus Collection, i5K Workspace @ NAL, and GRIN. We then searched using search engines such as Bing and Google for non-USDA / federal ag databases, using Boolean variations of “agricultural data” /“ag data” / “scientific data” + NOT + USDA (to filter out the federal / USDA results). Most of these results were domain specific, though some contained a mix of data subjects. We then used search engines such as Bing and Google to find top agricultural university repositories using variations of “agriculture”, “ag data” and “university” to find schools with agriculture programs. Using that list of universities, we searched each university web site to see if their institution had a repository for their unique, independent research data if not apparent in the initial web browser search. We found both ag specific university repositories and general university repositories that housed a portion of agricultural data. Ag specific university repositories are included in the list of domain-specific repositories. Results included Columbia University – International Research Institute for Climate and Society, UC Davis – Cover Crops Database, etc. If a general university repository existed, we determined whether that repository could filter to include only data results after our chosen ag search terms were applied. General university databases that contain ag data included Colorado State University Digital Collections, University of Michigan ICPSR (Inter-university Consortium for Political and Social Research), and University of Minnesota DRUM (Digital Repository of the University of Minnesota). We then split out NCBI (National Center for Biotechnology Information) repositories. Next we searched the internet for open general data repositories using a variety of search engines, and repositories containing a mix of data, journals, books, and other types of records were tested to determine whether that repository could filter for data results after search terms were applied. General subject data repositories include Figshare, Open Science Framework, PANGEA, Protein Data Bank, and Zenodo. Finally, we compared scholarly journal suggestions for data repositories against our list to fill in any missing repositories that might contain agricultural data. Extensive lists of journals were compiled, in which USDA published in 2012 and 2016, combining search results in ARIS, Scopus, and the Forest Service's TreeSearch, plus the USDA web sites Economic Research Service (ERS), National Agricultural Statistics Service (NASS), Natural Resources and Conservation Service (NRCS), Food and Nutrition Service (FNS), Rural Development (RD), and Agricultural Marketing Service (AMS). The top 50 journals' author instructions were consulted to see if they (a) ask or require submitters to provide supplemental data, or (b) require submitters to submit data to open repositories. Data are provided for Journals based on a 2012 and 2016 study of where USDA employees publish their research studies, ranked by number of articles, including 2015/2016 Impact Factor, Author guidelines, Supplemental Data?, Supplemental Data reviewed?, Open Data (Supplemental or in Repository) Required? and Recommended data repositories, as provided in the online author guidelines for each the top 50 journals. Evaluation We ran a series of searches on all resulting general subject databases with the designated search terms. From the results, we noted the total number of datasets in the repository, type of resource searched (datasets, data, images, components, etc.), percentage of the total database that each term comprised, any dataset with a search term that comprised at least 1% and 5% of the total collection, and any search term that returned greater than 100 and greater than 500 results. We compared domain-specific databases and repositories based on parent organization, type of institution, and whether data submissions were dependent on conditions such as funding or affiliation of some kind. Results A summary of the major findings from our data review: Over half of the top 50 ag-related journals from our profile require or encourage open data for their published authors. There are few general repositories that are both large AND contain a significant portion of ag data in their collection. GBIF (Global Biodiversity Information Facility), ICPSR, and ORNL DAAC were among those that had over 500 datasets returned with at least one ag search term and had that result comprise at least 5% of the total collection. Not even one quarter of the domain-specific repositories and datasets reviewed allow open submission by any researcher regardless of funding or affiliation. See included README file for descriptions of each individual data file in this dataset. Resources in this dataset:Resource Title: Journals. File Name: Journals.csvResource Title: Journals - Recommended repositories. File Name: Repos_from_journals.csvResource Title: TDWG presentation. File Name: TDWG_Presentation.pptxResource Title: Domain Specific ag data sources. File Name: domain_specific_ag_databases.csvResource Title: Data Dictionary for Ag Data Repository Inventory. File Name: Ag_Data_Repo_DD.csvResource Title: General repositories containing ag data. File Name: general_repos_1.csvResource Title: README and file inventory. File Name: README_InventoryPublicDBandREepAgData.txt

  18. Top Data Science Brands Around the World

    • kaggle.com
    Updated Aug 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sofia_Hypothetical (2024). Top Data Science Brands Around the World [Dataset]. https://www.kaggle.com/datasets/sofiahypothetical/data-science-brands-australia-brazil-uk-usa-spain/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 16, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sofia_Hypothetical
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    World
    Description

    Dataset

    This dataset was created by Sofia_Hypothetical

    Released under MIT

    Contents

  19. N

    Comprehensive Median Household Income and Distribution Dataset for Great...

    • neilsberg.com
    Updated Jan 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Comprehensive Median Household Income and Distribution Dataset for Great Neck Estates, NY: Analysis by Household Type, Size and Income Brackets [Dataset]. https://www.neilsberg.com/research/datasets/cd9ea1bf-b041-11ee-aaca-3860777c1fe6/
    Explore at:
    Dataset updated
    Jan 11, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    New York, Great Neck Estates
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the median household income in Great Neck Estates. It can be utilized to understand the trend in median household income and to analyze the income distribution in Great Neck Estates by household type, size, and across various income brackets.

    Content

    The dataset will have the following datasets when applicable

    Please note: The 2020 1-Year ACS estimates data was not reported by the Census Bureau due to the impact on survey collection and analysis caused by COVID-19. Consequently, median household income data for 2020 is unavailable for large cities (population 65,000 and above).

    • Great Neck Estates, NY Median Household Income Trends (2010-2021, in 2022 inflation-adjusted dollars)
    • Median Household Income Variation by Family Size in Great Neck Estates, NY: Comparative analysis across 7 household sizes
    • Income Distribution by Quintile: Mean Household Income in Great Neck Estates, NY
    • Great Neck Estates, NY households by income brackets: family, non-family, and total, in 2022 inflation-adjusted dollars

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Interested in deeper insights and visual analysis?

    Explore our comprehensive data analysis and visual representations for a deeper understanding of Great Neck Estates median household income. You can refer the same here

  20. n

    Build better LibGuides: A dataset of Political Science, Public Affairs, and...

    • data.niaid.nih.gov
    • search.dataone.org
    • +2more
    zip
    Updated May 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Annelise Sklar (2024). Build better LibGuides: A dataset of Political Science, Public Affairs, and International Studies LibGuides [Dataset]. http://doi.org/10.5061/dryad.prr4xgxvk
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2024
    Dataset provided by
    University of California, San Diego
    Authors
    Annelise Sklar
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The dataset that accompanies the "Build Better LibGuides" chapter of Teaching Information Literacy in Political Science, Public Affairs, and International Studies. This dataset was created to compare current practices in Political Science, Public Affairs, and International Studies (PSPAIS) LibGuides with recommended best practices using a sample that represents a variety of academic institutions. Members of the ACRL Politics, Policy, and International Relations Section (PPIRS) were identified as the librarians most likely to be actively engaged with these specific subjects, so the dataset was scoped by identifying the institutions associated with the most active PPIRS members and then locating the LibGuides in these and related disciplines. The resulting dataset includes 101 guides at 46 institutions, for a total of 887 LibGuide tabs. Methods This dataset was created to compare current practices in Political Science, Public Affairs, and International Studies (PSPAIS) LibGuides with recommended best practices using a sample that represents a variety of academic institutions. Members of the ACRL Politics, Policy, and International Relations Section (PPIRS) were identified as the librarians most likely to be actively engaged with these specific subjects, so the dataset was scoped by identifying the institutions associated with the most active PPIRS members and then locating the LibGuides in these and related disciplines. Specifically, a student assistant collected the names and institutional affiliations of each member serving on a PPIRS committee as of July 1, 2021, 2022, and 2023. The student then removed the individual librarian names from the list and located the links to the Political Science or Government; Public Policy, Public Affairs, or Public Administration; and International Studies or International Relations LibGuides at each institution. The chapter author then confirmed and, in a few cases, added to the student's work and copied and pasted the tab names from each guide (which conveniently were also hyperlinked) into a Google Sheet. The resulting dataset included 101 guides at 46 institutions, for a total of 887 LibGuide tabs. A Google Apps script was used to extract the hyperlinks from the collected tab names and then a Python script was used to scrape the names of links included on each of the tabs. LibGuides from two institutions returned errors during the link name scraping process and were excluded in this part of the analysis.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Rui Simões (2022). Orange dataset table [Dataset]. http://doi.org/10.6084/m9.figshare.19146410.v1

Orange dataset table

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
xlsxAvailable download formats
Dataset updated
Mar 4, 2022
Dataset provided by
figshare
Authors
Rui Simões
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.

Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.

Search
Clear search
Close search
Google apps
Main menu