42 datasets found
  1. P

    WikiSQL Dataset

    • paperswithcode.com
    • opendatalab.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Victor Zhong; Caiming Xiong; Richard Socher, WikiSQL Dataset [Dataset]. https://paperswithcode.com/dataset/wikisql
    Explore at:
    Authors
    Victor Zhong; Caiming Xiong; Richard Socher
    Description

    WikiSQL consists of a corpus of 87,726 hand-annotated SQL query and natural language question pairs. These SQL queries are further split into training (61,297 examples), development (9,145 examples) and test sets (17,284 examples). It can be used for natural language inference tasks related to relational databases.

  2. wikisql

    • huggingface.co
    Updated May 29, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Salesforce (2024). wikisql [Dataset]. https://huggingface.co/datasets/Salesforce/wikisql
    Explore at:
    Dataset updated
    May 29, 2024
    Dataset provided by
    Salesforce Inchttp://salesforce.com/
    Authors
    Salesforce
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    A large crowd-sourced dataset for developing natural language interfaces for relational databases

  3. Northwind and Chinook DataBase

    • kaggle.com
    Updated Jun 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RCURIOSO (2024). Northwind and Chinook DataBase [Dataset]. https://www.kaggle.com/datasets/rcurioso/northwind-and-chinook-database
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 19, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    RCURIOSO
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Northwind Database

    La base de datos Northwind es una base de datos de muestra creada originalmente por Microsoft y utilizada como base para sus tutoriales en una variedad de productos de bases de datos durante décadas. La base de datos de Northwind contiene datos de ventas de una empresa ficticia llamada "Northwind Traders", que importa y exporta alimentos especiales de todo el mundo. La base de datos Northwind es un excelente esquema tutorial para un ERP de pequeñas empresas, con clientes, pedidos, inventario, compras, proveedores, envíos, empleados y contabilidad de entrada única. Desde entonces, la base de datos Northwind ha sido trasladada a una variedad de bases de datos que no son de Microsoft, incluido PostgreSQL.

    El conjunto de datos de Northwind incluye datos de muestra para lo siguiente.

    • Proveedores: Proveedores y vendedores de Northwind
    • Clientes: Clientes que compran productos de Northwind
    • Empleados: detalles de los empleados de los comerciantes de Northwind
    • Productos: Información del producto
    • Transportistas: los detalles de los transportistas que envían los productos desde los comerciantes a los clientes finales.
    • Órdenes y detalles de la orden: transacciones de órdenes de venta que tienen lugar entre los clientes y la empresa.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13411583%2Fa52a5bbc3d8842abfdfcfe608b7a8d25%2FNorthwind_E-R_Diagram.png?generation=1718785485874540&alt=media" alt="">

    Chinook DataBase

    Chinook es una base de datos de muestra disponible para SQL Server, Oracle, MySQL, etc. Se puede crear ejecutando un único script SQL. La base de datos Chinook es una alternativa a la base de datos Northwind, siendo ideal para demostraciones y pruebas de herramientas ORM dirigidas a servidores de bases de datos únicos o múltiples.

    El modelo de datos Chinook representa una tienda de medios digitales, que incluye tablas para artistas, álbumes, pistas multimedia, facturas y clientes.

    Los datos relacionados con los medios se crearon utilizando datos reales de una biblioteca de iTunes. La información de clientes y empleados se creó manualmente utilizando nombres ficticios, direcciones que se pueden ubicar en mapas de Google y otros datos bien formateados (teléfono, fax, correo electrónico, etc.). La información de ventas se genera automáticamente utilizando datos aleatorios durante un período de cuatro años.

    ¿Por qué el nombre Chinook? El nombre de esta base de datos de ejemplo se basó en la base de datos Northwind. Los chinooks son vientos en el interior oeste de América del Norte, donde las praderas canadienses y las grandes llanuras se encuentran con varias cadenas montañosas. Los chinooks son más frecuentes en el sur de Alberta en Canadá. Chinook es una buena opción de nombre para una base de datos que pretende ser una alternativa a Northwind.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13411583%2Fd856e0358e3a572d50f1aba5e171c1c6%2FChinook%20DataBase.png?generation=1718785749657445&alt=media" alt="">

  4. h

    sql-create-context

    • huggingface.co
    • opendatalab.com
    Updated Apr 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    brianm (2023). sql-create-context [Dataset]. https://huggingface.co/datasets/b-mc2/sql-create-context
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 21, 2023
    Authors
    brianm
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Overview

    This dataset builds from WikiSQL and Spider. There are 78,577 examples of natural language queries, SQL CREATE TABLE statements, and SQL Query answering the question using the CREATE statement as context. This dataset was built with text-to-sql LLMs in mind, intending to prevent hallucination of column and table names often seen when trained on text-to-sql datasets. The CREATE TABLE statement can often be copy and pasted from different DBMS and provides table names, column… See the full description on the dataset page: https://huggingface.co/datasets/b-mc2/sql-create-context.

  5. Purchase Order Data

    • data.ca.gov
    • catalog.data.gov
    csv, docx, pdf
    Updated Oct 23, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of General Services (2019). Purchase Order Data [Dataset]. https://data.ca.gov/dataset/purchase-order-data
    Explore at:
    docx, csv, pdfAvailable download formats
    Dataset updated
    Oct 23, 2019
    Dataset authored and provided by
    California Department of General Services
    Description

    The State Contract and Procurement Registration System (SCPRS) was established in 2003, as a centralized database of information on State contracts and purchases over $5000. eSCPRS represents the data captured in the State's eProcurement (eP) system, Bidsync, as of March 16, 2009. The data provided is an extract from that system for fiscal years 2012-2013, 2013-2014, and 2014-2015

    Data Limitations:
    Some purchase orders have multiple UNSPSC numbers, however only first was used to identify the purchase order. Multiple UNSPSC numbers were included to provide additional data for a DGS special event however this affects the formatting of the file. The source system Bidsync is being deprecated and these issues will be resolved in the future as state systems transition to Fi$cal.

    Data Collection Methodology:

    The data collection process starts with a data file from eSCPRS that is scrubbed and standardized prior to being uploaded into a SQL Server database. There are four primary tables. The Supplier, Department and United Nations Standard Products and Services Code (UNSPSC) tables are reference tables. The Supplier and Department tables are updated and mapped to the appropriate numbering schema and naming conventions. The UNSPSC table is used to categorize line item information and requires no further manipulation. The Purchase Order table contains raw data that requires conversion to the correct data format and mapping to the corresponding data fields. A stacking method is applied to the table to eliminate blanks where needed. Extraneous characters are removed from fields. The four tables are joined together and queries are executed to update the final Purchase Order Dataset table. Once the scrubbing and standardization process is complete the data is then uploaded into the SQL Server database.

    Secondary/Related Resources:

  6. SQL Bike Stores

    • kaggle.com
    Updated Nov 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamed ZRIRAK (2024). SQL Bike Stores [Dataset]. https://www.kaggle.com/datasets/mohamedzrirak/sql-bkestores
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 21, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mohamed ZRIRAK
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Download: SQL Query This SQL project is focused on analyzing sales data from a relational database to gain insights into customer behavior, store performance, product sales, and the effectiveness of sales representatives. By executing a series of complex SQL queries across multiple tables, the project aggregates key metrics, such as total units sold and total revenue, and links them with customer, store, product, and staff details.

    Key Objectives:

    Customer Analysis: Understand customer purchasing patterns by analyzing the total number of units and revenue generated per customer. Product and Category Insights: Evaluate product performance and its category’s impact on overall sales. Store Performance: Identify which stores generate the most revenue and handle the highest sales volume. Sales Representative Effectiveness: Assess the performance of sales representatives by linking sales data with each representative’s handled orders. Techniques Used:

    SQL Joins: The project integrates data from multiple tables, including orders, customers, order_items, products, categories, stores, and staffs, using INNER JOIN to merge information from related tables. Aggregation: SUM functions are used to compute total units sold and revenue generated by each order, providing valuable insights into sales performance. Grouping: Data is grouped by order ID, customer, product, store, and sales representative, ensuring accurate and summarized sales metrics. Use Cases:

    Business Decision-Making: The analysis can help businesses identify high-performing products and stores, optimize inventory, and evaluate the impact of sales teams. Market Segmentation: Segment customers based on geographic location (city/state) and identify patterns in purchasing behavior. Sales Strategy Optimization: Provide recommendations to improve sales strategies by analyzing product categories and sales rep performance.

  7. P

    BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation)...

    • paperswithcode.com
    Updated Sep 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jinyang Li; Binyuan Hui; Ge Qu; Jiaxi Yang; Binhua Li; Bowen Li; Bailin Wang; Bowen Qin; Rongyu Cao; Ruiying Geng; Nan Huo; Xuanhe Zhou; Chenhao Ma; Guoliang Li; Kevin C. C. Chang; Fei Huang; Reynold Cheng; Yongbin Li (2024). BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation) Dataset [Dataset]. https://paperswithcode.com/dataset/bird-sql
    Explore at:
    Dataset updated
    Sep 24, 2024
    Authors
    Jinyang Li; Binyuan Hui; Ge Qu; Jiaxi Yang; Binhua Li; Bowen Li; Bailin Wang; Bowen Qin; Rongyu Cao; Ruiying Geng; Nan Huo; Xuanhe Zhou; Chenhao Ma; Guoliang Li; Kevin C. C. Chang; Fei Huang; Reynold Cheng; Yongbin Li
    Description

    BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation) represents a pioneering, cross-domain dataset that examines the impact of extensive database contents on text-to-SQL parsing. BIRD contains over 12,751 unique question-SQL pairs and 95 big databases with a total size of 33.4 GB. It also covers more than 37 professional domains, such as blockchain, hockey, healthcare and education, etc.

  8. h

    spider

    • huggingface.co
    • opendatalab.com
    Updated Dec 9, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    XLang NLP Lab (2021). spider [Dataset]. https://huggingface.co/datasets/xlangai/spider
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 9, 2021
    Dataset authored and provided by
    XLang NLP Lab
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for Spider

      Dataset Summary
    

    Spider is a large-scale complex and cross-domain semantic parsing and text-to-SQL dataset annotated by 11 Yale students. The goal of the Spider challenge is to develop natural language interfaces to cross-domain databases.

      Supported Tasks and Leaderboards
    

    The leaderboard can be seen at https://yale-lily.github.io/spider

      Languages
    

    The text in the dataset is in English.

      Dataset Structure
    
    
    
    
    
      Data… See the full description on the dataset page: https://huggingface.co/datasets/xlangai/spider.
    
  9. W

    London Development Database SQL Extract

    • cloud.csiss.gmu.edu
    • data.europa.eu
    • +1more
    bin, pdf
    Updated May 25, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Greater London Authority (GLA) (2018). London Development Database SQL Extract [Dataset]. https://cloud.csiss.gmu.edu/uddi/dataset/london-development-database-sql-extract
    Explore at:
    pdf, binAvailable download formats
    Dataset updated
    May 25, 2018
    Dataset provided by
    Greater London Authority (GLA)
    License

    http://reference.data.gov.uk/id/open-government-licencehttp://reference.data.gov.uk/id/open-government-licence

    Area covered
    London
    Description

    This is a copy of the London Development Database.

    This is the entire LDD database exported as a .sql.tar using pg_dump. For information on how to use this file and details of the database tables please refer to the document London Development database export.pdf

    The permissions data within this extract includes anything submitted to LDD by 23/05/2018. All data is provided by London’s planning authorities.

    An extract from the database can be downloaded from the London Datastore and data can be viewed on a map at https://maps.london.gov.uk/map/?ldd

  10. Playlist2vec: Spotify Million Playlist Dataset

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated Jun 22, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Piyush Papreja; Piyush Papreja (2021). Playlist2vec: Spotify Million Playlist Dataset [Dataset]. http://doi.org/10.5281/zenodo.5002584
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 22, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Piyush Papreja; Piyush Papreja
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset was created using Spotify developer API. It consists of user-created as well as Spotify-curated playlists.
    The dataset consists of 1 million playlists, 3 million unique tracks, 3 million unique albums, and 1.3 million artists.
    The data is stored in a SQL database, with the primary entities being songs, albums, artists, and playlists.
    Each of the aforementioned entities are represented by unique IDs (Spotify URI).
    Data is stored into following tables:

    • album
    • artist
    • track
    • playlist
    • track_artist1
    • track_playlist1

    album

    | id | name | uri |

    id: Album ID as provided by Spotify
    name: Album Name as provided by Spotify
    uri: Album URI as provided by Spotify


    artist

    | id | name | uri |

    id: Artist ID as provided by Spotify
    name: Artist Name as provided by Spotify
    uri: Artist URI as provided by Spotify


    track

    | id | name | duration | popularity | explicit | preview_url | uri | album_id |

    id: Track ID as provided by Spotify
    name: Track Name as provided by Spotify
    duration: Track Duration (in milliseconds) as provided by Spotify
    popularity: Track Popularity as provided by Spotify
    explicit: Whether the track has explicit lyrics or not. (true or false)
    preview_url: A link to a 30 second preview (MP3 format) of the track. Can be null
    uri: Track Uri as provided by Spotify
    album_id: Album Id to which the track belongs


    playlist

    | id | name | followers | uri | total_tracks |

    id: Playlist ID as provided by Spotify
    name: Playlist Name as provided by Spotify
    followers: Playlist Followers as provided by Spotify
    uri: Playlist Uri as provided by Spotify
    total_tracks: Total number of tracks in the playlist.

    track_artist1

    | track_id | artist_id |

    Track-Artist association table

    track_playlist1

    | track_id | playlist_id |

    Track-Playlist association table

    - - - - - SETUP - - - - -


    The data is in the form of a SQL dump. The download size is about 10 GB, and the database populated from it comes out to about 35GB.

    spotifydbdumpschemashare.sql contains the schema for the database (for reference):
    spotifydbdumpshare.sql is the actual data dump.


    Setup steps:
    1. Create database

    - - - - - PAPER - - - - -


    The description of this dataset can be found in the following paper:

    Papreja P., Venkateswara H., Panchanathan S. (2020) Representation, Exploration and Recommendation of Playlists. In: Cellier P., Driessens K. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Communications in Computer and Information Science, vol 1168. Springer, Cham

  11. P

    Spider-Realistic Dataset

    • paperswithcode.com
    • opendatalab.com
    • +1more
    Updated Sep 11, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiang Deng; Ahmed Hassan Awadallah; Christopher Meek; Oleksandr Polozov; Huan Sun; Matthew Richardson (2021). Spider-Realistic Dataset [Dataset]. https://paperswithcode.com/dataset/spider-realistic
    Explore at:
    Dataset updated
    Sep 11, 2021
    Authors
    Xiang Deng; Ahmed Hassan Awadallah; Christopher Meek; Oleksandr Polozov; Huan Sun; Matthew Richardson
    Description

    Spider dataset is used for evaluation in the paper "Structure-Grounded Pretraining for Text-to-SQL". The dataset is created based on the dev split of the Spider dataset (2020-06-07 version from https://yale-lily.github.io/spider). We manually modified the original questions to remove the explicit mention of column names while keeping the SQL queries unchanged to better evaluate the model's capability in aligning the NL utterance and the DB schema. For more details, please check our paper at https://arxiv.org/abs/2010.12773.

  12. Google Patents Public Data

    • kaggle.com
    zip
    Updated Sep 19, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2018). Google Patents Public Data [Dataset]. https://www.kaggle.com/datasets/bigquery/patents
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Sep 19, 2018
    Dataset provided by
    Googlehttp://google.com/
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Fork this notebook to get started on accessing data in the BigQuery dataset by writing SQL queries using the BQhelper module.

    Context

    Google Patents Public Data, provided by IFI CLAIMS Patent Services, is a worldwide bibliographic and US full-text dataset of patent publications. Patent information accessibility is critical for examining new patents, informing public policy decisions, managing corporate investment in intellectual property, and promoting future scientific innovation. The growing number of available patent data sources means researchers often spend more time downloading, parsing, loading, syncing and managing local databases than conducting analysis. With these new datasets, researchers and companies can access the data they need from multiple sources in one place, thus spending more time on analysis than data preparation.

    Content

    The Google Patents Public Data dataset contains a collection of publicly accessible, connected database tables for empirical analysis of the international patent system.

    Acknowledgements

    Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:patents

    For more info, see the documentation at https://developers.google.com/web/tools/chrome-user-experience-report/

    “Google Patents Public Data” by IFI CLAIMS Patent Services and Google is licensed under a Creative Commons Attribution 4.0 International License.

    Banner photo by Helloquence on Unsplash

  13. O*NET Database

    • onetcenter.org
    excel, mysql, oracle +2
    Updated May 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Center for O*NET Development (2025). O*NET Database [Dataset]. https://www.onetcenter.org/database.html
    Explore at:
    oracle, sql server, text, mysql, excelAvailable download formats
    Dataset updated
    May 20, 2025
    Dataset provided by
    Occupational Information Network
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States
    Dataset funded by
    US Department of Labor, Employment and Training Administration
    Description

    The O*NET Database contains hundreds of standardized and occupation-specific descriptors on almost 1,000 occupations covering the entire U.S. economy. The database, which is available to the public at no cost, is continually updated by a multi-method data collection program. Sources of data include: job incumbents, occupational experts, occupational analysts, employer job postings, and customer/professional association input.

    Data content areas include:

    • Worker Characteristics (e.g., Abilities, Interests, Work Styles)
    • Worker Requirements (e.g., Education, Knowledge, Skills)
    • Experience Requirements (e.g., On-the-Job Training, Work Experience)
    • Occupational Requirements (e.g., Detailed Work Activities, Work Context)
    • Occupation-Specific Information (e.g., Job Titles, Tasks, Technology Skills)

  14. P

    CoSQL Dataset

    • paperswithcode.com
    Updated Aug 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tao Yu; Rui Zhang; He Yang Er; Suyi Li; Eric Xue; Bo Pang; Xi Victoria Lin; Yi Chern Tan; Tianze Shi; Zihan Li; Youxuan Jiang; Michihiro Yasunaga; Sungrok Shim; Tao Chen; Alexander Fabbri; Zifan Li; Luyao Chen; Yuwen Zhang; Shreya Dixit; Vincent Zhang; Caiming Xiong; Richard Socher; Walter S. Lasecki; Dragomir Radev (2024). CoSQL Dataset [Dataset]. https://paperswithcode.com/dataset/cosql
    Explore at:
    Dataset updated
    Aug 8, 2024
    Authors
    Tao Yu; Rui Zhang; He Yang Er; Suyi Li; Eric Xue; Bo Pang; Xi Victoria Lin; Yi Chern Tan; Tianze Shi; Zihan Li; Youxuan Jiang; Michihiro Yasunaga; Sungrok Shim; Tao Chen; Alexander Fabbri; Zifan Li; Luyao Chen; Yuwen Zhang; Shreya Dixit; Vincent Zhang; Caiming Xiong; Richard Socher; Walter S. Lasecki; Dragomir Radev
    Description

    CoSQL is a corpus for building cross-domain, general-purpose database (DB) querying dialogue systems. It consists of 30k+ turns plus 10k+ annotated SQL queries, obtained from a Wizard-of-Oz (WOZ) collection of 3k dialogues querying 200 complex DBs spanning 138 domains. Each dialogue simulates a real-world DB query scenario with a crowd worker as a user exploring the DB and a SQL expert retrieving answers with SQL, clarifying ambiguous questions, or otherwise informing of unanswerable questions.

  15. Geodatabase for the Baltimore Ecosystem Study Spatial Data

    • search.dataone.org
    • portal.edirepository.org
    Updated Apr 1, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Spatial Analysis Lab; Jarlath O'Neal-Dunne; Morgan Grove (2020). Geodatabase for the Baltimore Ecosystem Study Spatial Data [Dataset]. https://search.dataone.org/view/https%3A%2F%2Fpasta.lternet.edu%2Fpackage%2Fmetadata%2Feml%2Fknb-lter-bes%2F3120%2F150
    Explore at:
    Dataset updated
    Apr 1, 2020
    Dataset provided by
    Long Term Ecological Research Networkhttp://www.lternet.edu/
    Authors
    Spatial Analysis Lab; Jarlath O'Neal-Dunne; Morgan Grove
    Time period covered
    Jan 1, 1999 - Jun 1, 2014
    Area covered
    Description

    The establishment of a BES Multi-User Geodatabase (BES-MUG) allows for the storage, management, and distribution of geospatial data associated with the Baltimore Ecosystem Study. At present, BES data is distributed over the internet via the BES website. While having geospatial data available for download is a vast improvement over having the data housed at individual research institutions, it still suffers from some limitations. BES-MUG overcomes these limitations; improving the quality of the geospatial data available to BES researches, thereby leading to more informed decision-making. BES-MUG builds on Environmental Systems Research Institute's (ESRI) ArcGIS and ArcSDE technology. ESRI was selected because its geospatial software offers robust capabilities. ArcGIS is implemented agency-wide within the USDA and is the predominant geospatial software package used by collaborating institutions. Commercially available enterprise database packages (DB2, Oracle, SQL) provide an efficient means to store, manage, and share large datasets. However, standard database capabilities are limited with respect to geographic datasets because they lack the ability to deal with complex spatial relationships. By using ESRI's ArcSDE (Spatial Database Engine) in conjunction with database software, geospatial data can be handled much more effectively through the implementation of the Geodatabase model. Through ArcSDE and the Geodatabase model the database's capabilities are expanded, allowing for multiuser editing, intelligent feature types, and the establishment of rules and relationships. ArcSDE also allows users to connect to the database using ArcGIS software without being burdened by the intricacies of the database itself. For an example of how BES-MUG will help improve the quality and timeless of BES geospatial data consider a census block group layer that is in need of updating. Rather than the researcher downloading the dataset, editing it, and resubmitting to through ORS, access rules will allow the authorized user to edit the dataset over the network. Established rules will ensure that the attribute and topological integrity is maintained, so that key fields are not left blank and that the block group boundaries stay within tract boundaries. Metadata will automatically be updated showing who edited the dataset and when they did in the event any questions arise. Currently, a functioning prototype Multi-User Database has been developed for BES at the University of Vermont Spatial Analysis Lab, using Arc SDE and IBM's DB2 Enterprise Database as a back end architecture. This database, which is currently only accessible to those on the UVM campus network, will shortly be migrated to a Linux server where it will be accessible for database connections over the Internet. Passwords can then be handed out to all interested researchers on the project, who will be able to make a database connection through the Geographic Information Systems software interface on their desktop computer. This database will include a very large number of thematic layers. Those layers are currently divided into biophysical, socio-economic and imagery categories. Biophysical includes data on topography, soils, forest cover, habitat areas, hydrology and toxics. Socio-economics includes political and administrative boundaries, transportation and infrastructure networks, property data, census data, household survey data, parks, protected areas, land use/land cover, zoning, public health and historic land use change. Imagery includes a variety of aerial and satellite imagery. See the readme: http://96.56.36.108/geodatabase_SAL/readme.txt See the file listing: http://96.56.36.108/geodatabase_SAL/diroutput.txt

  16. d

    Current Population Survey (CPS)

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Damico, Anthony (2023). Current Population Survey (CPS) [Dataset]. http://doi.org/10.7910/DVN/AK4FDD
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Damico, Anthony
    Description

    analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D

  17. a

    [udemy] SQL Masterclass for Financial Analysis & Financial Reporting

    • academictorrents.com
    bittorrent
    Updated Dec 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Irfan Sharif (2024). [udemy] SQL Masterclass for Financial Analysis & Financial Reporting [Dataset]. https://academictorrents.com/details/161282939abe2462e37cd8a59664043716a1a529
    Explore at:
    bittorrent(735121934)Available download formats
    Dataset updated
    Dec 26, 2024
    Dataset authored and provided by
    Irfan Sharif
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    Official Course URL: udemy.com/course/sql-for-financial-data-analysis/ Course Overview: Unlock the power of SQL for financial data analysis and reporting. This course is tailored for non-tech professionals who want to streamline their analytics and reporting capabilities. Learn to extract and process financial data, prepare detailed reports like Profit & Loss Statements and Balance Sheets, and calculate critical financial ratios through practical exercises. What You ll Learn: - SQL Basics: Master database querying techniques for financial data. - Report Preparation: Create Profit & Loss Statements, Balance Sheets, and Cash Flow Statements. - Key Analytics: Calculate and interpret profitability, efficiency, and liquidity ratios. - Database Skills: Gain hands-on experience without prior technical expertise. Course Benefits: - Practical Applications: Apply SQL to real-world financial scenarios. - Independent Reporting: Reduce reliance on system-generated reports. - Career Advancem

  18. a

    Sci-Hub SQL Database (2020-05-30)

    • academictorrents.com
    bittorrent
    Updated Jul 7, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Library Genesis (2020). Sci-Hub SQL Database (2020-05-30) [Dataset]. https://academictorrents.com/details/4b13244559282f9650a382f70506dc4c516215e2
    Explore at:
    bittorrent(10352938638)Available download formats
    Dataset updated
    Jul 7, 2020
    Dataset authored and provided by
    Library Genesis
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    Sci-Hub is a website that provides free access to scientific research articles and books by bypassing publisher paywalls. Sci-Hub does not make an article database publicly available. Instead, Library Genesis indexes files provided by Sci-Hub into a separate database (internally named "scimag"). The Library Genesis "scimag" database indexed 82,513,235 articles as of 2020-07-07. The database consists only of DOI and article metadata and does not contain the articles themselves. Each row of the database represents an entry for a full-text scientific article, uniquely identified by its DOI. Timestamped 2020-05-30 04:54 Current as of 2020-07-07 File hashes MD5:1CC808AD4ACC430A4B3A40892252793A SHA-1:5369999C6A534A693D41D32A7D5869505C292155 Related research Cabanac, G. (2016). Bibliogifts in LibGen? A study of a text-sharing platform driven by biblioleaks and crowdsourcing. Journal of the Association for Information Science and Technology, 67(4), 874–884.

  19. Opening the Valve on Pure Data Dataset

    • zenodo.org
    application/gzip, zip
    Updated Feb 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anisha Islam; Anisha Islam; Kalvin Eng; Kalvin Eng; Abram Hindle; Abram Hindle (2024). Opening the Valve on Pure Data Dataset [Dataset]. http://doi.org/10.5281/zenodo.10576757
    Explore at:
    zip, application/gzipAvailable download formats
    Dataset updated
    Feb 7, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anisha Islam; Anisha Islam; Kalvin Eng; Kalvin Eng; Abram Hindle; Abram Hindle
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This page contains the i) SQLite database, and ii) scripts and instructions for the paper titled Opening the Valve on Pure-Data: Usage Patterns and Programming Practices of a Data-Flow Based Visual Programming Language.

    We have provided two main files in this link:

    1. dataset.tar.gz
    2. scripts_and_instructions.zip

    Additionally, the i) SQLite database, ii) scripts and instructions, and iii) mirrored repositories of the PD projects can also be found in the following link: https://archive.org/details/Opening_the_Valve_on_Pure_Data.

    The download instructions are as follows:

    1. Our dataset is available at this link and also at archive.org and at as a file titled dataset.tar.gz (~1.12GB). You can download the file and then you can unzip the database by running tar -xzf dataset.tar.gz.
    2. You can also find the scripts and instructions needed to use our database and replicate our work inside the scripts_and_instructions.zip (~116MB) file, which you can download from this link and also from the same archive.org link. After that, you can unzip the scripts_and_instructions.zip file by using the command: unzip scripts_and_instructions.zip.
    3. Finally, the mirrored PD repositories are available at archive.org. The file is titled pd_mirrored.tar.gz (~242.5GB). You can download the zipped folder of the mirrored repositories using the following command: wget -c https://archive.org/download/Opening_the_Valve_on_Pure_Data/pd_mirrored.tar.gz. After that, you can unzip the file using tar -xzf pd_mirrored.tar.gz.

    You can find a README.md file inside the unzipped directory titled scripts_and_instructions detailing the structure and usage of our dataset, along with some sample SQL queries and additional helper scripts for the database. Furthermore, we have provided instructions for replicating our work in the same README file.

  20. f

    A user-friendly extract of the LibGen scimag metadata SQL dump on 2017-04-07...

    • figshare.com
    application/x-rar
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Himmelstein; Stephen McLaughlin (2023). A user-friendly extract of the LibGen scimag metadata SQL dump on 2017-04-07 [Dataset]. http://doi.org/10.6084/m9.figshare.5231245.v1
    Explore at:
    application/x-rarAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Authors
    Daniel Himmelstein; Stephen McLaughlin
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains the metadata for the LibGen scimag database of full-text scholarly documents. Each row of this dataset corresponds to a scholarly document in the LibGen scimag database, as identified by its DOI.scimag_dbbackup-2017-04-07.rar was downloaded from http://libgen.io/dbdumps/backup_archive/scimag_dbbackup-2017-04-07.rar. It's a compressed SQL dump of the LibGen scimag metadata database on 2017-04-07. This is the unmodified file downloaded from libgen.io. It encodes a single table named scimag.libgen-scimag-2017-04-07.tsv.xz contains a TSV version of the scimag table from scimag_dbbackup-2017-04-07.rar. It's more user-friendly because it provides access to the data without requiring MySQL, is UTF-8 encoded, and has null bytes removed.The code that downloaded and processed these datasets is at https://git.io/v7Uh4. Users should note that the TimeAdded column appears to store the modification rather than the creation date for each DOI. As discussed in https://doi.org/b9s5, this field should not be mistaken for the date of first upload to LibGen scimag.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Victor Zhong; Caiming Xiong; Richard Socher, WikiSQL Dataset [Dataset]. https://paperswithcode.com/dataset/wikisql

WikiSQL Dataset

Explore at:
Authors
Victor Zhong; Caiming Xiong; Richard Socher
Description

WikiSQL consists of a corpus of 87,726 hand-annotated SQL query and natural language question pairs. These SQL queries are further split into training (61,297 examples), development (9,145 examples) and test sets (17,284 examples). It can be used for natural language inference tasks related to relational databases.

Search
Clear search
Close search
Google apps
Main menu