50 datasets found
  1. h

    hacker-news-posts

    • huggingface.co
    Updated Jan 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julien C (2025). hacker-news-posts [Dataset]. http://doi.org/10.57967/hf/6381
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 21, 2025
    Authors
    Julien C
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Hacker News Stories Dataset

    This is a dataset containing approximately 4 million stories from Hacker News, exported to a CSV file. The dataset includes the following fields:

    id (int64): The unique identifier of the story. title (string): The title of the story. url (string): The URL of the story. score (int64): The score of the story. time (int64): The time the story was posted, in Unix time. comments (int64): The number of comments on the story. author (string): The username of… See the full description on the dataset page: https://huggingface.co/datasets/julien040/hacker-news-posts.

  2. Data from: Hacker News

    • kaggle.com
    zip
    Updated Feb 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hacker News (2019). Hacker News [Dataset]. https://www.kaggle.com/datasets/hacker-news/hacker-news
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Feb 12, 2019
    Dataset authored and provided by
    Hacker Newshttp://news.ycombinator.com/
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    This dataset contains all stories and comments from Hacker News from its launch in 2006. Each story contains a story id, the author that made the post, when it was written, and the number of points the story received. Hacker News is a social news website focusing on computer science and entrepreneurship. It is run by Paul Graham's investment fund and startup incubator, Y Combinator. In general, content that can be submitted is defined as "anything that gratifies one's intellectual curiosity".

    Content

    Each story contains a story ID, the author that made the post, when it was written, and the number of points the story received.

    Please note that the text field includes profanity. All texts are the author’s own, do not necessarily reflect the positions of Kaggle or Hacker News, and are presented without endorsement.

    Querying BigQuery tables

    You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.hacker_news.[TABLENAME]. Fork this kernel to get started.

    Acknowledgements

    This dataset was kindly made publicly available by Hacker News under the MIT license.

    Inspiration

    • Recent studies have found that many forums tend to be dominated by a very small fraction of users. Is this true of Hacker News?

    • Hacker News has received complaints that the site is biased towards Y Combinator startups. Do the data support this?

    • Is the amount of coverage by Hacker News predictive of a startup’s success?

  3. Data from: hacker-news

    • huggingface.co
    Updated Dec 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenPipe (2024). hacker-news [Dataset]. https://huggingface.co/datasets/OpenPipe/hacker-news
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 5, 2024
    Dataset provided by
    OpenPipe, Inc.
    Authors
    OpenPipe
    Description

    Hacker News posts and comments

    This is a dataset of all HN posts and comments, current as of November 1, 2023.

  4. c

    Hacker News Sentiment Analysis Dataset

    • cubig.ai
    Updated Jul 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). Hacker News Sentiment Analysis Dataset [Dataset]. https://cubig.ai/store/products/586/hacker-news-sentiment-analysis-dataset
    Explore at:
    Dataset updated
    Jul 14, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Privacy-preserving data transformation via differential privacy, Synthetic data generation using AI techniques for model training
    Description

    1) Data Introduction • The Hacker News Sentiment Analysis Dataset is a technology community public opinion analysis data that provides an emotional analysis (polarity, subjectivity, and emotional categories) of each of the top 141 hacker news posts along with the title, URL, point, and comment count.

    2) Data Utilization (1) Hacker News Sentiment Analysis Dataset has characteristics that: • This dataset includes polar (-1-1), subjectivity (0-1), and category (positive/neutral/negative) columns that quantify the sentiment of comments using TextBlob, based on the latest top posts as of June 24, 2025. • It is generated through web scraping and NLP preprocessing, and allows for quantitative comparison of community responses to technology news. (2) Hacker News Sentiment Analysis Dataset can be used to: • Visualize technology trends Emotional: Connect emotional scores with post topics to visually analyze community response patterns to specific technology news such as AI and policies. • NLP Model Learning: Emotional classification models can be trained using comment data with real-world technical discussions or applied to research on the subjectivity prediction of comments.

  5. h

    hacker-news-dataset

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sahil, hacker-news-dataset [Dataset]. https://huggingface.co/datasets/labofsahil/hacker-news-dataset
    Explore at:
    Dataset authored and provided by
    Sahil
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Schema

    field name type description

    title STRING Story title

    url STRING Story url

    text STRING Story or comment text

    dead BOOLEAN Is dead?

    by STRING The username of the item's author.

    score INTEGER Story score

    time INTEGER Unix time

    timestamp TIMESTAMP Timestamp for the unix time

    type STRING type of details (comment, comment_ranking, poll, story, job, pollopt)

    id INTEGER The item's unique id.

    parent INTEGER Parent comment ID descendants INTEGER… See the full description on the dataset page: https://huggingface.co/datasets/labofsahil/hacker-news-dataset.

  6. Data from: Hacker News

    • console.cloud.google.com
    Updated Jul 21, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Y%20Combinator (2018). Hacker News [Dataset]. https://console.cloud.google.com/marketplace/product/y-combinator/hacker-news
    Explore at:
    Dataset updated
    Jul 21, 2018
    Dataset provided by
    Googlehttp://google.com/
    Description

    This dataset contains all stories and comments from Hacker News from its launch in 2006 to present. Each story contains a story ID, the author that made the post, when it was written, and the number of points the story received. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  7. Hacker News Curated Comments Dataset

    • zenodo.org
    csv
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christopher Moody; Christopher Moody (2020). Hacker News Curated Comments Dataset [Dataset]. http://doi.org/10.5281/zenodo.45901
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Christopher Moody; Christopher Moody
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    A curated dataset from fh-bigquery:hackernews.stories

    Only HN stories with more than 10 comments are included, and only comments from users with more than 10 comments are included.

  8. h

    hacker-news-corpus-2007-2022

    • huggingface.co
    Updated Jul 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jacob Keisling (2023). hacker-news-corpus-2007-2022 [Dataset]. https://huggingface.co/datasets/jkeisling/hacker-news-corpus-2007-2022
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 10, 2023
    Authors
    Jacob Keisling
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Hacker News corpus, 2007-Nov 2022

      Dataset Description
    
    
    
    
    
      Dataset Summary
    

    Dataset Name: Hacker News Full Corpus (2007 - November 2022) Description:

    NOTE: I am not affiliated with Y Combinator.

    This dataset is a July 2023 snapshot of YCombinator's BigQuery dump of the entire archive of posts and comments made on Hacker News. It contains posts from Hacker News' inception in 2007 through to November 16, 2022, when the BigQuery database was last updated. The dataset… See the full description on the dataset page: https://huggingface.co/datasets/jkeisling/hacker-news-corpus-2007-2022.

  9. Hacker News Stories

    • kaggle.com
    zip
    Updated Dec 29, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ashish01 (2020). Hacker News Stories [Dataset]. https://www.kaggle.com/ashish01/hacker-news-stories
    Explore at:
    zip(278553725 bytes)Available download formats
    Dataset updated
    Dec 29, 2020
    Authors
    ashish01
    Description

    Dataset

    This dataset was created by ashish01

    Contents

  10. Hacker News tokenized

    • kaggle.com
    zip
    Updated Apr 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michał Paliński (2021). Hacker News tokenized [Dataset]. https://www.kaggle.com/michapaliski/hacker-news-tokenized
    Explore at:
    zip(31166288 bytes)Available download formats
    Dataset updated
    Apr 26, 2021
    Authors
    Michał Paliński
    Description

    Dataset

    This dataset was created by Michał Paliński

    Contents

    It contains the following files:

  11. h

    hacker_news_with_comments

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KunLi, hacker_news_with_comments [Dataset]. https://huggingface.co/datasets/Linkseed/hacker_news_with_comments
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    KunLi
    License

    https://choosealicense.com/licenses/afl-3.0/https://choosealicense.com/licenses/afl-3.0/

    Description

    Dataset Card for [Dataset Name]

      Dataset Summary
    

    Hacker news until 2015 with comments. Collect from Google BigQuery open dataset. We didn't do any pre-processing except remove HTML tags.

      Supported Tasks and Leaderboards
    

    Comment Generation; News analysis with comments; Other comment-based NLP tasks.

      Languages
    

    English

      Data Fields
    

    [More Information Needed]

      Data Splits
    

    [More Information Needed]

      Dataset Creation… See the full description on the dataset page: https://huggingface.co/datasets/Linkseed/hacker_news_with_comments.
    
  12. Hacker News lda2vec model pretrained word vectors

    • zenodo.org
    bin
    Updated Jan 24, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christopher Moody; Christopher Moody (2020). Hacker News lda2vec model pretrained word vectors [Dataset]. http://doi.org/10.5281/zenodo.49902
    Explore at:
    binAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Christopher Moody; Christopher Moody
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description
  13. h

    hacker-news-discussion-summarization

    • huggingface.co
    Updated Mar 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    George Chiramattel (2025). hacker-news-discussion-summarization [Dataset]. https://huggingface.co/datasets/georgeck/hacker-news-discussion-summarization
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 4, 2025
    Authors
    George Chiramattel
    Description

    georgeck/hacker-news-discussion-summarization dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. Hacker News posts from 2006 to 2015

    • kaggle.com
    Updated Mar 21, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hamza Jabbar Khan (2020). Hacker News posts from 2006 to 2015 [Dataset]. https://www.kaggle.com/hamzajabbarkhan/hacker-news-posts-from-2006-to-2015/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 21, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Hamza Jabbar Khan
    Description

    Dataset

    This dataset was created by Hamza Jabbar Khan

    Contents

  15. h

    hacker-news-regressor-dataset

    • huggingface.co
    Updated Apr 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Guillaume Boniface-Chang (2025). hacker-news-regressor-dataset [Dataset]. https://huggingface.co/datasets/gbonifacechang/hacker-news-regressor-dataset
    Explore at:
    Dataset updated
    Apr 17, 2025
    Authors
    Guillaume Boniface-Chang
    Description

    gbonifacechang/hacker-news-regressor-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  16. Data from: TDMentions: A Dataset of Technical Debt Mentions in Online Posts

    • zenodo.org
    • data.niaid.nih.gov
    bin, bz2
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Morgan Ericsson; Morgan Ericsson; Anna Wingkvist; Anna Wingkvist (2020). TDMentions: A Dataset of Technical Debt Mentions in Online Posts [Dataset]. http://doi.org/10.5281/zenodo.2593142
    Explore at:
    bin, bz2Available download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Morgan Ericsson; Morgan Ericsson; Anna Wingkvist; Anna Wingkvist
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    # TDMentions: A Dataset of Technical Debt Mentions in Online Posts (version 1.0)

    TDMentions is a dataset that contains mentions of technical debt from Reddit, Hacker News, and Stack Exchange. It also contains a list of blog posts on Medium that were tagged as technical debt. The dataset currently contains approximately 35,000 items.

    ## Data collection and processing

    The dataset is mainly collected from existing datasets. We used data from:

    - the archive of Reddit posts by Jason Baumgartner (available at [https://pushshift.io](https://pushshift.io),
    - the archive of Hacker News available at Google's BigQuery (available at [https://console.cloud.google.com/marketplace/details/y-combinator/hacker-news](https://console.cloud.google.com/marketplace/details/y-combinator/hacker-news)), and the Stack Exchange data dump (available at [https://archive.org/details/stackexchange](https://archive.org/details/stackexchange)).
    - the [GHTorrent](http://ghtorrent.org) project
    - the [GH Archive](https://www.gharchive.org)

    The data set currently contains data from the start of each source/service until 2018-12-31. For GitHub, we currently only include data from 2015-01-01.

    We use the regular expression `tech(nical)?[\s\-_]*?debt` to find mentions in all sources except for Medium. We decided to limit our matches to variations of technical debt and tech debt. Other shorter forms, such as TD, can result in too many false positives. For Medium, we used the tag `technical-debt`.

    ## Data Format

    The dataset is stored as a compressed (bzip2) JSON file with one JSON object per line. Each mention is represented as a JSON object with the following keys.

    - `id`: the id used in the original source. We use the URL path to identify Medium posts.
    - `body`: the text that contains the mention. This is either the comment or the title of the post. For Medium posts this is the title and subtitle (which might not mention technical debt, since posts are identified by the tag).
    - `created_utc`: the time the item was posted in seconds since epoch in UTC.
    - `author`: the author of the item. We use the username or userid from the source.
    - `source`: where the item was posted. Valid sources are:
    - HackerNews Comment
    - HackerNews Job
    - HackerNews Submission
    - Reddit Comment
    - Reddit Submission
    - StackExchange Answer
    - StackExchange Comment
    - StackExchange Question
    - Medium Post
    - `meta`: Additional information about the item specific to the source. This includes, e.g., the subreddit a Reddit submission or comment was posted to, the score, etc. We try to use the same names, e.g., `score` and `num_comments` for keys that have the same meaning/information across multiple sources.

    This is a sample item from Reddit:

    ```JSON
    {
    "id": "ab8auf",
    "body": "Technical Debt Explained (x-post r/Eve)",
    "created_utc": 1546271789,
    "author": "totally_100_human",
    "source": "Reddit Submission",
    "meta": {
    "title": "Technical Debt Explained (x-post r/Eve)",
    "score": 1,
    "num_comments": 0,
    "url": "http://jestertrek.com/eve/technical-debt-2.png",
    "subreddit": "RCBRedditBot"
    }
    }
    ```

    ## Sample Analyses

    We decided to use JSON to store the data, since it is easy to work with from multiple programming languages. In the following examples, we use [`jq`](https://stedolan.github.io/jq/) to process the JSON.

    ### How many items are there for each source?

    ```
    lbzip2 -cd postscomments.json.bz2 | jq '.source' | sort | uniq -c
    ```

    ### How many submissions that mentioned technical debt were posted each month?

    ```
    lbzip2 -cd postscomments.json.bz2 | jq 'select(.source == "Reddit Submission") | .created_utc | strftime("%Y-%m")' | sort | uniq -c
    ```

    ### What are the titles of items that link (`meta.url`) to PDF documents?

    ```
    lbzip2 -cd postscomments.json.bz2 | jq '. as $r | select(.meta.url?) | .meta.url | select(endswith(".pdf")) | $r.body'
    ```

    ### Please, I want CSV!

    ```
    lbzip2 -cd postscomments.json.bz2 | jq -r '[.id, .body, .author] | @csv'
    ```

    Note that you need to specify the keys you want to include for the CSV, so it is easier to either ignore the meta information or process each source.

    Please see [https://github.com/sse-lnu/tdmentions](https://github.com/sse-lnu/tdmentions) for more analyses

    # Limitations and Future updates

    The current version of the dataset lacks GitHub data and Medium comments. GitHub data will be added in the next update. Medium comments (responses) will be added in a future update if we find a good way to represent these.

  17. h

    hacker-news-discussion-summarization-large

    • huggingface.co
    Updated Mar 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    George Chiramattel (2025). hacker-news-discussion-summarization-large [Dataset]. https://huggingface.co/datasets/georgeck/hacker-news-discussion-summarization-large
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 4, 2025
    Authors
    George Chiramattel
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for Hacker News Discussion Summarization - Large

      Dataset Summary
    

    This dataset comprises 14,531 records of Hacker News front-page stories collected over 516 days. Each record includes the story's metadata and its associated discussion threads, formatted to facilitate the development of summarization models.

      Supported Tasks and Leaderboards
    

    The primary task supported by this dataset is summarization, specifically targeting the summarization of… See the full description on the dataset page: https://huggingface.co/datasets/georgeck/hacker-news-discussion-summarization-large.

  18. Social Media Reactions to Open Source Promotions: AI-Powered GitHub Project...

    • zenodo.org
    zip
    Updated May 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous; Anonymous (2025). Social Media Reactions to Open Source Promotions: AI-Powered GitHub Project Posts on Hacker News [Dataset]. http://doi.org/10.5281/zenodo.15386236
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 15, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonymous; Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Fixes:

    • Added scripts
    • Added README explaining dataset schema and instructions
  19. h

    Data from: hacker-news

    • huggingface.co
    Updated Apr 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anant Narayanan (2023). hacker-news [Dataset]. https://huggingface.co/datasets/anantn/hacker-news
    Explore at:
    Dataset updated
    Apr 28, 2023
    Authors
    Anant Narayanan
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This repository contains the datasets for hacker news, used by https://github.com/anantn/hn-chatgpt-plugin As of June 2025, these are now exported as parquet files instead of sqlite for space efficiency

  20. O

    Oman Internet Usage: Social Media Market Share: Mobile: news.ycombinator.com...

    • ceicdata.com
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2025). Oman Internet Usage: Social Media Market Share: Mobile: news.ycombinator.com [Dataset]. https://www.ceicdata.com/en/oman/internet-usage-social-media-market-share/internet-usage-social-media-market-share-mobile-newsycombinatorcom
    Explore at:
    Dataset updated
    Jan 15, 2025
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 24, 2024 - Jan 4, 2025
    Area covered
    Oman
    Description

    Oman Internet Usage: Social Media Market Share: Mobile: news.ycombinator.com data was reported at 0.000 % in 05 Apr 2025. This stayed constant from the previous number of 0.000 % for 04 Apr 2025. Oman Internet Usage: Social Media Market Share: Mobile: news.ycombinator.com data is updated daily, averaging 0.000 % from May 2024 (Median) to 05 Apr 2025, with 56 observations. The data reached an all-time high of 0.190 % in 31 Dec 2024 and a record low of 0.000 % in 05 Apr 2025. Oman Internet Usage: Social Media Market Share: Mobile: news.ycombinator.com data remains active status in CEIC and is reported by Statcounter Global Stats. The data is categorized under Global Database’s Oman – Table OM.SC.IU: Internet Usage: Social Media Market Share.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Julien C (2025). hacker-news-posts [Dataset]. http://doi.org/10.57967/hf/6381

hacker-news-posts

Hacker News stories dataset

julien040/hacker-news-posts

Explore at:
17 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 21, 2025
Authors
Julien C
License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

Hacker News Stories Dataset

This is a dataset containing approximately 4 million stories from Hacker News, exported to a CSV file. The dataset includes the following fields:

id (int64): The unique identifier of the story. title (string): The title of the story. url (string): The URL of the story. score (int64): The score of the story. time (int64): The time the story was posted, in Unix time. comments (int64): The number of comments on the story. author (string): The username of… See the full description on the dataset page: https://huggingface.co/datasets/julien040/hacker-news-posts.

Search
Clear search
Close search
Google apps
Main menu