3 datasets found
  1. EurLex DataSet

    • kaggle.com
    zip
    Updated May 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Beshoy Hakeem (2025). EurLex DataSet [Dataset]. https://www.kaggle.com/datasets/puskas78/eurlex-dataset
    Explore at:
    zip(992021193 bytes)Available download formats
    Dataset updated
    May 23, 2025
    Authors
    Beshoy Hakeem
    Description

    The CEPS EurLex dataset The dataset contains 142.036 EU laws - almost the entire corpus of the EU's digitally available legal acts passed between 1952 - 2019. It encompasses the three types of legally binding acts passed by the EU institutions: 102.304 regulations, 4.070 directives, 35.798 decisions in English language. The dataset was scraped from the official EU legal database (Eur-lex.eu) and transformed in machine-readable CSV format with the programming languages R and Python. The dataset was collected by the Centre for European Policy Studies (CEPS) for the TRIGGER project (https://trigger-project.eu/). We hope that it will facilitate future quantitative and computational research on the EU.

    Brief description: - The dataset is organised in tabular format, with each law representing one row and the columns representing 23 variables. - The full text of 134.633 laws is included (column "act_raw_text"). For newer laws, the text was scraped from Eur-lex.eu via the HTML pages, while for older laws, the text was extracted from (scanned) PDF documents (if available in English). - 22 additional variables are included, such as 'Act_name', 'Act_type', 'Subject_matter', 'Authors', 'Date_document', 'ELI_link', 'CELEX' (a unique identifier for every law). Please see the "CEPS_EurLex_codebook.pdf" file for an explanation of all variables. - Given its size, the dataset was uploaded in different batches to facilitate usage. Some Excel files are provided for non-technical users. We recommend, however, the use of the CSV files, since Excel does not save large amounts of data properly. EurLex_all.csv is the master file containing all data.

    Caveats: - The Eur-lex.eu website does not consistently provide data for all the variables. In addition, the HTML documents were not always cleanly formatted and text extraction from scanned PDFs is not entirely clean. Some data points are therefore missing for some laws and some laws were excluded entirely. - Not not all (older) laws were available in English, especially since Ireland and the UK only joined the European Communities in 1973. Non-English laws are excluded from the dataset.

    Other: - For details on the types of EU legal acts: https://ec.europa.eu/info/law/law-making-process/types-eu-law_en - An example for an experimental analysis with this dataset: https://trigger-project.eu/2019/10/28/a-data-science-approach-to-eu-differentiated-integration/ - The TRIGGER project is funded by the EU's Horizon 2020 programme, grant number 822735 (2020-02-16)

  2. The CEPS EurLex dataset: 142.036 EU laws from 1952-2019 with full text and...

    • berd-platform.de
    csv
    Updated Jul 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Camille Borrett; Moritz Laurer; Camille Borrett; Moritz Laurer (2025). The CEPS EurLex dataset: 142.036 EU laws from 1952-2019 with full text and 22 variables [Dataset]. http://doi.org/10.82939/rce01-44p03
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jul 31, 2025
    Dataset provided by
    Centre for European Policy Studieshttps://www.ceps.eu/
    Authors
    Camille Borrett; Moritz Laurer; Camille Borrett; Moritz Laurer
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Jul 1, 2019
    Area covered
    European Union
    Description

    The dataset contains 142.036 EU laws - almost the entire corpus of the EU's digitally available legal acts passed between 1952 - 2019. It encompasses the three types of legally binding acts passed by the EU institutions: 102.304 regulations, 4.070 directives, 35.798 decisions in English language. The dataset was scraped from the official EU legal database (Eur-lex.eu) and transformed in machine-readable CSV format with the programming languages R and Python. The dataset was collected by the Centre for European Policy Studies (CEPS) for the TRIGGER project (https://trigger-project.eu/).It is about 1.5 GB large. We hope that it will facilitate future quantitative and computational research on the EU.

  3. H

    The CEPS EurLex dataset: 142.036 EU laws from 1952-2019 with full text and...

    • dataverse.harvard.edu
    csv, pdf, tsv
    Updated Jun 2, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harvard Dataverse (2020). The CEPS EurLex dataset: 142.036 EU laws from 1952-2019 with full text and 22 variables [Dataset]. http://doi.org/10.7910/DVN/0EGYWY
    Explore at:
    tsv(119723405), csv(1019978404), csv(248865834), pdf(136562), csv(1585521237), csv(289564219), tsv(75055125), csv(445965588), tsv(25746986), csv(481548943), tsv(3663564), tsv(50375826)Available download formats
    Dataset updated
    Jun 2, 2020
    Dataset provided by
    Harvard Dataverse
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    1952 - 2019
    Area covered
    European Union
    Dataset funded by
    European Union-
    Description

    The CEPS EurLex dataset The dataset contains 142.036 EU laws - almost the entire corpus of the EU's digitally available legal acts passed between 1952 - 2019. It encompasses the three types of legally binding acts passed by the EU institutions: 102.304 regulations, 4.070 directives, 35.798 decisions in English language. The dataset was scraped from the official EU legal database (Eur-lex.eu) and transformed in machine-readable CSV format with the programming languages R and Python. The dataset was collected by the Centre for European Policy Studies (CEPS) for the TRIGGER project (https://trigger-project.eu/). We hope that it will facilitate future quantitative and computational research on the EU. Brief description: - The dataset is organised in tabular format, with each law representing one row and the columns representing 23 variables. - The full text of 134.633 laws is included (column "act_raw_text"). For newer laws, the text was scraped from Eur-lex.eu via the HTML pages, while for older laws, the text was extracted from (scanned) PDF documents (if available in English). - 22 additional variables are included, such as 'Act_name', 'Act_type', 'Subject_matter', 'Authors', 'Date_document', 'ELI_link', 'CELEX' (a unique identifier for every law). Please see the "CEPS_EurLex_codebook.pdf" file for an explanation of all variables. - Given its size, the dataset was uploaded in different batches to facilitate usage. Some Excel files are provided for non-technical users. We recommend, however, the use of the CSV files, since Excel does not save large amounts of data properly. EurLex_all.csv is the master file containing all data. Caveats: - The Eur-lex.eu website does not consistently provide data for all the variables. In addition, the HTML documents were not always cleanly formatted and text extraction from scanned PDFs is not entirely clean. Some data points are therefore missing for some laws and some laws were excluded entirely. - Not not all (older) laws were available in English, especially since Ireland and the UK only joined the European Communities in 1973. Non-English laws are excluded from the dataset. Other: - For details on the types of EU legal acts: https://ec.europa.eu/info/law/law-making-process/types-eu-law_en - An example for an experimental analysis with this dataset: https://trigger-project.eu/2019/10/28/a-data-science-approach-to-eu-differentiated-integration/ - The TRIGGER project is funded by the EU's Horizon 2020 programme, grant number 822735

  4. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Beshoy Hakeem (2025). EurLex DataSet [Dataset]. https://www.kaggle.com/datasets/puskas78/eurlex-dataset
Organization logo

EurLex DataSet

Explore at:
zip(992021193 bytes)Available download formats
Dataset updated
May 23, 2025
Authors
Beshoy Hakeem
Description

The CEPS EurLex dataset The dataset contains 142.036 EU laws - almost the entire corpus of the EU's digitally available legal acts passed between 1952 - 2019. It encompasses the three types of legally binding acts passed by the EU institutions: 102.304 regulations, 4.070 directives, 35.798 decisions in English language. The dataset was scraped from the official EU legal database (Eur-lex.eu) and transformed in machine-readable CSV format with the programming languages R and Python. The dataset was collected by the Centre for European Policy Studies (CEPS) for the TRIGGER project (https://trigger-project.eu/). We hope that it will facilitate future quantitative and computational research on the EU.

Brief description: - The dataset is organised in tabular format, with each law representing one row and the columns representing 23 variables. - The full text of 134.633 laws is included (column "act_raw_text"). For newer laws, the text was scraped from Eur-lex.eu via the HTML pages, while for older laws, the text was extracted from (scanned) PDF documents (if available in English). - 22 additional variables are included, such as 'Act_name', 'Act_type', 'Subject_matter', 'Authors', 'Date_document', 'ELI_link', 'CELEX' (a unique identifier for every law). Please see the "CEPS_EurLex_codebook.pdf" file for an explanation of all variables. - Given its size, the dataset was uploaded in different batches to facilitate usage. Some Excel files are provided for non-technical users. We recommend, however, the use of the CSV files, since Excel does not save large amounts of data properly. EurLex_all.csv is the master file containing all data.

Caveats: - The Eur-lex.eu website does not consistently provide data for all the variables. In addition, the HTML documents were not always cleanly formatted and text extraction from scanned PDFs is not entirely clean. Some data points are therefore missing for some laws and some laws were excluded entirely. - Not not all (older) laws were available in English, especially since Ireland and the UK only joined the European Communities in 1973. Non-English laws are excluded from the dataset.

Other: - For details on the types of EU legal acts: https://ec.europa.eu/info/law/law-making-process/types-eu-law_en - An example for an experimental analysis with this dataset: https://trigger-project.eu/2019/10/28/a-data-science-approach-to-eu-differentiated-integration/ - The TRIGGER project is funded by the EU's Horizon 2020 programme, grant number 822735 (2020-02-16)

Search
Clear search
Close search
Google apps
Main menu