http://reference.data.gov.uk/id/open-government-licencehttp://reference.data.gov.uk/id/open-government-licence
A dataset of all the meta-data for all of the datasets available through the data.gov.uk service. This is provided as a zipped CSV or JSON file. It is published nightly.
Updates: 27 Sep 2017: we've moved all the previous dumps to an S3 bucket at https://dgu-ckan-metadata-dumps.s3-eu-west-1.amazonaws.com/ - This link is now listed here as a data file.
From 13/10/16 we added .v2.jsonl dump, which is set to replace the .json dump (which will be discontinued after a 3 month transition). This is produced using 'ckanapi dump'. It provides an enhanced version of each dataset ('validated', or what you get from package_show in CKAN API v3 - the old json was the unvalidated version). This now includes full details of the organization the dataset is in, rather than just the owner_id. Plus it includes the results of the archival & qa for each dataset and resource, showing whether the link is broken, detected format and stars of openness. It also benefits from being json lines http://jsonlines.org/ format, so you don't need to load the whole thing into memory to parse the json - just a line at a time.
On 12/1/2015 the organizations of the CSV was changed:
Before this date, each dataset was one line, and resources added as numbered columns. Since a dataset may have up to 300 resources, it ends up with 1025 columns, which is wider than many versions of Excel and Libreoffice will open. And the uncompressed size of 170Mb is more than most will deal with too. It is suggested you load it into a database, ahandle it with a python or ruby script, or use tools such as Refine or Google Fusion Tables.
After this date, the datasets are provided in one CSV and resources in another. On occasions that you want to join them, you can join them using the (dataset) "Name" column. These are now manageable in spreadsheet software.
You can also use the standard CKAN API if you want to search or get a small section of the data. Please respect the traffic limits in the API: http://data.gov.uk/terms-and-conditions
The ckanext-sfb-search extension enhances the CKAN search functionality to allow for more specific dataset discovery. It provides capabilities to search datasets based on column names found within CSV or XLSX data resources. Furthermore, it enables searches based on linked publications when used in conjunction with the ckanext-Dataset-Reference extension. This extension also includes a feature to automatically tag datasets upon creation. Key Features: Column-Based Search: Enables searching for datasets by matching terms with the column names present in CSV and XLSX data resources, offering improved data discovery. Linked Publication Search: Allows users to find datasets linked to specific publications, leveraging the ckanext-Dataset-Reference extension for more targeted searches. Automatic Tagging: Features automatic tagging of datasets upon creation, streamlining metadata management and improving dataset discoverability. Technical Integration: The extension integrates directly with CKAN by adding plugins (sfb_search and auto_tag) registered in the CKAN configuration file (ckan.plugins). After installation and plugin activation, the extension requires a database migration (ckan db upgrade -p sfb_search) to set up the necessary database structures. Benefits & Impact: By implementing ckanext-sfb-search, CKAN installations can benefit from improved dataset discoverability through advanced search capabilities. Searching by column names in data resources and linking datasets to publications offer refined data exploration options. Automatic tagging simplifies dataset curation and ensures consistency in metadata practices.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
http://reference.data.gov.uk/id/open-government-licencehttp://reference.data.gov.uk/id/open-government-licence
A dataset of all the meta-data for all of the datasets available through the data.gov.uk service. This is provided as a zipped CSV or JSON file. It is published nightly.
Updates: 27 Sep 2017: we've moved all the previous dumps to an S3 bucket at https://dgu-ckan-metadata-dumps.s3-eu-west-1.amazonaws.com/ - This link is now listed here as a data file.
From 13/10/16 we added .v2.jsonl dump, which is set to replace the .json dump (which will be discontinued after a 3 month transition). This is produced using 'ckanapi dump'. It provides an enhanced version of each dataset ('validated', or what you get from package_show in CKAN API v3 - the old json was the unvalidated version). This now includes full details of the organization the dataset is in, rather than just the owner_id. Plus it includes the results of the archival & qa for each dataset and resource, showing whether the link is broken, detected format and stars of openness. It also benefits from being json lines http://jsonlines.org/ format, so you don't need to load the whole thing into memory to parse the json - just a line at a time.
On 12/1/2015 the organizations of the CSV was changed:
Before this date, each dataset was one line, and resources added as numbered columns. Since a dataset may have up to 300 resources, it ends up with 1025 columns, which is wider than many versions of Excel and Libreoffice will open. And the uncompressed size of 170Mb is more than most will deal with too. It is suggested you load it into a database, ahandle it with a python or ruby script, or use tools such as Refine or Google Fusion Tables.
After this date, the datasets are provided in one CSV and resources in another. On occasions that you want to join them, you can join them using the (dataset) "Name" column. These are now manageable in spreadsheet software.
You can also use the standard CKAN API if you want to search or get a small section of the data. Please respect the traffic limits in the API: http://data.gov.uk/terms-and-conditions