Facebook
TwitterLink to Sharepoint website containing a repository of more detailed internal information for authorized users
Facebook
TwitterLink to Sharepoint website containing a repository of more detailed internal information for authorized users
Facebook
TwitterSDD Front Office standard operating procedures, employee performance work plans, and templates
Facebook
TwitterThis is an FHWA initiative to promote knowledge sharing across the organization; it includes a FHWA external Microsoft SharePoint 2010 services, internal Microsoft SharePoint 2010 services, and Adobe Connect Professional Web conferencing services for knowledge sharing communities. There are two SharePoint environments that are not connected. The two environments are totally separate.The internal environment is non-Public and accessible to DOT only. The internal environment is managed by OST. It hosts 184 top level SharePoint sites serving the FHWA. The external environment is Public with some restricted SharePoint sites. External environment is managed by FHWA.The external SharePoint environment hosts 16 top level SharePoint sites serving the FHWA, state and local DOTs, and the surface transportation community.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionUK Power Network maintains the 132kV voltage level network and below. An important part of the distribution network is the stepping down of voltage as it is moved towards the household; this is achieved using transformers. Transformers have a maximum rating for the utilisation of these assets based upon protection, overcurrent, switch gear, etc. This dataset contains the Grid Substation Transformers, also known as Bulk Supply Points, that typically step-down voltage from 132kV to 33kV (occasionally down to 66 or more rarely 20-25). These transformers can be viewed on the single line diagrams in our Long-Term Development Statements (LTDS) and the underlying data is then found in the LTDS tables.Care is taken to protect the private affairs of companies connected to the 33kV network, resulting in the redaction of certain transformers. Where redacted, we provide monthly statistics to continue to add value where possible. Where monthly statistics exist but half-hourly is absent, this data has been redacted.This dataset provides monthly statistics data across these named transformers from 2021 through to the previous month across our license areas. The data are aligned with the same naming convention as the LTDS for improved interoperability.To find half-hourly current and power flow data for a transformer, use the ‘tx_id’ that can be cross referenced in the Grid Transformers Half Hourly Dataset.If you want to download all this data, it is perhaps more convenient from our public sharepoint: Open Data Portal Library - Grid Transformers - All Documents (sharepoint.com)This dataset is part of a larger endeavour to share more operational data on UK Power Networks assets. Please visit our Network Operational Data Dashboard for more operational datasets.Methodological ApproachThe dataset is not derived, it is the measurements from our network stored in our historian.The measurement devices are taken from current transformers attached to the cable at the circuit breaker, and power is derived combining this with the data from voltage transformers physically attached to the busbar. The historian stores datasets based on a report-by-exception process, such that a certain deviation from the present value must be reached before logging a point measurement to the historian. We extract the data following a 30-min time weighted averaging method to get half-hourly values. Where there are no measurements logged in the period, the data provided is blank; due to the report-by-exception process, it may be appropriate to forward fill this data for shorter gaps.We developed a data redactions process to protect the privacy or companies according to the Utilities Act 2000 section 105.1.b, which requires UK Power Networks to not disclose information relating to the affairs of a business. For this reason, where the demand of a private customer is derivable from our data and that data is not already public information (e.g., data provided via Elexon on the Balancing Mechanism), we redact the half-hourly time series, and provide only the monthly averages. This redaction process considers the correlation of all the data, of only corresponding periods where the customer is active, the first order difference of all the data, and the first order difference of only corresponding periods where the customer is active. Should any of these four tests have a high linear correlation, the data is deemed redacted. This process is not simply applied to only the circuit of the customer, but of the surrounding circuits that would also reveal the signal of that customer.The directionality of the data is not consistent within this dataset. Where directionality was ascertainable, we arrange the power data in the direction of the LTDS "from node" to the LTDS "to node". Measurements of current do not indicate directionality and are instead positive regardless of direction. In some circumstances, the polarity can be negative, and depends on the data commissioner's decision on what the operators in the control room might find most helpful in ensuring reliable and secure network operation.Quality Control StatementThe data is provided "as is". In the design and delivery process adopted by the DSO, customer feedback and guidance is considered at each phase of the project. One of the earliest steers was that raw data was preferable. This means that we do not perform prior quality control screening to our raw network data. The result of this decision is that network rearrangements and other periods of non-intact running of the network are present throughout the dataset, which has the potential to misconstrue the true utilisation of the network, which is determined regulatorily by considering only by in-tact running arrangements. Therefore, taking the maximum or minimum of these transformers are not a reliable method of correctly ascertaining the true utilisation. This does have the intended added benefit of giving a realistic view of how the network was operated. The critical feedback was that our customers have a desire to understand what would have been the impact to them under real operational conditions. As such, this dataset offers unique insight into that.Assurance StatementCreating this dataset involved a lot of human data imputation. At UK Power Networks, we have differing software to run the network operationally (ADMS) and to plan and study the network (PowerFactory). The measurement devices are intended to primarily inform the network operators of the real time condition of the network, and importantly, the network drawings visible in the LTDS are a planning approach, which differs to the operational. To compile this dataset, we made the union between the two modes of operating manually. A team of data scientists, data engineers, and power system engineers manually identified the LTDS transformer from the single line diagram, identified the line name from LTDS Table 2a/b, then identified the same transformer in ADMS to identify the measurement data tags. This was then manually inputted to a spreadsheet. Any influential customers to that circuit were noted using ADMS and the single line diagrams. From there, a python code is used to perform the triage and compilation of the datasets. There is potential for human error during the manual data processing. These issues can include missing transformers, incorrectly labelled transformers, incorrectly identified measurement data tags, incorrectly interpreted directionality. Whilst care has been taken to minimise the risk of these issues, they may persist in the provided dataset. Any uncertain behaviour observed by using this data should be reported to allow us to correct as fast as possible.Additional informationDefinitions of key terms related to this dataset can be found in the Open Data Portal Glossary.Download dataset information: Metadata (JSON)We would be grateful if you find this dataset useful to submit a “reuse” case study to tell us what you did and how you used it. This enables us to drive our direction and gain better understanding for how we improve our data offering in the future. Click here for more information: Open Data Portal Reuses — UK Power NetworksTo view this data please register and login.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction
UK Power Network maintains the 132kV voltage level network and below. An important part of the distribution network is the stepping down of voltage as it is moved towards the household; this is achieved using transformers. Transformers have a maximum rating for the utilisation of these assets based upon protection, overcurrent, switch gear, etc. This dataset contains the Primary Substation Transformers, that typically step-down voltage from 33kV to 11kV (occasionally from 132kV to 11kV). These transformers can be viewed on the single line diagrams in our Long-Term Development Statements (LTDS) and the underlying data is then found in the LTDS tables.Care is taken to protect the private affairs of companies connected to the 11kV network, resulting in the redaction of certain transformers. Where redacted, we provide monthly statistics to continue to add value where possible. Where monthly statistics exist but half-hourly is absent, this data has been redacted.This dataset provides monthly statistics data across these named transformers from 2021 through to the previous month across our license areas. The data are aligned with the same naming convention as the LTDS for improved interoperability.To find half-hourly current and power flow data for a transformer, use the ‘tx_id’ that can be cross referenced in the Primary Transformers Half Hourly Dataset.If you want to download all this data, it is perhaps more convenient from our public sharepoint: Open Data Portal Library - Primary Transformers - All Documents (sharepoint.com)This dataset is part of a larger endeavour to share more operational data on UK Power Networks assets. Please visit our Network Operational Data Dashboard for more operational datasets.Methodological ApproachThe dataset is not derived, it is the measurements from our network stored in our historian.The measurement devices are taken from current transformers attached to the cable at the circuit breaker, and power is derived combining this with the data from voltage transformers physically attached to the busbar. The historian stores datasets based on a report-by-exception process, such that a certain deviation from the present value must be reached before logging a point measurement to the historian. We extract the data following a 30-min time weighted averaging method to get half-hourly values. Where there are no measurements logged in the period, the data provided is blank; due to the report-by-exception process, it may be appropriate to forward fill this data for shorter gaps.We developed a data redactions process to protect the privacy or companies according to the Utilities Act 2000 section 105.1.b, which requires UK Power Networks to not disclose information relating to the affairs of a business. For this reason, where the demand of a private customer is derivable from our data and that data is not already public information (e.g., data provided via Elexon on the Balancing Mechanism), we redact the half-hourly time series, and provide only the monthly averages. This redaction process considers the correlation of all the data, of only corresponding periods where the customer is active, the first order difference of all the data, and the first order difference of only corresponding periods where the customer is active. Should any of these four tests have a high linear correlation, the data is deemed redacted. This process is not simply applied to only the circuit of the customer, but of the surrounding circuits that would also reveal the signal of that customer.The directionality of the data is not consistent within this dataset. Where directionality was ascertainable, we arrange the power data in the direction of the LTDS "from node" to the LTDS "to node". Measurements of current do not indicate directionality and are instead positive regardless of direction. In some circumstances, the polarity can be negative, and depends on the data commissioner's decision on what the operators in the control room might find most helpful in ensuring reliable and secure network operation. Quality Control StatementThe data is provided "as is". In the design and delivery process adopted by the DSO, customer feedback and guidance is considered at each phase of the project. One of the earliest steers was that raw data was preferable. This means that we do not perform prior quality control screening to our raw network data. The result of this decision is that network rearrangements and other periods of non-intact running of the network are present throughout the dataset, which has the potential to misconstrue the true utilisation of the network, which is determined regulatorily by considering only by in-tact running arrangements. Therefore, taking the maximum or minimum of these transformers are not a reliable method of correctly ascertaining the true utilisation. This does have the intended added benefit of giving a realistic view of how the network was operated. The critical feedback was that our customers have a desire to understand what would have been the impact to them under real operational conditions. As such, this dataset offers unique insight into that.
Assurance StatementCreating this dataset involved a lot of human data imputation. At UK Power Networks, we have differing software to run the network operationally (ADMS) and to plan and study the network (PowerFactory). The measurement devices are intended to primarily inform the network operators of the real time condition of the network, and importantly, the network drawings visible in the LTDS are a planning approach, which differs to the operational. To compile this dataset, we made the union between the two modes of operating manually. A team of data scientists, data engineers, and power system engineers manually identified the LTDS transformer from the single line diagram, identified the line name from LTDS Table 2a/b, then identified the same transformer in ADMS to identify the measurement data tags. This was then manually inputted to a spreadsheet. Any influential customers to that circuit were noted using ADMS and the single line diagrams. From there, a python code is used to perform the triage and compilation of the datasets. There is potential for human error during the manual data processing. These issues can include missing transformers, incorrectly labelled transformers, incorrectly identified measurement data tags, incorrectly interpreted directionality. Whilst care has been taken to minimise the risk of these issues, they may persist in the provided dataset. Any uncertain behaviour observed by using this data should be reported to allow us to correct as fast as possible.
Additional informationDefinitions of key terms related to this dataset can be found in the Open Data Portal Glossary.Download dataset information: Metadata (JSON)We would be grateful if you find this dataset useful to submit a “reuse” case study to tell us what you did and how you used it. This enables us to drive our direction and gain better understanding for how we improve our data offering in the future. Click here for more information: Open Data Portal Reuses — UK Power NetworksTo view this data please register and login.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction
UK Power Network maintains the 132kV voltage level network and below. An important part of the distribution network is the stepping down of voltage as it is moved towards the household; this is achieved using transformers. Transformers have a maximum rating for the utilisation of these assets based upon protection, overcurrent, switch gear, etc. This dataset contains the Primary Substation Transformers, that typically step-down voltage from 33kVto 11kV (occasionally from 132kV to 11kV). These transformers can be viewed on the single line diagrams in our Long-Term Development Statements (LTDS) and the underlying data is then found in the LTDS tables. This dataset provides half-hourly current and power flow data across these named transformers, in our South Eastern region, from 2021 through to the previous month across our license areas. The data are aligned with the same naming convention as the LTDS for improved interoperability.Care is taken to protect the private affairs of companies connected to the 11kV network, resulting in the redaction of certain transformers. Where redacted, we provide monthly statistics to continue to add value where possible. Where monthly statistics exist but half-hourly is absent, this data has been redacted. To find which transformer you are looking for, use the ‘tx_id’ that can be cross referenced in the Primary Transformers Monthly Dataset, which describes by month what transformers were triaged, if they could be made public, and what the monthly statistics are of that site. If you want to download all this data, it is perhaps more convenient from our public sharepoint: Open Data Portal Library - Primary Transformers - All Documents (sharepoint.com)This dataset is part of a larger endeavour to share more operational data on UK Power Networks assets. Please visit our Network Operational Data Dashboard for more operational datasets.
Methodological Approach The dataset is not derived, it is the measurements from our network stored in our historian.The measurement devices are taken from current transformers attached to the cable at the circuit breaker, and power is derived combining this with the data from voltage transformers physically attached to the busbar. The historian stores datasets based on a report-by-exception process, such that a certain deviation from the present value must be reached before logging a point measurement to the historian. We extract the data following a 30-min time weighted averaging method to get half-hourly values. Where there are no measurements logged in the period, the data provided is blank; due to the report-by-exception process, it may be appropriate to forward fill this data for shorter gaps.We developed a data redactions process to protect the privacy or companies according to the Utilities Act 2000 section 105.1.b, which requires UK Power Networks to not disclose information relating to the affairs of a business. For this reason, where the demand of a private customer is derivable from our data and that data is not already public information (e.g., data provided via Elexon on the Balancing Mechanism), we redact the half-hourly time series, and provide only the monthly averages. Where the primary transformer has 5 or fewer customers, we redact the dataset.The directionality of the data is not consistent within this dataset. Where directionality was ascertainable, we arrange the power data in the direction of the LTDS "from node" to the LTDS "to node". Measurements of current do not indicate directionality and are instead positive regardless of direction. In some circumstances, the polarity can be negative, and depends on the data commissioner's decision on what the operators in the control room might find most helpful in ensuring reliable and secure network operation.
Quality Control Statement The data is provided "as is". In the design and delivery process adopted by the DSO, customer feedback and guidance is considered at each phase of the project. One of the earliest steers was that raw data was preferable. This means that we do not perform prior quality control screening to our raw network data. The result of this decision is that network rearrangements and other periods of non-intact running of the network are present throughout the dataset, which has the potential to misconstrue the true utilisation of the network, which is determined regulatorily by considering only by in-tact running arrangements. Therefore, taking the maximum or minimum of these transformers are not a reliable method of correctly ascertaining the true utilisation. This does have the intended added benefit of giving a realistic view of how the network was operated. The critical feedback was that our customers have a desire to understand what would have been the impact to them under real operational conditions. As such, this dataset offers unique insight into that.
Assurance Statement Creating this dataset involved a lot of human data imputation. At UK Power Networks, we have differing software to run the network operationally (ADMS) and to plan and study the network (PowerFactory). The measurement devices are intended to primarily inform the network operators of the real time condition of the network, and importantly, the network drawings visible in the LTDS are a planning approach, which differs to the operational. To compile this dataset, we made the union between the two modes of operating manually. A team of data scientists, data engineers, and power system engineers manually identified the LTDS transformer from the single line diagram, identified the line name from LTDS Table 2a/b, then identified the same transformer in ADMS to identify the measurement data tags. This was then manually inputted to a spreadsheet. Any influential customers to that circuit were noted using ADMS and the single line diagrams. From there, a python code is used to perform the triage and compilation of the datasets. There is potential for human error during the manual data processing. These issues can include missing transformers, incorrectly labelled transformers, incorrectly identified measurement data tags, incorrectly interpreted directionality. Whilst care has been taken to minimise the risk of these issues, they may persist in the provided dataset. Any uncertain behaviour observed by using this data should be reported to allow us to correct as fast as possible.
Additional information Definitions of key terms related to this dataset can be found in the Open Data Portal Glossary. Download dataset information: Download dataset information: Metadata (JSON)
We would be grateful if you find this dataset useful to submit a “reuse” case study to tell us what you did and how you used it. This enables us to drive our direction and gain better understanding for how we improve our data offering in the future. Click here for more information: Open Data Portal Reuses — UK Power NetworksTo view this data please register and login.
Facebook
TwitterThe DSI grantee database is an internal sharepoint site that contains information on grantees that are within the Division of Strategic Investments management portfolio, developed for the purpose of responding to DSI, ETA, or DOL questions about grantee characteristics. The sharepoint site includes general information on grantees drawn from grant application abstracts (such as name of organization, address, name of funded project, industry sector, and contact information of project manager, etc.), and lists of organizations that provided MOUs stating their intent to partner with the grantee to support their grant.
Facebook
TwitterDataset for the ICSE 2023 technical paper ``Demystifying Privacy Policy of Third-Party Libraries in Mobile Apps".
The second part of the dataset is given at: 10.5281/zenodo.7790328
Main repository: https://doi.org/10.5281/zenodo.7647779
If you can not unzip the dataset, please download the dataset from ICSE2023CodeDataset
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cash-and-Equivalents Time Series for Microsoft Corporation. Microsoft Corporation develops and supports software, services, devices, and solutions worldwide. The company's Productivity and Business Processes segment offers Microsoft 365 Commercial, Enterprise Mobility + Security, Windows Commercial, Power BI, Exchange, SharePoint, Microsoft Teams, Security and Compliance, and Copilot; Microsoft 365 Commercial products, such as Windows Commercial on-premises and Office licensed services; Microsoft 365 Consumer products and cloud services, such as Microsoft 365 Consumer subscriptions, Office licensed on-premises, and other consumer services; LinkedIn; Dynamics products and cloud services, such as Dynamics 365, cloud-based applications, and on-premises ERP and CRM applications. Its Intelligent Cloud segment provides Server products and cloud services, such as Azure and other cloud services, GitHub, Nuance Healthcare, virtual desktop offerings, and other cloud services; Server products, including SQL and Windows Server, Visual Studio and System Center related Client Access Licenses, and other on-premises offerings; Enterprise and partner services, including Enterprise Support and Nuance professional Services, Industry Solutions, Microsoft Partner Network, and Learning Experience. The company's Personal Computing segment provides Windows and Devices, such as Windows OEM licensing and Devices and Surface and PC accessories; Gaming services and solutions, such as Xbox hardware, content, and services, first- and third-party content Xbox Game Pass, subscriptions, and Cloud Gaming, advertising, and other cloud services; search and news advertising services, such as Bing and Copilot, Microsoft News and Edge, and third-party affiliates. It sells its products through OEMs, distributors, and resellers; and online and retail stores. The company was founded in 1975 and is headquartered in Redmond, Washington.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Net-Receivables Time Series for Microsoft Corporation. Microsoft Corporation develops and supports software, services, devices, and solutions worldwide. The company's Productivity and Business Processes segment offers Microsoft 365 Commercial, Enterprise Mobility + Security, Windows Commercial, Power BI, Exchange, SharePoint, Microsoft Teams, Security and Compliance, and Copilot; Microsoft 365 Commercial products, such as Windows Commercial on-premises and Office licensed services; Microsoft 365 Consumer products and cloud services, such as Microsoft 365 Consumer subscriptions, Office licensed on-premises, and other consumer services; LinkedIn; Dynamics products and cloud services, such as Dynamics 365, cloud-based applications, and on-premises ERP and CRM applications. Its Intelligent Cloud segment provides Server products and cloud services, such as Azure and other cloud services, GitHub, Nuance Healthcare, virtual desktop offerings, and other cloud services; Server products, including SQL and Windows Server, Visual Studio and System Center related Client Access Licenses, and other on-premises offerings; Enterprise and partner services, including Enterprise Support and Nuance professional Services, Industry Solutions, Microsoft Partner Network, and Learning Experience. The company's Personal Computing segment provides Windows and Devices, such as Windows OEM licensing and Devices and Surface and PC accessories; Gaming services and solutions, such as Xbox hardware, content, and services, first- and third-party content Xbox Game Pass, subscriptions, and Cloud Gaming, advertising, and other cloud services; search and news advertising services, such as Bing and Copilot, Microsoft News and Edge, and third-party affiliates. It sells its products through OEMs, distributors, and resellers; and online and retail stores. The company was founded in 1975 and is headquartered in Redmond, Washington.
Facebook
TwitterODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
A. SUMMARY This dataset includes aggregate data on the type, status, population served, and individuals placed at each alternative housing site under contract with HSA. B. HOW THE DATASET IS CREATED Site Type, Status, and Population The HSA DOC leadership inform the data tracker owner when the legal status, site type, or intended population to serve changes. Daily Census and Units Available The site monitors at each site inform the data tracker owner at the HSA DOC at least once daily with the updates to the daily census. C. UPDATE PROCESS Updated several times daily, whenever new information is shared with the data tracker owner. The data tracker owner inputs the data directly into the underlying SharePoint spreadsheet. D. HOW TO USE THIS DATASET Use the data for aggregate data on the site type, status, and daily census of individuals placed in the sites. Do not use this spreadsheet for individual-level information. There is no personally identifying or medical information in this dataset.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
── videochatgpt_gen # Official website: https://github.com/mbzuai-oryx/Video-ChatGPT/tree/main/quantitative_evaluation
├── Test_Videos/ # Available at: https://mbzuaiac-my.sharepoint.com/:u:/g/personal/hanoona_bangalath_mbzuai_ac_ae/EatOpE7j68tLm2XAd0u6b8ABGGdVAwLMN6rqlDGM_DwhVA?e=90WIuW
├── Test_Human_Annotated_Captions/ # Available at:… See the full description on the dataset page: https://huggingface.co/datasets/studymakesmehappyyyyy/VCGBENCH.
Facebook
Twitterhttps://datacatalog.worldbank.org/public-licenses?fragment=cchttps://datacatalog.worldbank.org/public-licenses?fragment=cc
This dataset contains metadata (title, abstract, date of publication, field, etc) for around 1 million academic articles. Each record contains additional information on the country of study and whether the article makes use of data. Machine learning tools were used to classify the country of study and data use.
Our data source of academic articles is the Semantic Scholar Open Research Corpus (S2ORC) (Lo et al. 2020). The corpus contains more than 130 million English language academic papers across multiple disciplines. The papers included in the Semantic Scholar corpus are gathered directly from publishers, from open archives such as arXiv or PubMed, and crawled from the internet.
We placed some restrictions on the articles to make them usable and relevant for our purposes. First, only articles with an abstract and parsed PDF or latex file are included in the analysis. The full text of the abstract is necessary to classify the country of study and whether the article uses data. The parsed PDF and latex file are important for extracting important information like the date of publication and field of study. This restriction eliminated a large number of articles in the original corpus. Around 30 million articles remain after keeping only articles with a parsable (i.e., suitable for digital processing) PDF, and around 26% of those 30 million are eliminated when removing articles without an abstract. Second, only articles from the year 2000 to 2020 were considered. This restriction eliminated an additional 9% of the remaining articles. Finally, articles from the following fields of study were excluded, as we aim to focus on fields that are likely to use data produced by countries’ national statistical system: Biology, Chemistry, Engineering, Physics, Materials Science, Environmental Science, Geology, History, Philosophy, Math, Computer Science, and Art. Fields that are included are: Economics, Political Science, Business, Sociology, Medicine, and Psychology. This third restriction eliminated around 34% of the remaining articles. From an initial corpus of 136 million articles, this resulted in a final corpus of around 10 million articles.
Due to the intensive computer resources required, a set of 1,037,748 articles were randomly selected from the 10 million articles in our restricted corpus as a convenience sample.
The empirical approach employed in this project utilizes text mining with Natural Language Processing (NLP). The goal of NLP is to extract structured information from raw, unstructured text. In this project, NLP is used to extract the country of study and whether the paper makes use of data. We will discuss each of these in turn.
To determine the country or countries of study in each academic article, two approaches are employed based on information found in the title, abstract, or topic fields. The first approach uses regular expression searches based on the presence of ISO3166 country names. A defined set of country names is compiled, and the presence of these names is checked in the relevant fields. This approach is transparent, widely used in social science research, and easily extended to other languages. However, there is a potential for exclusion errors if a country’s name is spelled non-standardly.
The second approach is based on Named Entity Recognition (NER), which uses machine learning to identify objects from text, utilizing the spaCy Python library. The Named Entity Recognition algorithm splits text into named entities, and NER is used in this project to identify countries of study in the academic articles. SpaCy supports multiple languages and has been trained on multiple spellings of countries, overcoming some of the limitations of the regular expression approach. If a country is identified by either the regular expression search or NER, it is linked to the article. Note that one article can be linked to more than one country.
The second task is to classify whether the paper uses data. A supervised machine learning approach is employed, where 3500 publications were first randomly selected and manually labeled by human raters using the Mechanical Turk service (Paszke et al. 2019).[1] To make sure the human raters had a similar and appropriate definition of data in mind, they were given the following instructions before seeing their first paper:
Each of these documents is an academic article. The goal of this study is to measure whether a specific academic article is using data and from which country the data came.
There are two classification tasks in this exercise:
1. identifying whether an academic article is using data from any country
2. Identifying from which country that data came.
For task 1, we are looking specifically at the use of data. Data is any information that has been collected, observed, generated or created to produce research findings. As an example, a study that reports findings or analysis using a survey data, uses data. Some clues to indicate that a study does use data includes whether a survey or census is described, a statistical model estimated, or a table or means or summary statistics is reported.
After an article is classified as using data, please note the type of data used. The options are population or business census, survey data, administrative data, geospatial data, private sector data, and other data. If no data is used, then mark "Not applicable". In cases where multiple data types are used, please click multiple options.[2]
For task 2, we are looking at the country or countries that are studied in the article. In some cases, no country may be applicable. For instance, if the research is theoretical and has no specific country application. In some cases, the research article may involve multiple countries. In these cases, select all countries that are discussed in the paper.
We expect between 10 and 35 percent of all articles to use data.
The median amount of time that a worker spent on an article, measured as the time between when the article was accepted to be classified by the worker and when the classification was submitted was 25.4 minutes. If human raters were exclusively used rather than machine learning tools, then the corpus of 1,037,748 articles examined in this study would take around 50 years of human work time to review at a cost of $3,113,244, which assumes a cost of $3 per article as was paid to MTurk workers.
A model is next trained on the 3,500 labelled articles. We use a distilled version of the BERT (bidirectional Encoder Representations for transformers) model to encode raw text into a numeric format suitable for predictions (Devlin et al. (2018)). BERT is pre-trained on a large corpus comprising the Toronto Book Corpus and Wikipedia. The distilled version (DistilBERT) is a compressed model that is 60% the size of BERT and retains 97% of the language understanding capabilities and is 60% faster (Sanh, Debut, Chaumond, Wolf 2019). We use PyTorch to produce a model to classify articles based on the labeled data. Of the 3,500 articles that were hand coded by the MTurk workers, 900 are fed to the machine learning model. 900 articles were selected because of computational limitations in training the NLP model. A classification of “uses data” was assigned if the model predicted an article used data with at least 90% confidence.
The performance of the models classifying articles to countries and as using data or not can be compared to the classification by the human raters. We consider the human raters as giving us the ground truth. This may underestimate the model performance if the workers at times got the allocation wrong in a way that would not apply to the model. For instance, a human rater could mistake the Republic of Korea for the Democratic People’s Republic of Korea. If both humans and the model perform the same kind of errors, then the performance reported here will be overestimated.
The model was able to predict whether an article made use of data with 87% accuracy evaluated on the set of articles held out of the model training. The correlation between the number of articles written about each country using data estimated under the two approaches is given in the figure below. The number of articles represents an aggregate total of
Facebook
TwitterReadingBank (HF conversion)
Source paper: https://arxiv.org/abs/2108.11591 Original data: https://mail2sysueducn-my.sharepoint.com/:u:/g/personal/huangyp28_mail2_sysu_edu_cn/Efh3ZWjsA-xFrH2FSjyhSVoBMak6ypmbABWmJEmPwtKhhw?e=tbthMD Created with: https://github.com/albertklor/reading-bank
Fields:
file_name (name of the file): str page_number (index of the page number): int bounding_boxes (normalized bounding boxes in [x0, y0, x1, y1] format): list[list[int]] text… See the full description on the dataset page: https://huggingface.co/datasets/albertklorer/readingbank.
Facebook
Twitterhttps://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
We provide a script for quick start. First download our trained models from here. Put the checkpoints folder into the project folder.
Our dataset ModelsResource-RigNetv1 has 2,703 models. We split it into 80% for training (2,163 models), 10% for validation (270 models), and 10% for testing. All models in fbx format can be downloaded here.
To use this dataset in this project, pre-processing is performed. We put the pre-processed data here, which consists of several sub-folders.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction UK Power Network maintains the 132kV voltage level network and below. An important part of the distribution network is distributing this electricity across our regions through circuits. Electricity enters our network through Super Grid Transformers at substations shared with National Grid we call Grid Supply Points. It is then sent at across our 132 kV Circuits towards our grid substations and primary substations. From there, electricity is distributed along the 33 kV circuits to bring it closer to the home. These circuits can be viewed on the single line diagrams in our Long-Term Development Statements (LTDS) and the underlying data is then found in the LTDS tables.
This dataset provides half-hourly current and power flow data across these named circuits from 2021 through to the previous month across our Eastern Power Networks (EPN) license area. The data is aligned with the same naming convention as the LTDS for improved interoperability.
Care is taken to protect the private affairs of companies connected to the 33 kV network, resulting in the redaction of certain circuits. Where redacted, we provide monthly statistics to continue to add value where possible. Where monthly statistics exist but half-hourly is absent, this data has been redacted.
To find which circuit you are looking for, use the ‘ltds_line_name’ that can be cross referenced in the 33kV Circuits Monthly Data, which describes by month what circuits were triaged, if they could be made public, and what the monthly statistics are of that site.
If you want to download all this data, it is perhaps more convenient from our public sharepoint: Sharepoint
This dataset is part of a larger endeavour to share more operational data on UK Power Networks assets. Please visit our Network Operational Data Dashboard for more operational datasets.
Methodological Approach
The dataset is not derived, it is the measurements from our network stored in our historian.
The measurement devices are taken from current transformers attached to the cable at the circuit breaker, and power is derived combining this with the data from voltage transformers physically attached to the busbar. The historian stores datasets based on a report-by-exception process, such that a certain deviation from the present value must be reached before logging a point measurement to the historian. We extract the data following a 30-min time weighted averaging method to get half-hourly values. Where there are no measurements logged in the period, the data provided is blank; due to the report-by-exception process, it may be appropriate to forward fill this data for shorter gaps.
We developed a data redactions process to protect the privacy or companies according to the Utilities Act 2000 section 105.1.b, which requires UK Power Networks to not disclose information relating to the affairs of a business. For this reason, where the demand of a private customer is derivable from our data and that data is not already public information (e.g., data provided via Elexon on the Balancing Mechanism), we redact the half-hourly time series, and provide only the monthly averages. This redaction process considers the correlation of all the data, of only corresponding periods where the customer is active, the first order difference of all the data, and the first order difference of only corresponding periods where the customer is active. Should any of these four tests have a high linear correlation, the data is deemed redacted. This process is not simply applied to only the circuit of the customer, but of the surrounding circuits that would also reveal the signal of that customer.
The directionality of the data is not consistent within this dataset. Where directionality was ascertainable, we arrange the power data in the direction of the LTDS "from node" to the LTDS "to node". Measurements of current do not indicate directionality and are instead positive regardless of direction. In some circumstances, the polarity can be negative, and depends on the data commissioner's decision on what the operators in the control room might find most helpful in ensuring reliable and secure network operation.
Quality Control Statement
The data is provided "as is".
In the design and delivery process adopted by the DSO, customer feedback and guidance is considered at each phase of the project. One of the earliest steers was that raw data was preferable. This means that we do not perform prior quality control screening to our raw network data. The result of this decision is that network rearrangements and other periods of non-intact running of the network are present throughout the dataset, which has the potential to misconstrue the true utilisation of the network, which is determined regulatorily by considering only by in-tact running arrangements. Therefore, taking the maximum or minimum of these measurements are not a reliable method of correctly ascertaining the true utilisation. This does have the intended added benefit of giving a realistic view of how the network was operated. The critical feedback was that our customers have a desire to understand what would have been the impact to them under real operational conditions. As such, this dataset offers unique insight into that.
Assurance Statement
Creating this dataset involved a lot of human data imputation. At UK Power Networks, we have differing software to run the network operationally (ADMS) and to plan and study the network (PowerFactory). The measurement devices are intended to primarily inform the network operators of the real time condition of the network, and importantly, the network drawings visible in the LTDS are a planning approach, which differs to the operational. To compile this dataset, we made the union between the two modes of operating manually. A team of data scientists, data engineers, and power system engineers manually identified the LTDS circuit from the single line diagram, identified the line name from LTDS Table 2a/b, then identified the same circuit in ADMS to identify the measurement data tags. This was then manually inputted to a spreadsheet. Any influential customers to that circuit were noted using ADMS and the single line diagrams. From there, a python code is used to perform the triage and compilation of the datasets. There is potential for human error during the manual data processing. These issues can include missing circuits, incorrectly labelled circuits, incorrectly identified measurement data tags, incorrectly interpreted directionality. Whilst care has been taken to minimise the risk of these issues, they may persist in the provided dataset. Any uncertain behaviour observed by using this data should be reported to allow us to correct as fast as possible.
Additional InformationDefinitions of key terms related to this dataset can be found in the Open Data Portal Glossary. Download dataset information: Metadata (JSON) We would be grateful if you find this dataset useful to submit a reuse case study to tell us what you did and how you used it. This enables us to drive our direction and gain better understanding for how we improve our data offering in the future. Click here for more information: Open Data Portal Reuses — UK Power Networks To view this data please register and login.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Net-Income Time Series for Microsoft Corporation. Microsoft Corporation develops and supports software, services, devices, and solutions worldwide. The company's Productivity and Business Processes segment offers Microsoft 365 Commercial, Enterprise Mobility + Security, Windows Commercial, Power BI, Exchange, SharePoint, Microsoft Teams, Security and Compliance, and Copilot; Microsoft 365 Commercial products, such as Windows Commercial on-premises and Office licensed services; Microsoft 365 Consumer products and cloud services, such as Microsoft 365 Consumer subscriptions, Office licensed on-premises, and other consumer services; LinkedIn; Dynamics products and cloud services, such as Dynamics 365, cloud-based applications, and on-premises ERP and CRM applications. Its Intelligent Cloud segment provides Server products and cloud services, such as Azure and other cloud services, GitHub, Nuance Healthcare, virtual desktop offerings, and other cloud services; Server products, including SQL and Windows Server, Visual Studio and System Center related Client Access Licenses, and other on-premises offerings; Enterprise and partner services, including Enterprise Support and Nuance professional Services, Industry Solutions, Microsoft Partner Network, and Learning Experience. The company's Personal Computing segment provides Windows and Devices, such as Windows OEM licensing and Devices and Surface and PC accessories; Gaming services and solutions, such as Xbox hardware, content, and services, first- and third-party content Xbox Game Pass, subscriptions, and Cloud Gaming, advertising, and other cloud services; search and news advertising services, such as Bing and Copilot, Microsoft News and Edge, and third-party affiliates. It sells its products through OEMs, distributors, and resellers; and online and retail stores. The company was founded in 1975 and is headquartered in Redmond, Washington.
Facebook
TwitterLink to Sharepoint website containing a repository of more detailed internal information for authorized users