14 datasets found

Supplement 1. R code for estimating thresholds while accounting for variable...

wiley.figshare.com

html

Updated Jun 2, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

Jay E. Jones; Andrew J. Kroll; Jack Giovanini; Steven D. Duke; Matthew G. Betts (2023). Supplement 1. R code for estimating thresholds while accounting for variable detection and data for estimating thresholds for forest birds, Oregon, USA, 2007–2008. [Dataset]. http://doi.org/10.6084/m9.figshare.3552231.v1

Explore at:

htmlAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.3552231.v1

Dataset updated

Jun 2, 2023

Dataset provided by

Wileyhttps://www.wiley.com/

Authors

Jay E. Jones; Andrew J. Kroll; Jack Giovanini; Steven D. Duke; Matthew G. Betts

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Area covered

Oregon

Description

File List Supplement_Avian data.csv Supplement_R code.r Description The Supplement_Avian data.csv file contains data on stand-level habitat covariates and visit-specific detections of avian species, Oregon, USA, 2008–2009. Column definitions

    Stand id
    Percent cover of conifer species
    Percent cover of broadleaf species
    Percent cover of deciduous broadleaf species
    Percent cover of hardwood species
    Percent cover of hardwood species in a 2000 m radius circle around each sample stand
    Elevation (m) of stand
    Age of stand
    Year of sampling
    Visit number
    Detection of Magnolia Warbler on Visit 1
    Detection of Magnolia Warbler on Visit 2
    Detection of Orange-crowned Warbler on Visit 1
    Detection of Orange-crowned Warbler on Visit 2
    Detection of Swainson’s Thrush on Visit 1
    Detection of Swainson’s Thrush on Visit 2
    Detection of Willow Flycatcher on Visit 1
    Detection of Willow Flycatcher on Visit 2
    Detection of Wilson’s Warbler on Visit 1
    Detection of Wilson’s Warbler on Visit 1

  Checksum values are:

    Column 2 (Percent cover of conifer species – CONIFER): SUM = 5862.83
    Column 3 (Percent cover of broadleaf species – BROAD): SUM = 7043.17
    Column 4 (Percent cover of deciduous broadleaf species – DECBROAD): SUM = 5475.17
    Column 5 (Percent cover of hardwood species – HARDWOOD): SUM = 2151.96
    Column 6 (Percent cover of hardwood species in a 2000 m radius circle around each sample stand– HWD2000): SUM = 3486.07
    Column 7 (Stand elevation – ELEVM): SUM = 83240.58
    Column 8 (Stand age – AGE): SUM = 1537; NA indicates a stand was harvested in 2008
    Column 9 (Year of sampling – YEAR): SUM = 425792
    Column 11 (MGWA.1): SUM = 70
    Column 12 (MGWA.2): SUM = 71
    Column 13 (OCWA.1): SUM = 121
    Column 14 (OCWA.2): SUM = 76
    Column 15 (SWTH.1): SUM = 90
    Column 16 (SWTH.2): SUM = 95
    Column 17 (WIFL.1): SUM = 85
    Column 18 (WIFL.2): SUM = 85
    Column 19 (WIWA.1): SUM = 36
    Column 20 (WIWA.2): SUM = 37

  The Supplement_R code.r file is R source code for simulation and empirical analyses conducted in Jones et al.

Uniform Crime Reporting (UCR) Program Data: Arrests by Age, Sex, and Race,...
search.datacite.org
doi.org
+1more
Updated 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacob Kaplan (2018). Uniform Crime Reporting (UCR) Program Data: Arrests by Age, Sex, and Race, 1980-2016 [Dataset]. http://doi.org/10.3886/e102263v5-10021
Explore at:
Unique identifier
https://doi.org/10.3886/e102263v5-10021
Dataset updated
2018
Dataset provided by
Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
DataCitehttps://www.datacite.org/
Authors
Jacob Kaplan
Description
Version 5 release notes:
Removes support for SPSS and Excel data.Changes the crimes that are stored in each file. There are more files now with fewer crimes per file. The files and their included crimes have been updated below.
Adds in agencies that report 0 months of the year.Adds a column that indicates the number of months reported. This is generated summing up the number of unique months an agency reports data for. Note that this indicates the number of months an agency reported arrests for ANY crime. They may not necessarily report every crime every month. Agencies that did not report a crime with have a value of NA for every arrest column for that crime.Removes data on runaways.
Version 4 release notes:
Changes column names from "poss_coke" and "sale_coke" to "poss_heroin_coke" and "sale_heroin_coke" to clearly indicate that these column includes the sale of heroin as well as similar opiates such as morphine, codeine, and opium. Also changes column names for the narcotic columns to indicate that they are only for synthetic narcotics.
Version 3 release notes:
Add data for 2016.Order rows by year (descending) and ORI.Version 2 release notes:
Fix bug where Philadelphia Police Department had incorrect FIPS county code.
The Arrests by Age, Sex, and Race data is an FBI data set that is part of the annual Uniform Crime Reporting (UCR) Program data. This data contains highly granular data on the number of people arrested for a variety of crimes (see below for a full list of included crimes). The data sets here combine data from the years 1980-2015 into a single file. These files are quite large and may take some time to load.
All the data was downloaded from NACJD as ASCII+SPSS Setup files and read into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. For the R code used to clean this data, see here. https://github.com/jacobkap/crime_data. If you have any questions, comments, or suggestions please contact me at jkkaplan6@gmail.com.

I did not make any changes to the data other than the following. When an arrest column has a value of "None/not reported", I change that value to zero. This makes the (possible incorrect) assumption that these values represent zero crimes reported. The original data does not have a value when the agency reports zero arrests other than "None/not reported." In other words, this data does not differentiate between real zeros and missing values. Some agencies also incorrectly report the following numbers of arrests which I change to NA: 10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000, 99999, 99998.

To reduce file size and make the data more manageable, all of the data is aggregated yearly. All of the data is in agency-year units such that every row indicates an agency in a given year. Columns are crime-arrest category units. For example, If you choose the data set that includes murder, you would have rows for each agency-year and columns with the number of people arrests for murder. The ASR data breaks down arrests by age and gender (e.g. Male aged 15, Male aged 18). They also provide the number of adults or juveniles arrested by race. Because most agencies and years do not report the arrestee's ethnicity (Hispanic or not Hispanic) or juvenile outcomes (e.g. referred to adult court, referred to welfare agency), I do not include these columns.

To make it easier to merge with other data, I merged this data with the Law Enforcement Agency Identifiers Crosswalk (LEAIC) data. The data from the LEAIC add FIPS (state, county, and place) and agency type/subtype. Please note that some of the FIPS codes have leading zeros and if you open it in Excel it will automatically delete those leading zeros.

I created 9 arrest categories myself. The categories are:
Total Male JuvenileTotal Female JuvenileTotal Male AdultTotal Female AdultTotal MaleTotal FemaleTotal JuvenileTotal AdultTotal ArrestsAll of these categories are based on the sums of the sex-age categories (e.g. Male under 10, Female aged 22) rather than using the provided age-race categories (e.g. adult Black, juvenile Asian). As not all agencies report the race data, my method is more accurate. These categories also make up the data in the "simple" version of the data. The "simple" file only includes the above 9 columns as the arrest data (all other columns in the data are just agency identifier columns). Because this "simple" data set need fewer columns, I include all offenses.

As the arrest data is very granular, and each category of arrest is its own column, there are dozens of columns per crime. To keep the data somewhat manageable, there are nine different files, eight which contain different crimes and the "simple" file. Each file contains the data for all years. The eight categories each have crimes belonging to a major crime category and do not overlap in crimes other than with the index offenses. Please note that the crime names provided below are not the same as the column names in the data. Due to Stata limiting column names to 32 characters maximum, I have abbreviated the crime names in the data. The files and their included crimes are:

Index Crimes
MurderRapeRobberyAggravated AssaultBurglaryTheftMotor Vehicle TheftArsonAlcohol CrimesDUIDrunkenness
LiquorDrug CrimesTotal DrugTotal Drug SalesTotal Drug PossessionCannabis PossessionCannabis SalesHeroin or Cocaine PossessionHeroin or Cocaine SalesOther Drug PossessionOther Drug SalesSynthetic Narcotic PossessionSynthetic Narcotic SalesGrey Collar and Property CrimesForgeryFraudStolen PropertyFinancial CrimesEmbezzlementTotal GamblingOther GamblingBookmakingNumbers LotterySex or Family CrimesOffenses Against the Family and Children
Other Sex Offenses
ProstitutionRapeViolent CrimesAggravated AssaultMurderNegligent ManslaughterRobberyWeapon Offenses
Other CrimesCurfewDisorderly ConductOther Non-trafficSuspicion
VandalismVagrancy
Simple
This data set has every crime and only the arrest categories that I created (see above).
If you have any questions, comments, or suggestions please contact me at jkkaplan6@gmail.com.
R Package History on CRAN
kaggle.com
zip
Updated Jul 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Heads or Tails (2022). R Package History on CRAN [Dataset]. https://www.kaggle.com/datasets/headsortails/r-package-history-on-cran/code
Explore at:
zip(5637913 bytes)Available download formats
Dataset updated
Jul 18, 2022
Authors
Heads or Tails
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The Comprehensive R Archive Network (CRAN) is the central repository for software packages in the powerful R programming language for statistical computing. It describes itself as "a network of ftp and web servers around the world that store identical, up-to-date, versions of code and documentation for R." If you're installing an R package in the standard way then it is provided by one of the CRAN mirrors.

The ecosystem of R packages continues to grow at an accelerated pace, covering a multitude of aspects of statistics, machine learning, data visualisation, and many other areas. This dataset provides monthly updates of all the packages available through CRAN, as well as their release histories. Explore the evolution of the R multiverse and all of its facets through this comprehensive data.

Content

I'm providing 2 csv tables that describe the current set of R packages on CRAN, as well as the version history of these packages. To derive the data, I made use of the fantastic functionality of the tools package, via the CRAN_package_db function, and the equally wonderful packageRank package and its packageHistory function. The results from those function were slightly adjusted and formatted. I might add further related tables over time.

See the associated blog post for how the data was derived, and for some ideas on how to explore this dataset.

These are the tables contained in this dataset:

cran_package_overview.csv: all R packages currently available through CRAN, with (usually) 1 row per package. (At the time of the creation of this Kaggle dataset there were a few packages with 2 entries and different dependencies. Feel free to contribute some EDA investigating those.) Packages are listed in alphabetical order according to their names.

cran_package_history.csv: version history of virtually all packages in the previous table. This table has one row for each combination of package name and version number, which in most cases leads to multiple rows per package. Packages are listed in alphabetical order according to their names.

I will update this dataset on a roughly monthly cadence by checking which packages have newer version in the overview table, and then replacing

Column Description

Table cran_package_overview.csv: I decided to simplify the large number of columns provided by CRAN and tools::CRAN_package_db into a smaller set of more focus features. All columns are formatted as strings, except for the boolean feature needs_compilation, but the date_published can be read as a ymd date:

package: package name following the official spelling and capitalisation. Table is sorted alphabetically according to this column.

version: current version.

depends: package depends on which other packages.

imports: package imports which other packages.

licence: the licence under which the package is distributed (e.g. GPL versions)

needs_compilation: boolean feature describing whether the package needs to be compiled.

author: package author.

bug_reports: where to send bugs.

url: where to read more.

date_published: when the current version of the package was published. Note: this is not the date of the initial package release. See the package history table for that.

description: relatively detailed description of what the package is doing.

title: the title and tagline of the package.

Table cran_package_history.csv: The output of packageRank::packageHistory for each package from the overview table. Almost all of them have a match in this table, and can be matched by package and version. All columns are strings, and the date can again be parsed as a ymd date:

package: package name. Joins to the feature of the same name in the overview table. Table is sorted alphabetically according to this column.

version: historical or current package version. Also joins. Secondary sorting column within each package name.

date: when this version was published. Should sort in the same way as the version does.

repository: on CRAN or in the Archive.

Acknowledgements

All data is being made publicly available by the Comprehensive R Archive Network (CRAN). I'm grateful to the authors and maintainers of the packages tools and packageRank for providing the functionality to query CRAN packages smoothly and easily.

The vignette photo is the official logo for the R language © 2016 The R Foundation. You can distribute the logo under the terms of the Creative Commons Attribution-ShareAlike 4.0 International license...
The Pizza Problem
kaggle.com
zip
Updated Feb 8, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeremy Jeanne (2019). The Pizza Problem [Dataset]. https://www.kaggle.com/jeremyjeanne/google-hashcode-pizza-training-2019
Explore at:
zip(178852 bytes)Available download formats
Dataset updated
Feb 8, 2019
Authors
Jeremy Jeanne
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Problem description

Pizza

The pizza is represented as a rectangular, 2-dimensional grid of R rows and C columns. The cells within the grid are referenced using a pair of 0-based coordinates [r, c] , denoting respectively the row and the column of the cell.

Each cell of the pizza contains either:

mushroom, represented in the input file as M tomato, represented in the input file as T

Slice

A slice of pizza is a rectangular section of the pizza delimited by two rows and two columns, without holes. The slices we want to cut out must contain at least L cells of each ingredient (that is, at least L cells of mushroom and at least L cells of tomato) and at most H cells of any kind in total - surprising as it is, there is such a thing as too much pizza in one slice. The slices being cut out cannot overlap. The slices being cut do not need to cover the entire pizza.

Goal

The goal is to cut correct slices out of the pizza maximizing the total number of cells in all slices. Input data set The input data is provided as a data set file - a plain text file containing exclusively ASCII characters with lines terminated with a single ‘ ’ character at the end of each line (UNIX- style line endings).

File format

The file consists of:

one line containing the following natural numbers separated by single spaces: R (1 ≤ R ≤ 1000) is the number of rows C (1 ≤ C ≤ 1000) is the number of columns L (1 ≤ L ≤ 1000) is the minimum number of each ingredient cells in a slice H (1 ≤ H ≤ 1000) is the maximum total number of cells of a slice

Google 2017, All rights reserved.

R lines describing the rows of the pizza (one after another). Each of these lines contains C characters describing the ingredients in the cells of the row (one cell after another). Each character is either ‘M’ (for mushroom) or ‘T’ (for tomato).

Example

3 5 1 6 TTTTT TMMMT TTTTT

3 rows, 5 columns, min 1 of each ingredient per slice, max 6 cells per slice

Example input file.

Submissions

File format

The file must consist of:

one line containing a single natural number S (0 ≤ S ≤ R × C) , representing the total number of slices to be cut, U lines describing the slices. Each of these lines must contain the following natural numbers separated by single spaces: r 1 , c 1 , r 2 , c 2 describe a slice of pizza delimited by the rows r (0 ≤ r1,r2 < R, 0 ≤ c1, c2 < C) 1 and r 2 and the columns c 1 and c 2 , including the cells of the delimiting rows and columns. The rows ( r 1 and r 2 ) can be given in any order. The columns ( c 1 and c 2 ) can be given in any order too.

Example

0 0 2 1 0 2 2 2 0 3 2 4

3 slices.

First slice between rows (0,2) and columns (0,1). Second slice between rows (0,2) and columns (2,2). Third slice between rows (0,2) and columns (3,4). Example submission file.

© Google 2017, All rights reserved.

Slices described in the example submission file marked in green, orange and purple. Validation

For the solution to be accepted:

the format of the file must match the description above, each cell of the pizza must be included in at most one slice, each slice must contain at least L cells of mushroom, each slice must contain at least L cells of tomato, total area of each slice must be at most H

Scoring

The submission gets a score equal to the total number of cells in all slices. Note that there are multiple data sets representing separate instances of the problem. The final score for your team is the sum of your best scores on the individual data sets. Scoring example

The example submission file given above cuts the slices of 6, 3 and 6 cells, earning 6 + 3 + 6 = 15 points.
o
Jacob Kaplan's Concatenated Files: Uniform Crime Reporting Program Data: Law...
openicpsr.org
search.gesis.org
Updated Mar 25, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacob Kaplan (2018). Jacob Kaplan's Concatenated Files: Uniform Crime Reporting Program Data: Law Enforcement Officers Killed and Assaulted (LEOKA) 1960-2018 [Dataset]. http://doi.org/10.3886/E102180V7
Explore at:
Unique identifier
https://doi.org/10.3886/E102180V7
Dataset updated
Mar 25, 2018
Dataset provided by
University of Pennsylvania
Authors
Jacob Kaplan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
1960 - 2018
Area covered
United States
Description
For any questions about this data please email me at jacob@crimedatatool.com. If you use this data, please cite it.Version 7 release notes:Add data from 2018Version 6 release notes:Adds data in the following formats: SPSS and Excel.Changes project name to avoid confusing this data for the ones done by NACJD.Version 5 release notes: Adds data for 1960-1974 and 2017. Note: many columns (including number of female officers) will always have a value of 0 for years prior to 1971.Removes support for .csv and .sav files.Adds a number_of_months_reported variable for each agency-year. A month is considered reported if the month_indicator column for that month has a value of "normal update" or "reported, not data."The formatting of the monthly data has changed from wide to long. This means that each agency-month has a single row. The old data had each agency being a single row with each month-category (e.g. jan_officers_killed_by_felony) being a column. Now there will just be a single column for each category (e.g. officers_killed_by_felony) and the month can be identified in the month column. This also results in most column names changing. As such, be careful when aggregating the monthly data since some variables are the same every month (e.g. number of officers employed is measured annually) so aggregating will be 12 times as high as the real value for those variables. Adds a date column. This date column is always set to the first of the month. It is NOT the date that a crime occurred or was reported. It is only there to make it easier to create time-series graphs that require a date input.All the data in this version was acquired from the FBI as text/DAT files and read into R using the package asciiSetupReader. The FBI also provided a PDF file explaining how to create the setup file to read the data. Both the FBI's PDF and the setup file I made are included in the zip files. Data is the same as from NACJD but using all FBI files makes cleaning easier as all column names are already identical. Version 4 release notes: Add data for 2016.Order rows by year (descending) and ORI.Version 3 release notes: Fix bug where Philadelphia Police Department had incorrect FIPS county code. The LEOKA data sets contain highly detailed data about the number of officers/civilians employed by an agency and how many officers were killed or assaulted. All the data was acquired from the FBI as text/DAT files and read into R using the package asciiSetupReader. The FBI also provided a PDF file explaining how to create the setup file to read the data. Both the FBI's PDF and the setup file I made are included in the zip files. About 7% of all agencies in the data report more officers or civilians than population. As such, I removed the officers/civilians per 1,000 population variables. You should exercise caution if deciding to generate and use these variables yourself. Several agency had impossible large (>15) officer deaths in a single month. For those months I changed the value to NA. See the R code for a complete list. For the R code used to clean this data, see here. https://github.com/jacobkap/crime_data.The UCR Handbook (https://ucr.fbi.gov/additional-ucr-publications/ucr_handbook.pdf/view) describes the LEOKA data as follows:"The UCR Program collects data from all contributing agencies ... on officer line-of-duty deaths and assaults. Reporting agencies must submit data on ... their own duly sworn officers feloniously or accidentally killed or assaulted in the line of duty. The purpose of this data collection is to identify situations in which officers are killed or assaulted, describe the incidents statistically, and publish the data to aid agencies in developing policies to improve officer safety."... agencies must record assaults on sworn officers. Reporting agencies must count all assaults that resulted in serious injury or assaults in which a weapon was used that could have caused serious injury or death. They must include other assaults not causing injury if the assault involved more than mere verbal abuse or minor resistance to an arrest. In other words, agencies must include in this section all assaults on officers, whether or not the officers sustained injuries."
Data and Scripts Associated with the Manuscript “Water Column Respiration in...
osti.gov
search.dataone.org
Updated Jan 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
River Corridor Hydro-biogeochemistry from Molecular to Multi-Basin Scales SFA (2024). Data and Scripts Associated with the Manuscript “Water Column Respiration in the Yakima River Basin is Explained by Temperature, Nutrients and Suspended Solids” [Dataset]. http://doi.org/10.15485/2283171
Explore at:
Unique identifier
https://doi.org/10.15485/2283171
Dataset updated
Jan 16, 2024
Dataset provided by
Office of Sciencehttp://www.er.doe.gov/
River Corridor Hydro-biogeochemistry from Molecular to Multi-Basin Scales SFA
Area covered
Yakima River
Description
This data package is associated with the publication “Water Column Respiration in the Yakima River Basin is Explained by Temperature, Nutrients and Suspended Solids” submitted to EGU Biogeochemistry (Laan et al. 2025). In this research, water column respiration (ERwc) data, surface water chemistry data, organic matter (OM) chemistry data, and publicly available geospatial data were used in analysis to evaluate the variability in ERwc at 47 sites across the Yakima River basin in Washington, USA.In addition to this readme, this data package also includes a file-level metadata (FLMD) file that describes each file and a data dictionary (DD) that describes all column/row headers and variable definitions.The data package includes the data inputs, and outputs, and R scripts to reproduce all the analyses performed in the manuscript and create manuscript figures. The data package is comprised of three main folders (Code, Data, and Figures). The Code folder is comprised of four scripts and three analysis-specific subfolders that contain the R scripts to perform the analyses described in the publication and create publication figures. The Data folder is comprised of two “.csv” files and four subfolders that contain data input and output files. The Published_Data folder contains a readme that directs the user to download the appropriate files and add to this folder when using scripts. The Figures folder includes figures from the manuscript in “.pdf” and “.png” formats and a folder with intermediate figure files. This data package is associated with a GitHub repository which can be found at https://github.com/river-corridors-sfa/rcsfa-RC2-SPS-ERwc.
Huge US 514 Stocks + 1298 columns Market Data 25Gb
kaggle.com
zip
Updated Jan 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oleg Shpagin (2024). Huge US 514 Stocks + 1298 columns Market Data 25Gb [Dataset]. https://www.kaggle.com/datasets/olegshpagin/extra-us-stocks-market-data
Explore at:
zip(8646680017 bytes)Available download formats
Dataset updated
Jan 2, 2024
Authors
Oleg Shpagin
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Area covered
United States
Description
Huge US Stocks prices + 1292 columns extra data from Indicators. This Dataset provides historical Open, High, Low, Close, and Volume (OHLCV) prices of stocks traded in the United States financial markets AND calculated 1292 columns of indicators. You can use all this hyge data for stock price predictions.

Columns with Momentum Indicator values ADX - Average Directional Movement Index ADXR - Average Directional Movement Index Rating APO - Absolute Price Oscillator AROON - Aroon AROONOSC - Aroon Oscillator BOP - Balance Of Power CCI - Commodity Channel Index CMO - Chande Momentum Oscillator DX - Directional Movement Index MACD - Moving Average Convergence/Divergence MACDEXT - MACD with controllable MA type MACDFIX - Moving Average Convergence/Divergence Fix 12/26 MFI - Money Flow Index MINUS_DI - Minus Directional Indicator MINUS_DM - Minus Directional Movement MOM - Momentum PLUS_DI - Plus Directional Indicator PLUS_DM - Plus Directional Movement PPO - Percentage Price Oscillator ROC - Rate of change : ((price/prevPrice)-1)*100 ROCP - Rate of change Percentage: (price-prevPrice)/prevPrice ROCR - Rate of change ratio: (price/prevPrice) ROCR100 - Rate of change ratio 100 scale: (price/prevPrice)*100 RSI - Relative Strength Index STOCH - Stochastic STOCHF - Stochastic Fast STOCHRSI - Stochastic Relative Strength Index TRIX - 1-day Rate-Of-Change (ROC) of a Triple Smooth EMA ULTOSC - Ultimate Oscillator WILLR - Williams' %R

Columns with Volatility Indicator values ATR - Average True Range NATR - Normalized Average True Range TRANGE - True Range

Columns with Volume Indicator values AD - Chaikin A/D Line ADOSC - Chaikin A/D Oscillator OBV - On Balance Volume

Columns with Overlap Studies values BBANDS - Bollinger Bands DEMA - Double Exponential Moving Average EMA - Exponential Moving Average HT_TRENDLINE - Hilbert Transform - Instantaneous Trendline KAMA - Kaufman Adaptive Moving Average MA - Moving average MAMA - MESA Adaptive Moving Average MAVP - Moving average with variable period MIDPOINT - MidPoint over period MIDPRICE - Midpoint Price over period SAR - Parabolic SAR SAREXT - Parabolic SAR - Extended SMA - Simple Moving Average T3 - Triple Exponential Moving Average (T3) TEMA - Triple Exponential Moving Average TRIMA - Triangular Moving Average WMA - Weighted Moving Average

Columns with Cycle Indicator values HT_DCPERIOD - Hilbert Transform - Dominant Cycle Period HT_DCPHASE - Hilbert Transform - Dominant Cycle Phase HT_PHASOR - Hilbert Transform - Phasor Components HT_SINE - Hilbert Transform - SineWave HT_TRENDMODE - Hilbert Transform - Trend vs Cycle Mode

If you want to download actual data - on today for example, then you can use python code from my github. tickers = ['CE.US', 'WELL.US', 'GRMN.US', 'IEX.US', 'CAG.US', 'BEN.US', 'ATO.US', 'WY.US', 'TSCO.US', 'COR.US', 'MOS.US', 'SWKS.US', 'ORCL.US', 'URI.US', 'INCY.US', 'MPC.US', 'HD.US', 'PPG.US', 'NUE.US', 'DDOG.US', 'HSIC.US', 'CAT.US', 'HSY.US', 'MKTX.US', 'CCEP.US', 'GWW.US', 'LEN.US', 'IFF.US', 'GL.US', 'MDB.US', 'SNPS.US', 'KR.US', 'DVN.US', 'SYY.US', 'USB.US', 'DRI.US', 'PARA.US', 'FMC.US', 'UBER.US', 'WRK.US', 'DLR.US', 'SO.US', 'AMGN.US', 'MA.US', 'STT.US', 'BWA.US', 'KVUE.US', 'GFS.US', 'BBY.US', 'BK.US', 'MRVL.US', 'VFC.US', 'EIX.US', 'ADSK.US', 'ZBH.US', 'MU.US', 'HUBB.US', 'PEAK.US', 'CVX.US', 'CPB.US', 'GILD.US', 'BXP.US', 'DD.US', 'MCD.US', 'KDP.US', 'GE.US', 'PKG.US', 'HST.US', 'WTW.US', 'XOM.US', 'ED.US', 'SPG.US', 'PFG.US', 'LVS.US', 'FAST.US', 'ROST.US', 'TTD.US', 'CNC.US', 'PGR.US', 'CMI.US', 'TEAM.US', 'MELI.US', 'BKR.US', 'EBAY.US', 'CPRT.US', 'MSFT.US', 'HOLX.US', 'ABBV.US', 'AMZN.US', 'FE.US', 'WYNN.US', 'KMI.US', 'APA.US', 'CRWD.US', 'DPZ.US', 'EQT.US', 'NOC.US', 'TAP.US', 'ETR.US', 'T.US', 'OMC.US', 'MTCH.US', 'TRMB.US', 'EXPE.US', 'DTE.US', 'PNR.US', 'LH.US', 'ALL.US', 'CTRA.US', 'VMC.US', 'XRAY.US', 'NWS.US', 'GOOGL.US', 'WEC.US', 'BIIB.US', 'LLY.US', 'BMY.US', 'STE.US', 'NI.US', 'MKC.US', 'AMT.US', 'CFG.US', 'LW.US', 'HIG.US', 'ETSY.US', 'AON.US', 'ULTA.US', 'DVA.US', 'LKQ.US', 'MPWR.US', 'TEL.US', 'FICO.US', 'CVS.US', 'CMA.US', 'NVDA.US', 'TDG.US', 'AWK.US', 'PSA.US', 'FOXA.US', 'ON.US', 'ODFL.US', 'NVR.US', 'ROP.US', 'TFX.US', 'HLT.US', 'EXPD.US', 'FOX.US', 'D.US', 'AMAT.US', 'AZO.US', 'DLTR.US', 'TT.US', 'SBUX.US', 'JNJ.US', 'HAS.US', 'DASH.US', 'NRG.US', 'JNPR.US', 'BIO.US', 'AMD.US', 'NFLX.US', 'VLTO.US', 'BRO.US', 'REGN.US', 'WRB.US', 'LRCX.US', 'SYK.US', 'MCO.US', 'CSGP.US', 'TROW.US', 'ETN.US', 'RTX.US', 'CRM.US', 'SIRI.US', 'UPS.US', 'HES.US', 'RSG.US', 'PEP.US', 'MET.US', 'HON.US', 'IQV.US', 'JPM.US', 'DG.US', 'CBRE.US', 'NDSN.US', 'DOW.US', 'SBAC.US', 'TSN.US', 'IT.US', 'WM.US', 'TPR.US', 'IBM.US', 'CHTR.US', 'HAL.US', 'ROL.US', 'FDS.US', 'SHW.US', 'EW.US', 'RJF.US', 'APH.US', 'AIZ.US', 'ZBRA.US', 'SRE.US', 'CTAS.US', 'PXD.US', 'MTD.US', 'NOW.US', 'MAS.US', 'FFIV.US', 'ELV.US', 'SYF.US', 'CSCO.US', 'APTV...
r
Inequality measures based on election data 1871 and 1892 for Swedish...
researchdata.se
demo.researchdata.se
Updated Apr 30, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sara Moricz (2019). Inequality measures based on election data 1871 and 1892 for Swedish municipalities [Dataset]. http://doi.org/10.5878/cw7b-g897
Explore at:
(429925)Available download formats
Unique identifier
https://doi.org/10.5878/cw7b-g897
Dataset updated
Apr 30, 2019
Dataset provided by
Lund University
Authors
Sara Moricz
Time period covered
1871
Area covered
Sweden
Description
The data contains inequality measures at the municipality-level for 1892 and 1871, as estimated in the PhD thesis "Institutions, Inequality and Societal Transformations" by Sara Moricz. The data also contains the source publications: 1) tabel 1 from “Bidrag till Sverige official statistik R) Valstatistik. XI. Statistiska Centralbyråns underdåniga berättelse rörande kommunala rösträtten år 1892” (biSOS R 1892) 2) tabel 1 from “Bidrag till Sverige official statistik R) Valstatistik. II. Statistiska Centralbyråns underdåniga berättelse rörande kommunala rösträtten år 1871” (biSOS R 1871)

moricz_inequality_agriculture.csv

A UTF-8 encoded .csv-file. Each row is a municipality of the agricultural sample (2222 in total). Each column is a variable.

R71muncipality_id: a unique identifier for the municipalities in the R1871 publication (the municipality name can be obtained from the source data) R92muncipality_id: a unique identifier for the municipalities in the R1892 publication (the municipality name can be obtained from the source data) agriTop1_1871: an ordinal measure (ranking) of the top 1 income share in the agricultural sector for 1871 agriTop1_1892: an ordinal measure (ranking) of the top 1 income share in the agricultural sector for 1892 highestFarm_1871: a cardinal measure of the top 1 person share in the agricultural sector for 1871 highestFarm_1871: a cardinal measure of the top 1 person share in the agricultural sector for 1892

moricz_inequality_industry.csv

A UTF-8 encoded .csv-file. Each row is a municipality of the industrial sample (1328 in total). Each column is a variable.

R71muncipality_id: see above description R92muncipality_id: see above description indTop1_1871: an ordinal measure (ranking) of the top 1 income share in the industrial sector for 1871 indTop1_1892: an ordinal measure (ranking) of the top 1 income share in the industrial sector for 1892

moricz_R1892_source_data.csv

A UTF-8 encoded .csv-file with the source data. The variables are described in the adherent codebook moricz_R1892_source_data_codebook.csv.

Contains table 1 from “Bidrag till Sverige official statistik R) Valstatistik. XI. Statistiska Centralbyråns underdåniga berättelse rörande kommunala rösträtten år 1892” (biSOS R 1892). SCB provides the scanned publication on their website. Dollar Typing Service typed and delivered the data in 2015. All numerical variables but two have been checked. This is easy to do since nearly all columns should sum up to another column. For “Folkmangd” (population) the numbers have been corrected against U1892. The highest estimate of errors in the variables is 0.005 percent (0.5 promille), calculated at cell level. The two numerical variables which have not been checked is “hogsta_fyrk_jo“ and “hogsta_fyrk_ov“, as this cannot much be compared internally in the data. According to my calculations as the worst case scenario, I have measurement errors of 0.0043 percent (0.43 promille) in those variables.

moricz_R1871_source_data.csv

A UTF-8 encoded .csv-file with the source data. The variables are described in the adherent codebook moricz_R1871_source_data_codebook.csv.

Contains table 1 from “Bidrag till Sverige official statistik R) Valstatistik. II. Statistiska Centralbyråns underdåniga berättelse rörande kommunala rösträtten år 1871” (biSOS R 1871). SCB provides the scanned publication on their website. Dollar Typing Service typed and delivered the data in 2015. The variables have been checked for accuracy, which is feasible since columns and rows should sum. The variables that most likely carry mistakes are “hogsta_fyrk_al” and “hogsta_fyrk_jo”.
l
LScDC Word-Category RIG Matrix
figshare.le.ac.uk
pdf
Updated Apr 28, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neslihan Suzen (2020). LScDC Word-Category RIG Matrix [Dataset]. http://doi.org/10.25392/leicester.data.12133431.v2
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.25392/leicester.data.12133431.v2
Dataset updated
Apr 28, 2020
Dataset provided by
University of Leicester
Authors
Neslihan Suzen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
LScDC Word-Category RIG MatrixApril 2020 by Neslihan Suzen, PhD student at the University of Leicester (ns433@leicester.ac.uk / suzenneslihan@hotmail.com)Supervised by Prof Alexander Gorban and Dr Evgeny MirkesGetting StartedThis file describes the Word-Category RIG Matrix for theLeicester Scientific Corpus (LSC) [1], the procedure to build the matrix and introduces the Leicester Scientific Thesaurus (LScT) with the construction process. The Word-Category RIG Matrix is a 103,998 by 252 matrix, where rows correspond to words of Leicester Scientific Dictionary-Core (LScDC) [2] and columns correspond to 252 Web of Science (WoS) categories [3, 4, 5]. Each entry in the matrix corresponds to a pair (category,word). Its value for the pair shows the Relative Information Gain (RIG) on the belonging of a text from the LSC to the category from observing the word in this text. The CSV file of Word-Category RIG Matrix in the published archive is presented with two additional columns of the sum of RIGs in categories and the maximum of RIGs over categories (last two columns of the matrix). So, the file ‘Word-Category RIG Matrix.csv’ contains a total of 254 columns.This matrix is created to be used in future research on quantifying of meaning in scientific texts under the assumption that words have scientifically specific meanings in subject categories and the meaning can be estimated by information gains from word to categories. LScT (Leicester Scientific Thesaurus) is a scientific thesaurus of English. The thesaurus includes a list of 5,000 words from the LScDC. We consider ordering the words of LScDC by the sum of their RIGs in categories. That is, words are arranged in their informativeness in the scientific corpus LSC. Therefore, meaningfulness of words evaluated by words’ average informativeness in the categories. We have decided to include the most informative 5,000 words in the scientific thesaurus. Words as a Vector of Frequencies in WoS CategoriesEach word of the LScDC is represented as a vector of frequencies in WoS categories. Given the collection of the LSC texts, each entry of the vector consists of the number of texts containing the word in the corresponding category.It is noteworthy that texts in a corpus do not necessarily belong to a single category, as they are likely to correspond to multidisciplinary studies, specifically in a corpus of scientific texts. In other words, categories may not be exclusive. There are 252 WoS categories and a text can be assigned to at least 1 and at most 6 categories in the LSC. Using the binary calculation of frequencies, we introduce the presence of a word in a category. We create a vector of frequencies for each word, where dimensions are categories in the corpus.The collection of vectors, with all words and categories in the entire corpus, can be shown in a table, where each entry corresponds to a pair (word,category). This table is build for the LScDC with 252 WoS categories and presented in published archive with this file. The value of each entry in the table shows how many times a word of LScDC appears in a WoS category. The occurrence of a word in a category is determined by counting the number of the LSC texts containing the word in a category. Words as a Vector of Relative Information Gains Extracted for CategoriesIn this section, we introduce our approach to representation of a word as a vector of relative information gains for categories under the assumption that meaning of a word can be quantified by their information gained for categories.For each category, a function is defined on texts that takes the value 1, if the text belongs to the category, and 0 otherwise. For each word, a function is defined on texts that takes the value 1 if the word belongs to the text, and 0 otherwise. Consider LSC as a probabilistic sample space (the space of equally probable elementary outcomes). For the Boolean random variables, the joint probability distribution, the entropy and information gains are defined.The information gain about the category from the word is the amount of information on the belonging of a text from the LSC to the category from observing the word in the text [6]. We used the Relative Information Gain (RIG) providing a normalised measure of the Information Gain. This provides the ability of comparing information gains for different categories. The calculations of entropy, Information Gains and Relative Information Gains can be found in the README file in the archive published. Given a word, we created a vector where each component of the vector corresponds to a category. Therefore, each word is represented as a vector of relative information gains. It is obvious that the dimension of vector for each word is the number of categories. The set of vectors is used to form the Word-Category RIG Matrix, in which each column corresponds to a category, each row corresponds to a word and each component is the relative information gain from the word to the category. In Word-Category RIG Matrix, a row vector represents the corresponding word as a vector of RIGs in categories. We note that in the matrix, a column vector represents RIGs of all words in an individual category. If we choose an arbitrary category, words can be ordered by their RIGs from the most informative to the least informative for the category. As well as ordering words in each category, words can be ordered by two criteria: sum and maximum of RIGs in categories. The top n words in this list can be considered as the most informative words in the scientific texts. For a given word, the sum and maximum of RIGs are calculated from the Word-Category RIG Matrix.RIGs for each word of LScDC in 252 categories are calculated and vectors of words are formed. We then form the Word-Category RIG Matrix for the LSC. For each word, the sum (S) and maximum (M) of RIGs in categories are calculated and added at the end of the matrix (last two columns of the matrix). The Word-Category RIG Matrix for the LScDC with 252 categories, the sum of RIGs in categories and the maximum of RIGs over categories can be found in the database.Leicester Scientific Thesaurus (LScT)Leicester Scientific Thesaurus (LScT) is a list of 5,000 words form the LScDC [2]. Words of LScDC are sorted in descending order by the sum (S) of RIGs in categories and the top 5,000 words are selected to be included in the LScT. We consider these 5,000 words as the most meaningful words in the scientific corpus. In other words, meaningfulness of words evaluated by words’ average informativeness in the categories and the list of these words are considered as a ‘thesaurus’ for science. The LScT with value of sum can be found as CSV file with the published archive. Published archive contains following files:1) Word_Category_RIG_Matrix.csv: A 103,998 by 254 matrix where columns are 252 WoS categories, the sum (S) and the maximum (M) of RIGs in categories (last two columns of the matrix), and rows are words of LScDC. Each entry in the first 252 columns is RIG from the word to the category. Words are ordered as in the LScDC.2) Word_Category_Frequency_Matrix.csv: A 103,998 by 252 matrix where columns are 252 WoS categories and rows are words of LScDC. Each entry of the matrix is the number of texts containing the word in the corresponding category. Words are ordered as in the LScDC.3) LScT.csv: List of words of LScT with sum (S) values. 4) Text_No_in_Cat.csv: The number of texts in categories. 5) Categories_in_Documents.csv: List of WoS categories for each document of the LSC.6) README.txt: Description of Word-Category RIG Matrix, Word-Category Frequency Matrix and LScT and forming procedures.7) README.pdf (same as 6 in PDF format)References[1] Suzen, Neslihan (2019): LSC (Leicester Scientific Corpus). figshare. Dataset. https://doi.org/10.25392/leicester.data.9449639.v2[2] Suzen, Neslihan (2019): LScDC (Leicester Scientific Dictionary-Core). figshare. Dataset. https://doi.org/10.25392/leicester.data.9896579.v3[3] Web of Science. (15 July). Available: https://apps.webofknowledge.com/[4] WoS Subject Categories. Available: https://images.webofknowledge.com/WOKRS56B5/help/WOS/hp_subject_category_terms_tasca.html [5] Suzen, N., Mirkes, E. M., & Gorban, A. N. (2019). LScDC-new large scientific dictionary. arXiv preprint arXiv:1912.06858. [6] Shannon, C. E. (1948). A mathematical theory of communication. Bell system technical journal, 27(3), 379-423.
Data from: BEING A TREE CROP INCREASES THE ODDS OF EXPERIENCING YIELD...
zenodo.org
bin, zip
Updated Aug 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marcelo Adrián Aizen; Marcelo Adrián Aizen; Gabriela Gleiser; Gabriela Gleiser; Thomas Kitzberger; Thomas Kitzberger; Rubén Milla; Rubén Milla (2023). BEING A TREE CROP INCREASES THE ODDS OF EXPERIENCING YIELD DECLINES IRRESPECTIVE OF POLLINATOR DEPENDENCE [Dataset]. http://doi.org/10.5281/zenodo.7863825
Explore at:
zip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7863825
Dataset updated
Aug 8, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Marcelo Adrián Aizen; Marcelo Adrián Aizen; Gabriela Gleiser; Gabriela Gleiser; Thomas Kitzberger; Thomas Kitzberger; Rubén Milla; Rubén Milla
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Marcelo A. Aizen, Gabriela R. Gleiser, Thomas Kitzberger, Ruben Milla. Being a tree crop increases the odds of experiencing yield declines irrespective of pollinator dependence (to be submitted to PCI)

Data and R scripts to reproduce the analyses and the figures shown in the paper. All analyses were performed using R 4.0.2.

Data

1. FAOdata_21-12-2021.csv

This file includes yearly data (1961-2020, column 8) on yield and cultivated area (columns 6 and 10) at the country, sub-regional, and regional levels (column 2) for each crop (column 4) drawn from the United Nations Food and Agriculture Organization database (data available at http://www.fao.org/faostat/en; accessed July 21-12-2021). [Used in Script 1 to generate the synthesis dataset]

2. countries.csv

This file provides information on the region (column 2) to which each country (column 1) belongs. [Used in Script 1 to generate the synthesis dataset]

3. dependence.csv

This file provides information on the pollinator dependence category (column 2) of each crop (column 1).

4. traits.csv

This file provides information on the traits of each crop other than pollinator dependence, including, besides the crop name (column1), the variables type of harvested organ (column 5) and growth form (column 6). [Used in Script 1 to generate the synthesis dataset]

5. dataset.csv

The synthesis dataset generated by Script 1.

6. growth.csv

The yield growth dataset generated by Script 1 and used as input by Scripts 2 and 3.

7. phylonames.csv

This file lists all the crops (column 1) and their equivalent tip names in the crop phylogeny (column 2). [Used in Script 2 for the phylogenetically-controlled analyses]

8.phylo137.tre

File containing the phylogenetic tree.

Scripts

1. dataset

This R script curates and merges all the individual datasets mentioned above into a single dataset, estimating and adding to this single dataset the growth rate for each crop and country, and the (log) cumulative harvested area per crop and country over the period 1961-2020.

2. analyses

This R script includes all the analyses described in the article’s main text.

3. figures

This R script creates all the main and supplementary figures of this article.

4. lme4_phylo_setup

R function written by Li and Bolker (2019) to carry out phylogenetically-controlled generalized linear mixed-effects models as described in the main text of the article.

References

Li, M., and B. Bolker. 2019. wzmli/phyloglmm: First release of phylogenetic comparative analysis in lme4- verse. Zenodo. https://doi.org/10.5281/zenodo.2639887.
Z
Virtual Reality Balance Disturbance Dataset
data-staging.niaid.nih.gov
Updated Oct 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ferrete Ribeiro, Nuno; Pires, Henrique; P. Santos, Cristina (2024). Virtual Reality Balance Disturbance Dataset [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_14013467
Explore at:
Dataset updated
Oct 31, 2024
Dataset provided by
University of Minho
Authors
Ferrete Ribeiro, Nuno; Pires, Henrique; P. Santos, Cristina
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Background and Purpose:

There are very few publicly available datasets on real-world falls in scientific literature due to the lack of natural falls and the inherent difficulties in gathering biomechanical and physiological data from young subjects or older adults residing in their communities in a non-intrusive and user-friendly manner. This data gap hindered research on fall prevention strategies. Immersive Virtual Reality (VR) environments provide a unique solution.

This dataset supports research in fall prevention by providing an immersive VR setup that simulates diverse ecological environments and randomized visual disturbances, aimed at triggering and analyzing balance-compensatory reactions. The dataset is a unique tool for studying human balance responses to VR-induced perturbations, facilitating research that could inform training programs, wearable assistive technologies, and VR-based rehabilitation methods.

Dataset Content:The dataset includes:

Kinematic Data: Captured using a full-body Xsens MVN Awinda inertial measurement system, providing detailed movement data at 60 Hz.

Muscle Activity (EMG): Recorded at 1111 Hz using Delsys Trigno for tracking muscle contractions.

Electrodermal Activity (EDA)*: Captured at 100.21 Hz with a Shimmer GSR device on the dominant forearm to record physiological responses to perturbations.

Metadata: Includes participant demographics (age, height, weight, gender, dominant hand and foot), trial conditions, and perturbation characteristics (timing and type).

The files are named in the format "ParticipantX_labelled", where X represents the participant's number. Each file is provided in a .mat format, with data already synchronized across different sensor sources. The structure of each file is organized into the following columns:

Column 1: Label indicating the visual perturbation applied. 0 means no visual perturbation.

Column 2: Timestamp, providing the precise timing of each recorded data point.

Column 3: Frame identifier, which can be cross-referenced with the MVN file for detailed motion analysis.

Columns 4 to 985: Xsens motion capture features, exported directly from the MVN file.

Columns 986 to 993: EMG data - Tibialis Anterior (R&L), Gastrocnemius Medial Head (R&L), Rectus Femoris (R), Semitendinosus (R), External Oblique (R), Sternocleidomastoid (R).

Columns 994 to 1008: Shimmer data: Accelerometer (x,y,z), Gyroscope (x,y,z), Magnetometer (x,y,z), GSR Range, Skin Conductance, Skin Resistance, PPG, Pressure, Temperature.

In addition, we are also releasing the .MVN and .MVNA files for each participant (1 to 10), which provide comprehensive motion capture data and include the participants' body measurements, respectively. This additional data enables precise body modeling and further in-depth biomechanical analysis.

Participants & VR Headset:

Twelve healthy young adults (average age: 25.09 ± 2.81 years; height: 167.82 ± 8.40 cm; weight: 64.83 ± 7.77 kg; 6 males, 6 females) participated in this study (Table 1). Participants met the following criteria: i) healthy locomotion, ii) stable postural balance, iii) age ≥ 18 years, and iv) body weight < 135 kg.

Participants were excluded if they: i) had any condition affecting locomotion, ii) had epilepsy, vestibular disorders, or other neurological conditions impacting stability, iii) had undergone recent surgeries impacting mobility, iv) were involved in other experimental studies, v) were under judicial protection or guardianship, or vi) experienced complications using VR headsets (e.g., motion sickness).

All participants provided written informed consent, adhering to the ethical guidelines set by the University of Minho Ethics Committee (CEICVS 063/2021), in compliance with the Declaration of Helsinki and the Oviedo Convention.

To ensure unbiased reactions, participants were kept unaware of the specific protocol details. Visual disturbances were introduced in a random sequence and at various locations, enhancing the unpredictability of the experiment and simulating a naturalistic response.

The VR setup involved an HTC Vive Pro headset with two wirelessly synchronized base stations that tracked participants’ head movements within a 5m x 2.5m area. The base stations adjusted the VR environment’s perspective according to head movements, while controllers were used solely for setup purposes.

Table 1 - Participants' demographic information

Participant Height (cm) Weight (kg) Age Gender Dom. Hand Dom. Foot

1 159 56.5 23 F Right Right

2 157 55.3 28 F Right Right

3 174 67.1 31 M Right Right

4 176 73.8 23 M Right Right

5 158 57.3 23 F Right Right

6 181 70.9 27 M Right Right

7 171 73.3 23 M Right Right

8 159 69.2 28 F Right Right

9 177 57.3 22 M Right Right

10 171 75.5 25 M Right Right

11 163 58.1 23 F Right Right

12 168 63.7 25 F Right Right

Data Collection Methodology:

The experimental protocol was designed to integrate four essential components: (i) precise control over stimuli, (ii) high reproducibility of the experimental conditions, (iii) preservation of ecological validity, and (iv) promotion of real-world learning transfer.

Participant Instructions and Familiarization Trial: Before starting, participants were given specific instructions to (i) seek assistance if they experienced motion sickness, (ii) adjust the VR headset for comfort by modifying the lens distance and headset fit, (iii) stay within the defined virtual play area demarcated by a blue boundary, and (iv) complete a familiarization trial. During this trial, participants were encouraged to explore various virtual environments while performing a sequence of three key movements—walking forward, turning around, and returning to the initial location—without any visual perturbations. This familiarization phase helped participants acclimate to the virtual space in a controlled setting.

Experimental Protocol and Visual Perturbations: Participants were exposed to 11 different types of visual perturbations as outlined in Table 2, applied across a total of 35 unique perturbation variants (Table 3). Each variant involved the same type of perturbation, such as a clockwise Roll Axis Tilt, but varied in intensity (e.g., rotation speed) and was presented in randomized virtual locations. The selection of perturbation types was grounded in existing literature on visual disturbances. This design ensured that participants experienced a diverse range of visual effects in a manner that maintained ecological validity, supporting the potential for generalization to real-world scenarios where visual perturbations might occur spontaneously.

Protocol Flow and Randomized Presentation: Throughout the experimental protocol, each visual perturbation variant was presented three times, and participants engaged repeatedly in the familiarization activities over a nearly one-hour period. These activities—walking forward, turning around, and returning to the starting point—took place in a 5m x 2.5m physical space mirrored in VR, allowing participants to take 7–10 steps before turning. Participants were not informed of the timing or nature of any perturbations, which could occur unpredictably during their forward walk, adding a realistic element of surprise. After each return to the starting point, participants were relocated to a random position within the virtual environment, with the sequence of positions determined by a randomized, computer-generated order.

Table 2 - Visual perturbations' name and parameters (L - Lateral; B - Backward; F - Forward; S - Slip; T - Trip; CW- Clockwise; CCW - Counter-Clockwise)

Perturbation [Fall Category]

Parameters

Roll Axis Tilt - CW [L] [10º, 20º, 30º] during 0.5s

Roll Axis Tilt – CCW [L] [10º, 20º, 30º] during 0.5s

Support Surface ML Axis Translation - Bidirectional [L] Discrete Movement (static pauses between movements) – 1 m/s

AP Axis Translation - Front [F] 1 m/s

AP Axis Translation - Backwards [B] 1 m/s

Pitch Axis Tilt [S] 0º-25º, 60º/s

Virtual object with lower height than a real object [T] Variable object height

Roll-Pitch-Yaw Axis Tilt [Syncope] Sum of sinusoids drive each axis rotation

Scene Object Movement [L] Objects fly towards the subject’s head. Variable speeds

Vertigo Sensation [F/L] Walk at a comfortable speed. With and without avatar. House’s height

Axial Axis Translation [F/B/L] Free fall

Table 3 - Label Encoding

Visual Perturbation Label Visual Perturbation Label Visual Perturbation Label

Roll Indoor 1 CW10 1 Roll Indoor 1 CW20 2 Roll Indoor 1 CW30 3

Roll Indoor 1 CCW10 4 Roll Indoor 1 CCW20 5 Roll Indoor 1 CCW30 6

Roll Indoor 2 CW10 7 Roll Indoor 2 CW20 8 Roll Indoor 2 CW30 9

Roll Indoor 2 CCW10 10 Roll Indoor 2 CCW20 11 Roll Indoor 2 CCW30 12

Roll Outdoor CW10 13 Roll Outdoor CW20 14 Roll Outdoor CW30 15

Roll Outdoor CCW10 16 Roll Outdoor CCW20 17 Roll Outdoor CCW30 18

ML-Axis Trans. - Kitchen 19 AP-Axis Trans. - Corridor Forward 20 AP-Axis Trans. - Corridor Backward 21

Pitch Indoor - Bathroom (wet floor) 22 Pitch Indoor - Near Fridge (wet floor) 23 Roof Beam Walking - Vertigo 24

Roof Beam Walking - Vertigo No Avatar 25 Simple Roof - Vertigo 26 Simple Roof - Vertigo No Avatar 27

Pitch Outdoor - Near Car Oil 28 Trip - Sidewalk / Trip Shock 29/290 Bedroom Syncope 30

Garden - Object Avoidance 31 Electricity Pole - Vertigo 32 Electricity Pole - No Avatar 33

Free Fall 34 Climbing Virtual Stairs 35

Some data from Shimmer device was collected but not used or checked by the research team.
Z
Data and scripts for the analysis of the influence of crop pollinator...
data.niaid.nih.gov
Updated Aug 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aizen, Marcelo Adrián; Gleiser, Gabriela; Kitzberger, Thomas; Milla, Rubén (2023). Data and scripts for the analysis of the influence of crop pollinator dependence and growth form on yield decline [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7863824
Explore at:
Dataset updated
Aug 8, 2023
Dataset provided by
Departamento de Biología, Geología, Física y Química Inorgánica, Universidad Rey Juan 5 Carlos, Tulipán s/n, 28933 Móstoles, Spain
Instituto de Investigaciones en Biodiversidad y Medioambiente (INIBIOMA), Universidad Nacional del Comahue-CONICET, Pasaje Gutierrez 1415, 8400 San Carlos de Bariloche, Río Negro, Argentina.
Authors
Aizen, Marcelo Adrián; Gleiser, Gabriela; Kitzberger, Thomas; Milla, Rubén
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Marcelo A. Aizen, Gabriela R. Gleiser, Thomas Kitzberger, Ruben Milla. Being a tree crop increases the odds of experiencing yield declines irrespective of pollinator dependence (to be submitted to PCI)

Data and R scripts to reproduce the analyses and the figures shown in the paper. All analyses were performed using R 4.0.2.

Data

FAOdata_21-12-2021.csv

This file includes yearly data (1961-2020, column 8) on yield and cultivated area (columns 6 and 10) at the country, sub-regional, and regional levels (column 2) for each crop (column 4) drawn from the United Nations Food and Agriculture Organization database (data available at http://www.fao.org/faostat/en; accessed July 21-12-2021). [Used in Script 1 to generate the synthesis dataset]

countries.csv

This file provides information on the region (column 2) to which each country (column 1) belongs. [Used in Script 1 to generate the synthesis dataset]

dependence.csv

This file provides information on the pollinator dependence category (column 2) of each crop (column 1).

traits.csv

This file provides information on the traits of each crop other than pollinator dependence, including, besides the crop name (column1), the variables type of harvested organ (column 5) and growth form (column 6). [Used in Script 1 to generate the synthesis dataset]

dataset.csv

The synthesis dataset generated by Script 1.

growth.csv

The yield growth dataset generated by Script 1 and used as input by Scripts 2 and 3.

phylonames.csv

This file lists all the crops (column 1) and their equivalent tip names in the crop phylogeny (column 2). [Used in Script 2 for the phylogenetically-controlled analyses]

8.phylo137.tre

File containing the phylogenetic tree.

Scripts

dataset

This R script curates and merges all the individual datasets mentioned above into a single dataset, estimating and adding to this single dataset the growth rate for each crop and country, and the (log) cumulative harvested area per crop and country over the period 1961-2020.

analyses

This R script includes all the analyses described in the article’s main text.

figures

This R script creates all the main and supplementary figures of this article.

lme4_phylo_setup

R function written by Li and Bolker (2019) to carry out phylogenetically-controlled generalized linear mixed-effects models as described in the main text of the article.

References

Li, M., and B. Bolker. 2019. wzmli/phyloglmm: First release of phylogenetic comparative analysis in lme4- verse. Zenodo. https://doi.org/10.5281/zenodo.2639887.
Z
Data from: Russian Financial Statements Database: A firm-level collection of...
data.niaid.nih.gov
Updated Mar 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bondarkov, Sergey; Ledenev, Victor; Skougarevskiy, Dmitriy (2025). Russian Financial Statements Database: A firm-level collection of the universe of financial statements [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_14622208
Explore at:
Dataset updated
Mar 14, 2025
Dataset provided by
European University at St. Petersburg
European University at St Petersburg
Authors
Bondarkov, Sergey; Ledenev, Victor; Skougarevskiy, Dmitriy
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
The Russian Financial Statements Database (RFSD) is an open, harmonized collection of annual unconsolidated financial statements of the universe of Russian firms:

🔓 First open data set with information on every active firm in Russia.

🗂️ First open financial statements data set that includes non-filing firms.

🏛️ Sourced from two official data providers: the Rosstat and the Federal Tax Service.

📅 Covers 2011-2023 initially, will be continuously updated.

🏗️ Restores as much data as possible through non-invasive data imputation, statement articulation, and harmonization.

The RFSD is hosted on 🤗 Hugging Face and Zenodo and is stored in a structured, column-oriented, compressed binary format Apache Parquet with yearly partitioning scheme, enabling end-users to query only variables of interest at scale.

The accompanying paper provides internal and external validation of the data: http://arxiv.org/abs/2501.05841.

Here we present the instructions for importing the data in R or Python environment. Please consult with the project repository for more information: http://github.com/irlcode/RFSD.

Importing The Data

You have two options to ingest the data: download the .parquet files manually from Hugging Face or Zenodo or rely on 🤗 Hugging Face Datasets library.

Python

🤗 Hugging Face Datasets

It is as easy as:

from datasets import load_dataset import polars as pl

This line will download 6.6GB+ of all RFSD data and store it in a 🤗 cache folder

RFSD = load_dataset('irlspbru/RFSD')

Alternatively, this will download ~540MB with all financial statements for 2023# to a Polars DataFrame (requires about 8GB of RAM)

RFSD_2023 = pl.read_parquet('hf://datasets/irlspbru/RFSD/RFSD/year=2023/*.parquet')

Please note that the data is not shuffled within year, meaning that streaming first n rows will not yield a random sample.

Local File Import

Importing in Python requires pyarrow package installed.

import pyarrow.dataset as ds import polars as pl

Read RFSD metadata from local file

RFSD = ds.dataset("local/path/to/RFSD")

Use RFSD_dataset.schema to glimpse the data structure and columns' classes

print(RFSD.schema)

Load full dataset into memory

RFSD_full = pl.from_arrow(RFSD.to_table())

Load only 2019 data into memory

RFSD_2019 = pl.from_arrow(RFSD.to_table(filter=ds.field('year') == 2019))

Load only revenue for firms in 2019, identified by taxpayer id

RFSD_2019_revenue = pl.from_arrow( RFSD.to_table( filter=ds.field('year') == 2019, columns=['inn', 'line_2110'] ) )

Give suggested descriptive names to variables

renaming_df = pl.read_csv('local/path/to/descriptive_names_dict.csv') RFSD_full = RFSD_full.rename({item[0]: item[1] for item in zip(renaming_df['original'], renaming_df['descriptive'])})

R

Local File Import

Importing in R requires arrow package installed.

library(arrow) library(data.table)

Read RFSD metadata from local file

RFSD <- open_dataset("local/path/to/RFSD")

Use schema() to glimpse into the data structure and column classes

schema(RFSD)

Load full dataset into memory

scanner <- Scanner$create(RFSD) RFSD_full <- as.data.table(scanner$ToTable())

Load only 2019 data into memory

scan_builder <- RFSD$NewScan() scan_builder$Filter(Expression$field_ref("year") == 2019) scanner <- scan_builder$Finish() RFSD_2019 <- as.data.table(scanner$ToTable())

Load only revenue for firms in 2019, identified by taxpayer id

scan_builder <- RFSD$NewScan() scan_builder$Filter(Expression$field_ref("year") == 2019) scan_builder$Project(cols = c("inn", "line_2110")) scanner <- scan_builder$Finish() RFSD_2019_revenue <- as.data.table(scanner$ToTable())

Give suggested descriptive names to variables

renaming_dt <- fread("local/path/to/descriptive_names_dict.csv") setnames(RFSD_full, old = renaming_dt$original, new = renaming_dt$descriptive)

Use Cases

🌍 For macroeconomists: Replication of a Bank of Russia study of the cost channel of monetary policy in Russia by Mogiliat et al. (2024) — interest_payments.md

🏭 For IO: Replication of the total factor productivity estimation by Kaukin and Zhemkova (2023) — tfp.md

🗺️ For economic geographers: A novel model-less house-level GDP spatialization that capitalizes on geocoding of firm addresses — spatialization.md

FAQ

Why should I use this data instead of Interfax's SPARK, Moody's Ruslana, or Kontur's Focus?hat is the data period?

To the best of our knowledge, the RFSD is the only open data set with up-to-date financial statements of Russian companies published under a permissive licence. Apart from being free-to-use, the RFSD benefits from data harmonization and error detection procedures unavailable in commercial sources. Finally, the data can be easily ingested in any statistical package with minimal effort.

What is the data period?

We provide financials for Russian firms in 2011-2023. We will add the data for 2024 by July, 2025 (see Version and Update Policy below).

Why are there no data for firm X in year Y?

Although the RFSD strives to be an all-encompassing database of financial statements, end users will encounter data gaps:

We do not include financials for firms that we considered ineligible to submit financial statements to the Rosstat/Federal Tax Service by law: financial, religious, or state organizations (state-owned commercial firms are still in the data).

Eligible firms may enjoy the right not to disclose under certain conditions. For instance, Gazprom did not file in 2022 and we had to impute its 2022 data from 2023 filings. Sibur filed only in 2023, Novatek — in 2020 and 2021. Commercial data providers such as Interfax's SPARK enjoy dedicated access to the Federal Tax Service data and therefore are able source this information elsewhere.

Firm may have submitted its annual statement but, according to the Uniform State Register of Legal Entities (EGRUL), it was not active in this year. We remove those filings.

Why is the geolocation of firm X incorrect?

We use Nominatim to geocode structured addresses of incorporation of legal entities from the EGRUL. There may be errors in the original addresses that prevent us from geocoding firms to a particular house. Gazprom, for instance, is geocoded up to a house level in 2014 and 2021-2023, but only at street level for 2015-2020 due to improper handling of the house number by Nominatim. In that case we have fallen back to street-level geocoding. Additionally, streets in different districts of one city may share identical names. We have ignored those problems in our geocoding and invite your submissions. Finally, address of incorporation may not correspond with plant locations. For instance, Rosneft has 62 field offices in addition to the central office in Moscow. We ignore the location of such offices in our geocoding, but subsidiaries set up as separate legal entities are still geocoded.

Why is the data for firm X different from https://bo.nalog.ru/?

Many firms submit correcting statements after the initial filing. While we have downloaded the data way past the April, 2024 deadline for 2023 filings, firms may have kept submitting the correcting statements. We will capture them in the future releases.

Why is the data for firm X unrealistic?

We provide the source data as is, with minimal changes. Consider a relatively unknown LLC Banknota. It reported 3.7 trillion rubles in revenue in 2023, or 2% of Russia's GDP. This is obviously an outlier firm with unrealistic financials. We manually reviewed the data and flagged such firms for user consideration (variable outlier), keeping the source data intact.

Why is the data for groups of companies different from their IFRS statements?

We should stress that we provide unconsolidated financial statements filed according to the Russian accounting standards, meaning that it would be wrong to infer financials for corporate groups with this data. Gazprom, for instance, had over 800 affiliated entities and to study this corporate group in its entirety it is not enough to consider financials of the parent company.

Why is the data not in CSV?

The data is provided in Apache Parquet format. This is a structured, column-oriented, compressed binary format allowing for conditional subsetting of columns and rows. In other words, you can easily query financials of companies of interest, keeping only variables of interest in memory, greatly reducing data footprint.

Version and Update Policy

Version (SemVer): 1.0.0.

We intend to update the RFSD annualy as the data becomes available, in other words when most of the firms have their statements filed with the Federal Tax Service. The official deadline for filing of previous year statements is April, 1. However, every year a portion of firms either fails to meet the deadline or submits corrections afterwards. Filing continues up to the very end of the year but after the end of April this stream quickly thins out. Nevertheless, there is obviously a trade-off between minimization of data completeness and version availability. We find it a reasonable compromise to query new data in early June, since on average by the end of May 96.7% statements are already filed, including 86.4% of all the correcting filings. We plan to make a new version of RFSD available by July.

Licence

Creative Commons License Attribution 4.0 International (CC BY 4.0).

Copyright © the respective contributors.

Citation

Please cite as:

@unpublished{bondarkov2025rfsd, title={{R}ussian {F}inancial {S}tatements {D}atabase}, author={Bondarkov, Sergey and Ledenev, Victor and Skougarevskiy, Dmitriy}, note={arXiv preprint arXiv:2501.05841}, doi={https://doi.org/10.48550/arXiv.2501.05841}, year={2025}}

Acknowledgments and Contacts

Data collection and processing: Sergey Bondarkov, sbondarkov@eu.spb.ru, Viktor Ledenev, vledenev@eu.spb.ru

Project conception, data validation, and use cases: Dmitriy Skougarevskiy, Ph.D.,
o
Uniform Crime Reporting (UCR) Program Data: Supplementary Homicide Reports,...
openicpsr.org
Updated Jun 1, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacob Kaplan (2017). Uniform Crime Reporting (UCR) Program Data: Supplementary Homicide Reports, 1976-2016 [Dataset]. http://doi.org/10.3886/E100699V5
Explore at:
Unique identifier
https://doi.org/10.3886/E100699V5
Dataset updated
Jun 1, 2017
Dataset provided by
University of Pennsylvania. Department of Criminology
Authors
Jacob Kaplan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
1976 - 2015
Area covered
United States
Description
Version 5 release notes:Adds 2016 dataStandardizes the "group" column which categorizes cities and counties by population.Arrange rows in descending order by year and ascending order by ORI. Version 4 release notes: Fix bug where Philadelphia Police Department had incorrect FIPS county code. Version 3 Release Notes:Merges data with LEAIC data to add FIPS codes, census codes, agency type variables, and ORI9 variable.Change column names for relationship variables from offender_n_relation_to_victim_1 to victim_1_relation_to_offender_n to better indicate that all relationship are victim 1's relationship to each offender. Reorder columns.This is a single file containing all data from the Supplementary Homicide Reports from 1976 to 2015. The Supplementary Homicide Report provides detailed information about the victim, offender, and circumstances of the murder. Details include victim and offender age, sex, race, ethnicity (Hispanic/not Hispanic), the weapon used, circumstances of the incident, and the number of both offenders and victims. All the data was downloaded from NACJD as ASCII+SPSS Setup files and cleaned using R. The "cleaning" just means that column names were standardized (different years have slightly different spellings for many columns). Standardization of column names is necessary to stack multiple years together. Categorical variables (e.g. state) were also standardized (i.e. fix spelling errors, have terminology be the same across years). The following is the summary of the Supplementary Homicide Report copied from ICPSR's 2015 page for the data.The Uniform Crime Reporting Program Data: Supplementary Homicide Reports (SHR) provide detailed information on criminal homicides reported to the police. These homicides consist of murders; non-negligent killings also called non-negligent manslaughter; and justifiable homicides. UCR Program contributors compile and submit their crime data by one of two means: either directly to the FBI or through their State UCR Programs. State UCR Programs frequently impose mandatory reporting requirements which have been effective in increasing both the number of reporting agencies as well as the number and accuracy of each participating agency's reports. Each agency may be identified by its numeric state code, alpha-numeric agency ("ORI") code, jurisdiction population, and population group. In addition, each homicide incident is identified by month of occurrence and situation type, allowing flexibility in creating aggregations and subsets.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Supplement 1. R code for estimating thresholds while accounting for variable detection and data for estimating thresholds for forest birds, Oregon, USA, 2007–2008.

Explore at:

htmlAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.3552231.v1

Dataset updated

Jun 2, 2023

Dataset provided by

Wileyhttps://www.wiley.com/

Authors

Jay E. Jones; Andrew J. Kroll; Jack Giovanini; Steven D. Duke; Matthew G. Betts

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Area covered

Oregon

Description

    Stand id
    Percent cover of conifer species
    Percent cover of broadleaf species
    Percent cover of deciduous broadleaf species
    Percent cover of hardwood species
    Percent cover of hardwood species in a 2000 m radius circle around each sample stand
    Elevation (m) of stand
    Age of stand
    Year of sampling
    Visit number
    Detection of Magnolia Warbler on Visit 1
    Detection of Magnolia Warbler on Visit 2
    Detection of Orange-crowned Warbler on Visit 1
    Detection of Orange-crowned Warbler on Visit 2
    Detection of Swainson’s Thrush on Visit 1
    Detection of Swainson’s Thrush on Visit 2
    Detection of Willow Flycatcher on Visit 1
    Detection of Willow Flycatcher on Visit 2
    Detection of Wilson’s Warbler on Visit 1
    Detection of Wilson’s Warbler on Visit 1

  Checksum values are:

    Column 2 (Percent cover of conifer species – CONIFER): SUM = 5862.83
    Column 3 (Percent cover of broadleaf species – BROAD): SUM = 7043.17
    Column 4 (Percent cover of deciduous broadleaf species – DECBROAD): SUM = 5475.17
    Column 5 (Percent cover of hardwood species – HARDWOOD): SUM = 2151.96
    Column 6 (Percent cover of hardwood species in a 2000 m radius circle around each sample stand– HWD2000): SUM = 3486.07
    Column 7 (Stand elevation – ELEVM): SUM = 83240.58
    Column 8 (Stand age – AGE): SUM = 1537; NA indicates a stand was harvested in 2008
    Column 9 (Year of sampling – YEAR): SUM = 425792
    Column 11 (MGWA.1): SUM = 70
    Column 12 (MGWA.2): SUM = 71
    Column 13 (OCWA.1): SUM = 121
    Column 14 (OCWA.2): SUM = 76
    Column 15 (SWTH.1): SUM = 90
    Column 16 (SWTH.2): SUM = 95
    Column 17 (WIFL.1): SUM = 85
    Column 18 (WIFL.2): SUM = 85
    Column 19 (WIWA.1): SUM = 36
    Column 20 (WIWA.2): SUM = 37

  The Supplement_R code.r file is R source code for simulation and empirical analyses conducted in Jones et al.

Clear search

Close search

Google apps

Main menu

Supplement 1. R code for estimating thresholds while accounting for variable...

Uniform Crime Reporting (UCR) Program Data: Arrests by Age, Sex, and Race,...

R Package History on CRAN

Context

Content

Column Description

Acknowledgements

The Pizza Problem

Jacob Kaplan's Concatenated Files: Uniform Crime Reporting Program Data: Law...

Data and Scripts Associated with the Manuscript “Water Column Respiration in...

Huge US 514 Stocks + 1298 columns Market Data 25Gb

Inequality measures based on election data 1871 and 1892 for Swedish...

moricz_inequality_agriculture.csv

moricz_inequality_industry.csv

moricz_R1892_source_data.csv

moricz_R1871_source_data.csv

LScDC Word-Category RIG Matrix

Data from: BEING A TREE CROP INCREASES THE ODDS OF EXPERIENCING YIELD...

Virtual Reality Balance Disturbance Dataset

Data and scripts for the analysis of the influence of crop pollinator...

Data from: Russian Financial Statements Database: A firm-level collection of...

This line will download 6.6GB+ of all RFSD data and store it in a 🤗 cache folder

Alternatively, this will download ~540MB with all financial statements for 2023# to a Polars DataFrame (requires about 8GB of RAM)

Read RFSD metadata from local file

Use RFSD_dataset.schema to glimpse the data structure and columns' classes

Load full dataset into memory

Load only 2019 data into memory

Load only revenue for firms in 2019, identified by taxpayer id

Give suggested descriptive names to variables