94 datasets found
  1. P

    C4 Dataset

    • paperswithcode.com
    Updated Dec 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Colin Raffel; Noam Shazeer; Adam Roberts; Katherine Lee; Sharan Narang; Michael Matena; Yanqi Zhou; Wei Li; Peter J. Liu, C4 Dataset [Dataset]. https://paperswithcode.com/dataset/c4
    Explore at:
    Dataset updated
    Dec 13, 2023
    Authors
    Colin Raffel; Noam Shazeer; Adam Roberts; Katherine Lee; Sharan Narang; Michael Matena; Yanqi Zhou; Wei Li; Peter J. Liu
    Description

    C4 is a colossal, cleaned version of Common Crawl's web crawl corpus. It was based on Common Crawl dataset: https://commoncrawl.org. It was used to train the T5 text-to-text Transformer models.

    The dataset can be downloaded in a pre-processed form from allennlp.

  2. h

    c4-en-10k

    • huggingface.co
    • opendatalab.com
    Updated Jun 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stas Bekman (2025). c4-en-10k [Dataset]. https://huggingface.co/datasets/stas/c4-en-10k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 11, 2025
    Authors
    Stas Bekman
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This is a small subset representing the first 10K records of the original C4 dataset, "en" subset - created for testing. The records were extracted after having been shuffled.

    The full 1TB+ dataset is at https://huggingface.co/datasets/c4.

  3. P

    Data from: mC4 Dataset

    • paperswithcode.com
    • opendatalab.com
    • +1more
    Updated Jun 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Linting Xue; Noah Constant; Adam Roberts; Mihir Kale; Rami Al-Rfou; Aditya Siddhant; Aditya Barua; Colin Raffel (2022). mC4 Dataset [Dataset]. https://paperswithcode.com/dataset/mc4
    Explore at:
    Dataset updated
    Jun 8, 2022
    Authors
    Linting Xue; Noah Constant; Adam Roberts; Mihir Kale; Rami Al-Rfou; Aditya Siddhant; Aditya Barua; Colin Raffel
    Description

    mC4 is a multilingual variant of the C4 dataset called mC4. mC4 comprises natural text in 101 languages drawn from the public Common Crawl web scrape.

  4. h

    c4-tokenized-2b

    • huggingface.co
    Updated Aug 19, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neel Nanda (2023). c4-tokenized-2b [Dataset]. https://huggingface.co/datasets/NeelNanda/c4-tokenized-2b
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 19, 2023
    Authors
    Neel Nanda
    Description

    Dataset Card for "c4-tokenized-2b"

    More Information needed

  5. (Table C4) Accumulation rate and grain size analysis of ODCP Site 151-909

    • doi.pangaea.de
    html, tsv
    Updated 1999
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amelie Winkler (1999). (Table C4) Accumulation rate and grain size analysis of ODCP Site 151-909 [Dataset]. http://doi.org/10.1594/PANGAEA.56007
    Explore at:
    html, tsvAvailable download formats
    Dataset updated
    1999
    Dataset provided by
    PANGAEA
    Authors
    Amelie Winkler
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Time period covered
    Aug 15, 1993 - Sep 11, 1993
    Area covered
    Variables measured
    AGE, Sample code/label, DEPTH, sediment/rock, Accumulation rate, > 0.5 mm, Size fraction > 1 mm, gravel, Size fraction 0.250-0.125 mm, 2.0-3.0 phi, fine sand, Size fraction 0.500-0.250 mm, 1.0-2.0 phi, medium sand, Size fraction 1.000-0.500 mm, 0.0-1.0 phi, coarse sand, Size fraction 0.125-0.063 mm, 3.0-4.0 phi, very fine sand
    Description

    This dataset is about: (Table C4) Accumulation rate and grain size analysis of ODCP Site 151-909. Please consult parent dataset @ https://doi.org/10.1594/PANGAEA.712694 for more information.

  6. d

    Data from: C4 photosynthesis boosts growth by altering physiology,...

    • datadryad.org
    zip
    Updated Mar 30, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rebecca R. L. Atkinson; Emily J. Mockford; Christopher Bennett; Pascal-Antoine Christin; Elizabeth L. Spriggs; Robert P. Freckleton; Ken Thompson; Mark Rees; Colin P. Osborne (2017). C4 photosynthesis boosts growth by altering physiology, allocation and size [Dataset]. http://doi.org/10.5061/dryad.16860
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 30, 2017
    Dataset provided by
    Dryad
    Authors
    Rebecca R. L. Atkinson; Emily J. Mockford; Christopher Bennett; Pascal-Antoine Christin; Elizabeth L. Spriggs; Robert P. Freckleton; Ken Thompson; Mark Rees; Colin P. Osborne
    Time period covered
    2017
    Area covered
    Global
    Description

    Experiment data DOI 10.1038_NPLANTS.2016.38Experimental data used for the comparative growth analysis reported by Atkinson et al. (2016) DOI 10.1038_NPLANTS.2016.38. The phylogeny used for this analysis is included as supplementary data with the paper.Experiment data DOI0.1038_NPLANTS.2016.38.csv

  7. o

    Data from: New Insights Into the Evolution of C4 Photosynthesis Offered by...

    • omicsdi.org
    Updated Nov 30, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). New Insights Into the Evolution of C4 Photosynthesis Offered by the Tarenaya Cluster of Cleomaceae [Dataset]. https://www.omicsdi.org/dataset/biostudies/S-EPMC8803641
    Explore at:
    Dataset updated
    Nov 30, 2021
    Variables measured
    Unknown
    Description

    Cleomaceae is closely related to Brassicaceae and includes C3, C3–C4, and C4 species. Thus, this family represents an interesting system for studying the evolution of the carbon concentrating mechanism. However, inadequate genetic information on Cleomaceae limits their research applications. Here, we characterized 22 Cleomaceae accessions [3 genera (Cleoserrata, Gynandropsis, and Tarenaya) and 11 species] in terms of genome size; molecular phylogeny; as well as anatomical, biochemical, and photosynthetic traits. We clustered the species into seven groups based on genome size. Interestingly, despite clear differences in genome size (2C, ranging from 0.55 to 1.3 pg) in Tarenaya spp., this variation was not consistent with phylogenetic grouping based on the internal transcribed spacer (ITS) marker, suggesting the occurrence of multiple polyploidy events within this genus. Moreover, only G. gynandra, which possesses a large nuclear genome, exhibited the C4 metabolism. Among the C3-like species, we observed intra- and interspecific variation in nuclear genome size as well as in biochemical, physiological, and anatomical traits. Furthermore, the C3-like species had increased venation density and bundle sheath cell size, compared to C4 species, which likely predisposed the former lineages to C4 photosynthesis. Accordingly, our findings demonstrate the potential of Cleomaceae, mainly members of Tarenaya, in offering novel insights into the evolution of C4 photosynthesis.

  8. Vietnamese Curated Dataset

    • kaggle.com
    Updated Jan 26, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nguyen Duc Y (2025). Vietnamese Curated Dataset [Dataset]. https://www.kaggle.com/datasets/ndy001/vietnamese-curated-dataset-2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 26, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Nguyen Duc Y
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Description

    Vietnamese Curated Text Dataset. This dataset is collected from multiple open Vietnamese datasets, and curated with NeMo Curator

    • Developed by: Viettel Solutions
    • Language: Vietnamese

    Details

    Please visit our Tech Blog post on NVIDIA's plog page for details. Link

    Data Collection

    We utilize a combination of datasets that contain samples in Vietnamese language, ensuring a robust and representative text corpus. These datasets include: - The Vietnamese subset of the C4 dataset . - The Vietnamese subset of the OSCAR dataset, version 23.01. - Wikipedia's Vietnamese articles. - Binhvq's Vietnamese news corpus.

    Preprocessing

    We use NeMo Curator to curate the collected data. The data curation pipeline includes these key steps: 1. Unicode Reformatting: Texts are standardized into a consistent Unicode format to avoid encoding issues. 2. Exact Deduplication: Removes exact duplicates to reduce redundancy. 3. Quality Filtering: 4. Heuristic Filtering: Applies rules-based filters to remove low-quality content. 5. Classifier-Based Filtering: Uses machine learning to classify and filter documents based on quality.

    Notebook

    Dataset Statistics

    Content diversity https://cdn-uploads.huggingface.co/production/uploads/661766c00c68b375f3f0ccc3/mW6Pct3uyP_XDdGmE8EP3.png" alt="Domain proportion in curated dataset">

    Character based metrics https://cdn-uploads.huggingface.co/production/uploads/661766c00c68b375f3f0ccc3/W9TQjM2vcC7uXozyERHSQ.png" alt="Box plots of percentage of symbols, numbers, and whitespace characters compared to the total characters, word counts and average word lengths">

    Token count distribution https://cdn-uploads.huggingface.co/production/uploads/661766c00c68b375f3f0ccc3/PDelYpBI0DefSmQgFONgE.png" alt="Distribution of document sizes (in terms of token count)">

    Embedding visualization https://cdn-uploads.huggingface.co/production/uploads/661766c00c68b375f3f0ccc3/sfeoZWuQ7DcSpbmUOJ12r.png" alt="UMAP visualization of 5% of the dataset"> UMAP visualization of 5% of the dataset

  9. P

    FineWeb Dataset

    • paperswithcode.com
    Updated May 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). FineWeb Dataset [Dataset]. https://paperswithcode.com/dataset/fineweb
    Explore at:
    Dataset updated
    May 27, 2025
    Description

    The FineWeb dataset consists of more than 15T tokens of cleaned and deduplicated English web data from CommonCrawl. The data processing pipeline is optimized for LLM performance and runs on the datatrove library, our large-scale data processing library.

    FineWeb was originally meant to be a fully open replication of RefinedWeb, with a release of the full dataset under the ODC-By 1.0 license. However, by carefully adding additional filtering steps, we managed to push the performance of FineWeb well above that of the original RefinedWeb, and models trained on our dataset also outperform models trained on other commonly used high-quality web datasets (like C4, Dolma-v1.6, The Pile, SlimPajama, RedPajam2) on our aggregate group of benchmark tasks.

  10. Data from: Responses of C4 grasses to aridity reflect species-specific...

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Aug 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicole Havrilchak; Jason West (2024). Responses of C4 grasses to aridity reflect species-specific strategies in a semiarid savanna [Dataset]. http://doi.org/10.5061/dryad.7m0cfxq39
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 1, 2024
    Dataset provided by
    Texas A&M University
    Authors
    Nicole Havrilchak; Jason West
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The C4 Poaceae are a diverse group both in terms of evolutionary lineage and biochemistry. There is a distinct pattern in the distribution of C4 grass groups with aridity, however, the mechanistic basis for this distribution is not well understood. Additionally, few studies have investigated the functional strategies of cooccurring C4 grass species for dealing with aridity in their natural environments. We explored the coordination of leaf-level gas exchange, water use, and morphology among five co-occurring semiarid C4 grasses belonging to divergent clades, biochemical subtypes, and size classes at three sites along a natural aridity gradient. More specifically, we measured pre-dawn and midday water potential, stomatal conductance, water use efficiency, and photosynthesis. Leaf tissue was also collected for analysis of stable isotopes of carbon and oxygen as well as for measurement of specific leaf area (SLA) and leaf width. Species differences in the responsiveness of stomata to changes in vapor pressure deficit were also assessed. It was expected that NAD-me species would maintain higher rates of photosynthesis, higher water use efficiency, and have more responsive stomata than other cooccurring species based on observed biogeographic patterns and past greenhouse studies. We found that Aristidoideae and Chloridoideae NAD-me-type grasses had greater stomatal sensitivity to VPD, consistent with a more isohydric strategy. However, midgrasses had both greater apparent water access and water use efficiency, regardless of subtype or lineage. PCK-type had less responsive stomata and maintained lower levels of photosynthesis with increasing aridity. There were strong interspecific differences in 13C, leaf width, and SLA, however these were not significantly correlated with water use efficiency. C4 grasses in our study did not fit discretely into functional groups as defined by lineage, biochemistry, or size class. Interspecific differences, evolutionary legacy, and biochemical pathways are likely to interact to determine water use and photosynthetic strategies of these plants. Control of water loss via highly responsive stomata may form the basis for the dominance of certain C4 grass groups in arid environments. These findings build on our understanding of contrasting strategies of C4 grasses for dealing with aridity in their natural environments. Methods Species and Site Descriptions To explore relationships between physiology, morphology, and grass subtype across a natural aridity gradient (across sites) and across species (within sites), we measured a suite of gas exchange parameters (photosynthesis, stomatal conductance, water use efficiency), water stress (leaf water potential), morphology (specific leaf area, leaf width), and bulk leaf stable isotopes of carbon and oxygen in a variety of C4 grasses commonly found in North American semi-arid savannas. Five species representing different combinations of biochemical subtype, phylogenetic lineage, and physiognomy (Aristida wrightii Nash [NADP-ME, Aristidoideae, midgrass], Bouteloua curtipendula (Michx.) Torr. [NAD-ME/PCK, Chloridoideae, midgrass], Erioneuron pilosum (Buckley) Nash [NAD-ME, Chloridoideae, shortgrass], Eriochloa sericea (Scheele) Munro ex Vasey [PCK, Panicoideae, midgrass], and Hilaria belangeri (Steud.) Nash [PCK, Chloridoideae, shortgrass]) was studied across three sites representing a natural precipitation gradient on the Edwards Plateau, Texas (Appendix S1: Table S2). Measurements were made from May to September across three summer growing seasons (2019-2021) at three Texas A&M AgriLife Research Ranches (Figure 1c-f), Martin (30° 48' N, 99° 50' W; MAP= 630 mm), Sonora (30° 16' N, 100° 34' W; MAP= 570 mm) and Read (30° 32' N, 101°03' W; MAP= 480 mm). The sites are characterized as Low Stony Hill and Limestone Hill ecological sites which are dominated by midgrasses and shortgrasses, annual forbs, and woody-encroaching species (Prosopis glandulosa, Quercus virginiana, Juniperus ashei, and Juniperus pinchotii; Soil Survey Team, NRCS, 2022). At Martin, the woody vegetation is primarily dominated by P. glandulosa, while at Sonora and Read Juniperus spp. dominate. Many of the tallgrasses (Sorghastrum nutans, Schizachyrium scoparium) that were historically common at these sites have been extirpated due to a legacy of overgrazing until the 1960s. For the past few decades, the three sites have been either ungrazed or only lightly grazed with low stocking rates of goats. Deer are also present at all sites. Soils at Martin are primarily Tarrant-type soils with 1-8% slopes and very cobbly clay in the top horizon (Soil Survey Team, NRCS, 2022). At Sonora, soils in our plots are dominated by the Eckrant-Rock outcrop complex with 1-20% slopes and cobbly silty clay in the top horizon as well as the Tarrant-Valera complex with 0-3% slopes and very cobbly clay to 15 inches. Soils at Read are primarily Tarrant-rock outcrop complexes (1-15% slopes or dry, 8-30% slopes). Weather towers at each site were used to monitor temperature, humidity, and rainfall throughout the course of each growing season and measurement campaign using an EE181-L and TB4MM-L wired to a CR1000 datalogger (Campbell Scientific, Logan, UT). Plots for physiological measurements were located within approximately 500 m of each tower. Meteorological data were gap-filled using TexMesoNet sites with 25 km of each field site when necessary (Texas Water Development Board, 2023). Measurements at each of the three sites were made once in 2019 and 2020 and twice in 2021 within the same weeklong period to avoid temporal differences between sites, and were only made on full-sun days approximately 2-weeks post rainfall events to avoid intervals when grasses were senescing or curling. Repeated growing season measurements were not made in 2019 and 2020 due to repeated drought events at some sites. We did not attempt to characterize growing season “dry-down” as these are pulse-driven systems with variability in timing and intensity of summer rainfall events but instead sought to characterize the behavior of species across sites when most physiologically active. All measurements were made on the top-most fully expanded leaves of perennial grass tillers currently in flower in order to ensure consistency of phenological stage across species and sites for each measurement campaign. Water Relations A Scholander-type pressure chamber (Model 600, PMS Instrument Co., Albany, OR) fitted with a grass compression gland and base was used to make predawn (Ѱpre) and midday (Ѱmid) measurements of leaf water potential. Leaf blades were cut slightly above the ligule using a razorblade, wrapped in plastic, and pressurized in the chamber until sap was visibly exuded from veins, and equilibrium pressure was recorded. Four leaf blades were used for each of the measurement intervals at each site. Predawn measurements were made between 4:30 and 6:30 AM. Midday measurements were made between 1:30 and 3:30 PM to assess the water status of grasses during the most stressful point in the day. Gas Exchange At each of the three sites, two approximately 20 x 30 m plots were established with two grass patches per species in each plot used for gas exchange measurements (net photosynthesis, Anet; stomatal conductance to water vapor, gs; transpiration, E; instantaneous water use efficiency, WUEi (Anet/E) and intrinsic water use efficiency, WUEg (Anet/gsw). A LI-6800 Portable Photosynthesis System fitted with a 2x3 cm side-to-side clear-top chamber and 3x3 cm light source was used for gas exchange measurements. Chamber settings were adjusted to mimic ambient conditions: light source was set to 1,500 µmol m⁻² s⁻¹, source carbon dioxide to 420 ppm, fan speed to 10,000 rpm, flow rate of 700 µmol s⁻¹, and temperature and relative humidity adjusted to track ambient conditions measured using a Kestrel 3000 Weather Meter throughout the day. Measurements were made beginning at approximately 9:30 AM each day after the dew had dissipated and warm-up and system tests had been performed. The infrared gas analyzers were matched in between each measurement and instantaneous measurements were logged three times per leaf blade once steady-state conditions were reached in the chamber (stable ΔCO2 and ΔH2O, positive intercellular CO2 [Ci]). The middle section of one or two leaf blade(s) was placed in the chamber for each species per patch per plot (8 individuals per species per day). Leaf tissue within the chamber was collected and placed in plastic bags with wetted paper towels on ice until corrections could be made for the leaf area within the chamber at the end of each field day using an Epson Perfection V39 flatbed scanner. Species measurements were randomly rotated so that the same species was not measured in the same order or during the same time of day. Gas exchange measurements were discarded if accidentally logged before reaching a steady state or if very low negative values of Ci were observed. Stable Isotopes (δ13C, δ18O) Approximately 15-20 leaf blades were collected at each site per plot per species for stable isotope analysis. Leaf tissue was dried at 65ºC for three days and ground using a Retesch Oscillating Mixer Ball Mill MM400 to homogenize whole tissue samples (approx. 10-20 leaf blades). Bulk leaf δ13C was determined using an Elemental Analyzer (Costech Analytical Technologies, Inc., Valencia, CA, USA) coupled to an Isotope Ratio Mass Spectrometer (Thermo Fisher Scientific, Waltham, MA, USA). Oxygen stable isotopes (δ18O) were also determined using an Elemental Analyzer coupled to an Isotope Ratio Mass Spectrometer (Thermo Fisher Scientific, Waltham, MA, USA). Isotopic ratios of bulk leaf tissue were expressed as δ13C or δ18O and calculated as follows: δ = [(Rsample/Rstandard) – 1]/1000 where Rsample and Rstandard are the ratios of

  11. Data from: Anatomical enablers and the evolution of C4 photosynthesis in...

    • zenodo.org
    • datadryad.org
    bin, txt
    Updated Jun 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pascal-Antoine Christin; Colin P. Osborne; David S. Chatelet; J. Travis Columbus; Guillaume Besnard; Trevor R. Hodkinson; Laura M. Garrison; Maria S. Vorontsova; Erika J. Edwards; Pascal-Antoine Christin; Colin P. Osborne; David S. Chatelet; J. Travis Columbus; Guillaume Besnard; Trevor R. Hodkinson; Laura M. Garrison; Maria S. Vorontsova; Erika J. Edwards (2022). Data from: Anatomical enablers and the evolution of C4 photosynthesis in grasses [Dataset]. http://doi.org/10.5061/dryad.6j9r7
    Explore at:
    bin, txtAvailable download formats
    Dataset updated
    Jun 1, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Pascal-Antoine Christin; Colin P. Osborne; David S. Chatelet; J. Travis Columbus; Guillaume Besnard; Trevor R. Hodkinson; Laura M. Garrison; Maria S. Vorontsova; Erika J. Edwards; Pascal-Antoine Christin; Colin P. Osborne; David S. Chatelet; J. Travis Columbus; Guillaume Besnard; Trevor R. Hodkinson; Laura M. Garrison; Maria S. Vorontsova; Erika J. Edwards
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    C4 photosynthesis is a series of anatomical and biochemical modifications to the typical C3 pathway that increases the productivity of plants in warm, sunny, and dry conditions. Despite its complexity, it evolved more than 62 times independently in flowering plants. However, C4 origins are absent from most plant lineages and clustered in others, suggesting that some characteristics increase C4 evolvability in certain phylogenetic groups. The C4 trait has evolved 22–24 times in grasses, and all origins occurred within the PACMAD clade, whereas the similarly sized BEP clade contains only C3 taxa. Here, multiple foliar anatomy traits of 157 species from both BEP and PACMAD clades are quantified and analyzed in a phylogenetic framework. Statistical modeling indicates that C4 evolvability strongly increases when the proportion of vascular bundle sheath (BS) tissue is higher than 15%, which results from a combination of short distance between BS and large BS cells. A reduction in the distance between BS occurred before the split of the BEP and PACMAD clades, but a decrease in BS cell size later occurred in BEP taxa. Therefore, when environmental changes promoted C4 evolution, suitable anatomy was present only in members of the PACMAD clade, explaining the clustering of C4 origins in this lineage. These results show that key alterations of foliar anatomy occurring in a C3 context and preceding the emergence of the C4 syndrome by millions of years facilitated the repeated evolution of one of the most successful physiological innovations in angiosperm history.

  12. Materials Data on SiC by Materials Project

    • osti.gov
    Updated Sep 3, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). LBNL Materials Project (2020). Materials Data on SiC by Materials Project [Dataset]. http://doi.org/10.17188/1759603
    Explore at:
    Dataset updated
    Sep 3, 2020
    Dataset provided by
    Office of Sciencehttp://www.er.doe.gov/
    Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). LBNL Materials Project
    Description

    SiC is Moissanite-6H-like structured and crystallizes in the trigonal R3m space group. The structure is three-dimensional. there are thirteen inequivalent Si4+ sites. In the first Si4+ site, Si4+ is bonded to four C4- atoms to form corner-sharing SiC4 tetrahedra. There is three shorter (1.89 Å) and one longer (1.90 Å) Si–C bond length. In the second Si4+ site, Si4+ is bonded to four C4- atoms to form corner-sharing SiC4 tetrahedra. All Si–C bond lengths are 1.90 Å. In the third Si4+ site, Si4+ is bonded to four C4- atoms to form corner-sharing SiC4 tetrahedra. There is three shorter (1.89 Å) and one longer (1.91 Å) Si–C bond length. In the fourth Si4+ site, Si4+ is bonded to four C4- atoms to form corner-sharing SiC4 tetrahedra. There is three shorter (1.89 Å) and one longer (1.90 Å) Si–C bond length. In the fifth Si4+ site, Si4+ is bonded to four C4- atoms to form corner-sharing SiC4 tetrahedra. All Si–C bond lengths are 1.89 Å. In the sixth Si4+ site, Si4+ is bonded to four C4- atoms to form corner-sharing SiC4 tetrahedra. There is three shorter (1.89 Å) and one longer (1.90 Å) Si–C bond length. In the seventh Si4+ site, Si4+ is bonded to four C4- atoms to form corner-sharing SiC4 tetrahedra. There is three shorter (1.89 Å) and one longer (1.90 Å) Si–C bond length. In the eighth Si4+ site, Si4+ is bonded to four C4- atoms to form corner-sharing SiC4 tetrahedra. There is three shorter (1.89 Å) and one longer (1.90 Å) Si–C bond length. In the ninth Si4+ site, Si4+ is bonded to four C4- atoms to form corner-sharing SiC4 tetrahedra. There is three shorter (1.89 Å) and one longer (1.90 Å) Si–C bond length. In the tenth Si4+ site, Si4+ is bonded to four C4- atoms to form corner-sharing SiC4 tetrahedra. There is three shorter (1.89 Å) and one longer (1.90 Å) Si–C bond length. In the eleventh Si4+ site, Si4+ is bonded to four C4- atoms to form corner-sharing SiC4 tetrahedra. There is three shorter (1.89 Å) and one longer (1.90 Å) Si–C bond length. In the twelfth Si4+ site, Si4+ is bonded to four C4- atoms to form corner-sharing SiC4 tetrahedra. There is three shorter (1.89 Å) and one longer (1.90 Å) Si–C bond length. In the thirteenth Si4+ site, Si4+ is bonded to four C4- atoms to form corner-sharing SiC4 tetrahedra. There is three shorter (1.89 Å) and one longer (1.91 Å) Si–C bond length. There are thirteen inequivalent C4- sites. In the first C4- site, C4- is bonded to four Si4+ atoms to form corner-sharing CSi4 tetrahedra. In the second C4- site, C4- is bonded to four Si4+ atoms to form corner-sharing CSi4 tetrahedra. In the third C4- site, C4- is bonded to four Si4+ atoms to form corner-sharing CSi4 tetrahedra. In the fourth C4- site, C4- is bonded to four Si4+ atoms to form corner-sharing CSi4 tetrahedra. In the fifth C4- site, C4- is bonded to four Si4+ atoms to form corner-sharing CSi4 tetrahedra. In the sixth C4- site, C4- is bonded to four Si4+ atoms to form corner-sharing CSi4 tetrahedra. All C–Si bond lengths are 1.89 Å. In the seventh C4- site, C4- is bonded to four Si4+ atoms to form corner-sharing CSi4 tetrahedra. In the eighth C4- site, C4- is bonded to four Si4+ atoms to form corner-sharing CSi4 tetrahedra. In the ninth C4- site, C4- is bonded to four Si4+ atoms to form corner-sharing CSi4 tetrahedra. In the tenth C4- site, C4- is bonded to four Si4+ atoms to form corner-sharing CSi4 tetrahedra. All C–Si bond lengths are 1.89 Å. In the eleventh C4- site, C4- is bonded to four Si4+ atoms to form corner-sharing CSi4 tetrahedra. In the twelfth C4- site, C4- is bonded to four Si4+ atoms to form corner-sharing CSi4 tetrahedra. In the thirteenth C4- site, C4- is bonded to four Si4+ atoms to form corner-sharing CSi4 tetrahedra.

  13. Materials Data on B2H4C by Materials Project

    • osti.gov
    Updated May 2, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Materials Project (2020). Materials Data on B2H4C by Materials Project [Dataset]. https://www.osti.gov/dataexplorer/biblio/dataset/1204140-materials-data-b2h4c-materials-project
    Explore at:
    Dataset updated
    May 2, 2020
    Dataset provided by
    United States Department of Energyhttp://energy.gov/
    Office of Sciencehttp://www.er.doe.gov/
    Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). LBNL Materials Project
    Authors
    The Materials Project
    Description

    B2CH4 crystallizes in the monoclinic P2_1/c space group. The structure is zero-dimensional and consists of four B2CH4 clusters. there are four inequivalent B sites. In the first B site, B is bonded in a 1-coordinate geometry to three B, two C4-, and one H1+ atom. There is one shorter (1.73 Å) and two longer (1.78 Å) B–B bond length. Both B–C bond lengths are 1.75 Å. The B–H bond length is 1.19 Å. In the second B site, B is bonded in a distorted trigonal non-coplanar geometry to one B and three H1+ atoms. There are a spread of B–H bond distances ranging from 1.19–1.35 Å. In the third B site, B is bonded in a distorted trigonal non-coplanar geometry to one B, one C4-, and two H1+ atoms. The B–C bond length is 1.53 Å. There is one shorter (1.19 Å) and one longer (1.32 Å) B–H bond length. In the fourth B site, B is bonded in a distorted trigonal non-coplanar geometry to one B, one C4-, and two H1+ atoms. The B–C bond length is 1.53 Å. There is one shorter (1.19 Å) and one longer (1.32 Å) B–H bond length. There are two inequivalent C4- sites.more » In the first C4- site, C4- is bonded in a 2-coordinate geometry to two B, one C4-, and one H1+ atom. The C–C bond length is 1.43 Å. The C–H bond length is 1.09 Å. In the second C4- site, C4- is bonded in a 2-coordinate geometry to two B, one C4-, and one H1+ atom. The C–H bond length is 1.09 Å. There are eight inequivalent H1+ sites. In the first H1+ site, H1+ is bonded in a single-bond geometry to one B atom. In the second H1+ site, H1+ is bonded in a single-bond geometry to one B atom. In the third H1+ site, H1+ is bonded in a single-bond geometry to one C4- atom. In the fourth H1+ site, H1+ is bonded in a single-bond geometry to one C4- atom. In the fifth H1+ site, H1+ is bonded in an L-shaped geometry to two B atoms. In the sixth H1+ site, H1+ is bonded in an L-shaped geometry to two B atoms. In the seventh H1+ site, H1+ is bonded in a single-bond geometry to one B atom. In the eighth H1+ site, H1+ is bonded in a single-bond geometry to one B atom.« less

  14. f

    Data from: Large Scale Prediction with Decision Trees

    • tandf.figshare.com
    docx
    Updated Dec 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jason M. Klusowski; Peter M. Tian (2023). Large Scale Prediction with Decision Trees [Dataset]. http://doi.org/10.6084/m9.figshare.24552254.v2
    Explore at:
    docxAvailable download formats
    Dataset updated
    Dec 18, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Jason M. Klusowski; Peter M. Tian
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This article shows that decision trees constructed with Classification and Regression Trees (CART) and C4.5 methodology are consistent for regression and classification tasks, even when the number of predictor variables grows sub-exponentially with the sample size, under natural 0-norm and 1-norm sparsity constraints. The theory applies to a wide range of models, including (ordinary or logistic) additive regression models with component functions that are continuous, of bounded variation, or, more generally, Borel measurable. Consistency holds for arbitrary joint distributions of the predictor variables, thereby accommodating continuous, discrete, and/or dependent data. Finally, we show that these qualitative properties of individual trees are inherited by Breiman’s random forests. A key step in the analysis is the establishment of an oracle inequality, which allows for a precise characterization of the goodness of fit and complexity tradeoff for a mis-specified model. Supplementary materials for this article are available online.

  15. Materials Data on CuBP4H24C8(SF)4 by Materials Project

    • osti.gov
    Updated Jan 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    USDOE Office of Science (SC), Basic Energy Sciences (BES) (2019). Materials Data on CuBP4H24C8(SF)4 by Materials Project [Dataset]. http://doi.org/10.17188/1696320
    Explore at:
    Dataset updated
    Jan 12, 2019
    Dataset provided by
    Office of Sciencehttp://www.er.doe.gov/
    Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). LBNL Materials Project
    Description

    CuP4H24(C2S)4BF4 crystallizes in the monoclinic P2_1/c space group. The structure is zero-dimensional and consists of four BF4 clusters and four CuP4H24(C2S)4 clusters. In each BF4 cluster, B3+ is bonded in a tetrahedral geometry to four F1- atoms. All B–F bond lengths are 1.42 Å. There are four inequivalent F1- sites. In the first F1- site, F1- is bonded in a single-bond geometry to one B3+ atom. In the second F1- site, F1- is bonded in a single-bond geometry to one B3+ atom. In the third F1- site, F1- is bonded in a single-bond geometry to one B3+ atom. In the fourth F1- site, F1- is bonded in a single-bond geometry to one B3+ atom. In each CuP4H24(C2S)4 cluster, Cu1+ is bonded in a tetrahedral geometry to four S2- atoms. There are a spread of Cu–S bond distances ranging from 2.32–2.35 Å. There are eight inequivalent C4- sites. In the first C4- site, C4- is bonded to one P5+ and three H+0.83+ atoms to form distorted corner-sharing CPH3 tetrahedra. The C–P bond length is 1.81 Å. All C–H bond lengths are 1.10 Å. In the second C4- site, C4- is bonded to one P5+ and three H+0.83+ atoms to form distorted corner-sharing CPH3 tetrahedra. The C–P bond length is 1.81 Å. There is one shorter (1.09 Å) and two longer (1.10 Å) C–H bond length. In the third C4- site, C4- is bonded to one P5+ and three H+0.83+ atoms to form distorted corner-sharing CPH3 tetrahedra. The C–P bond length is 1.81 Å. There is one shorter (1.09 Å) and two longer (1.10 Å) C–H bond length. In the fourth C4- site, C4- is bonded to one P5+ and three H+0.83+ atoms to form distorted corner-sharing CPH3 tetrahedra. The C–P bond length is 1.81 Å. All C–H bond lengths are 1.10 Å. In the fifth C4- site, C4- is bonded to one P5+ and three H+0.83+ atoms to form distorted corner-sharing CPH3 tetrahedra. The C–P bond length is 1.81 Å. There is one shorter (1.09 Å) and two longer (1.10 Å) C–H bond length. In the sixth C4- site, C4- is bonded to one P5+ and three H+0.83+ atoms to form distorted corner-sharing CPH3 tetrahedra. The C–P bond length is 1.81 Å. All C–H bond lengths are 1.10 Å. In the seventh C4- site, C4- is bonded to one P5+ and three H+0.83+ atoms to form distorted corner-sharing CPH3 tetrahedra. The C–P bond length is 1.81 Å. All C–H bond lengths are 1.10 Å. In the eighth C4- site, C4- is bonded to one P5+ and three H+0.83+ atoms to form distorted corner-sharing CPH3 tetrahedra. The C–P bond length is 1.81 Å. All C–H bond lengths are 1.10 Å. There are four inequivalent P5+ sites. In the first P5+ site, P5+ is bonded in a trigonal non-coplanar geometry to two C4- and one S2- atom. The P–S bond length is 1.98 Å. In the second P5+ site, P5+ is bonded in a trigonal non-coplanar geometry to two C4- and one S2- atom. The P–S bond length is 1.99 Å. In the third P5+ site, P5+ is bonded in a trigonal non-coplanar geometry to two C4- and one S2- atom. The P–S bond length is 1.99 Å. In the fourth P5+ site, P5+ is bonded in a trigonal non-coplanar geometry to two C4- and one S2- atom. The P–S bond length is 1.98 Å. There are twenty-four inequivalent H+0.83+ sites. In the first H+0.83+ site, H+0.83+ is bonded in a single-bond geometry to one C4- atom. In the second H+0.83+ site, H+0.83+ is bonded in a single-bond geometry to one C4- atom. In the third H+0.83+ site, H+0.83+ is bonded in a single-bond geometry to one C4- atom. In the fourth H+0.83+ site, H+0.83+ is bonded in a single-bond geometry to one C4- atom. In the fifth H+0.83+ site, H+0.83+ is bonded in a single-bond geometry to one C4- atom. In the sixth H+0.83+ site, H+0.83+ is bonded in a single-bond geometry to one C4- atom. In the seventh H+0.83+ site, H+0.83+ is bonded in a single-bond geometry to one C4- atom. In the eighth H+0.83+ site, H+0.83+ is bonded in a single-bond geometry to one C4- atom. In the ninth H+0.83+ site, H+0.83+ is bonded in a single-bond geometry to one C4- atom. In the tenth H+0.83+ site, H+0.83+ is bonded in a single-bond geometry to one C4- atom. In the eleventh H+0.83+ site, H+0.83+ is bonded in a single-bond geometry to one C4- atom. In the twelfth H+0.83+ site, H+0.83+ is bonded in a single-bond geometry to one C4- atom. In the thirteenth H+0.83+ site, H+0.83+ is bonded in a single-bond geometry to one C4- atom. In the fourteenth H+0.83+ site, H+0.83+ is bonded in a single-bond geometry to one C4- atom. In the fifteenth H+0.83+ site, H+0.83+ is bonded in a single-bond geometry to one C4- atom. In the sixteenth H+0.83+ site, H+0.83+ is bonded in a single-bond geometry to one C4- atom. In the seventeenth H+0.83+ site, H+0.83+ is bonded in a single-bond geometry to one C4- atom. In the eighteenth H+0.83+ site, H+0.83+ is bonded in a single-bond geometry to one C4- atom. In the nineteenth H+0.83+ site, H+0.83+ is bonded in a single-bond geometry to one C4- atom. In the twentieth H+0.83+ site, H+0.83+ is bonded in a single-bond geometry to one C4- atom. In the twenty-first H+0.83+ site, H+0.83+ is bonded in a single-bond geometry to one C4- atom. In the twenty-second H+0.83+ site, H+0.83+ is bonded in a single-bond geometry to one C4- atom. In the twenty-third H+0.83+ site, H+0.83+ is bonded in a single-bond geometry to one C4- atom. In the twenty-fourth H+0.83+ site, H+0.83+ is bonded in a single-bond geometry to one C4- atom. There are four inequivalent S2- sites. In the first S2- site, S2- is bonded in a water-like geometry to one Cu1+ and one P5+ atom. In the second S2- site, S2- is bonded in a water-like geometry to one Cu1+ and one P5+ atom. In the third S2- site, S2- is bonded in a water-like geometry to one Cu1+ and one P5+ atom. In the fourth S2- site, S2- is bonded in an L-shaped geometry to one Cu1+ and one P5+ atom.

  16. n

    Data for: Induction of C4 genes during de-etiolation of Gynandropsis...

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Feb 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pallavi Singh; Sean Stevenson; Julian Hibberd (2023). Data for: Induction of C4 genes during de-etiolation of Gynandropsis gynandra evolved through changes in cis allowing integration into ancestral C3 gene regulatory networks [Dataset]. http://doi.org/10.5061/dryad.sf7m0cgb2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 8, 2023
    Dataset provided by
    University of Cambridge
    Authors
    Pallavi Singh; Sean Stevenson; Julian Hibberd
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    C4 photosynthesis has evolved repeatedly and in doing so repurposed existing enzymes to drive a carbon pump that limits the oxygenation reaction of RuBisCO. C4 proteins accumulate to levels matching those of the photosynthetic apparatus, and to allow this gene expression must be modified over evolutionary time. To better understand this rewiring of gene expression we undertook RNA-SEQ and DNaseI-SEQ on de-etiolating seedlings of C4 Gynandropsis gynandra which is evolutionarily proximate to C3 A. thaliana. Changes in chloroplast ultrastructure and C4 gene expression in G. gynandra were coordinated and rapid. C3 and C4 photosynthesis genes showed similar induction patterns, but C4 genes from G. gynandra were more strongly induced than orthologs from A. thaliana. The cistrome of G. gynandra was enriched in TGA, TCP and homeodomain binding sites. Furthermore, in vivo binding data in G. gynandra highlighted TGA and homeodomain as well as light responsive elements such as G- and I-box motifs as being associated with the rapid increase in transcripts derived from C4 genes. Although promoters of PPDK and ASP1 from G. gynandra contained distinct light responsive elements, promoters from both A. thaliana and G. gynandra allowed high expression. Deletion analysis of the Ppa6 gene from G. gynandra showed that regions containing G- and I-boxes were necessary for high expression. The data support a model in which accumulation of transcripts derived from C4 genes in leaves of G. gynandra is enhanced compared with homologs in A. thaliana because a variety of modifications in cis allowed integration into ancestral transcriptional networks. Methods Gynandropsis gynandra seeds were sown directly from intact pods and germinated on moist filter papers in the dark at 32°C for 24 hours. Germinated seeds were then transferred to half strength Murashige and Skoog (MS) medium with 0.8% (w/v) agar (pH 5.8) and grown for three days in a growth chamber at 26°C. De-etiolation was induced by exposure to white light with a photon flux density (PFD) of 350 μmol m-2 s-1 and photoperiod of 16 hours. Whole seedlings were harvested at 0.5, 2, 4 and 24 hours after illumination (starting at 8:00 with light cycle 6:00 to 22:00). Tissue was flash frozen in liquid nitrogen and stored at -80°C prior to processing. RNA and DNaseI sequencing Before processing, frozen samples were divided into two, the first being used for RNA-SEQ analysis and the second for DNaseI-SEQ. Samples were ground in a mortar and pestle and RNA extraction carried out with the RNeasy Plant Mini Kit (74904; QIAGEN) according to the manufacturer’s instructions. RNA quality and integrity were assessed on a Bioanalyzer High Sensitivity DNA Chip (Agilent Technologies). Library preparation was performed with 500 ng of high integrity total RNA (RNA integrity number > 8) using the QuantSeq 3’ mRNA-SEQ Library Preparation Kit FWD for Illumina (Lexogen) following the manufacturer’s instructions. Library quantity and quality were checked using Qubit (Life Technologies) and a Bioanalyzer High Sensitivity DNA Chip (Agilent Technologies). Libraries were sequenced on NextSeq 500 (Illumina, Chesterford, UK) using single-end sequencing and a Mid Output 150 cycle run. To extract nuclei, tissue was ground in liquid nitrogen and incubated for five minutes in 15mM PIPES pH 6.5, 0.3 M sucrose, 1% (v/v) Triton X-100, 20 mM NaCl, 80 mM KCl, 0.1 mM EDTA, 0.25 mM spermidine, 0.25 g Polyvinylpyrrolidone (SIGMA), EDTA-free proteinase inhibitors (ROCHE), filtered through two layers of Miracloth (Millipore) and pelleted by centrifugation at 4°C for 15 min at 3600 g. To isolate deproteinated DNA, 100 mg of tissue from seedlings exposed to 24 hours light were harvested two hours into the light cycle, four days after germination. DNA was extracted using a QIAGEN DNeasy Plant Mini Kit (QIAGEN, UK) according to the manufacturer’s instructions. 2x108 nuclei were re-suspended at 4°C in digestion buffer (15 mM Tris-HCl, 90 mM NaCl, 60 mM KCl, 6 mM CaCl2, 0.5 mM spermidine, 1 mM EDTA and 0.5 mM EGTA, pH 8.0). DNase-I (Fermentas) at 2.5 U was added to each tube and incubated at 37 °C for three minutes. Digestion was arrested by adding a 1:1 volume of stop buffer (50 mM Tris-HCl, 100 mM NaCl, 0.1% (w/v) SDS, 100 mM EDTA, pH 8.0, 1 mM Spermidine, 0.3 mM Spermine, RNaseA40 µg/ml) and incubated at 55°C for 15 minutes. 50 U of Proteinase K were then added and samples incubated at 55°C for 1 h. DNA was isolated by mixing with 1 ml 25:24:1 Phenol:Chloroform:Isoamyl Alcohol (Ambion) and spun for 5 minutes at 15,700 g followed by ethanol precipitation of the aqueous phase. Samples were size-selected (50-400 bp) using agarose gel electrophoresis and quantified fluorometrically using a Qubit 3.0 Fluorometer (Life technologies), and a total of 10 ng of digested DNA (200 pg l-1) used for library construction. Sequencing ready libraries were prepared using a TruSeq Nano DNA library kit according to the manufacturer’s instructions. Quality of libraries was determined using a Bioanalyzer High Sensitivity DNA Chip (Agilent Technologies) and quantified by Qubit (Life Technologies) and qPCR using an NGS Library Quantification Kit (KAPA Biosystems) prior to normalisation, and then pooled, diluted and denatured for paired-end sequencing using High Output 150 cycle run (2x75 bp reads). Sequencing was performed using NextSeq 500 (Illumina, Chesterford UK) with 2x75 cycles of sequencing. RNA-SEQ data processing and quantification Commands used are available on GitHub (“command_line_steps”) but an outline of steps was as follows. Raw single ended reads were trimmed using trimmomatic (version 0.36). Trimmed reads were then quantified using salmon (version 0.4.234) after building an index file for a modified G. gynandra transcriptome. The transcriptome was modified to create a pseudo-3’ UTR sequence of 339 bp (the mean length of identified 3’UTRs) for G. gynandra gene models that lacked a 3’ UTR sequence which was essentially an extension beyond the stop codon of the open reading frame. Inclusion of this psuedo 3’ UTR improved mapping rates. Each sample was then quantified using the salmon “quant” tool. All *.sf files had the “NumReads” columns merged into a single file (All_read_counts.txt) to allow analysis with both DEseq2 and edgeR. The edgeR pipeline was run as the edgeR.R R script (here and on GitHub) on the All_read_counts.txt file to identify the significantly differentially expressed genes by comparing each time-point to the previous one. A low expression filter step was also used. We then similarly analysed the data with the DEseq2 package using the DEseq2.R R script (on GitHub) on the same All_read_counts.txt file. This also included the Principal Component Analysis shown in Fig. 2A. The intersection from both methods was used to identify a robust set of differentially regulated genes. For most further analysis of the RNA-SEQ data, mean TPM values for each time-point (from three biological replicates) was first quantile normalised and then each value divided by the sample mean such that a value was of 1 represented the average for that sample. This processing facilitated comparisons between experiments across species in identifying changes to transcript abundance between orthologs.

  17. o

    Data from: Complete genome analysis of the C4 subgenotype strains of...

    • omicsdi.org
    Updated Jul 11, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Complete genome analysis of the C4 subgenotype strains of enterovirus 71: predominant recombination C4 viruses persistently circulating in China for 14 years. [Dataset]. https://www.omicsdi.org/dataset/biostudies/S-EPMC3575343
    Explore at:
    Dataset updated
    Jul 11, 2023
    Variables measured
    Unknown
    Description

    Genetic recombination is a well-known phenomenon for enteroviruses. To investigate the genetic characterization and the potential recombination of enterovirus 71 (EV71) circulating in China, we determined the 16 complete genome sequences of EV71 isolated from Hand Foot Mouth Disease (HFMD) patients during the large scale outbreak and non-outbreak years since 1998 in China. The full length genome sequences of 16 Chinese EV71 in present study were aligned with 186 genome sequences of EV71 available from GenBank, including 104 China mainland and 82 international sequences, covering the time period of 1970-2011. The oldest strains of each subgenotype of EV71 and prototype strains of HEV-A were included to do the phylogenetic and Simplot analysis. Phylogenetic analysis indicated that all Chinese strains were clustered into C4 subgenotype of EV71, except for HuB/CHN/2009 clustered into A and Xiamen/CHN/2009 clustered into B5 subgenotype. Most of C4 EV71 were clustered into 2 predominant evolutionary branches: C4b and C4a evolutionary brunches. Our comprehensive recombination analysis showed the evidence of genome recombination of subgenotype C4 (including C4a and C4b) sequences between structural genes from genotype C EV71 and non-structural genes from the prototype strains of CAV16, 14 and 4, but the evidence of intratypic recombination between C4 strains and B subgenotype was not enough strong. This intertypic recombination C4 viruses were first seen in 1998 and became the predominant endemic viruses circulating in China mainland for at least 14 years. A shift between C4a and C4b evolutionary brunches of C4 recombination viruses were observed, and C4a viruses have been associated with large scale nationwide HFMD outbreak with higher morbidity and mortality since 2007.

  18. Global Mixed C4 (Crude C4) Market Future Outlook 2025-2032

    • statsndata.org
    excel, pdf
    Updated May 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stats N Data (2025). Global Mixed C4 (Crude C4) Market Future Outlook 2025-2032 [Dataset]. https://www.statsndata.org/report/mixed-c4-crude-c4-market-239082
    Explore at:
    pdf, excelAvailable download formats
    Dataset updated
    May 2025
    Dataset authored and provided by
    Stats N Data
    License

    https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order

    Area covered
    Global
    Description

    The Mixed C4 (Crude C4) market plays a pivotal role in the chemical industry, serving as a crucial feedstock for the production of various petrochemicals, including butadiene, isoprene, and other valuable derivatives. Mixed C4 is derived from the steam cracking of hydrocarbons, primarily ethane, propane, and naphtha

  19. Global C4 Raffinate Market Technological Advancements 2025-2032

    • statsndata.org
    excel, pdf
    Updated May 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stats N Data (2025). Global C4 Raffinate Market Technological Advancements 2025-2032 [Dataset]. https://www.statsndata.org/report/c4-raffinate-market-246638
    Explore at:
    pdf, excelAvailable download formats
    Dataset updated
    May 2025
    Dataset authored and provided by
    Stats N Data
    License

    https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order

    Area covered
    Global
    Description

    The C4 Raffinate market is a critical segment within the petrochemical industry, primarily utilized in the manufacture of a variety of valuable products, including synthetic rubber and chemicals essential for the production of plastics. Defined as a byproduct in the steam cracking process of naphtha or ethane, C4 Ra

  20. M

    Global Complement C4 Antibody Market Historical Impact Review 2025-2032

    • statsndata.org
    excel, pdf
    Updated Apr 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stats N Data (2025). Global Complement C4 Antibody Market Historical Impact Review 2025-2032 [Dataset]. https://www.statsndata.org/report/complement-c4-antibody-market-349380
    Explore at:
    pdf, excelAvailable download formats
    Dataset updated
    Apr 2025
    Dataset authored and provided by
    Stats N Data
    License

    https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order

    Area covered
    Global
    Description

    The Complement C4 Antibody market is a vital segment within the larger landscape of immunological research and clinical diagnostics, playing a crucial role in the study and treatment of various autoimmune disorders and inflammatory diseases. Complement C4, a component of the complement system, is essential for immun

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Colin Raffel; Noam Shazeer; Adam Roberts; Katherine Lee; Sharan Narang; Michael Matena; Yanqi Zhou; Wei Li; Peter J. Liu, C4 Dataset [Dataset]. https://paperswithcode.com/dataset/c4

C4 Dataset

Colossal Clean Crawled Corpus

Explore at:
6 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Dec 13, 2023
Authors
Colin Raffel; Noam Shazeer; Adam Roberts; Katherine Lee; Sharan Narang; Michael Matena; Yanqi Zhou; Wei Li; Peter J. Liu
Description

C4 is a colossal, cleaned version of Common Crawl's web crawl corpus. It was based on Common Crawl dataset: https://commoncrawl.org. It was used to train the T5 text-to-text Transformer models.

The dataset can be downloaded in a pre-processed form from allennlp.

Search
Clear search
Close search
Google apps
Main menu