Toggle light / dark theme

Topsicle: a method for estimating telomere length from whole genome long-read sequencing data

Long read sequencing technology (advanced by Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (Nanopore)) is revolutionizing the genomics field [43] and it has major potential to be a powerful computational tool for investigating the telomere length variation within populations and between species. Read length from long read sequencing platforms is orders of magnitude longer than short read sequencing platforms (tens of kilobase pairs versus 100–300 bp). These long reads have greatly aided in resolving the complex and highly repetitive regions of the genome [44], and near gapless genome assemblies (also known as telomere-to-telomere assembly) are generated for multiple organisms [45, 46]. The long read sequences can also be used for estimating telomere length, since whole genome sequencing using a long read sequencing platform would contain reads that span the entire telomere and subtelomere region. Computational methods can then be developed to determine the telomere–subtelomere boundary and use it to estimate the telomere length. As an example, telomere-to-telomere assemblies have been used for estimating telomere length by analyzing the sequences at the start and end of the gapless chromosome assembly [47,48,49,50]. But generating gapless genome assemblies is resource intensive and cannot be used for estimating the telomeres of multiple individuals. Alternatively, methods such as TLD [51], Telogator [52], and TeloNum [53] analyze raw long read sequences to estimate telomere lengths. These methods require a known telomere repeat sequence but this can be determined through k-mer based analysis [54]. Specialized methods have also been developed to concentrate long reads originating from chromosome ends. These methods involve attaching sequencing adapters that are complementary to the single-stranded 3′ G-overhang of the telomere, which can subsequently be used for selectively amplifying the chromosome ends for long read sequencing [55,56,57,58]. While these methods can enrich telomeric long reads, they require optimization of the protocol (e.g., designing the adapter sequence to target the G-overhang) and organisms with naturally blunt-ended telomeres [59, 60] would have difficulty implementing the methods.

An explosion of long read sequencing data has been generated for many organisms across the animal and plant kingdom [61, 62]. A computational method that can use this abundant long read sequencing data and estimate telomere length with minimal requirements can be a powerful toolkit for investigating the biology of telomere length variation. But so far, such a method is not available, and implementing one would require addressing two major algorithmic considerations before it can be widely used across many different organisms. The first algorithmic consideration is the ability to analyze the diverse telomere sequence variation across the tree of life. All vertebrates have an identical telomere repeat motif TTAGGG [63] and most previous long read sequencing based computational methods were largely designed for analyzing human genomic datasets where the algorithms are optimized on the TTAGGG telomere motif. But the telomere repeat motif is highly diverse across the animal and plant kingdom [64,65,66,67], and there are even species in fungi and plants that utilize a mix of repeat motifs, resulting in a sequence complex telomere structure [64, 68, 69]. A new computational method would need to accommodate the diverse telomere repeat motifs, especially across the inherently noisy and error-prone long read sequencing data [70]. With recent improvements in sequencing chemistry and technology (HiFi sequencing for PacBio and Q20 + Chemistry kit for Nanopore) error rates have been substantially reduced to 1% [71, 72]. But even with this low error rate, a telomeric region that is several kilobase pairs long can harbor substantial erroneous sequences across the read [73] and hinder the identification of the correct telomere–subtelomere boundary. In addition, long read sequencers are especially error-prone to repetitive homopolymer sequences [74,75,76], and the GT-rich microsatellite telomere sequences are predicted to be an especially erroneous region for long read sequencing. A second algorithmic consideration relates to identifying the telomere–subtelomere boundary. Prior long read sequencing based methods [51, 52] have used sliding windows to calculate summary statistics and a threshold to determine the boundary between the telomere and subtelomere. Sliding window and threshold based analyses are commonly used in genome analysis, but they place the burden on the user to determine the appropriate cutoff, which for telomere length measuring computational methods may differ depending on the sequenced organism. In addition, threshold based sliding window scans can inflate both false positive and false negative results [77,78,79,80,81,82] if the cutoff is improperly determined.

Here, we introduce Topsicle, a computational method that uses a novel strategy to estimate telomere lengths from raw long read sequences from the entire whole genome sequencing library. Methodologically, Topsicle iterates through different substring sizes of the telomere repeat sequence (i.e., telomere k-mer) and different phases of the telomere k-mer are used to summarize the telomere repeat content of each sequencing read. The k-mer based summary statistics of telomere repeats are then used for selecting long reads originating from telomeric regions. Topsicle uses those putative reads from the telomere region to estimate the telomere length by determining the telomere–subtelomere boundary through a binary segmentation change point detection analysis [83]. We demonstrate the high accuracy of Topsicle through simulations and apply our new method on long read sequencing datasets from three evolutionarily diverse plant species (A. thaliana, maize, and Mimulus) and human cancer cell lines. We believe using Topsicle will enable high-resolution explorations of telomere length for more species and achieve a broad understanding of the genetics and evolution underlying telomere length variation.

Generative AI Designs Synthetic Gene Editing Proteins Better than Nature

Researchers from Integra Therapeutics, in partnership with the Pompeu Fabra University (UPF) and the Centre for Genomic Regulation (CRG), Spain, have used generative AI to design synthetic proteins that outperform naturally occurring proteins used for editing the human genome. Their use of generative AI focused on PiggyBac transposases, naturally occurring enzymes that have long been used for gene delivery and genetic engineering, and uncovered more than 13,000 previously unidentified PiggyBac sequences. The research, published in Nature Biotechnology, has the potential to improve current gene editing tools for the creation of CAR T and gene therapies.

“Our work expands the phylogenetic tree of PiggyBac transposons by two orders of magnitude, unveiling a previously unexplored diversity within this family of mobile genetic elements,” the researchers wrote.

For their work, the researchers first conducted extensive computational bioprospecting, screening more than 31,000 eukaryotic genomes to uncover the 13,000 new sequences. From this number, the team was able to validate 10 active transposases, two of which showed similar activity to PiggyBac transposases currently used in both research and clinical settings.

Epigenetic shifts link maternal infection during pregnancy to higher risk of offspring developing schizophrenia

The health of mothers during pregnancy has long been known to play a role in the lifelong mental and physical health of offspring. Recent studies have found that contracting an infection during pregnancy can increase the risk that offspring will develop some neurodevelopmental disorders, conditions that are associated with the atypical maturation of some parts of the brain.

An infection is an invasion of pathogens, such as bacteria, viruses, fungi or parasites, which can then multiply and colonize host tissues. Findings suggest that when an expecting mother contracts an infection, her immune system can respond to it in ways that could impact the development of the fetus.

Researchers at University of Manchester and Manchester Metropolitan University recently carried out a study aimed at further investigating the processes through which maternal infections during pregnancy could increase the risk that offspring will develop schizophrenia later in life. Schizophrenia is a typically debilitating mental health condition characterized by hallucinations, false beliefs about oneself or the world (e.g., delusions) and cognitive impairments.

Synaptic changes in the brains of patients with frontotemporal dementia can be modeled in the laboratory

Neurons produced from frontotemporal dementia patients’ skin biopsies using modern stem cell technology recapitulate the synaptic loss and dysfunction detected in the patients’ brains, a new study from the University of Eastern Finland shows.

Frontotemporal dementia is a progressive neurodegenerative disease affecting the frontal and temporal lobes of the brain. The most common symptoms are , difficulties in understanding or producing speech, problems in movement, and psychiatric symptoms.

Often, has no identified genetic cause, but especially in Finnish patients, hexanucleotide repeat expansion in the C9orf72 gene is a common genetic cause, present in about half of the familial cases and in 20% of the sporadic cases where there is no family history of the disease.

Map of bacterial gene interactions uncovers targets for future antibiotics

Despite rapid advances in reading the genetic code of living organisms, scientists still face a major challenge today—knowing a gene’s sequence does not automatically reveal what it does. Even in simple, well-studied bacteria like Escherichia coli (better known as E. coli), about one-quarter of the genes have no known function. Traditional approaches—turning off one gene at a time and studying the effects—are slow, laborious, and sometimes inconclusive due to gene redundancy.

Researchers from the Yong Loo Lin School of Medicine, National University of Singapore (NUS Medicine) and collaborators from the University of California, Berkeley (UC Berkeley) have developed a new technique called Dual transposon sequencing (Dual Tn-seq), which allows for rapid identification of genetic interactions. It maps how bacterial genes work together, revealing vulnerabilities that could be targeted by future antibiotics.

“This is like mapping the social network for ,” said Assistant Professor Chris Sham Lok To from the Infectious Diseases Translational Research Program and the Department of Microbiology and Immunology, NUS Medicine, who led the study. “We can now see which genes depend on each other, and which pairs of genes bacteria can’t live without. That’s exactly the insight we need for next-generation antibiotics.”

Fine Particulate (PM2.5) Exposure Negatively Impacts Hallmarks Of Aging: What’s Optimal?

Join us on Patreon! https://www.patreon.com/MichaelLustgartenPhD

Discount Links/Affiliates:
Blood testing (where I get the majority of my labs): https://www.ultalabtests.com/partners/michaellustgarten.

At-Home Metabolomics: https://www.iollo.com?ref=michael-lustgarten.
Use Code: CONQUERAGING At Checkout.

Clearly Filtered Water Filter: https://get.aspr.app/SHoPY

Epigenetic, Telomere Testing: https://trudiagnostic.com/?irclickid=U-s3Ii2r7xyIU-LSYLyQdQ6…M0&irgwc=1
Use Code: CONQUERAGING

NAD+ Quantification: https://www.jinfiniti.com/intracellular-nad-test/

Association between carbohydrate intake and the risk of psoriasis: a prospective cohort study based on UK Biobank

Research on the association between carbohydrate intake and psoriasis risk is limited. We aimed to examine the associations of carbohydrate and its different subtypes with psoriasis risk, as well as the interaction between genetic predisposition and carbohydrate intake.

We performed a prospective cohort study based on UK Biobank that included 210,474 participants who did not have psoriasis at baseline. A 24-hour dietary assessment tool was used to assess detailed dietary intake information. Incident psoriasis events were identified through hospitalization records. The association between carbohydrate intake and psoriasis was examined by Cox proportional hazard regression models. Multiplicative interaction between genetic risk and carbohydrate intake was assessed by incorporating a cross-product term in the model.

A total of 1907 incident psoriasis events were recorded during the follow-up period (median: 13.25 years). Compared to the lowest intake quartile (Q1), the highest intake quartile (Q4) of total sugars


FDR-Ptrend = 0.116], free sugars [1.22 (1.07–1.38), 0.021], and sucrose [1.14 (1.01–1.30), 0.058] was associated with an increased psoriasis risk. In contrast, the highest intake of starch [0.86 (0.76–0.98), 0.049] and fiber [0.84 (0.74–0.96), 0.021] showed an inverse association with psoriasis risk. However, there was no statistically significant interaction between carbohydrate intake and genetic risk.

New Cas9 Enzymes Improve the Accuracy of CRISPR Prime Editing

The CRISPR gene editing system holds tremendous promise. It has already revolutionized biomedical research by making gene editing a straightforward process. It involves using a guide RNA molecule that has a unique sequence, which matches with a target location in genomic DNA. This guide RNA brings an enzyme called Cas9 to that genetic location, where Cas9 makes a cut in the DNA. Scientists have been modifying and improving on the CRISPR technique since it was created. Many of those improvements are related to the Cas9 enzyme, and ensuring that it makes the proper cut in the correct place.

Restoring order to dividing cancer cells may halt triple negative breast cancer spread

Triple negative breast cancer (TNBC) is one of the most aggressive and hardest forms of breast cancer to treat, but a new study led by Weill Cornell Medicine suggests a surprising way to stop it from spreading. Researchers have discovered that an enzyme called EZH2 drives TNBC cells to divide abnormally, which enables them to relocate to distant organs. The preclinical study also found drugs that block EZH2 could restore order to dividing cells and thwart the spread of TNBC cells.

“Metastasis is the main reason patients with triple negative breast cancer face poor survival odds,” said senior author Dr. Vivek Mittal, Ford-Isom Research Professor of Cardiothoracic Surgery and member of the Sandra and Edward Meyer Cancer Center at Weill Cornell Medicine. “Our study suggests a new therapeutic approach to block metastasis before it starts and help patients overcome this deadly cancer.”

The findings, published Oct. 2 in Cancer Discovery, challenge the popular notion that cancer treatments should boost cell division errors already occurring in beyond the breaking point to induce cell death. When normal cells divide, the chromosomes—DNA “packages” carrying genes—are duplicated and split evenly into two daughter cells. This process goes haywire in many cancer cells, leading to chromosomal instability: too many, too few, or jumbled chromosomes in multiple daughter cells.

Ancient viral DNA is essential for human embryo development, study shows

Our ancient past isn’t always buried history. When it comes to our DNA, nearly 9% of the human genome is made up of leftover genetic material from ancient viruses (called endogenous retroviruses or ERVs) that infected our ancestors millions of years ago and became permanently integrated into our genetic code. In a new study published in the journal Nature, scientists have demonstrated that one piece of this viral junk is essential for the earliest stages of human life.

/* */