关键词木质纤维素降解 解纤维梭菌 纤维小体 差异全转录组测序 动态操纵子
我们以梭菌作为研究对象是因为其含有自然界中最有效的纤维素降解装置-纤维小体。在本研究中我们通过全基因组测序及分析,还有在以不同复杂程度糖类包括单糖(葡萄糖,木糖),双糖(纤维二糖),多糖(纤维素,木聚糖)以及复杂多糖(玉米秸秆)为底物条件下的全转录组测序及分析,以及以纤维素和纤维素降解产物包括纤维二糖和葡萄糖为底物条件下的细胞外分泌蛋白测定和分析,呈现了一个常温产纤维小体的解纤维梭菌(Clostridium cellulolyticum ATCC 35319)的精确到单个碱基的"纤维素降解组"。我们发现编码以下四类蛋白的基因在纤维素降解组中得到富集,其中包括核心代谢功能,环境感应,基因调控和多糖代谢。与此同时,通过对解纤维梭菌中包含的所有147个碳水化合物活性酶(carbohydrate-active enzymes,简称CAZymes)编码基因的差异表达分析我们发现由48个CAZymes组成的"核心酶",这些酶是该菌降解包含有纤维素的底物所必须的。除"核心酶"之外,我们还发现一类由76个CAZymes组成的"附属酶",而这些酶是该菌特异的降解非纤维素底物所必须的。基因的协同表达分析表明碳代谢抑制(carbon catabolite repression,简称CCR)相关的调控子能够感受细菌细胞内糖酵解中间产物的丰度进而控制主要由纤维小体组分组成的"核心酶"的表达。然而11套双组分系统(two-component systems,简称TCS)调节子能够对细菌细胞外可利用的可溶性糖做出反应,进而特异性的调节相对应的"附属酶"和对应的转运子的表达。令人惊奇的是,在以葡萄糖为单一底物条件下,核心纤维素酶在转录水平和蛋白水平都会高表达。此外,葡萄糖会以浓度依赖型的方式提高细菌的纤维素的降解能力,具体表现为在低浓度葡萄糖下诱导纤维素酶的转录,而在高浓度葡萄糖下促进细菌细胞生长。基于以上结论,我们提出了一个解纤维梭菌降解纤维素的分子模型,鉴别了该菌中底物特异性的CAZymes,揭示了由碳代谢抑制机制所介导的对核心纤维素酶的转录调控,并证实了在这个调控中,葡萄糖是作为一个碳代谢抑制的抑制剂而不是诱导剂。这些特性代表了一种独特的环境感应策略,即在竞争的同时协作利用纤维素,这一策略可以被用于微生物纤维素降解的工艺和遗传工程改造。

依据以上研究,我们发现在解纤维梭菌中,存在由碳代谢抑制(CCR)和双组分系统(TCS)组成的纤维素降解组的调节模式,该层面的调控作用于操纵子之间,控制不同操纵子的差异化转录。但通过对解纤维梭菌基因组研究我们发现解纤维梭菌纤维小体酶基因倾向于在染色体上聚集,其中包括经典的"cip-cel" 纤维小体纤维素酶基因簇(Ccel_0728-0739)和一个包含14个纤维小体酶基因簇(Ccel_1229-1242)。转录组研究表明编码纤维小体的三个高表达的重要组件(纤维小体支架蛋白CipC,外切葡聚糖酶Cel48F和内切葡聚糖酶Cel9E)都位于"cip-cel"基因簇内并且这个基因簇是呈多顺反子操纵子形式转录,所以了解细菌如何在操纵子内部调控从而控制构成纤维小体的关键亚基的相对丰度对全面的认识细菌如何降解纤维素具有至关重要的意义。
依据以上研究,我们发现在解纤维梭菌中,存在由碳代谢抑制(CCR)和双组分系统(TCS)组成的纤维素降解组的调节模式,该层面的调控作用于操纵子之间,控制不同操纵子的差异化转录。但通过对解纤维梭菌基因组研究我们发现解纤维梭菌纤维小体酶基因倾向于在染色体上聚集,其中包括经典的“cip-cel” 纤维小体纤维素酶基因簇(Ccel_0728-0739)和一个包含14个纤维小体酶基因簇(Ccel_1229-1242)。转录组研究表明编码纤维小体的三个高表达的重要组件(纤维小体支架蛋白CipC,外切葡聚糖酶Cel48F和内切葡聚糖酶Cel9E)都位于“cip-cel”基因簇内并且这个基因簇是呈多顺反子操纵子形式转录,所以了解细菌如何在操纵子内部调控从而控制构成纤维小体的关键亚基的相对丰度对全面的认识细菌如何降解纤维素具有至关重要的意义。
除纤维小体以外,细菌中很多蛋白复合体都由许多的结构组件或酶学组件构成,这些组件之间的相对丰度对于蛋白复合体行使功能具有至关重要的意义。为了探究细菌如何调控这些蛋白复合体组件的化学计量数,我们首次运用RNA-Seq测量了解纤维梭菌ATCC 35319在不同的碳源底物条件下的转录本的丰度和结构,发现在解纤维梭菌基因组上,有182个操纵子内包含的基因在发生转录时表现出"复杂型"的转录水平比率,即为"动态极性操纵子"(dynamic polarity operon,简称DPO)。这种操纵子的"动态极性"特性很精确的决定了纤维小体基因簇"cip-cel"内12个基因的相对转录丰度比率为389:417:19:22:128:7:5:1:1:1:1:1:2:6,这一比率在我们所测试的以单糖,双糖或多糖为底物条件下都是几乎稳定不变的,并且与蛋白的丰度比率相一致,这些结果都被遗传学基因敲除实验所验证,表明细菌对纤维小体组分化学计量数的严密调控。为了了解造成操纵子动态极性的原因,我们通过一种可以富集初级转录本5'末端的差异测序方法(differential RNA-Seq,简称dRNA-Seq)描绘了该菌的全基因组转录起始位点(transcriptional start-site,简称TS)和转录后加工位点(post-transcriptional processed site,简称PS)图谱。通过转录起始位点图谱我们发现了全基因组范围内的σ因子的结合位点,揭示了基于σ因子的精细调控网络。此外,通过转录起始位点图谱和转录后加工位点图谱,我们发现了在181个动态极性操纵子中,54个是由操纵子内部的转录起始位点和转录后加工位点引起的。体内和体外实验我们均验证了引起纤维小体基因簇"cip-cel"内12个基因的不同表达量的转录起始位点和转录后加工位点。通过纤维小体基因簇"cip-cel"在6个梭菌中的直系同源基因簇的分析揭示了操纵子的"动态极性"是一种进化上保守的转录/转录后调控机制,这种机制可以控制纤维小体的复杂性和可塑性,并且在某些梭菌中通过调节操纵子内部转录起始位点和转录后加工位点所引导的转录本的结构和丰度,创造了一个复杂而精确的纤维小体组分配方。我们的发现对体内纤维小体的设计和工程改造具有重要意义。

为了支持以上两个研究,我开发了一个基于Perl语言的细菌RNA-Seq分析流程BTA,用于高通量测序转录组数据集的数据质量控制和表达水平计算。这个流程包含一系列模块,可以执行诸如计算基因表达值,生成比对读段的信号轨迹和鉴定活跃转录区域等任务。它也可以被用于鉴定转录本结构。这个工具应该有广泛的用户基础,因为它可以被用于分析用户自己的数据集或来自ArrayExpress Archive的公共RNA-seq数据集。
其他摘要Lignocellulosic biomass is the most abundant biopolymers on earth, yet recalcitrance to hydrolysis has hampered its exploitation for renewable bioenergy and biomaterials. Many bacteria efficiently degrade lignocellulose yet the underpinning genome-wide metabolic and regulatory networks remained elusive. Identifying genetic components of the degradome of cellulolytic bacteria and elucidating how their activities are organized and regulated in vivo should form the basis for developing natural or engineered cellulases and their host cells for efficient production of cellulose-based biofuels. My thesis consists of three components.
(1) Here we revealed the “cellulose degradome” for the model mesophilic cellulolytic bacterium Clostridium cellulolyticum ATCC 35319, via an integrated analysis of its complete genome, its transcriptomes under glucose, xylose, cellobiose, cellulose, xylan or corn stover and its extracellular proteomes under glucose, cellobiose or cellulose. Proteins for core metabolic functions, environment sensing, gene regulation and polysaccharide metabolism were enriched in the cellulose degradome. Analysis of differentially expressed genes revealed a “core” set of 48 CAZymes required for degrading cellulose-containing substrates as well as an “accessory” set of 76 CAZymes required for degrading specific non-cellulose substrates. Gene co-expression analysis suggested that Carbon Catabolite Repression (CCR) related regulators sensed intracellular glycolytic intermediates and controlled the core CAZymes that mainly included cellulosomal components, whereas 11 sets of Two-Component Systems (TCSs) responded to availability of extracellular soluble sugars and respectively regulated most of the accessory CAZymes and associated transporters. Surprisingly, under glucose alone, the core cellulases were highly expressed at both transcript and protein levels. Furthermore, glucose enhanced cellulolysis in a dose-dependent manner, via inducing cellulase transcription at low concentrations. Therefore,a molecular model of cellulose degradome in Ccel was proposed, which revealed the substrate-specificity of CAZymes and the transcriptional regulation of core cellulases by CCR where the glucose acts as a CCR inhibitor instead of a trigger. These features represented a distinct environment-sensing strategy for competing while collaborating for cellulose utilization, which can be exploited for process and genetic engineering of microbial cellulolysis.
(2) The study above allowed us to understand the regulation of cellulose degradome in Ccel. This regulation consists of Carbon Catabolite Repression (CCR) and Two-component systems (TCSs), which operates at the inter-operon level and controls the differential transcription among operons. However, intra-operon regulatory mechanism of cellulase genes is not well understood. Our genome analysis showed that the cellulosomal genes in Ccel tend to physically cluster along the chromosome. We identified many cellulosomal gene clusters such as the “cip-cel” gene cluster (Ccel_0728-0740) that encodes the major cellulosome components (including scaffoldin) and another cluster of 14 genes (Ccel_1229-1242) encoding exclusively secreted dockerin-containing proteins, which are probably involved in hemicellulose degradation and herein named the “xyl-doc” gene cluster. Therefore, it’s important to understand the intra-operon regulation of the cellulase genes, in particular, how the relative abundance of core cellulosomal subunits were controlled in vivo.
Except for cellulosome, there are many protein machineries in bacteria which consist of many structural and enzymatic components whose relative abundance can be crucial for function. We found in Clostridium cellulolyticum that, in 182 “Dynamic Polarity Operons” (DPOs), genes that were tandem arranged inside an operon exhibited “complex” ratio of normalized transcriptional level (NTL). In the “cip-cel” cluster, the NTL ratio of the twelve genes encoding cellulosomal components, at 389:417:19:22:128:7:5:1:1:1:1:1:2:6, was largely stable under different carbohydrates and was consistent with the protein-abundance ratio. To understand the causes of the DPOs and the forces precisely controlling the stoichiometry in vivo, we mapped genome-wide transcriptional start-sites (TSs) and post-transcriptional processed sites (PSs) of transcripts via “differential mRNA-sequencing” (dRNA-Seq). The TS-Map uncovered genome-wide σ-factor binding sites and revealed the fine structure of a global σ-factor regulatory network. Intra-operon TSs and PSs underlie the NTL ratio that characterized 54 of the DPOs. In vivo and in vitro validation of predicted TSs and PSs in the cip-cel cluster suggested they underlie the differential NTL of the 12 genes in the cluster. Analysis of orthologous loci in six Clostridia species suggested DPO as an evolutionarily conserved mechanism that regulates the complexity and plasticity of cellulosome and created, in certain clostridia, an intricate yet precise recipe of cellulosomal components by modulating structure and abundance of intra-operon transcripts as guided by inter-operon TSs and PSs. Our findings have implications in the design and engineering of cellulosomes in vivo.
(3) Increasing sequencing capacity has made it possible to explore the bacterial transcriptome to an unprecedented depth, which has revealed that bacterial transcriptomes are more complex and dynamic than expected. To support the previous two studies, I have developed a Perl-based bacterial RNA-Seq analysis pipeline, BTA, for data quality control and expression level calculation of high-throughput sequencing-based transcriptional profiling datasets. This pipeline consists of a set of modules that perform tasks such as calculating gene expression values, generating signal tracks of mapped reads and identifying actively transcribed regions. It can also be used to identify transcript structure. This tool should have a wide user base, because it can be used to analyze user’s own datasets or public RNA-seq datasets from the ArrayExpress Archive.
黄冉冉. 解纤维梭菌的纤维素降解机制研究[D]. 北京. 中国科学院研究生院,2013.
