Accessing curated gene expression data with GemmaPy =================================================== .. | *Dima Vavilov*, *Guillaume Poirier-Morency*, *B. Ogan Mancarci*, *Javier Castillo-Arnemann*, *Jordan Sicherman* | *Michael Smith Laboratories, University of British Columbia, Vancouver, Canada* About Gemma ----------- `Gemma `_ is a web site, database and a set of tools for the meta-analysis, re-use and sharing of genomics data, currently primarily targeted at the analysis of gene expression profiles. Gemma contains data from thousands of public studies, referencing thousands of published papers. Every dataset in Gemma has passed a rigorous curation process that re-annotates the expression platform at the sequence level, which allows for more consistent cross-platform comparisons and meta-analyses. For detailed information on the curation process, read this `page `_ or the latest `publication `_. Installation instructions ------------------------- .. include:: install.rst Additional packages ------------------- For the purpose of making plots in this tutorial the following packages should be installed and imported: :code:`matplotlib`, :code:`plotnine`, :code:`pandas`, :code:`seaborn`, :code:`statsmodels`. Searching for datasets of interest in Gemma ------------------------------------------- Using the :py:func:`~gemmapy.GemmaPy.get_datasets` function, datasets fitting various criteria can be accessed. >>> import gemmapy >>> api = gemmapy.GemmaPy() >>> # accessing all mouse and human datasets >>> api.get_datasets(taxa = ['mouse','human']).head() experiment_short_name ... taxon_database_ID 0 GSE2018 ... 87 1 GSE4523 ... 81 2 GSE4036 ... 87 3 GSE4034 ... 81 4 GSE2866 ... 81 [5 rows x 23 columns] >>> # accessing human datasets with the word "bipolar" >>> api.get_datasets(query = 'bipolar', taxa = ['human']).head() experiment_short_name ... taxon_database_ID 0 GSE157509 ... 87 1 GSE66196 ... 87 2 GSE210064 ... 87 3 GSE23848 ... 87 4 McLean Hippocampus ... 87 [5 rows x 23 columns] >>> # accessing human datasets that were annotated with the ontology term for >>> # the bipolar disorder. use search_annotations function to search for available >>> # annotation terms >>> api.get_datasets(taxa = ['human'], ... uris = ['http://purl.obolibrary.org/obo/MONDO_0004985']).head() experiment_short_name ... taxon_database_ID 0 GSE5389 ... 87 1 GSE5388 ... 87 2 GSE7036 ... 87 3 McLean Hippocampus ... 87 4 McLean_PFC ... 87 [5 rows x 23 columns] get_dataset function also includes a filter parameter that allows filtering for datasets with specific properties in a more structured manner. A list of the available properties can be accessed using :py:func:`~gemmapy.GemmaPy.filter_properties`. >>> api.filter_properties()['dataset'].head() name type description 0 accession.accession string NaN 1 accession.accessionVersion string NaN 2 accession.externalDatabase.ftpUri string NaN 3 accession.externalDatabase.id integer NaN 4 accession.externalDatabase.lastUpdated string NaN These properties can be used together to fine tune your results >>> # access human datasets that has bipolar disorder as an experimental factor >>> api.get_datasets(taxa = ["human"], ... filter = "experimentalDesign.experimentalFactors.factorValues.characteristics.valueUri = http://purl.obolibrary.org/obo/MONDO_0004985").head() experiment_short_name ... taxon_database_ID 0 GSE5389 ... 87 1 GSE5388 ... 87 2 GSE7036 ... 87 3 McLean_PFC ... 87 4 stanley_feinberg ... 87 [5 rows x 23 columns] >>> # all datasets with more than 4 samples annotated for any disease >>> api.get_datasets(filter = "bioAssayCount > 4 and allCharacteristics.category = disease").head() experiment_short_name ... taxon_database_ID 0 GSE2018 ... 87 1 GSE4036 ... 87 2 GSE2866 ... 81 3 GSE2426 ... 81 4 GSE2867 ... 81 [5 rows x 23 columns] >>> # all datasets with ontology terms for Alzheimer's disease and Parkinson's disease >>> # this is equivalent to using the uris parameter >>> api.get_datasets(filter = 'allCharacteristics.valueUri in (http://purl.obolibrary.org/obo/MONDO_0004975,http://purl.obolibrary.org/obo/MONDO_0005180)').head() experiment_short_name ... taxon_database_ID 0 GSE1837 ... 86 1 GSE30 ... 81 2 GSE4788 ... 81 3 GSE1157 ... 86 4 GSE1555 ... 81 [5 rows x 23 columns] Note that a single call of these functions will only return 20 results by default and a 100 results maximum, controlled by the limit argument. In order to get all available results, use :py:func:`~gemmapy.GemmaPy.get_all_pages` >>> api.get_all_pages(api.get_datasets,taxa = ['human']) experiment_short_name ... taxon_database_ID 0 GSE2018 ... 87 1 GSE4036 ... 87 2 GSE3489 ... 87 3 GSE1923 ... 87 4 GSE361 ... 87 ... ... ... 7697 GSE72747 ... 87 7698 GSE976 ... 87 7699 GSE78083 ... 87 7700 GSE11142.2 ... 87 7701 GSE2489.2 ... 87 [7702 rows x 23 columns] See `Larger queries`_ section for more details. To keep this vignette simpler we will keep using the first 20 results returned by default in examples below. Dataset information provided by get_datasets also includes some quality information that can be used to determine the suitability of any given experiment. For instance experiment.batchEffect column will be set to -1 if Gemma’s preprocessing has detected batch effects that were unable to be resolved by batch correction. More information about these and other fields can be found at the function documentation. >>> df = api.get_datasets(taxa = ['human'],filter = 'bioAssayCount > 4') >>> df.loc[df.experiment_batch_effect != -1].head() Gemma uses multiple ontologies when annotating datasets and using the term URIs instead of free text to search can lead to more specific results. :py:func:`~gemmapy.GemmaPy.search_annotations` function allows searching for annotation terms that might be relevant to your query. >>> api.search_annotations(['bipolar']).head() category_name ... value_URI 0 NaN ... http://www.ebi.ac.uk/efo/EFO_0009963 1 NaN ... http://www.ebi.ac.uk/efo/EFO_0009964 2 NaN ... http://purl.obolibrary.org/obo/MONDO_0004985 3 NaN ... http://purl.obolibrary.org/obo/HP_0007302 4 NaN ... http://purl.obolibrary.org/obo/NBO_0000258 [5 rows x 4 columns] Downloading expression data --------------------------- Upon identifying datasets of interest, more information about specific ones can be requested. In this example we will be using GSE46416 which includes samples taken from healthy donors along with manic/euthymic phase bipolar disorder patients. The data associated with specific experiments can be accessed by using :py:func:`~gemmapy.GemmaPy.get_datasets_by_ids`. >>> data = api.get_datasets_by_ids(['GSE46416']) >>> print(data) experiment_short_name ... taxon_database_ID 0 GSE46416 ... 87 [1 rows x 23 columns] To access the expression data in a convenient form, you can use :py:func:`~gemmapy.GemmaPy.get_dataset_object`. It is a high-level wrapper that combines various endpoint calls to return `anndata `_ (Annotated Data) objects or dictionaries. These include the expression matrix along with the experimental design, and ensure the sample names match between both when transforming/subsetting data. >>> adata = api.get_dataset_object(["GSE46416"])['8997'] # keys of the output uses Gemma IDs >>> print(adata) AnnData object with n_obs × n_vars = 18758 × 32 obs: 'GeneSymbol', 'NCBIid' var: 'factor_values', 'disease', 'block' uns: 'title', 'abstract', 'url', 'database', 'accesion', 'GemmaQualityScore', 'GemmaSuitabilityScore', 'taxon' To show how subsetting works, we'll keep the "manic phase" data and the :code:`reference_subject_role`\s, which refers to the control samples in Gemma datasets. >>> # Check the levels of the disease factor >>> adata.var['disease'].unique() array(['bipolar disorder has_modifier euthymic phase', 'reference subject role', 'bipolar disorder has_modifier manic phase'], dtype=object) >>> # Subset patients during manic phase and controls >>> manic=adata[:,(adata.var['disease'] == 'reference subject role') | ... (adata.var['disease'] == 'bipolar disorder has_modifier manic phase')].copy() >>> manic AnnData object with n_obs × n_vars = 18758 × 21 obs: 'GeneSymbol', 'NCBIid' var: 'factor_values', 'disease', 'block' uns: 'title', 'abstract', 'url', 'database', 'accesion', 'GemmaQualityScore', 'GemmaSuitabilityScore', 'taxon' >>> manic.var.head() factor_values ... block Control, 15 category ... factor_categ... ... Batch_02_26/11/09 Control, 8 category ... factor_categ... ... Batch_01_25/11/09 Bipolar disorder patient manic phase, 21 category ... factor_categ... ... Batch_03_27/11/09 Bipolar disorder patient manic phase, 18 category ... factor_categ... ... Batch_02_26/11/09 Bipolar disorder patient manic phase, 29 category ... factor_categ... ... Batch_03_27/11/09 [5 rows x 3 columns] Let’s take a look at sample to sample correlation in our subset. >>> # get expression data frame >>> import pandas as pd >>> import seaborn as sns >>> df = pd.DataFrame(manic.X) >>> df.columns = manic.var.index >>> corrs = df.corr() >>> plt = sns.clustermap(corrs) >>> plt.savefig('ded.png') .. image:: _static/ded.png :align: left :width: 100% Sample to sample correlations of bipolar patients during manic phase and controls. You can also use :py:func:`~gemmapy.GemmaPy.get_dataset_processed_expression` to only get the expression matrix, and :py:func:`~gemmapy.GemmaPy.get_dataset_samples` to get the metadata information. The output of this function includes some additional details about a sample such as the original accession ID or whether or not it was determined to be an outlier but it can be simplified to match the design table included in the output of get_dataset_object by using :py:func:`~gemmapy.GemmaPy.make_design` on the output. >>> api.make_design(api.get_dataset_samples('GSE46416')).drop(columns="factor_values").head() disease block sample_name Bipolar disorder patient euthymic phase, 17 bipolar disorder has_modifier euthymic phase Batch_02_26/11/09 Bipolar disorder patient euthymic phase, 34 bipolar disorder has_modifier euthymic phase Batch_04_02/12/09 Control, 15 reference subject role Batch_02_26/11/09 Bipolar disorder patient euthymic phase, 32 bipolar disorder has_modifier euthymic phase Batch_04_02/12/09 Control, 8 reference subject role Batch_01_25/11/09 Platform Annotations -------------------- Expression data in Gemma comes with annotations for the gene each expression profile corresponds to. Using the :py:func:`~gemmapy.GemmaPy.get_platform_annotations` function, these annotations can be retrieved independently of the expression data, along with additional annotations such as Gene Ontology terms. Examples: >>> api.get_platform_annotations('GPL96').head() ProbeName GeneSymbols ... GemmaIDs NCBIids 0 211750_x_at TUBA1C|TUBA1A ... 360802|172797 84790|7846 1 216678_at NaN ... NaN NaN 2 216345_at ZSWIM8 ... 235733 23053 3 207273_at NaN ... NaN NaN 4 216025_x_at CYP2C9 ... 32964 1559 [5 rows x 6 columns] >>> api.get_platform_annotations('Generic_human_ncbiIds').head() >>> with pandas.option_context('display.max_rows', None, 'display.max_columns', None): print(api_response[:6]) ElementName GeneSymbols ... GemmaIDs NCBIids 0 55236 UBA6 ... 295849.0 55236.0 1 79664 ICE2 ... 336840.0 79664.0 2 100126270 FMR1-AS1 ... 3157248.0 100126270.0 3 105373684 LINC01818 ... 9235895.0 105373684.0 4 124900245 LOC124900245 ... 10578422.0 124900245.0 [5 rows x 6 columns] If you are interested in a particular gene, you can see which platforms include it using :py:func:`~gemmapy.GemmaPy.get_gene_probes`. Note that functions to search gene work best with unambigious identifiers rather than symbols. >>> # lists genes in gemma matching the symbol or identifier >>> api.get_genes(['Eno2']) gene_symbol gene_ensembl ... taxon_database_name taxon_database_ID 0 ENO2 ENSG00000111674 ... hg38 87 1 Eno2 ENSMUSG00000004267 ... mm10 81 2 Eno2 ENSRNOG00000013141 ... rn6 86 [3 rows x 12 columns] >>> # ncbi id for human ENO2 >>> api.get_gene_probes("ENSG00000111674").head() element_name ... taxon_database_ID 0 201313_at ... 87 1 201313_at ... 87 2 40193_at ... 87 3 1639 ... 87 4 6621 ... 87 [5 rows x 14 columns] Differential expression analyses -------------------------------- Gemma contains precomputed differential expression analyses for most of its datasets. Analyses can involve more than one factor, such as "sex" as well as "disease". Some datasets contain more than one analysis to account for different factors and their interactions. The results are stored as resultSets, each corresponding to one factor (or their interaction). You can access them using :py:func:`~gemmapy.GemmaPy.get_differential_expression_values`. From here on, we can explore and visualize the data to find the most differentially-expressed genes. Note that :py:func:`~gemmapy.GemmaPy.get_differential_expression_values` can return multiple differentials per study if a study has multiple factors to contrast. Since GSE46416 only has one extracting the first element of the returned list is all we need. >>> dif_exp = api.get_differential_expression_values('GSE46416') >>> dif_exp[list(dif_exp.keys())[0]] {550248: Probe NCBIid ... contrast_113005_tstat contrast_113005_pvalue 0 2982730 4018 ... -0.3622 0.7196 1 2787851 166752 ... 0.7495 0.4590 2 2477558 NaN ... 1.2604 0.2166 3 2910917 NaN ... 0.9032 0.3731 4 3983537 140886 ... 1.7656 0.0869 ... ... ... ... ... 21956 3301011 64318 ... -1.6208 0.1148 21957 2461654 100130249 ... -0.2045 0.8393 21958 2360346 1141 ... 3.1278 0.0037 21959 2391172 7293 ... 1.7369 0.0920 21960 2525718 NaN ... -0.7101 0.4827 [21961 rows x 13 columns]} By default the columns names of the output correspond to contrast IDs. To see what conditions these IDs correspond to we can either use :py:func:`~gemmapy.GemmaPy.get_dataset_differential_expression_analyses` to get the metadata about differentials of a given dataset, or :code:`readable_contrasts` argument of :py:func:`~gemmapy.GemmaPy.get_differential_expression_values` to :code:`True`. The former approach is usually better for a large scale systematic analysis while the latter is easier to read in an interactive session. :py:func:`~gemmapy.GemmaPy.get_dataset_differential_expression_analyses` function returns metadata about the differentials. >>> contrasts = api.get_dataset_differential_expression_analyses("GSE46416") >>> contrasts result_ID contrast_ID ... probes_analyzed genes_analyzed 0 550248 113004 ... 21961 18959 1 550248 113005 ... 21961 18959 [2 rows x 12 columns] :code:`contrast_ID` column corresponds to the column names in the output of :py:func:`~gemmapy.GemmaPy.get_differential_expression_values` while :code:`result_ID` corresponds to the name of the differential in the output object. Using them together will let one to access differentially expressed gene counts for each condition contrast >>> import statsmodels.stats.multitest as multi >>> dif_exp_genes = [ ... sum(multi.multipletests( ... dif_exp[contrasts.loc[i,].result_ID][ ... "contrast_" + contrasts.loc[i,].contrast_ID + "_pvalue" ... ], ... alpha = 0.05, method = "fdr_bh")[0]) for i in range(contrasts.shape[0])] >>> contrasts.assign(n_diff= dif_exp_genes) >>> contrasts result_ID contrast_ID ... probes_analyzed genes_analyzed 0 550248 113004 ... 21961 18959 1 550248 113005 ... 21961 18959 [2 rows x 12 columns] >>> # baseline factors of the contrasts >>> [x.summary for x in contrasts.baseline_factors] [0 reference subject role Name: summary, dtype: object, 0 reference subject role Name: summary, dtype: object] >>> # experimental factors of the contrasts >>> [x.summary for x in contrasts.experimental_factors] [0 bipolar disorder has_modifier manic phase Name: summary, dtype: object, 0 bipolar disorder has_modifier euthymic phase Name: summary, dtype: object] Alternatively we, since we are only looking at one dataset and one contrast manually, we can simply use readable_contrasts. >>> de = api.get_differential_expression_values('GSE46416',readable_contrasts = True) >>> de = de[list(de.keys())[0]] >>> print(de) Probe ... contrast_bipolar disorder has_modifier euthymic phase_pvalue 0 2982730 ... 0.7196 1 2787851 ... 0.4590 2 2477558 ... 0.2166 3 2910917 ... 0.3731 4 3983537 ... 0.0869 ... ... ... 21956 3301011 ... 0.1148 21957 2461654 ... 0.8393 21958 2360346 ... 0.0037 21959 2391172 ... 0.0920 21960 2525718 ... 0.4827 [21961 rows x 13 columns] >>> # Classify probes for plotting >>> de['diffexpr'] = 'No' # add extra column >>> de.loc[(de['contrast_bipolar disorder has_modifier manic phase_logFoldChange'] > 1.0) & ... (de['contrast_bipolar disorder has_modifier manic phase_pvalue'] < 0.05),'diffexpr'] = 'Up' >>> de.loc[(de['contrast_bipolar disorder has_modifier manic phase_logFoldChange'] < -1.0) & ... (de['contrast_bipolar disorder has_modifier manic phase_pvalue'] < 0.05),'diffexpr'] = 'Down' >>> # Upregulated probes >>> de_up = de[de['diffexpr']=='Up'] >>> de_up = de_up[['Probe','GeneSymbol', 'contrast_bipolar disorder has_modifier manic phase_pvalue', ... 'contrast_bipolar disorder has_modifier manic phase_logFoldChange']].sort_values( ... 'contrast_bipolar disorder has_modifier manic phase_pvalue') >>> with pd.option_context('display.max_rows', None, 'display.max_columns', None): ... print(de_up[:10]) Probe GeneSymbol \ 18835 2319550 RBP7 4913 2548699 CYP1B1 11877 3907190 SLPI 6917 3629103 PCLAF 6188 3545525 SLIRP 2065 3146433 COX6C 407 2899102 H3C3 4839 2538349 NaN 18009 3635198 BCL2A1 18588 2633191 GPR15 contrast_bipolar disorder has_modifier manic phase_pvalue \ 18835 0.000086 4913 0.000100 11877 0.000300 6917 0.000500 6188 0.000600 2065 0.000900 407 0.001300 4839 0.001300 18009 0.001800 18588 0.002400 contrast_bipolar disorder has_modifier manic phase_logFoldChange 18835 1.0740 4913 1.3225 11877 1.0558 6917 1.2783 6188 1.3490 2065 1.4670 407 1.0260 4839 1.0731 18009 1.0798 18588 1.2046 >>> # Downregulated probes >>> de_dn = de[de['diffexpr']=='Down'] >>> de_dn = de_dn[['Probe','GeneSymbol', 'contrast_bipolar disorder has_modifier manic phase_pvalue', ... 'contrast_bipolar disorder has_modifier manic phase_logFoldChange']].sort_values( ... 'contrast_bipolar disorder has_modifier manic phase_pvalue') >>> with pd.option_context('display.max_rows', None, 'display.max_columns', None): ... print(de_dn[:10]) Probe GeneSymbol \ 18856 2775390 NaN 5641 3760268 NaN 18194 3124344 NaN 15046 3022689 SND1-IT1 10623 3245871 WDFY4 1742 3673179 NaN 9240 2679014 NaN 15821 3767230 NaN 9879 2880957 NaN 526 3336402 RBM14 contrast_bipolar disorder has_modifier manic phase_pvalue \ 18856 0.000002 5641 0.000012 18194 0.000100 15046 0.000200 10623 0.000200 1742 0.000200 9240 0.000300 15821 0.000400 9879 0.000400 526 0.000400 contrast_bipolar disorder has_modifier manic phase_logFoldChange 18856 -1.5558 5641 -1.8506 18194 -1.0370 15046 -1.2199 10623 -1.1569 1742 -1.0340 9240 -1.1752 15821 -1.0320 9879 -1.1755 526 -1.0711 >>> # Add gene symbols as labels to DE genes >>> de['delabel'] = '' >>> de.loc[de['diffexpr']!='No','delabel'] = de.loc[de['diffexpr']!='No','GeneSymbol'] >>> # Volcano plot for bipolar patients vs controls >>> de['-log10(p-value)'] = -np.log10(de['contrast_bipolar disorder has_modifier manic phase_pvalue']) >>> from plotnine import * >>> plot=(ggplot(de) ... +aes( ... x='contrast_bipolar disorder has_modifier manic phase_logFoldChange', ... y='-log10(p-value)', ... color='diffexpr', ... labels='delabel' ... ) ... +geom_point() ... +geom_hline(yintercept = -np.log10(0.05), color = "gray", linetype = "dashed") ... +geom_vline(xintercept = (-1.0, 1.0), color = "gray", linetype = "dashed") ... +labs(x = "log2(FoldChange)", y = "-log10(p-value)") ... +scale_color_manual(values = ("blue", "black", "red"))+theme_minimal()) ... >>> plot.save('dea.png', height=6, width=10) .. image:: _static/dea.png Differentially-expressed genes in bipolar patients during manic phase versus controls. Larger queries -------------- To query large amounts of data, the API has a pagination system which uses the limit and offset parameters. To avoid overloading the server, calls are limited to a maximum of 100 entries, so the offset allows you to get the next batch of entries in the next call(s). To simplify the process of accessing all available data, gemmapy includes the :py:func:`~gemmapy.GemmaPy.get_all_pages` function which can use the output from one page to make all the follow up requests >>> api.get_all_pages(api.get_platforms).head() platform_ID platform_short_name ... taxon_database_name taxon_database_ID 0 1 GPL96 ... hg38 87.0 1 2 GPL1355 ... rn6 86.0 2 3 GPL1261 ... mm10 81.0 3 4 GPL570 ... hg38 87.0 4 5 GPL81 ... mm10 81.0 [5 rows x 13 columns] Alternative way to access all pages is to do so manually. To see how many available results are there, you can look at the attributes component of the output objects where additional information from the API response is appended. >>> platform_count = api.get_platforms().attributes['total_elements'] 637 After which you can use offset to access all available platforms. >>> all_platforms = [] >>> for i in range(0,platform_count,100): ... all_platforms.append(api.get_platforms(offset = i, limit = 100)) >>> pd.concat(all_platforms) platform_ID platform_short_name ... taxon_database_name taxon_database_ID 0 1 GPL96 ... hg38 87.0 1 2 GPL1355 ... rn6 86.0 2 3 GPL1261 ... mm10 81.0 3 4 GPL570 ... hg38 87.0 4 5 GPL81 ... mm10 81.0 .. ... ... ... ... ... 32 1327 GPL11044 ... mm10 81.0 33 1328 GPL7182 ... hg38 87.0 34 1330 GPL33408 ... rn6 86.0 35 1331 GPL32159 ... mm10 81.0 36 1332 014850_4x44K_Merged ... hg38 87.0 [637 rows x 13 columns] Many endpoints only support a single identifier: >>> api.get_dataset_annotations(["GSE35974","GSE12649"]) ...Error Traceback... In these cases, you will have to loop over all the identifiers you wish to query and send separate requests: >>> annots = {} >>> for dataset in ["GSE35974","GSE12649"]: ... annots[dataset] = api.get_dataset_annotations(dataset) >>> annots {'GSE35974': class_name ... object_class 0 biological sex ... BioMaterial 1 disease ... FactorValue 2 disease ... FactorValue 3 biological sex ... BioMaterial 4 labelling ... BioMaterial 5 disease ... FactorValue 6 organism part ... ExperimentTag [7 rows x 5 columns], 'GSE12649': class_name ... object_class 0 disease ... FactorValue 1 labelling ... BioMaterial 2 disease ... FactorValue 3 organism part ... ExperimentTag [4 rows x 5 columns]} Raw endpoints ------------------ The previous version of gemmapy exposed raw API endpoints directly while this version processes the api output into pandas DataFrames for most cases. The old raw outputs are still accessible however under the :code:`raw` component of the GemmaPy instance. In general these functions take the same arguments (with the exception of taxa and uris arguments for anything that accepts a filter and result_sets argument of :py:func:`~gemmapy.GemmaPy.get_result_sets`) >>> api.raw.get_datasets_by_ids([1]) {'data': [{'accession': 'GSE2018', 'batch_confound': None, 'batch_effect': 'NO_BATCH_EFFECT_SUCCESS', 'batch_effect_statistics': 'This data set may have a batch artifact ' '(PC 2), p=0.69477', 'bio_assay_count': 34, 'curation_note': None, 'description': 'Bronchoalveolar lavage samples collected from lung ' 'transplant recipients. Numeric portion of sample ' 'name is an arbitrary patient ID and AxBx number ' 'indicates the perivascular (A) and bronchiolar (B) ' 'scores from biopsies collected on the same day as ' 'the BAL fluid was collected. Several patients have ' 'more than one sample in this series and can be ' 'determined by patient number followed by a lower ' 'case letter. Acute rejection state is determined ' 'by the combined A and B score - specifically, a ' 'combined AB score of 2 or greater is considered an ' 'acute rejection.', 'external_database': 'GEO', 'external_database_uri': 'http://www.ncbi.nlm.nih.gov/geo/', 'external_uri': 'https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE2018', 'geeq': {'batch_corrected': False, 'corr_mat_issues': '2', 'id': 557, 'no_vectors': False, 'public_quality_score': 0.9976340682316643, 'public_suitability_score': 0.875, 'q_score_batch_info': 1.0, 'q_score_outliers': 1.0, 'q_score_platforms_tech': 1.0, 'q_score_public_batch_confound': 1.0, 'q_score_public_batch_effect': 1.0, 'q_score_replicates': 1.0, 'q_score_sample_correlation_variance': 3.716509481654169e-05, 'q_score_sample_mean_correlation': 0.9822784200037354, 'q_score_sample_median_correlation': 0.9834384776216498, 'replicates_issues': '0', 's_score_avg_platform_popularity': 1.0, 's_score_avg_platform_size': 0.0, 's_score_missing_values': 1.0, 's_score_platform_amount': 1.0, 's_score_platform_tech_multi': 1.0, 's_score_publication': 1.0, 's_score_raw_data': 1.0, 's_score_sample_size': 1.0}, 'id': 1, 'last_needs_attention_event': {'_date': datetime.datetime(2018, 10, 2, 20, 57, 53, tzinfo=tzutc()), 'action': 'U', 'action_name': 'Update', 'detail': None, 'event_type_name': 'DoesNotNeedAttentionEvent', 'id': 25176178, 'note': 'Does not need attention.', 'performer': 'amansharma'}, 'last_note_update_event': None, 'last_troubled_event': None, 'last_updated': datetime.datetime(2024, 2, 10, 8, 31, 41, 417000, tzinfo=tzutc()), 'metadata': None, 'name': 'Human Lung Transplant - BAL', 'needs_attention': False, 'number_of_array_designs': 1, 'number_of_bio_assays': 34, 'number_of_processed_expression_vectors': 22283, 'short_name': 'GSE2018', 'source': '', 'taxon': {'common_name': 'human', 'external_database': {'description': 'Genome Reference ' 'Consortium Human ' 'GRCh38.p13 ' '(GCA_000001405.28)', 'external_databases': [{'description': None, 'external_databases': [], 'id': 94, 'last_updated': datetime.datetime(2022, 6, 30, 7, 0, tzinfo=tzutc()), 'name': 'hg38 ' 'annotations', 'release_url': 'https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_000001405.28/', 'release_version': 'GRCh38.p13', 'uri': 'https://hgdownload.cse.ucsc.edu/goldenpath/hg38/database/'}, {'description': 'Annotations ' 'provided ' 'by ' 'NCBI ' 'Genome ' 'and ' 'used ' 'by ' 'the ' 'RNA-Seq ' 'pipeline ' 'for ' 'human ' 'data.', 'external_databases': [], 'id': 124, 'last_updated': datetime.datetime(2023, 1, 17, 20, 27, 55, 59000, tzinfo=tzutc()), 'name': 'hg38 ' 'RNA-Seq ' 'annotations', 'release_url': 'https://ftp.ncbi.nlm.nih.gov/genomes/all/annotation_releases/9606/110/', 'release_version': '110', 'uri': 'https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000001405.40/'}], 'id': 87, 'last_updated': datetime.datetime(2022, 6, 30, 7, 0, tzinfo=tzutc()), 'name': 'hg38', 'release_url': 'https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_000001405.28/', 'release_version': 'GRCh38.p13', 'uri': 'https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38'}, 'id': 1, 'ncbi_id': 9606, 'scientific_name': 'Homo sapiens'}, 'taxon_id': 1, 'technology_type': None, 'trouble_details': 'No trouble details provided.', 'troubled': False}], 'filter': 'id = 1', 'group_by': ['id'], 'limit': 20, 'offset': 0, 'sort': {'direction': '+', 'order_by': 'id'}, 'total_elements': 1}