Skip to content
Snippets Groups Projects
Commit cc5511b8 authored by John Lafin's avatar John Lafin
Browse files

Add interpretation recommendations

parent d8a83119
No related merge requests found
...@@ -107,6 +107,66 @@ This workflow outputs the following: ...@@ -107,6 +107,66 @@ This workflow outputs the following:
- markers directory: A directory containing CSV files for cluster markers at - markers directory: A directory containing CSV files for cluster markers at
each resolution. each resolution.
## Using the output
This workflow outputs several plots and types of data. Here are some suggestions
on how to interpret and use them.
### Assessing filtering
The default filtering applied here is intentionally conservative- we prefer to keep
more than we remove. To assess how effective the filtering step was, compare the
pre- and post-filter plots. Assess the distributions of nCount_RNA (total counts
per cell) and nFeature_RNA (unique genes per cell) for an enrichment at the high
and low ends. The filtering step should remove these. If the cutoff lines are
beyond the bounds of the data, that's OK.
The trend plot is a good summary of several critical QC metrics. Pre-filtering,
an scRNA sample might have a cluster of cells at the lower left end of the
plot with high percent mitochondrial gene expression. Because snRNA data should
lack mitochondrial gene expression, for these samples you may see a similar
cluster without high percent mitocondrial gene expression. After filtering,
this population should mostly disappear, with most of the cells lining up with
the linear regression line.
### Assessing pre-processing
If you have multiple samples, you can assess the effect of batch correction by
comparing the PCA plot to the Harmony plot. Look for evidence of a batch effect
in the PCA plot (visible clusters enriched for particular samples or batch
variables). This effect should be reduced in the Harmony plot (eg, clusters from
different samples should shift together). If you don't see a batch effect in the
PCA plot, you may consider running the workflow again without batch correction.
You can also look at the UMAP plot grouped by sample or batch variable to assess
how well batch correction worked.
Next, look at the elbow plot. The vertical line indicates the dimensionality
selected. This selection should be early in the 'plateau' to attempt to retain
as much variation, without keeping noise. A value between 10-50 is usually
sufficient.
Clustering is run at 10 different resolutions by default (higher resolution =
more clusters). Selecting the proper resolution for your data can be challenging.
The clustree plot will show you cluster definitions at each resolution, and
how they change over the range. Use this plot to search for a resolution where
you see some stability in cluster assignment. From this starting point, you
can look at UMAP plots and marker lists to assess whether this resolution
appears to capture the biological populations you expect to find.
Finally, examine the doublet score plot. Any cluster or subcluster that shows
an enrichment of doublet scores may be made up of doublets. Data from these
clusters should be viewed with skepticism, or the clusters may be removed
from analysis entirely. For scRNA data, the same can be done with the stress
score plot- clusters enriched with cells with high stress score may be a sign
of transcriptional changes associated with enzymatic digestion.
## Next steps
The data generated by this workflow is directly compatible with the CZ CELLxGENE
Astrocyte workflow. Provide the `processed_object.rds` file as input to this
workflow for an interactive exploration of your data. Please see that workflow's
documentation for more information.
## Notes ## Notes
Note that although this workflow calculates stress and doublet scores, it does Note that although this workflow calculates stress and doublet scores, it does
......
...@@ -107,6 +107,66 @@ This workflow outputs the following: ...@@ -107,6 +107,66 @@ This workflow outputs the following:
- markers directory: A directory containing CSV files for cluster markers at - markers directory: A directory containing CSV files for cluster markers at
each resolution. each resolution.
## Using the output
This workflow outputs several plots and types of data. Here are some suggestions
on how to interpret and use them.
### Assessing filtering
The default filtering applied here is intentionally conservative- we prefer to keep
more than we remove. To assess how effective the filtering step was, compare the
pre- and post-filter plots. Assess the distributions of nCount_RNA (total counts
per cell) and nFeature_RNA (unique genes per cell) for an enrichment at the high
and low ends. The filtering step should remove these. If the cutoff lines are
beyond the bounds of the data, that's OK.
The trend plot is a good summary of several critical QC metrics. Pre-filtering,
an scRNA sample might have a cluster of cells at the lower left end of the
plot with high percent mitochondrial gene expression. Because snRNA data should
lack mitochondrial gene expression, for these samples you may see a similar
cluster without high percent mitocondrial gene expression. After filtering,
this population should mostly disappear, with most of the cells lining up with
the linear regression line.
### Assessing pre-processing
If you have multiple samples, you can assess the effect of batch correction by
comparing the PCA plot to the Harmony plot. Look for evidence of a batch effect
in the PCA plot (visible clusters enriched for particular samples or batch
variables). This effect should be reduced in the Harmony plot (eg, clusters from
different samples should shift together). If you don't see a batch effect in the
PCA plot, you may consider running the workflow again without batch correction.
You can also look at the UMAP plot grouped by sample or batch variable to assess
how well batch correction worked.
Next, look at the elbow plot. The vertical line indicates the dimensionality
selected. This selection should be early in the 'plateau' to attempt to retain
as much variation, without keeping noise. A value between 10-50 is usually
sufficient.
Clustering is run at 10 different resolutions by default (higher resolution =
more clusters). Selecting the proper resolution for your data can be challenging.
The clustree plot will show you cluster definitions at each resolution, and
how they change over the range. Use this plot to search for a resolution where
you see some stability in cluster assignment. From this starting point, you
can look at UMAP plots and marker lists to assess whether this resolution
appears to capture the biological populations you expect to find.
Finally, examine the doublet score plot. Any cluster or subcluster that shows
an enrichment of doublet scores may be made up of doublets. Data from these
clusters should be viewed with skepticism, or the clusters may be removed
from analysis entirely. For scRNA data, the same can be done with the stress
score plot- clusters enriched with cells with high stress score may be a sign
of transcriptional changes associated with enzymatic digestion.
## Next steps
The data generated by this workflow is directly compatible with the CZ CELLxGENE
Astrocyte workflow. Provide the `processed_object.rds` file as input to this
workflow for an interactive exploration of your data. Please see that workflow's
documentation for more information.
## Notes ## Notes
Note that although this workflow calculates stress and doublet scores, it does Note that although this workflow calculates stress and doublet scores, it does
......
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment