Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
S
scRNA-QC
Manage
Activity
Members
Labels
Plan
Issues
1
Issue boards
Milestones
Iterations
Code
Merge requests
0
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Locked files
Build
Pipelines
Jobs
Pipeline schedules
Test cases
Artifacts
Deploy
Container Registry
Monitor
Service Desk
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Code review analytics
Issue analytics
Insights
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Astrocyte
Workflows
Strand Lab
scRNA-QC
Commits
cc5511b8
There was an error fetching the commit references. Please try again later.
Commit
cc5511b8
authored
6 months ago
by
John Lafin
Browse files
Options
Downloads
Patches
Plain Diff
Add interpretation recommendations
parent
d8a83119
No related merge requests found
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
README.md
+60
-0
60 additions, 0 deletions
README.md
docs/index.md
+60
-0
60 additions, 0 deletions
docs/index.md
with
120 additions
and
0 deletions
README.md
+
60
−
0
View file @
cc5511b8
...
@@ -107,6 +107,66 @@ This workflow outputs the following:
...
@@ -107,6 +107,66 @@ This workflow outputs the following:
-
markers directory: A directory containing CSV files for cluster markers at
-
markers directory: A directory containing CSV files for cluster markers at
each resolution.
each resolution.
## Using the output
This workflow outputs several plots and types of data. Here are some suggestions
on how to interpret and use them.
### Assessing filtering
The default filtering applied here is intentionally conservative- we prefer to keep
more than we remove. To assess how effective the filtering step was, compare the
pre- and post-filter plots. Assess the distributions of nCount_RNA (total counts
per cell) and nFeature_RNA (unique genes per cell) for an enrichment at the high
and low ends. The filtering step should remove these. If the cutoff lines are
beyond the bounds of the data, that's OK.
The trend plot is a good summary of several critical QC metrics. Pre-filtering,
an scRNA sample might have a cluster of cells at the lower left end of the
plot with high percent mitochondrial gene expression. Because snRNA data should
lack mitochondrial gene expression, for these samples you may see a similar
cluster without high percent mitocondrial gene expression. After filtering,
this population should mostly disappear, with most of the cells lining up with
the linear regression line.
### Assessing pre-processing
If you have multiple samples, you can assess the effect of batch correction by
comparing the PCA plot to the Harmony plot. Look for evidence of a batch effect
in the PCA plot (visible clusters enriched for particular samples or batch
variables). This effect should be reduced in the Harmony plot (eg, clusters from
different samples should shift together). If you don't see a batch effect in the
PCA plot, you may consider running the workflow again without batch correction.
You can also look at the UMAP plot grouped by sample or batch variable to assess
how well batch correction worked.
Next, look at the elbow plot. The vertical line indicates the dimensionality
selected. This selection should be early in the 'plateau' to attempt to retain
as much variation, without keeping noise. A value between 10-50 is usually
sufficient.
Clustering is run at 10 different resolutions by default (higher resolution =
more clusters). Selecting the proper resolution for your data can be challenging.
The clustree plot will show you cluster definitions at each resolution, and
how they change over the range. Use this plot to search for a resolution where
you see some stability in cluster assignment. From this starting point, you
can look at UMAP plots and marker lists to assess whether this resolution
appears to capture the biological populations you expect to find.
Finally, examine the doublet score plot. Any cluster or subcluster that shows
an enrichment of doublet scores may be made up of doublets. Data from these
clusters should be viewed with skepticism, or the clusters may be removed
from analysis entirely. For scRNA data, the same can be done with the stress
score plot- clusters enriched with cells with high stress score may be a sign
of transcriptional changes associated with enzymatic digestion.
## Next steps
The data generated by this workflow is directly compatible with the CZ CELLxGENE
Astrocyte workflow. Provide the
`processed_object.rds`
file as input to this
workflow for an interactive exploration of your data. Please see that workflow's
documentation for more information.
## Notes
## Notes
Note that although this workflow calculates stress and doublet scores, it does
Note that although this workflow calculates stress and doublet scores, it does
...
...
This diff is collapsed.
Click to expand it.
docs/index.md
+
60
−
0
View file @
cc5511b8
...
@@ -107,6 +107,66 @@ This workflow outputs the following:
...
@@ -107,6 +107,66 @@ This workflow outputs the following:
-
markers directory: A directory containing CSV files for cluster markers at
-
markers directory: A directory containing CSV files for cluster markers at
each resolution.
each resolution.
## Using the output
This workflow outputs several plots and types of data. Here are some suggestions
on how to interpret and use them.
### Assessing filtering
The default filtering applied here is intentionally conservative- we prefer to keep
more than we remove. To assess how effective the filtering step was, compare the
pre- and post-filter plots. Assess the distributions of nCount_RNA (total counts
per cell) and nFeature_RNA (unique genes per cell) for an enrichment at the high
and low ends. The filtering step should remove these. If the cutoff lines are
beyond the bounds of the data, that's OK.
The trend plot is a good summary of several critical QC metrics. Pre-filtering,
an scRNA sample might have a cluster of cells at the lower left end of the
plot with high percent mitochondrial gene expression. Because snRNA data should
lack mitochondrial gene expression, for these samples you may see a similar
cluster without high percent mitocondrial gene expression. After filtering,
this population should mostly disappear, with most of the cells lining up with
the linear regression line.
### Assessing pre-processing
If you have multiple samples, you can assess the effect of batch correction by
comparing the PCA plot to the Harmony plot. Look for evidence of a batch effect
in the PCA plot (visible clusters enriched for particular samples or batch
variables). This effect should be reduced in the Harmony plot (eg, clusters from
different samples should shift together). If you don't see a batch effect in the
PCA plot, you may consider running the workflow again without batch correction.
You can also look at the UMAP plot grouped by sample or batch variable to assess
how well batch correction worked.
Next, look at the elbow plot. The vertical line indicates the dimensionality
selected. This selection should be early in the 'plateau' to attempt to retain
as much variation, without keeping noise. A value between 10-50 is usually
sufficient.
Clustering is run at 10 different resolutions by default (higher resolution =
more clusters). Selecting the proper resolution for your data can be challenging.
The clustree plot will show you cluster definitions at each resolution, and
how they change over the range. Use this plot to search for a resolution where
you see some stability in cluster assignment. From this starting point, you
can look at UMAP plots and marker lists to assess whether this resolution
appears to capture the biological populations you expect to find.
Finally, examine the doublet score plot. Any cluster or subcluster that shows
an enrichment of doublet scores may be made up of doublets. Data from these
clusters should be viewed with skepticism, or the clusters may be removed
from analysis entirely. For scRNA data, the same can be done with the stress
score plot- clusters enriched with cells with high stress score may be a sign
of transcriptional changes associated with enzymatic digestion.
## Next steps
The data generated by this workflow is directly compatible with the CZ CELLxGENE
Astrocyte workflow. Provide the
`processed_object.rds`
file as input to this
workflow for an interactive exploration of your data. Please see that workflow's
documentation for more information.
## Notes
## Notes
Note that although this workflow calculates stress and doublet scores, it does
Note that although this workflow calculates stress and doublet scores, it does
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment