Skip to content
Snippets Groups Projects
Commit ec50bada authored by John Lafin's avatar John Lafin
Browse files

Merge branch 'convert_input' into 'main'

Convert input

See merge request !1
parents 7977d530 77182e6e
1 merge request!1Convert input
......@@ -3,6 +3,9 @@ workflow/images
workflow/*.html*
workflow/*.csv
test_data/*.rds
test_data/cxg_input.h5ad
*.DS_Store
.history
.nextflow.log*
......
......@@ -16,20 +16,21 @@ Hosted tutorials to learn how to use CELLxGENE are available
## Parameters
Input: An h5ad file containined processed single cell RNAseq data. For
formatting requirements, see the CELLxGENE documentation [here](https://cellxgene.cziscience.com/docs/05__Annotate%20and%20Analyze%20Your%20Data/5_3__Preparing%20Data).
Input: An h5ad or rds file containining processed single cell RNAseq data. For
h5ad formatting requirements, see the CELLxGENE documentation [here](https://cellxgene.cziscience.com/docs/05__Annotate%20and%20Analyze%20Your%20Data/5_3__Preparing%20Data).
For rds files, these can contain processed Seurat or SingleCellExperiment
objects. This workflow will attempt to convert these objects to h5ad files
prior to loading them into CELLxGENE.
## Output
This workflow does not generate any guaranteed output. Optional outputs include:
- Annotations file: a CSV file containing cell labels defined by the user during
their CELLxGENE session.
- Marker gene lists: maybe?
This workflow generates an interactive single cell RNA seq exploration
interface in the browser. No output files are generated.
## Test data
The test data directory contains a small test h5ad file provided by CZ CELLxGENE.
The test data directory contains a small test h5ad file provided by CZ
CELLxGENE as well as a script to download test data in other formats.
## Questions
......
......@@ -76,6 +76,8 @@ workflow_modules:
# Singularity supports different registries, please specify the protocol to use.
# Such as, "docker://", "shub://", "library://", etc. We encourage you to use the GitLab
# container registry of BioHPC to save and manage your container images.
workflow_containers:
- docker://git.biohpc.swmed.edu:5050/astrocyte/workflows/strand-lab/cellxgene/h5ad_convert:0.0.1
# A list of parameters used by the workflow, defining how to present them,
# options etc in the web interface. For each parameter:
......@@ -115,10 +117,11 @@ workflow_parameters:
- id: input
type: file
required: true
regex: ".*h5ad$"
regex: "(?i).*(h5ad$|rds$)"
description: |
An .h5ad file contained pre-processed single cell RNAseq data.
For formatting requirements, see workflow documentation.
An .h5ad or .rds file contained pre-processed single cell RNAseq data.
.h5ad files should contain an annotated AnnData object. .rds files
should contain an annotated Seurat or SingleCellExperiment object.
# -----------------------------------------------------------------------------
# SHINY APP CONFIGURATION
......@@ -129,7 +132,7 @@ workflow_parameters:
# List of any full path of the containers
vizapp_containers:
- docker://git.biohpc.swmed.edu:5050/astrocyte/workflows/strand-lab/cellxgene:latest
- docker://git.biohpc.swmed.edu:5050/astrocyte/workflows/strand-lab/cellxgene/cellxgene:1.2.0
# List of any path of the scripts in 'your_astrocyte_workflow/vizapp' folder to run the containers
vizapp_container_runscripts:
......
......@@ -6,12 +6,12 @@ to aid in reproducibility.
## Input
To run this workflow, you must provide a pre-processed .h5ad file. The format
requirements for this file are available [here](https://cellxgene.cziscience.com/docs/05__Annotate%20and%20Analyze%20Your%20Data/5_3__Preparing%20Data).
CELLxGENE is not compatible with Seurat or SingleCellExperiment objects. These
data types can be converted to h5ad files using the `sceasy` or `SeuratDisk`
packages in R.
To run this workflow, you must provide pre-processed scRNA output. This workflow
can accept an AnnData object saved as an h5ad file (formating requirements are
available [here](https://cellxgene.cziscience.com/docs/05__Annotate%20and%20Analyze%20Your%20Data/5_3__Preparing%20Data)),
or an rds file containing a processed Seurat or SingleCellExperiment object. If
an rds file is submitted, this workflow will attempt to convert it to h5ad
using the `sceasy` R package.
## Credits
......@@ -23,5 +23,8 @@ and at the following publication:
> sparse matrices. CZI Single-Cell Biology, et al.
> bioRxiv 2021.04.05; doi: https://doi.org/10.1101/2021.04.05.438318
`sceasy` is an R package for the interconversion of single cell analysis formats.
The source code is available [here](https://github.com/cellgeni/sceasy).
This workflow was designed and implemented with the guidance of experts at
BioHPC.
\ No newline at end of file
#!/usr/bin/env Rscript
library(Seurat)
library(SeuratData)
library(SingleCellExperiment)
pbmc <- LoadData("pbmc", type = "pbmc.final")
pbmc_sce <- as.SingleCellExperiment(pbmc)
saveRDS(pbmc, "./pbmc_seurat.rds")
saveRDS(pbmc_sce, "./pbmc_sce.rds")
\ No newline at end of file
......@@ -14,7 +14,7 @@ DIR=$( cd -P "$( dirname "$SOURCE" )" >/dev/null 2>&1 && pwd )
# Image settings
REPO="cellxgene"
TAG="latest"
TAG="1.2.0"
singularity_image=${REPO}-${TAG}.img
outputDir=${outputDir:-"${DIR}/../workflow/output/"}
......
// Submit jobs to SLURM scheduler
process {
executor = 'slurm'
queue = 'super'
clusterOptions = '--hold --no-kill'
time = '4h'
}
// Enable singularity
singularity {
enabled = true
cacheDir = "${projectDir}/images/singularity"
}
// Remove temporary work files
cleanup = true
// This will also remove logs!
//cleanup = true
// Overwrite execution report files
// Required for nextflow version >22.10.0
trace.overwrite = true
dag.overwrite = true
timeline.overwrite = true
......
......@@ -16,10 +16,11 @@
params.input = "${projectDir}/../test_data/pbmc3k.h5ad"
// Copy input file
// Detect input file type and copy to h5ad
process setup {
stageInMode 'copy' //CxG cannot read symlinks
publishDir "${projectDir}/output", mode: 'copy'
container 'git.biohpc.swmed.edu:5050/astrocyte/workflows/strand-lab/cellxgene/h5ad_convert:0.0.1'
input:
path input
......@@ -28,14 +29,30 @@ process setup {
path "cxg_input.h5ad"
script:
"""
echo ""
echo "This script is running on:"
cat /etc/os-release
echo ""
cat $input > cxg_input.h5ad
"""
// If input is h5ad, just copy it to the location
if (input.getExtension() == "h5ad") {
"""
cp ${input} cxg_input.h5ad
"""
}
// If input is rds, convert using the R script
else if (input.getExtension().toLowerCase() == "rds") {
"""
source /h5ad_convert/entrypoint.sh
Rscript ${projectDir}/scripts/prepare_h5ad.R $input
"""
}
// If neither, something's gone wrong
else {
"""
echo "Error: Please submit either an .h5ad file, or an .rds file containing a Seurat or SingleCellExperiment object."
exit 1
"""
}
}
workflow {
......
# rds -> h5ad converter for CELLxGENE analysis
# Author: John T Lafin
# Load packages
library(Seurat)
library(sceasy)
# Set up reticulate
use_condaenv("reticulate")
# Read in file
args <- commandArgs(trailingOnly = TRUE)
obj <- readRDS(args)
# Convert Seurat object
if (class(obj) == "Seurat") {
assay_name <- DefaultAssay(obj)
# If object contains a V5 assay, convert to V3
if (class(obj[[assay_name]]) == "Assay5") {
obj[["RNA3"]] <- as(obj[[assay_name]], Class = "Assay")
DefaultAssay(obj) <- "RNA3"
obj[[assay_name]] <- NULL
obj <- RenameAssays(obj, RNA3 = "RNA")
}
# Convert object to h5ad
convertFormat(obj,
from="seurat",
to="anndata",
outFile="cxg_input.h5ad")
} else if (class(obj) == "SingleCellExperiment") {
# Convert SingleCellExperiment object
convertFormat(obj,
from="sce",
to="anndata",
outFile="cxg_input.h5ad")
} else {
# If neither Seurat nor SCE, error
stop("Provided rds file does not contain a compatible Seurat or SingleCellExperiment object.")
}
\ No newline at end of file
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment