Yep, you’re still a Bioinformatician working for a molecular diagnostics lab
The only thing is, you need to start thinking about doing these kind of analyses again and again and again and again ……
The lab says they might send 100’s if not 1000’s of samples through your approach
Prediction:
What’s a script?
Job #3: Automate your approaches in a Bash script
But first…..what’s a script?
Kinda like a tool but you make it yourself
Really, its any assembly of code to accomplish a task or multiple tasks.
When you combine a bunch of tools that already exist, you might call it a pipeline
Similar to the pipe | we just learned about, this means outputs flowing into inputs for the next set of commands
From command-line to command…code?
A bash script exists in a type of text file
they have special extensions: .sh
and need to be executable (more on this in a minute)
navigate into your /media/fileshare/ directory
and copy first_script.sh to BSCC_2023_dir/code/
So, what do we do with new files?
Take a look! Remember, # means the following is a comment
cat ../../code/first_script.sh
#!/usr/bin/env bash
# first_script.sh
# rreggiar@ucsc.edu
# 2022-07-18
script_name='first_script.sh' # variable_name = value
input_userID='1000'
echo "The name of this script is:" $script_name
# echo can print combinations of text and variables
echo "Your user ID is:" $input_userID
# some values, like PWD, are stored in 'global' variables
echo "Your present working directory is:" $PWD
# to execute cmdline tools, wrap them in $()
echo "The contents of $PWD are:" $(ls)
What do we do with scripts?
Execute them!
cd ../../code./first_script.sh
bash: line 1: ./first_script.sh: Permission denied
uh oh…permission denied? this is our computer!!
We need to change the permissions on the file to allow execution
Quick aside: permissions for execution
Files are protected from being used incorrectly by permissions
The shebang, tells the computer we’re using bash and where to find it to run the script – try which bash in cmd line
#!/usr/bin/env bash
The rest is just useful information
Exploring first_script.sh: variable assignment
Code
grep'=' ../../code/first_script.sh
script_name='first_script.sh' # variable_name = value
input_userID='1000'
Assigning variables is just variable = value
Try: Run the code block above
Run echo $script_name on the command line, what do you get?
Exploring first_script.sh: operations
Code
tail-7 ../../code/first_script.sh
# echo can print combinations of text and variables
echo "Your user ID is:" $input_userID
# some values, like PWD, are stored in 'global' variables
echo "Your present working directory is:" $PWD
# to execute cmdline tools, wrap them in $()
echo "The contents of $PWD are:" $(ls)
Three echo commands, each using either a variable or a command along with text
Prediction:
Where could echo with text and variables be useful going forward?
Exploring first_script.sh: using cmd line tools
Code
tail-2 ../../code/first_script.sh
echo "The contents of $PWD are:" $(ls)
Sometimes we’ll need to explicitly mark the tool for execution use $() , otherwise we’ll just print ls here
Practice 10:
On the command line, run each echo line from first_script.sh, what do you get?
echo "The name of this script is:" $script_name
echo "Your user ID is:" $input_userID
echo "Your present working directory is:" $PWD
echo "The contents of $PWD are:" $(ls)
why do some work and others don’t?
Practice 10 output:
1.
The name of this script is:
2.
Your user ID is:
3.
Your present working directory is: /Users/vikas/Documents/UCSC/teaching/ucsc_scbc_2022/code
4.
The contents of /Users/vikas/Documents/UCSC/teaching/ucsc_scbc_2022/code are: call_variant.sh first_script.sh process_lab_data.sh second_script.sh skeleton.sh
Reflection:
Why do you think the script is structured like this:
Shebang
Variable assignment
Operations
?
Prediction:
Since we can’t see the output of commands in a script like we can at the command line, how can we test our work to make sure its doing what we expect?
Remember…Job #3! (pt. 1)
Automate your gene and patient extraction in a Bash script
create a bash script: process_lab_data.sh , open it in text editor
make it executable
within process_lab_data.sh
write shebang/boilerplate code in the first lines
make a gene_db variable that stores the path to your gene_panel_database.fa
make a patient_db variable that stores the path to your patient_database/directory
Job #3 (pt.2)
Test your code by introducing echo commands that:
print the paths to your data
print the contents of the patient_database directory
Output:
../../code/process_lab_data.sh: line 7: [: =: unary operator expected
initializing gene and patient databases...
gene database: /home/jovyan/SCBC_2022_dir/data/gene_panel_database.fa
patient database: /home/jovyan/SCBC_2022_dir/data/patient_database
ls: /home/jovyan/SCBC_2022_dir/data/patient_database: No such file or directory
patient database contents:
Job #3 (pt.3)
Introduce the commands that generate final_gene_panel.fa by operating on the variable you’ve set to gene_panel_database.fa
Introduce the commands that generate patient_data.csv by operating on the variable you’ve set to patient_database/ (output is just head -3)
patient_id,gene,age,sex
10,TP53,56,M
11,TP53,56,M
Reflection:
Compare my first_script.sh to your process_lab_data.sh – what’s different?
Job #3 (pt.4)
Add comments to your code to explain what you’re doing on each line
Add echo commands to report to the user which step the code is on
Prediction:
What’s something else our tool might able to use that command line tools already use?
Defining script variables at the cmd line
Bash scripts have special values that correspond to positions on the command line:
$0$1$2$3 ….
that let us implement command line arguments
to see more, copy second_script.sh from /media/fileshare/
Checking out second_script.sh
cd ../../codetail-16 second_script.sh # for some reason only this many lines fit
# 2022-07-18
cmd_recieved=$0 # $0 stores the command entered to the cmd line
script_name=$(basename $0) # basename extracts the last entry in a path
input_var=${1:-10} # $1 stores the first cmd line argument
# :-VAL sets VAL to the default value of the argument
print_info=${2:-"TRUE"} # $2 stores the second cmd line argument
if [ $print_info = "TRUE" ]; then
echo "command: " $cmd_recieved
# echo can print combinations of text and variables
echo "The name of this script is: " $script_name
# bash can do math inside $(())
echo $input_var / 2 = $((input_var/2))
fi
Running second_script.sh
remember to make second_script.sh executable
../../code/second_script.sh
command: ../../code/second_script.sh
The name of this script is: second_script.sh
10 / 2 = 5
../../code/second_script.sh 640 TRUE
command: ../../code/second_script.sh
The name of this script is: second_script.sh
640 / 2 = 320
../../code/second_script.sh 1200 FALSE
what does each command line argument appear to do?
Breaking down second_script.sh
The top bit looks the same except for minor details
Bash scripts have special values that correspond to positions on the command line:
$0$1$2$3 ….
Code
head-10 ../../code/second_script.sh |tail-5
cmd_recieved=$0 # $0 stores the command entered to the cmd line
script_name=$(basename $0) # basename extracts the last entry in a path
input_var=${1:-10} # $1 stores the first cmd line argument
# :-VAL sets VAL to the default value of the argument
print_info=${2:-"TRUE"} # $2 stores the second cmd line argument
input_var=${1:-10} # $1 stores the first cmd line argument
# :-VAL sets VAL to the default value of the argument
Here, ${1:-10} is being used to set a default value of 10 for the first positional value $1
Practice 11:
Get second_script.sh to return 21436 as the output value
$2 is the second positional argument
Code
head-10 ../../code/second_script.sh |tail-1
print_info=${2:-"TRUE"} # $2 stores the second cmd line argument
As before, we set a default arg of TRUE for $2 , what happens if we change it?
$2 -> print_info operates in an if statement
Code
tail-8 ../../code/second_script.sh
if [ $print_info = "TRUE" ]; then
echo "command: " $cmd_recieved
# echo can print combinations of text and variables
echo "The name of this script is: " $script_name
# bash can do math inside $(())
echo $input_var / 2 = $((input_var/2))
fi
Another critical tool: IF statements
Much like for loops, if statements are multi-part commands that enable complex logic
if
[ condition to check for ]
then (equivalent to do)
do something
fi (equivalent to done)
if statements allow us to check something before running
Practice 12:
Try if on the command line
change tmp_var around until you can figure out what condition we are “satisfying”