Shell-2: Run Programs, Interact with their Output
1 Get the SWC files
1.1 We use git as a file retriever.
git is the name of a "command line" program. Git's project management concepts will be discussed elsewhere.
cd cd GIT/github git clone https://github.com/oulib-swc/ou_swc_files.git cd ou_swc_files
1.2 ls: Inspect those files
- list the items in the top level directory (ls)
- lists items
ls
gapminder inflammation Users
- I need "-F" because color coding is lost in this output
ls -F
gapminder/ inflammation/ Users/
- show hidden items
ls -Fa
./ ../ gapminder/ .git/ inflammation/ Users/
- show detailed information
ls -Fla
total 24 drwxrwxr-x 6 pauljohn pauljohn 4096 Nov 13 10:49 ./ drwxrwxr-x 10 pauljohn pauljohn 4096 Nov 14 19:20 ../ drwxrwxr-x 3 pauljohn pauljohn 4096 Nov 13 10:49 gapminder/ drwxrwxr-x 8 pauljohn pauljohn 4096 Nov 14 14:00 .git/ drwxrwxr-x 4 pauljohn pauljohn 4096 Nov 13 10:49 inflammation/ drwxrwxr-x 5 pauljohn pauljohn 4096 Nov 13 10:49 Users/
- List contents, including one level below
ls -F *
gapminder: data/ inflammation: data/ python/ Users: imhotep/ larry/ nelle/
- List files recursively with -R
ls -FR
.: gapminder/ inflammation/ Users/ ./gapminder: data/ ./gapminder/data: gapminder_all.csv gapminder_gdp_africa.csv gapminder_gdp_americas.csv gapminder_gdp_asia.csv gapminder_gdp_europe.csv gapminder_gdp_oceania.csv ./inflammation: data/ python/ ./inflammation/data: inflammation-01.csv inflammation-02.csv inflammation-03.csv inflammation-04.csv inflammation-05.csv inflammation-06.csv inflammation-07.csv inflammation-08.csv inflammation-09.csv inflammation-10.csv inflammation-11.csv inflammation-12.csv small-01.csv small-02.csv small-03.csv ./inflammation/python: argv-list.py arith.py check.py count-stdin.py errors_01.py errors_02.py gen-inflammation.py line-count.py my_ls.py readings-01.py readings-02.py readings-03.py readings-04.py readings-05.py readings-06.py readings-07.py readings-08.py readings-09.py rectangle.py sys-version.py ./Users: imhotep/ larry/ nelle/ ./Users/imhotep: ./Users/larry: ./Users/nelle: creatures/ data/ Desktop/ molecules/ north-pacific-gyre/ notes.txt pizza.cfg solar.pdf writing/ ./Users/nelle/creatures: basilisk.dat unicorn.dat ./Users/nelle/data: amino-acids.txt animals.txt elements/ morse.txt pdb/ planets.txt salmon.txt sunspot.txt ./Users/nelle/data/elements: Ac.xml Ag.xml Al.xml Am.xml Ar.xml As.xml At.xml Au.xml Ba.xml Be.xml Bi.xml Bk.xml Br.xml B.xml Ca.xml Cd.xml Ce.xml Cf.xml Cl.xml Cm.xml Co.xml Cr.xml Cs.xml Cu.xml C.xml Dy.xml Er.xml Es.xml Eu.xml Fe.xml Fm.xml Fr.xml F.xml Ga.xml Gd.xml Ge.xml He.xml Hf.xml Hg.xml Ho.xml H.xml In.xml Ir.xml I.xml Kr.xml K.xml La.xml Li.xml Lr.xml Lu.xml Md.xml Mg.xml Mn.xml Mo.xml Na.xml Nb.xml Nd.xml Ne.xml Ni.xml No.xml Np.xml N.xml Os.xml O.xml Pa.xml Pb.xml Pd.xml Pm.xml Po.xml Pr.xml Pt.xml Pu.xml P.xml Ra.xml Rb.xml Re.xml Rh.xml Rn.xml Ru.xml Sb.xml Sc.xml Se.xml Si.xml Sm.xml Sn.xml Sr.xml S.xml Ta.xml Tb.xml Tc.xml Te.xml Th.xml Ti.xml Tl.xml Tm.xml U.xml V.xml W.xml Xe.xml Yb.xml Y.xml Zn.xml Zr.xml ./Users/nelle/data/pdb: aldrin.pdb ammonia.pdb ascorbic-acid.pdb benzaldehyde.pdb camphene.pdb cholesterol.pdb cinnamaldehyde.pdb citronellal.pdb codeine.pdb cubane.pdb cyclobutane.pdb cyclohexanol.pdb cyclopropane.pdb ethane.pdb ethanol.pdb ethylcyclohexane.pdb glycol.pdb heme.pdb lactic-acid.pdb lactose.pdb lanoxin.pdb lsd.pdb maltose.pdb menthol.pdb methane.pdb methanol.pdb mint.pdb morphine.pdb mustard.pdb nerol.pdb norethindrone.pdb octane.pdb pentane.pdb piperine.pdb propane.pdb pyridoxal.pdb quinine.pdb strychnine.pdb styrene.pdb sucrose.pdb testosterone.pdb thiamine.pdb tnt.pdb tuberin.pdb tyrian-purple.pdb vanillin.pdb vinyl-chloride.pdb vitamin-a.pdb ./Users/nelle/Desktop: ./Users/nelle/molecules: cubane.pdb ethane.pdb methane.pdb octane.pdb pentane.pdb propane.pdb records.txt ./Users/nelle/north-pacific-gyre: 2012-07-03/ output/ ./Users/nelle/north-pacific-gyre/2012-07-03: goodiff goostats NENE01729A.txt NENE01729B.txt NENE01736A.txt NENE01751A.txt NENE01751B.txt NENE01812A.txt NENE01843A.txt NENE01843B.txt NENE01971Z.txt NENE01978A.txt NENE01978B.txt NENE02018B.txt NENE02040A.txt NENE02040B.txt NENE02040Z.txt NENE02043A.txt NENE02043B.txt ./Users/nelle/north-pacific-gyre/output: ./Users/nelle/writing: data/ haiku.txt old/ thesis/ tools/ ./Users/nelle/writing/data: one.txt two.txt ./Users/nelle/writing/old: ./Users/nelle/writing/thesis: empty-draft.md ./Users/nelle/writing/tools: format old/ stats ./Users/nelle/writing/tools/old: oldtool
1.3 Use du check disk space used
Default unit of measurement will be value not useful to humans, so add "-h" flag to du command
du -h
84K ./inflammation/python 112K ./inflammation/data 200K ./inflammation 88K ./gapminder/data 92K ./gapminder 4.0K ./.git/refs/tags 8.0K ./.git/refs/remotes/origin 12K ./.git/refs/remotes 8.0K ./.git/refs/heads 28K ./.git/refs 8.0K ./.git/info 4.0K ./.git/branches 44K ./.git/hooks 8.0K ./.git/logs/refs/remotes/origin 12K ./.git/logs/refs/remotes 8.0K ./.git/logs/refs/heads 24K ./.git/logs/refs 32K ./.git/logs 4.0K ./.git/objects/info 192K ./.git/objects/pack 200K ./.git/objects 364K ./.git 4.0K ./Users/larry 4.0K ./Users/nelle/Desktop 212K ./Users/nelle/data/pdb 416K ./Users/nelle/data/elements 736K ./Users/nelle/data 12K ./Users/nelle/creatures 32K ./Users/nelle/molecules 140K ./Users/nelle/north-pacific-gyre/2012-07-03 4.0K ./Users/nelle/north-pacific-gyre/output 148K ./Users/nelle/north-pacific-gyre 4.0K ./Users/nelle/writing/thesis 4.0K ./Users/nelle/writing/old 28K ./Users/nelle/writing/data 4.0K ./Users/nelle/writing/tools/old 16K ./Users/nelle/writing/tools 60K ./Users/nelle/writing 1.1M ./Users/nelle 4.0K ./Users/imhotep 1.1M ./Users 1.7M .
1.3.1 Want less detail? add "–max-depth" flag
du -h --max-depth=1
200K ./inflammation 92K ./gapminder 364K ./.git 1.1M ./Users 1.7M .
2 Inspect Contents (cat, head, tail)
2.1 The North Pacific Gyre data
2.1.1 cd into directory that has some of Nelle's data
cd Users/nelle/north-pacific-gyre/2012-07-03
ls -la
total 144 drwxrwxr-x 2 pauljohn pauljohn 4096 Nov 14 22:06 . drwxrwxr-x 4 pauljohn pauljohn 4096 Nov 14 22:46 .. -rw-rw-r-- 1 pauljohn pauljohn 184 Nov 13 10:49 goodiff -rw-rw-r-- 1 pauljohn pauljohn 198 Nov 13 10:49 goostats -r--r--r-- 1 pauljohn pauljohn 4406 Nov 13 10:49 NENE01729A.txt -r--r--r-- 1 pauljohn pauljohn 43 Nov 13 14:50 NENE01729B.txt -r--r--r-- 1 pauljohn pauljohn 4371 Nov 13 10:49 NENE01736A.txt -r--r--r-- 1 pauljohn pauljohn 4411 Nov 13 10:49 NENE01751A.txt -r--r--r-- 1 pauljohn pauljohn 4409 Nov 13 10:49 NENE01751B.txt -r--r--r-- 1 pauljohn pauljohn 4401 Nov 13 10:49 NENE01812A.txt -r--r--r-- 1 pauljohn pauljohn 4395 Nov 13 10:49 NENE01843A.txt -r--r--r-- 1 pauljohn pauljohn 4375 Nov 13 10:49 NENE01843B.txt -r--r--r-- 1 pauljohn pauljohn 4372 Nov 13 10:49 NENE01971Z.txt -r--r--r-- 1 pauljohn pauljohn 4381 Nov 13 10:49 NENE01978A.txt -r--r--r-- 1 pauljohn pauljohn 4389 Nov 13 10:49 NENE01978B.txt -r--r--r-- 1 pauljohn pauljohn 3517 Nov 13 10:49 NENE02018B.txt -r--r--r-- 1 pauljohn pauljohn 4391 Nov 13 10:49 NENE02040A.txt -r--r--r-- 1 pauljohn pauljohn 4367 Nov 13 10:49 NENE02040B.txt -r--r--r-- 1 pauljohn pauljohn 4381 Nov 13 10:49 NENE02040Z.txt -r--r--r-- 1 pauljohn pauljohn 4386 Nov 13 10:49 NENE02043A.txt -r--r--r-- 1 pauljohn pauljohn 4393 Nov 13 10:49 NENE02043B.txt
2.2 cat: "concatenate" files and write on standard output
"cat goodiff" is manageable output
cat goodiff
# difference of two input files # demo version, just return a random number or "files are identical" if [ "$1" == "$2" ] then echo "files are identical" else echo 0.$RANDOM fi
This is a "shell script", a series of commands cobbled together.
2.3 head and tail: checking contents of big files
2.3.1 head
If we simply run "cat NENE02040B.txt" to see what's in there, everything will run by on the screen very quickly. One way to deal with that is to only look at the top part of the file
head NENE02040B.txt
0.616254506154 0.283755587068 0.156990583983 0.404143324251 1.40467049524 0.563505688711 4.04569329033 1.58230309459 0.438038008849 1.24649230763
Head defaults to display 10 lines, but perhaps I only need to see the first 5.
head -n5 NENE02040B.txt
0.616254506154 0.283755587068 0.156990583983 0.404143324251 1.40467049524
Here is an example of the long and short style of command line argument. The short argument is "-n" with no equal sign, but the long version is
head --lines=5 NENE02040B.txt
0.616254506154 0.283755587068 0.156990583983 0.404143324251 1.40467049524
2.3.2 tail: check the last (default: 10) lines
tail NENE02040B.txt
1.1069459452 0.073897931368 0.0755146936238 0.609976382121 0.106432564 0.485084647673 2.98671436729 1.13033139062 0.518031268789 0.788386986395
tail -n3 NENE02040B.txt
1.13033139062 0.518031268789 0.788386986395
2.4 more and less
So far as I can tell, more and less are equivalent! more is an older program over which one company asserted ownership, while less is the free version created in response. Some systems have one, some systems have both. Need to scan entire file?
Running "cat" will spew out the results too fast. Some terminals are able to scroll back in history, but these are not always available.
2.4.1 Run the more program to see "one screen at a time".
more NENE02040B.txt
Space bar to see next screen
q to quit
3 Executable Path
Question: Why didn't we have to type "/usr/bin/git"?
3.1 Launch a program by name, including all directory structure
$ /usr/bin/git
or
$ /usr/bin/du
We don't usually have to do that for the most frequently used programs in the shell.
3.2 Enter the PATH
PATH Special directories where the shell can look for executable programs.
Here's my path
echo $PATH
/home/pauljohn/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/pauljohn/bin
On many computers, there will be 100s or 1000s of programs available. Many are in the executable path. Perhaps not some you might expect to be.
My path has the "bin" directory in my user account, plus lots of others that come with the OS.
3.2.1 Note what Git Bash does
On Git Bash in Windows: the style of the path is different from what you might see in Windows description of itself (to see what I mean, run the program "cmd" and type "echo %PATH%".)
3.3 Text versus GUI programs
- Text based terminal programs "stay in the terminal".
- GUI programs can be "launched" onto desktop
3.3.1 On a Linux/Unix system, simply typing a GUI program's name will
launch it on the screen.
3.3.2 On Macintosh:
The open function
$ open file-name-or-URL
$ open -a program-name file-name-or-URL
If you don't include "-a program-name" then Mac uses the default program to open the file-name-or-URL
See: http://brettterpstra.com/2014/08/06/shell-tricks-the-os-x-open-command/
3.3.3 Windows
Git Bash will launch Windows programs that are in the Windows System Path!
2 examples, with and without the special "start" program.
$ notepad whatever1.txt
$ start notepad whatever2.txt
Usually I'd just do this. I believe it is preferable to interact with projects in this way.
$ explorer .
3.4 What about programs that are not on the PATH?
- We can type out their names in full, beginning with "/"
- We can use relative file paths (the "./" trick).
- We add them to the path (either temporarily IN the shell or
permanently in the OS setup).
3.4.1 We are swimming upstream, unfortunately
The trend in Windows and Macintosh has been to NOT put programs in the PATH. Both of them have created an alternative model where programs are installed and they notify the operating system about themselves. These systems have a "desktop" metaphor where users can
- Launch from Menu
- Launch from "open with" feature in file manager
3.5 Need to add some directories to the PATH, probably…
Terminal users find it inconvenient when important programs are not in the path. For convenience, it is necessary to add program folders to the system PATH.
In Windows, when we install Git, Notepad++, Emacs, R, and so forth, we always say YES if the installer offers to put the programs in the path, and if we are not asked, then we do it manually.
4 Programs talk to each other
4.1 The Pipe "|"
Many of the traditional Unix functions are build so that the output of one function can "go into" another one. Sending "standard output" from the first as "standard input" to the follower. Many, not all programs, are designed this way.
4.2 Programs I associate with back end of the pipe
- wc counts lines or words
- sort sorts output alphabetically
- uniq keeps unique items (sort first)
- grep filters (looks for text strings)
These is still quite frequently used by text analysts.
4.3 Lets look at User nelle's files on molecules
cd Users/nelle/molecules
ls -la
total 36 drwxrwxr-x 2 pauljohn pauljohn 4096 Nov 14 13:17 . drwxrwxr-x 8 pauljohn pauljohn 4096 Nov 13 10:49 .. -rw-rw-r-- 1 pauljohn pauljohn 1158 Nov 13 10:49 cubane.pdb -rw-rw-r-- 1 pauljohn pauljohn 622 Nov 13 10:49 ethane.pdb -rw-rw-r-- 1 pauljohn pauljohn 422 Nov 13 10:49 methane.pdb -rw-rw-r-- 1 pauljohn pauljohn 1828 Nov 13 10:49 octane.pdb -rw-rw-r-- 1 pauljohn pauljohn 1226 Nov 13 10:49 pentane.pdb -rw-rw-r-- 1 pauljohn pauljohn 825 Nov 13 10:49 propane.pdb -rw-rw-r-- 1 pauljohn pauljohn 110 Nov 14 18:17 records.txt
These are small files, we might as well look at one:
more cubane.pdb
:::::::::::::: cubane.pdb :::::::::::::: COMPND CUBANE AUTHOR DAVE WOODCOCK 95 12 06 ATOM 1 C 1 0.789 -0.852 0.504 1.00 0.00 ATOM 2 C 1 -0.161 -1.104 -0.624 1.00 0.00 ATOM 3 C 1 -1.262 -0.440 0.160 1.00 0.00 ATOM 4 C 1 -0.289 -0.202 1.284 1.00 0.00 ATOM 5 C 1 1.203 0.513 -0.094 1.00 0.00 ATOM 6 C 1 0.099 1.184 0.694 1.00 0.00 ATOM 7 C 1 -0.885 0.959 -0.460 1.00 0.00 ATOM 8 C 1 0.236 0.283 -1.269 1.00 0.00 ATOM 9 H 1 1.410 -1.631 0.942 1.00 0.00 ATOM 10 H 1 -0.262 -2.112 -1.024 1.00 0.00 ATOM 11 H 1 -2.224 -0.925 0.328 1.00 0.00 ATOM 12 H 1 -0.468 -0.501 2.315 1.00 0.00 ATOM 13 H 1 2.224 0.892 -0.134 1.00 0.00 ATOM 14 H 1 0.240 2.112 1.251 1.00 0.00 ATOM 15 H 1 -1.565 1.730 -0.831 1.00 0.00 ATOM 16 H 1 0.472 0.494 -2.315 1.00 0.00 TER 17 1 END
If the file were bigger, we might just scan the top or the bottom 5 lines (using head and tail)
head -5 cubane.pdb
COMPND CUBANE AUTHOR DAVE WOODCOCK 95 12 06 ATOM 1 C 1 0.789 -0.852 0.504 1.00 0.00 ATOM 2 C 1 -0.161 -1.104 -0.624 1.00 0.00 ATOM 3 C 1 -1.262 -0.440 0.160 1.00 0.00
tail -5 cubane.pdb
ATOM 14 H 1 0.240 2.112 1.251 1.00 0.00 ATOM 15 H 1 -1.565 1.730 -0.831 1.00 0.00 ATOM 16 H 1 0.472 0.494 -2.315 1.00 0.00 TER 17 1 END
4.4 The wc program
How many lines are there in the file cubane.pdb?
wc -l cubane.pdb
20 cubane.pdb
How many lines are there in the pdb files?
wc *.pdb
20 156 1158 cubane.pdb 12 84 622 ethane.pdb 9 57 422 methane.pdb 30 246 1828 octane.pdb 21 165 1226 pentane.pdb 15 111 825 propane.pdb 107 819 6081 total
3 results:
-new lines
-words
-bytecount
Usually I just want the number of rows, can add "-l" flag.
wc -l *.pdb
20 cubane.pdb 12 ethane.pdb 9 methane.pdb 30 octane.pdb 21 pentane.pdb 15 propane.pdb 107 total
The results are out of order, pipe them to the sort function:
wc -l *.pdb | sort
107 total 12 ethane.pdb 15 propane.pdb 20 cubane.pdb 21 pentane.pdb 30 octane.pdb 9 methane.pdb
The results are still out of order, need to think harder (read help page). sort defaults to an alphabetical search, need to do numerical sort:
wc -l *.pdb | sort -n
9 methane.pdb 12 ethane.pdb 15 propane.pdb 20 cubane.pdb 21 pentane.pdb 30 octane.pdb 107 total
4.5 ">" and ">>" for redirecting output
The results (so far) have been printed into the screen. We may need a record, so we write them in a file.
> will ERASE a pre-existing file's content
>> will add output to a pre-existing files, or create a new file.
Try that with the sorted line list:
wc -l *.pdb | sort -n > records.txt
cat records.txt
9 methane.pdb 12 ethane.pdb 15 propane.pdb 20 cubane.pdb 21 pentane.pdb 30 octane.pdb 107 total
4.6 Pipe to more or less, for example
Any time output goes by too fast, put "| more" on the end.
I do that so often I never run more or less as the primary command
I often find myself tacking on the end of the command line with either
- "cat file1 file2 | more"
- "cat file1 file2 | less"
5 grep is for Filtering
grep = GNU regular expression parser
It can be used in 2 ways
- A command you run in the command line
List all lines that match a target string
I use that to find out "in which file is there a certain word?"
Check the AUTHOR line in each pdb file:
grep AUTHOR *.pdb
cubane.pdb:AUTHOR DAVE WOODCOCK 95 12 06 ethane.pdb:AUTHOR DAVE WOODCOCK 95 12 18 methane.pdb:AUTHOR DAVE WOODCOCK 95 12 18 octane.pdb:AUTHOR DAVE WOODCOCK 96 01 05 pentane.pdb:AUTHOR DAVE WOODCOCK 95 12 18 propane.pdb:AUTHOR DAVE WOODCOCK 95 12 18
- Suppose we run some command that causes 100s and 100s of lines to spill out into the terminal. I want to keep only the one that have the part I want.
A contrived example, running "cat *.pdb" will spill all of the files.
Pipe that to grep to just keep the ones that have "COMPND"
cat *.pdb | grep COMPND
COMPND CUBANE COMPND ETHANE COMPND METHANE COMPND OCTANE COMPND PENTANE COMPND PROPANE
5.0.1 What is "regular expression"
It is a fancy language text sub-string matching. Regular expression syntax is used a great deal in more advanced shell and programming exercises. Regular expressions give a comprehensive scheme to isolate parts of a string, to pick and choose among sub-pieces.
I don't want to teach that now, but can give the big picture.
regular expression cheat sheet
- ^ beginning of a string
- $ end of a string
- . any character
- * quantifier meaning "any number of times", so ".*" matches whole string
Suppose we want to keep only the words out of the molecule files if they begin with "ATOM". Here is the reqular expression recipe I would use with grep:
grep "^ATOM" *.pdb
If I run that, it will fill up my terminal with output, so I'll pipe the result to tail so we see just the last 10 lines:
grep "^ATOM.*" *.pdb | tail
propane.pdb:ATOM 2 C 1 -0.011 -0.441 0.333 1.00 0.00 propane.pdb:ATOM 3 C 1 -1.176 0.296 -0.332 1.00 0.00 propane.pdb:ATOM 4 H 1 1.516 0.699 -0.675 1.00 0.00 propane.pdb:ATOM 5 H 1 2.058 -0.099 0.827 1.00 0.00 propane.pdb:ATOM 6 H 1 1.035 1.354 0.913 1.00 0.00 propane.pdb:ATOM 7 H 1 -0.283 -0.691 1.359 1.00 0.00 propane.pdb:ATOM 8 H 1 0.204 -1.354 -0.225 1.00 0.00 propane.pdb:ATOM 9 H 1 -0.914 0.551 -1.359 1.00 0.00 propane.pdb:ATOM 10 H 1 -1.396 1.211 0.219 1.00 0.00 propane.pdb:ATOM 11 H 1 -2.058 -0.345 -0.332 1.00 0.00
grep has many arguments, we might not want the file name with each one, for example
grep -h "^ATOM.*" *.pdb | tail
ATOM 2 C 1 -0.011 -0.441 0.333 1.00 0.00 ATOM 3 C 1 -1.176 0.296 -0.332 1.00 0.00 ATOM 4 H 1 1.516 0.699 -0.675 1.00 0.00 ATOM 5 H 1 2.058 -0.099 0.827 1.00 0.00 ATOM 6 H 1 1.035 1.354 0.913 1.00 0.00 ATOM 7 H 1 -0.283 -0.691 1.359 1.00 0.00 ATOM 8 H 1 0.204 -1.354 -0.225 1.00 0.00 ATOM 9 H 1 -0.914 0.551 -1.359 1.00 0.00 ATOM 10 H 1 -1.396 1.211 0.219 1.00 0.00 ATOM 11 H 1 -2.058 -0.345 -0.332 1.00 0.00
In the Unix system, there are many programs designed for the further manipulation of these outputs. In case you ever wander into a help page that suggest you use programs like "tr", "sed", "perl" or such, you will know (vaguely) what they are talking about.
5.0.2 Get out of jail free card for grep users
In some chores, the power to designate "at the beginning of a string" is not needed.
The flag "-F" allows us to use grep as a text scanner, without worring about regular expressions.
The word ATOM is used, no matter where it is in the line.
grep "^ATOM.*" *.pdb
cubane.pdb:ATOM 1 C 1 0.789 -0.852 0.504 1.00 0.00 cubane.pdb:ATOM 2 C 1 -0.161 -1.104 -0.624 1.00 0.00 cubane.pdb:ATOM 3 C 1 -1.262 -0.440 0.160 1.00 0.00 cubane.pdb:ATOM 4 C 1 -0.289 -0.202 1.284 1.00 0.00 cubane.pdb:ATOM 5 C 1 1.203 0.513 -0.094 1.00 0.00 cubane.pdb:ATOM 6 C 1 0.099 1.184 0.694 1.00 0.00 cubane.pdb:ATOM 7 C 1 -0.885 0.959 -0.460 1.00 0.00 cubane.pdb:ATOM 8 C 1 0.236 0.283 -1.269 1.00 0.00 cubane.pdb:ATOM 9 H 1 1.410 -1.631 0.942 1.00 0.00 cubane.pdb:ATOM 10 H 1 -0.262 -2.112 -1.024 1.00 0.00 cubane.pdb:ATOM 11 H 1 -2.224 -0.925 0.328 1.00 0.00 cubane.pdb:ATOM 12 H 1 -0.468 -0.501 2.315 1.00 0.00 cubane.pdb:ATOM 13 H 1 2.224 0.892 -0.134 1.00 0.00 cubane.pdb:ATOM 14 H 1 0.240 2.112 1.251 1.00 0.00 cubane.pdb:ATOM 15 H 1 -1.565 1.730 -0.831 1.00 0.00 cubane.pdb:ATOM 16 H 1 0.472 0.494 -2.315 1.00 0.00 ethane.pdb:ATOM 1 C 1 -0.752 0.001 -0.141 1.00 0.00 ethane.pdb:ATOM 2 C 1 0.752 -0.001 0.141 1.00 0.00 ethane.pdb:ATOM 3 H 1 -1.158 0.991 0.070 1.00 0.00 ethane.pdb:ATOM 4 H 1 -1.240 -0.737 0.496 1.00 0.00 ethane.pdb:ATOM 5 H 1 -0.924 -0.249 -1.188 1.00 0.00 ethane.pdb:ATOM 6 H 1 1.158 -0.991 -0.070 1.00 0.00 ethane.pdb:ATOM 7 H 1 0.924 0.249 1.188 1.00 0.00 ethane.pdb:ATOM 8 H 1 1.240 0.737 -0.496 1.00 0.00 methane.pdb:ATOM 1 C 1 0.257 -0.363 0.000 1.00 0.00 methane.pdb:ATOM 2 H 1 0.257 0.727 0.000 1.00 0.00 methane.pdb:ATOM 3 H 1 0.771 -0.727 0.890 1.00 0.00 methane.pdb:ATOM 4 H 1 0.771 -0.727 -0.890 1.00 0.00 methane.pdb:ATOM 5 H 1 -0.771 -0.727 0.000 1.00 0.00 octane.pdb:ATOM 1 C 1 -4.397 0.370 -0.255 1.00 0.00 octane.pdb:ATOM 2 C 1 -3.113 -0.447 -0.421 1.00 0.00 octane.pdb:ATOM 3 C 1 -1.896 0.386 -0.007 1.00 0.00 octane.pdb:ATOM 4 C 1 -0.611 -0.426 -0.198 1.00 0.00 octane.pdb:ATOM 5 C 1 0.608 0.405 0.216 1.00 0.00 octane.pdb:ATOM 6 C 1 1.892 -0.400 0.001 1.00 0.00 octane.pdb:ATOM 7 C 1 3.113 0.429 0.414 1.00 0.00 octane.pdb:ATOM 8 C 1 4.397 -0.374 0.199 1.00 0.00 octane.pdb:ATOM 9 H 1 -4.502 0.681 0.785 1.00 0.00 octane.pdb:ATOM 10 H 1 -5.254 -0.243 -0.537 1.00 0.00 octane.pdb:ATOM 11 H 1 -4.357 1.252 -0.895 1.00 0.00 octane.pdb:ATOM 12 H 1 -3.009 -0.741 -1.467 1.00 0.00 octane.pdb:ATOM 13 H 1 -3.172 -1.337 0.206 1.00 0.00 octane.pdb:ATOM 14 H 1 -1.992 0.668 1.044 1.00 0.00 octane.pdb:ATOM 15 H 1 -1.849 1.286 -0.621 1.00 0.00 octane.pdb:ATOM 16 H 1 -0.515 -0.707 -1.248 1.00 0.00 octane.pdb:ATOM 17 H 1 -0.659 -1.326 0.417 1.00 0.00 octane.pdb:ATOM 18 H 1 0.520 0.671 1.270 1.00 0.00 octane.pdb:ATOM 19 H 1 0.645 1.314 -0.386 1.00 0.00 octane.pdb:ATOM 20 H 1 1.979 -0.666 -1.054 1.00 0.00 octane.pdb:ATOM 21 H 1 1.855 -1.309 0.604 1.00 0.00 octane.pdb:ATOM 22 H 1 3.030 0.696 1.467 1.00 0.00 octane.pdb:ATOM 23 H 1 3.155 1.337 -0.188 1.00 0.00 octane.pdb:ATOM 24 H 1 4.493 -0.641 -0.854 1.00 0.00 octane.pdb:ATOM 25 H 1 4.368 -1.282 0.801 1.00 0.00 octane.pdb:ATOM 26 H 1 5.254 0.230 0.498 1.00 0.00 pentane.pdb:ATOM 1 C 1 2.484 -0.389 0.322 1.00 0.00 pentane.pdb:ATOM 2 C 1 1.261 0.350 -0.243 1.00 0.00 pentane.pdb:ATOM 3 C 1 -0.027 -0.348 0.199 1.00 0.00 pentane.pdb:ATOM 4 C 1 -1.249 0.421 -0.326 1.00 0.00 pentane.pdb:ATOM 5 C 1 -2.536 -0.311 0.047 1.00 0.00 pentane.pdb:ATOM 6 H 1 2.471 -1.420 -0.033 1.00 0.00 pentane.pdb:ATOM 7 H 1 2.443 -0.371 1.412 1.00 0.00 pentane.pdb:ATOM 8 H 1 3.393 0.112 -0.016 1.00 0.00 pentane.pdb:ATOM 9 H 1 1.324 0.350 -1.332 1.00 0.00 pentane.pdb:ATOM 10 H 1 1.271 1.378 0.122 1.00 0.00 pentane.pdb:ATOM 11 H 1 -0.074 -0.384 1.288 1.00 0.00 pentane.pdb:ATOM 12 H 1 -0.048 -1.362 -0.205 1.00 0.00 pentane.pdb:ATOM 13 H 1 -1.183 0.500 -1.412 1.00 0.00 pentane.pdb:ATOM 14 H 1 -1.259 1.420 0.112 1.00 0.00 pentane.pdb:ATOM 15 H 1 -2.608 -0.407 1.130 1.00 0.00 pentane.pdb:ATOM 16 H 1 -2.540 -1.303 -0.404 1.00 0.00 pentane.pdb:ATOM 17 H 1 -3.393 0.254 -0.321 1.00 0.00 propane.pdb:ATOM 1 C 1 1.241 0.444 0.349 1.00 0.00 propane.pdb:ATOM 2 C 1 -0.011 -0.441 0.333 1.00 0.00 propane.pdb:ATOM 3 C 1 -1.176 0.296 -0.332 1.00 0.00 propane.pdb:ATOM 4 H 1 1.516 0.699 -0.675 1.00 0.00 propane.pdb:ATOM 5 H 1 2.058 -0.099 0.827 1.00 0.00 propane.pdb:ATOM 6 H 1 1.035 1.354 0.913 1.00 0.00 propane.pdb:ATOM 7 H 1 -0.283 -0.691 1.359 1.00 0.00 propane.pdb:ATOM 8 H 1 0.204 -1.354 -0.225 1.00 0.00 propane.pdb:ATOM 9 H 1 -0.914 0.551 -1.359 1.00 0.00 propane.pdb:ATOM 10 H 1 -1.396 1.211 0.219 1.00 0.00 propane.pdb:ATOM 11 H 1 -2.058 -0.345 -0.332 1.00 0.00
- Pipe to grep
A command that causes profuse output–say a huge list of files–can be filtered by piping the output to grep.
Suppose we start back at the top level of the ouswcfiles directory
ls */*/*
gapminder/data/gapminder_all.csv gapminder/data/gapminder_gdp_africa.csv gapminder/data/gapminder_gdp_americas.csv gapminder/data/gapminder_gdp_asia.csv gapminder/data/gapminder_gdp_europe.csv gapminder/data/gapminder_gdp_oceania.csv inflammation/data/inflammation-01.csv inflammation/data/inflammation-02.csv inflammation/data/inflammation-03.csv inflammation/data/inflammation-04.csv inflammation/data/inflammation-05.csv inflammation/data/inflammation-06.csv inflammation/data/inflammation-07.csv inflammation/data/inflammation-08.csv inflammation/data/inflammation-09.csv inflammation/data/inflammation-10.csv inflammation/data/inflammation-11.csv inflammation/data/inflammation-12.csv inflammation/data/small-01.csv inflammation/data/small-02.csv inflammation/data/small-03.csv inflammation/python/argv-list.py inflammation/python/arith.py inflammation/python/check.py inflammation/python/count-stdin.py inflammation/python/errors_01.py inflammation/python/errors_02.py inflammation/python/gen-inflammation.py inflammation/python/line-count.py inflammation/python/my_ls.py inflammation/python/readings-01.py inflammation/python/readings-02.py inflammation/python/readings-03.py inflammation/python/readings-04.py inflammation/python/readings-05.py inflammation/python/readings-06.py inflammation/python/readings-07.py inflammation/python/readings-08.py inflammation/python/readings-09.py inflammation/python/rectangle.py inflammation/python/sys-version.py Users/nelle/notes.txt Users/nelle/pizza.cfg Users/nelle/solar.pdf Users/nelle/creatures: basilisk.dat unicorn.dat Users/nelle/data: amino-acids.txt animals.txt elements morse.txt pdb planets.txt salmon.txt sunspot.txt Users/nelle/Desktop: Users/nelle/molecules: cubane.pdb ethane.pdb methane.pdb octane.pdb pentane.pdb propane.pdb records.txt Users/nelle/north-pacific-gyre: 2012-07-03 output Users/nelle/writing: data haiku.txt old thesis tools
We don't want to see all of those files
Perhaps I only want to see the ones that have "txt" in their names:
ls */*/* | grep txt
Users/nelle/notes.txt amino-acids.txt animals.txt morse.txt planets.txt salmon.txt sunspot.txt records.txt haiku.txt
ls */*/* | grep txt
Users/nelle/notes.txt amino-acids.txt animals.txt morse.txt planets.txt salmon.txt sunspot.txt records.txt haiku.txt
5.1 Use these skills to check the North Pacific Gyre data
If you are still in the molecules data, this should work to change the working directory to north-pacific-gyre/2012-07-03
cd ../north-pacific-gyre/2012-07-03 ## File check ls
goodiff goostats NENE01729A.txt NENE01729B.txt NENE01736A.txt NENE01751A.txt NENE01751B.txt NENE01812A.txt NENE01843A.txt NENE01843B.txt NENE01971Z.txt NENE01978A.txt NENE01978B.txt NENE02018B.txt NENE02040A.txt NENE02040B.txt NENE02040Z.txt NENE02043A.txt NENE02043B.txt
If you are at the top level of the Git clone, do this instead:
cd Users/nelle/north-pacific-gyre/2012-07-03
- Use wc to check number of lines within each file:
wc -l NENE*.txt
300 NENE01729A.txt 3 NENE01729B.txt 300 NENE01736A.txt 300 NENE01751A.txt 300 NENE01751B.txt 300 NENE01812A.txt 300 NENE01843A.txt 300 NENE01843B.txt 300 NENE01971Z.txt 300 NENE01978A.txt 300 NENE01978B.txt 240 NENE02018B.txt 300 NENE02040A.txt 300 NENE02040B.txt 300 NENE02040Z.txt 300 NENE02043A.txt 300 NENE02043B.txt 4743 total
Notes about problem files
- NENE01729B.txt has only 3 lines. We better double-check the data source
- Somebody in the project told me the ones that end in "Z" are probably wrong. NENE02040Z.txt
It is easy to select all the ones that end with A or B. The shell Wildcard globbing allows hard brackets like this [AB] to mean either "A" or "B"
ls *[AB].txt
NENE01729A.txt NENE01729B.txt NENE01736A.txt NENE01751A.txt NENE01751B.txt NENE01812A.txt NENE01843A.txt NENE01843B.txt NENE01978A.txt NENE01978B.txt NENE02018B.txt NENE02040A.txt NENE02040B.txt NENE02043A.txt NENE02043B.txt