Debian Wheezy multilib transition: caution

Here's a cautionary heads up on the transition from ordinary to multilib Debian.

I run Debian Wheezy amd64. Some 32 bit applications are installed, and somehow I came to a point where ia32libs wanted to update and transition me to a multilib setup. I'm still trying to figure out if that happened because I accidentally had sid enabled, or not, but I decided to go ahead and see what happens. The big block of 32bit support libraries from ia32-libs is replaced by a lot fo i386 packages.

It has been a little interesting to rebuild some local packages, I think most of that will work itself out.

But today I found about a serious problem I want to warn any/everybody about. The multilib packages will replace files from /usr/lib with files that go into /usr/lib/x86_64-linux-gnu or /usr/lib/i386-linux-gnu.

In the packaging, there are commands that one should insert so that the install of the multilib package causes the removal of files from /usr/lib when the new are installed. However, some packages malfunction, and so you are left with the old shared library files. I was having a devil of a time building some packages because the build system was finding libraries that I thought had been removed in /usr/lib.

I used a handy Debian tool "cruft" to survey the situation, and here are some of the "abandoned" library files that were left in /usr/lib:

libanl-2.11.2.so
libanl.so.1
libBrokenLocale-2.11.2.so
libBrokenLocale.so.1
libcidn-2.11.2.so
libcidn.so.1
libcrypt-2.11.2.so
libcrypt.so.1
libgcc_s.so.1
libmemusage.so
libnss_compat-2.11.2.so
libnss_compat.so.2
libnss_dns-2.11.2.so
libnss_dns.so.2
libnss_files-2.11.2.so
libnss_files.so.2
libnss_hesiod-2.11.2.so
libnss_hesiod.so.2
libnss_nis-2.11.2.so
libnss_nisplus-2.11.2.so
libnss_nisplus.so.2
libnss_nis.so.2
libpcprofile.so
libresolv-2.11.2.so
libresolv.so.2
libSegFault.so
libthread_db-1.0.so
libthread_db.so.1

Here's what goes wrong. The linker notices shared files in /lib or /usr/lib, but it can't find shlib dependency information on those in /var/ hierarchy (because that hierarchy now shows they are in /usr/lib/x86_64-linux-gnu.

If you try to compile a package, you will recognize the problem if the error says

error: no dependency information found for /lib/libnsl.so.1

and it is pointing to a file that is in there, but

$ dpkg -S /lib/libnsl.so.1

is not owned by a package, and then you notice that there is a newer version of libnsl.so in /usr/lib/x86_64-linux-gnu.

I'm pretty sure this is right, but I can't rule out the possibility that some other thing that happened in apt-get or synaptic caused this. I also found a big chunk of python numpy stuff was "orphaned" by package updates.

As I mentioned, the cruft package from Debian was a big help in finding the problem. All I did was install the package, then

$ cd /tmp
$ sudo /usr/sbin/cruft -d /lib -r report-lib

$ sudo /usr/sbin/cruft -d /usr/lib -r report-usrlib

that cranks out the reports that show whether files are missing or not valid members of deb packages.
As I mentioned, the cruft package from Debian was a big help in finding the problem. All I did was install the package, then

$ cd /tmp
$ sudo /usr/sbin/cruft -d /lib -r report-lib

$ sudo /usr/sbin/cruft -d /usr/lib -r report-usrlib

that cranks out the reports that show whether files are missing or not valid members of deb packages.

Posted in Linux | Tagged | Comments Off on Debian Wheezy multilib transition: caution

R package profiling with Google

I want to speed up R packages written with the Rcpp and RcppArmadillo packages.

gprof is not an option because R is not compiled with -pg, and just re-compiling the shared libraries of the packages is insufficient.

valgrind can generate some output files, but it does not generate any results that are remotely informative about the shared libraries accessed by R.

sprof was suggested by some, but I can't make it work on Debian Linux. I can get to the point where it produces an output file, but the dynamic linker will not analyze the data that was created. It has been that way since 2009, apparently I fought with it even back then when I posted about it in one of the R lists.

On the other hand, I do get relatively helpful information from the Google's C++ profiler, which was suggested by Dirk Eddelbuettel.

Here are web pages about it

http://code.google.com/p/gperftools/wiki/GooglePerformanceTools

http://code.google.com/p/gperftools/?redir=1

I found these instructions the most helpful:

http://gperftools.googlecode.com/svn/trunk/doc/cpuprofile.html

I think these are older, still OK

http://goog-perftools.sourceforge.net/doc/cpu_profiler.html

The Debian Linux OS has packages for it now (2013-01-06) that are very up to date.

Observe:

$ dpkg -l | grep perftools
ii  google-perftools      2.0-2       all          command line utilities to analyze the performance of C++ programs
ii  libgoogle-perftools-dev  2.0-2 amd64        libraries for CPU and heap analysis
ii  libgoogle-perftools4  2.0-2    amd64     libraries for CPU and heap analysis, plus an efficient thread-caching malloc

I eventually found that I could make this profiler cooperate with the package shared library in all 3 of the ways that are suggested in the documentation. I'll include the HOWTO notes at the end. I think, before you bother to try to do this, you need to know what you get from it.

When the program is run, the command line is like so:

$ CPUPROFILE="myprof.log" R -f my-program.R

When that's finished, the current working directory has a file "myprof.log". We can't read that without the help of special tools.

There are 2 kinds of output that are the most helpful. First, there is a text output that has 5 columns, as describedon the Google webpage about the program. This output shows that the program IS calling the Armadillo matrix routines, it indicated that quite a bit of the program's runtime is spend in arma::subview_elem2. The weird part about this output is that it does not show any of the functions called in the shared library that I'm trying to profile.

In the output, the right column is the function name. Most are unintelligible.

$ google-pprof --text /usr/lib/R/bin/exec/R myprof.log | more
Using local file /usr/lib/R/bin/exec/R.
Using local file myprof.log.
Removing _L_unlock_15 from all stack traces.
Total: 863 samples
      62   7.2%   7.2%       65   7.5% _int_malloc
      58   6.7%  13.9%       59   6.8% ATL_dJIK0x0x0NN0x0x0_aX_bX
      35   4.1%  18.0%       78   9.0% arma::subview_elem2::extract
      30   3.5%  21.4%      142  16.5% arma::subview_elem2::inplace_op
      26   3.0%  24.4%       26   3.0% ATL_dJIK0x0x48TN48x48x0_a1_bX
      24   2.8%  27.2%       24   2.8% ATL_dJIK0x0x0TN0x0x0_aX_bX
      23   2.7%  29.9%       23   2.7% ATL_dreftrmvUNN
      22   2.5%  32.4%       31   3.6% *__GI___libc_free
      20   2.3%  34.8%       79   9.2% *__GI___libc_malloc
      17   2.0%  36.7%       31   3.6% arma_check (inline)
      16   1.9%  38.6%       17   2.0% ATL_ddot_xp0yp0aXbX
      16   1.9%  40.4%       55   6.4% ATL_dgemv
      16   1.9%  42.3%       24   2.8% arma::Mat::at (inline)
      14   1.6%  43.9%       14   1.6% ATL_dgezero
      14   1.6%  45.5%       14   1.6% _int_free
      12   1.4%  46.9%       12   1.4% dgemv_
      11   1.3%  48.2%       11   1.3% ATL_dgemvN_a1_x1_b1_y1
      11   1.3%  49.5%       11   1.3% __memcpy_ssse3
      10   1.2%  50.6%       10   1.2% ATL_1dsplit
      10   1.2%  51.8%       15   1.7% helper (inline)
       9   1.0%  52.8%        9   1.0% ATL_dNCmmJIK
       8   0.9%  53.8%        8   0.9% ATL_dtrmvUN
       8   0.9%  54.7%       10   1.2% arma::eglue_core::apply
       7   0.8%  55.5%        7   0.8% 00007f8335cb77b7
       6   0.7%  56.2%        6   0.7% *__GI___strcoll_l
       6   0.7%  56.9%        6   0.7% ATL_dJIK0x0x24NN0x0x0_aX_bX
       6   0.7%  57.6%        9   1.0% ATL_ddot
       6   0.7%  58.3%        6   0.7% ATL_dptgemm_nt
       6   0.7%  59.0%        6   0.7% ATL_dscal_xp0yp0aXbX
       6   0.7%  59.7%        6   0.7% arma::arrayops::copy_big
       6   0.7%  60.4%        6   0.7% dtrmv_
       5   0.6%  61.0%        5   0.6% ATL_daxpy
       5   0.6%  61.5%      128  14.8% apply (inline)
       5   0.6%  62.1%        5   0.6% atl_f77wrap_dgemv_
       5   0.6%  62.7%       18   2.1% copy (inline)
       5   0.6%  63.3%        5   0.6% crc32_combine64
       5   0.6%  63.8%        5   0.6% lsame_
       4   0.5%  64.3%        4   0.5% 00007f8335cb892a
       4   0.5%  64.8%        4   0.5% ATL_dJIK0x0x24TN0x0x0_aX_bX
       4   0.5%  65.2%        4   0.5% ATL_dNCmmIJK
       4   0.5%  65.7%        4   0.5% ATL_daxpy_xp0yp0aXbX
       4   0.5%  66.2%        4   0.5% ATL_dgecopy
       4   0.5%  66.6%        4   0.5% ATL_dgemvN_a1_x1_bX_y1
       4   0.5%  67.1%        4   0.5% ATL_dptgemm
       4   0.5%  67.6%        4   0.5% ATL_dtrmv
       4   0.5%  68.0%        4   0.5% __parse_one_specmb
       4   0.5%  68.5%        9   1.0% arma::op_strans::apply_noalias
       4   0.5%  68.9%        4   0.5% malloc_consolidate
       4   0.5%  69.4%       52   6.0% operator new
       4   0.5%  69.9%        4   0.5% pthread_join
       3   0.3%  70.2%        3   0.3% 00007f8335cb64fd
       3   0.3%  70.6%        3   0.3% 00007f8335cb77bb
       3   0.3%  70.9%        3   0.3% 00007f8335cbab87
       3   0.3%  71.3%        3   0.3% ATL_dcopy_xp0yp0aXbX
       3   0.3%  71.6%        7   0.8% ATL_join_tree
       3   0.3%  72.0%        8   0.9% _IO_vfprintf_internal
       3   0.3%  72.3%       53   6.1% acquire (inline)
       3   0.3%  72.7%       42   4.9% arma::Mat::init_warm
       3   0.3%  73.0%        5   0.6% arma::gemm::apply_blas_type
       3   0.3%  73.3%        5   0.6% arma::op_symmat::apply
       3   0.3%  73.7%       15   1.7% arma::subview::extract
       3   0.3%  74.0%        5   0.6% arma::subview::operator=
       3   0.3%  74.4%        3   0.3% arma::unwrap_check_mixed::~unwrap_check_mixed
       2   0.2%  74.6%        2   0.2% 00007f833095a1b4
       2   0.2%  74.9%        2   0.2% 00007f83309865d9
       2   0.2%  75.1%        2   0.2% 00007f8330987005
       2   0.2%  75.3%        2   0.2% 00007f8335c49fcb
       2   0.2%  75.6%        2   0.2% 00007f8335c4adf1
       2   0.2%  75.8%        2   0.2% 00007f8335c59e20
       2   0.2%  76.0%        2   0.2% 00007f8335c5f390
       2   0.2%  76.2%        2   0.2% 00007f8335cb7fd6
       2   0.2%  76.5%        2   0.2% 00007f8335cb8015
       2   0.2%  76.7%        2   0.2% 00007f8335cb8397
       2   0.2%  76.9%        2   0.2% 00007f8335cb891d
       2   0.2%  77.2%        2   0.2% 00007f8335cb8a49
       2   0.2%  77.4%        2   0.2% 00007f8335cb930d
       2   0.2%  77.6%        2   0.2% 00007f8335cb9a3c
       2   0.2%  77.9%        2   0.2% ATL_dJIK0x0x0TN0x0x0_a1_bX
       2   0.2%  78.1%        2   0.2% ATL_dcpsc
       2   0.2%  78.3%        2   0.2% ATL_ddot_xp1yp1aXbX
       2   0.2%  78.6%        4   0.5% ATL_thread_exit
       2   0.2%  78.8%      151  17.5% Mat (inline)
       2   0.2%  79.0%        3   0.3% __clone
       2   0.2%  79.3%        2   0.2% __getdents64
       2   0.2%  79.5%        2   0.2% __memcmp_sse4_1
       2   0.2%  79.7%        3   0.3% __pthread_attr_init_2_1
       2   0.2%  80.0%        2   0.2% __pthread_attr_setscope
       2   0.2%  80.2%        2   0.2% __strcpy_ssse3
       2   0.2%  80.4%        3   0.3% _gfortran_compare_string
       2   0.2%  80.6%        2   0.2% _gfortran_concat_string
       2   0.2%  80.9%        2   0.2% _init@efd8

We can force it to show functions going into and out of a target function. In this case, here's the "emcore" function, which I know is important and is called a lot in this example.

$ google-pprof --text --focus=emcore /usr/lib/R/bin/exec/R myprof.log | more
Using local file /usr/lib/R/bin/exec/R.
Using local file myprof.log.
Removing _L_unlock_15 from all stack traces.
Total: 863 samples
      34  11.7%  11.7%       37  12.8% _int_malloc
      34  11.7%  23.4%       77  26.6% arma::subview_elem2::extract
      30  10.3%  33.8%      140  48.3% arma::subview_elem2::inplace_op
      17   5.9%  39.7%       31  10.7% arma_check (inline)
      16   5.5%  45.2%       24   8.3% arma::Mat::at (inline)
      11   3.8%  49.0%       48  16.6% *__GI___libc_malloc
      10   3.4%  52.4%       15   5.2% helper (inline)
       9   3.1%  55.5%       13   4.5% *__GI___libc_free
       8   2.8%  58.3%        8   2.8% __memcpy_ssse3
       8   2.8%  61.0%       10   3.4% arma::eglue_core::apply
       6   2.1%  63.1%        6   2.1% arma::arrayops::copy_big
       5   1.7%  64.8%        5   1.7% _int_free
       5   1.7%  66.6%      126  43.4% apply (inline)
       5   1.7%  68.3%       18   6.2% copy (inline)
       3   1.0%  69.3%       42  14.5% arma::Mat::init_warm
       3   1.0%  70.3%        5   1.7% arma::gemm::apply_blas_type
       3   1.0%  71.4%        8   2.8% arma::op_strans::apply_noalias
       3   1.0%  72.4%        5   1.7% arma::op_symmat::apply
       3   1.0%  73.4%        3   1.0% arma::unwrap_check_mixed::~unwrap_check_mixed
       3   1.0%  74.5%        3   1.0% malloc_consolidate
       2   0.7%  75.2%        2   0.7% 00007f833095a1b4
       2   0.7%  75.9%        2   0.7% 00007f83309865d9
       2   0.7%  76.6%        2   0.7% 00007f8330987005
       2   0.7%  77.2%      148  51.0% Mat (inline)
       2   0.7%  77.9%        3   1.0% _gfortran_compare_string
       2   0.7%  78.6%        2   0.7% _gfortran_concat_string
       2   0.7%  79.3%        2   0.7% _init@efd8
       2   0.7%  80.0%       52  17.9% acquire (inline)
       2   0.7%  80.7%        3   1.0% arma::Mat::colptr (inline)
       2   0.7%  81.4%        4   1.4% arma::Proxy::at (inline)
       2   0.7%  82.1%        2   0.7% arma::eGlue::at (inline)
       2   0.7%  82.8%       12   4.1% arma::glue_times::apply
       2   0.7%  83.4%       14   4.8% arma::subview::extract
       2   0.7%  84.1%        4   1.4% arma::subview::operator=
       2   0.7%  84.8%        2   0.7% arma::subview::operator[] (inline)
       2   0.7%  85.5%        2   0.7% inplace_plus (inline)
       2   0.7%  86.2%        2   0.7% inplace_set (inline)
       2   0.7%  86.9%       50  17.2% operator new
       1   0.3%  87.2%        1   0.3% 00007f83308aba8d
       1   0.3%  87.6%        1   0.3% 00007f833095a182
       1   0.3%  87.9%        1   0.3% 00007f833095a3a4
       1   0.3%  88.3%        1   0.3% 00007f833095a7f5
       1   0.3%  88.6%        1   0.3% 00007f833098655e
       1   0.3%  89.0%        1   0.3% 00007f83309865eb
       1   0.3%  89.3%        1   0.3% 00007f8330986fe0
       1   0.3%  89.7%        1   0.3% 00007f8330a56362
       1   0.3%  90.0%        1   0.3% 00007f8330a563e6
       1   0.3%  90.3%        1   0.3% 00007f8330a56c08
       1   0.3%  90.7%        1   0.3% 00007f8330a56c83
       1   0.3%  91.0%        1   0.3% 00007f8330a6065b
       1   0.3%  91.4%        1   0.3% 00007f8330a6067d
       1   0.3%  91.7%        1   0.3% 00007f8330a6096a
       1   0.3%  92.1%        1   0.3% 00007f8330a60b30
       1   0.3%  92.4%        1   0.3% Op (inline)
       1   0.3%  92.8%        1   0.3% _IO_vfprintf_internal
       1   0.3%  93.1%        1   0.3% __memcmp_sse4_1
       1   0.3%  93.4%        1   0.3% accumulate (inline)
       1   0.3%  93.8%       50  17.2% apply
       1   0.3%  94.1%        6   2.1% arma::Mat::Mat
       1   0.3%  94.5%       13   4.5% arma::Mat::init_cold
       1   0.3%  94.8%        1   0.3% arma::Mat::is_vec (inline)
       1   0.3%  95.2%        1   0.3% arma::Proxy::get_n_elem (inline)
       1   0.3%  95.5%       32  11.0% arma::auxlib::inv_sympd
       1   0.3%  95.9%       25   8.6% arma::op_find::apply
       1   0.3%  96.2%        2   0.7% arma::op_find::helper
       1   0.3%  96.6%        8   2.8% arma::op_strans::apply
       1   0.3%  96.9%       19   6.6% arma::op_strans::apply_proxy
       1   0.3%  97.2%        4   1.4% arma::op_sum::apply
       1   0.3%  97.6%        1   0.3% arma::subview::colptr (inline)
       1   0.3%  97.9%        1   0.3% arma::unwrap_check_mixed::unwrap_check_mixed
       1   0.3%  98.3%        1   0.3% dgemm_
       1   0.3%  98.6%      290 100.0% emcore
       1   0.3%  99.0%        1   0.3% gemm (inline)
       1   0.3%  99.3%        1   0.3% primitive_range_wrap__impl__nocast (inline)
       1   0.3%  99.7%        1   0.3% std::num_put::_M_insert_int
       1   0.3% 100.0%       12   4.1% ~Mat (inline)
       0   0.0% 100.0%        3   1.0% 000000000000001b
       0   0.0% 100.0%        1   0.3% 000000000000006b
       0   0.0% 100.0%      290 100.0% 000000000040086a
       0   0.0% 100.0%      290 100.0% 000000000040089c
       0   0.0% 100.0%        1   0.3% 00007f83308aba8c
       0   0.0% 100.0%        1   0.3% 00007f833095a181
       0   0.0% 100.0%        2   0.7% 00007f833095a1b3
       0   0.0% 100.0%        1   0.3% 00007f833095a3a3
       0   0.0% 100.0%        2   0.7% 00007f833095a4ab
       0   0.0% 100.0%        1   0.3% 00007f833095a7f4
       0   0.0% 100.0%        3   1.0% 00007f833095a80d
       0   0.0% 100.0%        1   0.3% 00007f833098655d
       0   0.0% 100.0%        2   0.7% 00007f83309865d8
       0   0.0% 100.0%        1   0.3% 00007f83309865ea
       0   0.0% 100.0%        1   0.3% 00007f8330986611

Frankly, that does not help me too much to understand what is going on. I really wish I could use gprof, that has the best profiler output I've ever seen, but I can't get it, so I soldier on.

Second, the same information can be converted into a graphviz image, showing the functions that call each other. This is not perfect, but pretty good. This result is presented in postscript, but I converted to PDF with ps2pdf. I've uploade a copy here:

https://pj.freefaculty.org/blog-attachments/pprof5994.0-LD_PRELOAD.pdf

That's a really wide image, you have to scroll it about. The boxes at the top are functions, but we don't know their names. Well, we can see their machine names like "00007f8335c78029", but that's useless. I've installed the debugging symbols that go with the R packages, but this program does not pick them up automatically. That surprised me.

But as you track down into the graph, you eventually come to functions that are in the shared library, such as "emcore". There's quite a bit of time being spent in the function called "sweep". The percentages in the boxes are the percent of total program time and percent of time within the "local" sequence of calls. If we add the command line "--focus=emcore" then the percentages are re-calculated in an interesting way. emcore is then assigned 100%, and functions leading into it and out of it are shown, with their percentages of time. That is interesting, maybe.

Now, HOW CAN YOU DO THIS FOR YOURSELF?

1. The easy way. May work, probably will.

Do not recompile anything, do not change any R packages. Just follow two steps. First, export to the environment the variable LD_PRELOAD that points at the shared library provided by the Google perftools package. My shell is always BASH, or something like it. I run

$ export LD_PRELOAD=/usr/lib/libprofiler.so

Then run the target program, but PREPEND to it an enviroment variable for the output file

$ CPUPROFILE="myprof.log" R -f amelia-faster-1.7.5.2.R

While that is running, the R output whirs by on the screen. But you know it was profiling because when it is done, you see this:

PROFILE: interrupts/evictions/bytes = 870/96/208320

Look in the working directory, and the output file "myprof.log" is sitting there. Then analyze that with the text or graph tools.

Recall the problem that, in the output, many functions do not have human readable names. The have names like 02324252543. The only symbols we can read are in the target package, which is not great. I thought it was missing the R function names because I did not have the R dbg package installed. But I installed it, same result.

Then I tried fiddling with the command line. I thought I would get better symbols by telling it to look in the actual R executable. /usr/bin/R is not actually a binary executable, it is a script that sets some environment variables. I think /usr/lib/R/bin/exec/R might be more helpful. Nope. I tried several variants of this, but did not find a way that produced better output than the others. So I define R_HOME and try the direct approach:

$ R_HOME = /usr/lib/R
$ CPUPROFILE="myprof.log" /usr/lib/R/bin/exec/R -f amelia-faster-1.7.5.2.R

No difference. No legible names on functions in the R libraries themselves.

2. Recompile the R packages for which profiling is desired, after adding a Makefile variable that changes the linker.

In the example I'm profiling, like all R packages with C or C++ code, there is a folder src and in there, Makefile that can be edited. All we need to do is insert -lprofiler in the linker statement. Change something like this:

PKG_LIBS = `$(R_HOME)/bin/Rscript -e "Rcpp:::LdFlags()"` $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS)

to

PKG_LIBS = `$(R_HOME)/bin/Rscript -e "Rcpp:::LdFlags()"` $(LAPACK_LIBS) -lprofiler $(BLAS_LIBS) $(FLIBS)

Then make a new tarball of the package, and then rebuild the package.

$ tar czvf myRpackage.tar.gz my-package-dir
$ R CMD INSTALL myRpackage.tar.gz

Of course, on Linux that's a standard process, if you are stuck in Windows, why the heck are you reading these notes anyway? Honestly, I really doubt this is relevant to Windows at all.

After that, start a NEW terminal session with a "clean" environment, with no LD_PRELOAD. I did not see this warning in the documenation, but the -lprofiler linker option is doing the same thing as the LD_PRELOAD statement, BUT, and this is a big BUT, neither works right if you are using the other. So if you do the -lprofiler option, don't set the LD_PRELOAD environment variable.

The Google docs say that the linker option -lprofiler has no effect unless you add the CPUPROFILE output element in the command line. I believe them, and I'm considering re-compiling R with that flag set to find out if the profiling output is more readable.

3. Recompile the C or C++ code in the package for which you want the profile. In some ways, this was the most fun. Here's the idea.

In the cpp file, at the top add the include line for the Google profiler, which is something like

#include <gperftools /profiler.h >

I did this with C++, but was assuming that C would be OK too. But you would have to test and tell me.

Here's the part I thought was neat. We can turn on the profiler before a certain function is called, and be careful to include the name of the output file in

ProfilerStart("/full/path/to/profile.output.file.log") 
//your functions here
ProfilerStop()

Presumably, the command you are running that is causing the big slowdown is sandwiched between these.

Make sure the link statements in the Makefile still have -lprofiler and then retar and rebuild the package.

After taking those steps, DO NOT include the output log file in the command line. Just run like usual.

$ R -f my-prog.R

Incidentally, I tried to insert only "ProfilerStart()" in the source code, but that won't compile, and then I simply made a blind guess that it was asking for a file name as an argument. And that worked.

Posted in R | Tagged , | Comments Off on R package profiling with Google

fighting with sprof

$ export LD_PROFILE_OUTPUT=/tmp/oout
$ export LD_PROFILE=`pwd`/Amelia.so

$ mkdir -p /tmp/oout//home/pauljohn/R/x86_64-pc-linux-gnu-library/2.15/Amelia/libs/

$ env | grep LD
OLDPWD=/home/pauljohn/R/x86_64-pc-linux-gnu-library/2.15/Amelia
LD_LIBRARY_PATH=/home/pauljohn/R/x86_64-pc-linux-gnu-library/2.15/Amelia/libs
LD_PROFILE=/home/pauljohn/R/x86_64-pc-linux-gnu-library/2.15/Amelia/libs/Amelia.so
LD_PROFILE_OUTPUT=/tmp/oout

After that

$ R -f testfile.R

If there is trouble, it looks like this:

> library(Amelia)
Loading required package: foreign
Loading required package: Rcpp
Loading required package: RcppArmadillo
/tmp/oout//home/pauljohn/R/x86_64-pc-linux-gnu-library/2.15/Amelia/libs/Amelia.so.profile: cannot open file: Error 21
##
## Amelia II: Multiple Imputation

Make sure "Amelia.so.profile" does not already exist in that directory.

After that, run again, Amelia.so.profile is created, but on Debian it all ends in blotto

$ sprof ~/R/x86_64-pc-linux-gnu-library/2.15/Amelia/libs/Amelia.so /tmp/oout/home/pauljohn/R/x86_64-pc-linux-gnu-library/2.15/Amelia/libs/Amelia.so.profile
Inconsistency detected by ld.so: dl-open.c: 690: _dl_open: Assertion `_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT' failed!

That's supposed to be a flaw in glibc that has existed for a long time. Crum.

People say 'switch to oprofile' which is not available for Debian yet in newest version. Have compiled 0.9.8 and am testing.

============

Humphf. Apparently, I was here at this same point in 2009. The email list r-sig-debian in June 30, 2009, has several questions by me about the same problem. And an effort to use the Google profile tool instead.

https://stat.ethz.ch/pipermail/r-sig-debian/2009-June/000795.html

There's a good chance I will learn a different mistake, write it down here, and completely forget about it. gloomy, that.
There

Posted in R | Tagged , | Comments Off on fighting with sprof

Faster R: Things to not forget

We ask all the time, how to make R go faster, and there are a few basic tips and clear working examples, but except for that, it is confusing. We just have to test. It is not unique to R. In C and C++ and Java programming.

A post in R-devel email list made me think I should try harder to remember good tips I have found.

1. Basics. Vectorize. This is a lead pipe cinch. Not Debatable.

No use of [] to access elements.

Problem is not for loops per say, but rather the frequent use of []. for loops got a bad rap. There's a comment about that in Chambers, Software for Data Analysis

2. SEEK ALGORITHMIC IMPROVEMENTS! This is a big lesson I learned in agent-based modeling.

Poor memory usage and repeated growing of objects is a disaster. Repeated use of cbind or rbind inside loops is a killer. We all have done it as novices, but must stop! There's full worked example I did called stackListItems-01.R. I beat the hell out of that one, I'd say.

3. See Simon Wood, Generalized Additive Models: An introduction with R. This is now my favorite book ever! p. 331 Appendix A "A1 Basic computational efficiency." There is a ridiculous speedup from grouping matrix multiplications.

A and B are n x n matrices. y is a vector.

It is HORRIBLE AWFUL SLOW to do this:

A %*% B %*% y

Much faster to do

A %*% (B %*% y)

The second way B %*% y creates an n x 1 vector, and multiplying A %*% (B%*%y) is HUGELY accelerated. Note multiplying an n x n matrix A times and n x n matrix B requires n * n-squared calculations. Avoid it!

I need a bigger list of "definitely stupid things go do" with matrices.

4. There are some R functions that are optimized for matrix calculations. rowSum. crossprod

More importantly,

1. Want X'X:  don't do t(X) %*% X, do crossprod(X)
2. don't do t(X) %*% y, do crossprod(X,y)
3. don't do X %*% t(y, do tcrossprod(X,y)
4. don't do crossprod(v, t(m)), do tcrossprod(v, m)

One explanation I just found in a vignette for RcppEigen. This is the first clear explanation I've found for the speedup we might obtain:

"As shown in the last example, the R function crossprod calculates the product of the transpose
of its first argument with its second argument. The single argument form, crossprod(X),
evaluates X'X. One could, of course, calculate this product as

t(X) %*% X

but crossprod(X) is roughly twice as fast because the result is known to be symmetric and
only one triangle needs to be calculated. The function tcrossprod evaluates crossprod(t(X))
without actually forming the transpose." (Douglas Bates and Dirk Eddelbuettel, "Fast and Elegant Numerical Linear Algebra Using the RcppEiden Package, p.6 (supplied with RcppEigen version 0.3.1.2)).

There's a GREAT overview of several competing approaches summarized here:

Bates, Douglas, (June 2004) "Least Squares Calculations in R: Timing Different Approaches", Rnews, 4(1): 17
http://cran.r-project.org/doc/Rnews/Rnews_2004-1.pdf

Note, after that, Bates went on to adopt the Eigen library for fast matrix algebra, so don't take the conclusion in the 2004 article as a final answer. What you are supposed to marvel at is that it is possible to make calculations go WAY FASTER by being smarter about algorithms and implementation.

5. Try cpmfun() from the compiler function. Try to get an intuition when it really helps. Perhaps it helps best on functions that are badly written (using vector access [] a lot). But may help others.

6. Don't forget that accessing and copying memory structures is a slowdown.

A. R has copy on change semantics, knowledge of it may help.

B. Pre-allocating vectors and matrices. May help.

C. I suspect, but have no proof, that creating and accessing data structures "anonymously" may help.

Idea: if one does

X < - someFunction(...)
Y <- otherFunction(X, ..)
Z <- anotherFunction(Y)

we see slowdown because possibly because we are naming and storing X and Y. what if somebody would do a comprehensive analysis of this construct for me. Is it faster to not name X, Y

Z < - anotherFunction(otherFunction(someFunction(...))

When I worked in Java, those were called anonymous functions

7. Especially nice reviews to refer people to.

Chapter length treatment in Norm Matloff, Art of R Programming. Focuses on vectorization and similar. Good examples. Also demonstrates cmpfun().

Slideshow: Ross Ihaka, Duncan Temple Lang, Brendan McArdle "Writing Efficient Programs in R (and Beyond)"
http://www.stat.auckland.ac.nz/~ihaka/downloads/Taupo.pdf

Very Good Slides Survey:
Duncan Temple Lang, "Efficiency in R: Simple rules, Garbage Collection, Profiling & algorithms" http://www.ms.uky.edu/~mai/sta705/Profiling.pdf. Obviously, not the original source, somebody reposted DTL's notes. But still good:)

This is a complete worked example:
John Nash, http://rwiki.sciviews.org/doku.php?id=tips:rqcasestudy, but the eventual link to the document is: http://macnash.telfer.uottawa.ca/~nashjc/RQtimes.pdf

Nice notes, rather like this list I'm compiling now (wish I had found it first)
Amy F. Szczepanski, "R with High Performance Computing:Parallel processing and large memory"
http://files.meetup.com/1781511/HighPerformanceComputingR-Szczepanski.pdf

Things I need to do.

1. Get better at dissecting the Rprof output. It is now EASY to get a profile,

Rprof("someprofile.txt")
##run R code here
Rprof(NULL)

summaryRprof("someproflile.txt")

HOWEVER, to me that output is not nearly as informative as a C profile, which shows branching.
In R now, you see how much time is spent in each function, but not HOW MANY TIMES each is called. Thus it is obscure whether there is a speedup to be found. See if there is a full detailed branching profiler for R.

2. Use the foreign function interface. Decide if I'd rather write in C, or CPlusPlus, of Fortran.

Test Rcpp and examples from Dirk Eddelbuettel. See his post in R-devel December 8, 2012. It discusses a clear working example.

Doug Bates's announcement of RcppEigen to the Eigen email list, Jun 28, 2011.
http://listengine.tuxfamily.org/lists.tuxfamily.org/eigen/2011/06/msg00075.html

I worry students and I will "loose contact" with what we can understand by relying on simplifying libraries and sleek packaging. I'm in favor of sleek, but when something goes wrong, we feel helpless, unable to debug. Here's some grounding!

Søren Højsgaard (2012-09-08) "Compiling Rcpp, RcppArmadillo and RcppEigen code using SHLIB
(Old habbits die hard)"
http://people.math.aau.dk/~sorenh/misc/Rdocs/Rcpp/RcppSHLIB.pdf

Barr, Stephen J. "Blog: Using RInside With RcppEigen"
http://steve.planetbarr.com/o/blog/2012/08/08/using-rinside-with-rcppeigen/

Posted in Uncategorized | Tagged , , | Comments Off on Faster R: Things to not forget

tar and ssh together to backup onto remote machine

On my laptop, I have working installs of some programs and I want to back them up as tar.gz files on a server that I can access via ssh.

The disk on the server is formatted in windows NTFS. It is that way because somebody wanted to be compatible with MS Windows, and so it is not a good idea to simply rsync the files onto that disk. I loose user permissions and ownership.  So it is better to make a big tarball and copy that over. The problem is that Program.tar.gz is too huge to fit on the laptop.

For a small program, what I'd do is make a tarball on my machine and copy, like so

First, I have to become root because some files in the program are root-read-only.

$ sudo su

# cd /usr/local

# tar ~/Program-20120928.tar.gz Program

# cd

# rsync -e ssh Program-20120928.tar.gz  myusername@mybigserver.quant.ku.edu:/home/LinuxProgAddons

That failed because my computer does not have enough disk space to hold a tarball so huge as that.

I need to take a smarter route. The idea is to run the tar program and pipe the results to the remote server, thereby avoiding the need to save the giant tarball on my laptop.

First, I tested this without compression, just uses "-" (the standard output) for the target of the tar command and then a pipe with ssh to write the result to the server

# tar cf - Program | ssh myuseraccount@mybigserver.quant.ku.edu  "cat > Program.tar"

That *should* drop an uncompressed tarball onto my user home account on mybigserver.

After that, I have to log into mybigserver and shrink it down.

$ gzip Program.tar

I wondered if I could get compression finished in one shot.

# tar czf  -  Program | ssh pauljohn32@crmda-100.quant.ku.edu  "cat > Program-20120928.tar.gz"

That seems to work, but the tarballs that were created are not exactly identical. Here are the file sizes:

-rw-rw-r-- 1 pauljohn32 pauljohn32 2233615001 Sep 28 13:14 Program-20120928.tar.gz
-rw-rw-r-- 1 pauljohn32 pauljohn32 2233615013 Sep 28 13:08 Program.tar.gz

But the difference between them is very miniscule, and so I'd probably gamble that the contents of the archives are the same.

I've tested this with 3 programs and restored them. It appears to be a workable metaphor.

Posted in Uncategorized | Comments Off on tar and ssh together to backup onto remote machine

RedHat 6.3 updates

We have an external disk with RedHat 6.3 provided by tech support. We have to work from that base because it has the user ID system configured for the KU HOME domain.  Otherwise, I would dump that and just use Centos, which i also have to maintain separately for myself and other users who don't have the license rights to run RedHat on personal hardware.

 

Here's my journal of installs and changes since I've received this system. Hopefully, it will help other people know what is needed to turn the base in install into a fully functioning system.

 

Paul Johnson
2012-09-13

Starting from workstation RedHat EL 6 provided by Jason Olenberger

1. yum install
emacs

2. configure EPEL repository

A. rhn optional chanels config hassle began: solved below in step 8

B. Run rpm to install epel from fedoraproject.org/wiki/EPEL

3. create pauljohn32 local account, put in wheel group.

4. Configure /etc/sudoers to enable the wheel group

- tried install r-devel, but failed because package texinfo-tex not available.
texinfo-tex is in the rhn optional repo, which is not available yet (see 2A).

5. install yumex (from EPEL)

6. Install lyx (in yumex)

-- hm.. R install still fails because of texinfo-tex

7. Install emacs-auctex, texinfo, intltool, automake, libtool, Terminal,
TexMacs

###################day 2################################

8. Victory! found where to get optional rhn repo that has other
required elements:

[root@CRMDA-Linux1 Desktop]# rhn-channel --list
rhel-x86_64-workstation-6
[root@CRMDA-Linux1 Desktop]# rhn-channel -a -c
rhel-x86_64-workstation-optional-6
Username: pauljohn
Password:

[root@CRMDA-Linux1 Desktop]# rhn-channel --list
rhel-x86_64-workstation-6
rhel-x86_64-workstation-optional-6

9. yumex: install R R-devel and all R packages available

10. emacs-ess from https://pj.freefaculty.org/Centos/6/i386/RPMS

11. install my init.el in the system startup config
so that Emacs and ESS work as described in my notes on "Emacs
has no learning curve"

12. yumex install
subversion,
subversion-gnome,
cvs,
tcl
tcl-devel,
tk
tk-devel,
tk-table,
thunderbird
thunderbird-lightning
tidy
xfig
transfig
fig2ps
inkscape
inkscape-docs
inkscape-view

13. yumex install

rpm-build
gcc-objc
gcc-objc++
libobjc
gperf

14. yumex install
lyx
TeXmacs
gimp
gimp-data-extras
gimp-help

15 yumex install
automake
automake16
autoconf
autoconf213
qt-devel
blt
blt-devel
gtk2-devel
glib-devel

16. yumex install groups:

XFCE
Eclipse

17. yumex install

pdfjam
firmware-addon-dell
java-1.6.0-openjdk-plugin

18. Flash
Download "yum" option, from:
http://get.adobe.com/flashplayer
Install that:
su -c 'rpm -ivh adobe-release-x86_64-1.0-1.noarch.rpm'

su -c 'rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-adobe-linux'

su -c 'yum install nspluginwrapper alsa-plugins-pulseaudio flash-plugin'

19. Acrobat Acroread:

Hmm. Stumped here on x86_64 support. Appears no 64 bit version exists.
Have to make do with Evince and xpdf, for now

20. Run this script with R to update and download a TON of R packages:

https://pj.freefaculty.org/R/SystemAdmin/R-labSelectInstall-02.R

Install that program to /usr/local/bin, then make sure it runs weekly
by putting this in the /etc/cron.weekly folder

$ cat /etc/cron.weekly/Rupdate.sh
R CMD BATCH /usr/local/bin/R-labSelectInstall-02.R >> /tmp/R-update.txt

This is not fooloproof, I was trying to use Rscript in the shebang
with that file so as to simply run "R-labSelectInstall-01.R", but
it seems not functioning. I just don't understand how Rscript is
supposed to work, or maybe it is blocked by SELINUX. Will check.

21.  Created an RPM repository with this file in
/etc/yum.repos.d/pjku.repo:

$ cat /etc/yum.repos.d/pjku.repo
[pjku]
name=Extra Packages for Enterprise Linux 6 - $basearch
baseurl=https://pj.freefaculty.org/EL/6/$basearch
enabled=1
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/PaulJohnson-BinaryPackageSigningKey

[pjku-source]
name=Extra Packages for Enterprise Linux 6 - $basearch - Source
baseurl=https://pj.freefaculty.org/EL/6/SRPMS
failovermethod=priority
enabled=0
gpgkey=file:///etc/pki/rpm-gpg/PaulJohnson-BinaryPackageSigningKey

Put my public Binary Package Signing key (which is in
https://pj.freefaculty.org/EL) in
/etc/pki/rpm-gpg/PaulJohnson-BinaryPackageSigningKey

On the build machine, it is necessary to have the private key
imported, and this in

~/.rpmmacros

%_topdir /home/pauljohn32/LinuxDownloads/redhat
%_unpackaged_files_terminate_build      0

%_missing_doc_files_terminate_build     0

%_signature gpg
%_gpg_name Paul Johnson (Binary Package Signing Key)

After installing that, then rpm --addsign *.rpm signs the packages.

So the work flow to update the repository will go like this.

1. on the work machine, copy the whole EL repo

rsync -e ssh -rav pauljohn32@freefaculty.org:freefaculty.org/EL .

2. Build RPMS in ~/LinuxDownloads/redhat like I always do, then copy
the RPMS into the EL/6 whatever hierarchy.

3. Sign the RPMS.

4. Run createrepo EL/6/x86_64  (do same for other folders)

5. upload repo back to server

rsync -e ssh -ravn EL pauljohn32@freefaculty.org:freefaculty.org

 

22. CONFIGURE GDM to NOT show usernames.

$ sudo gconftool-2 --direct --config-source xml:readwrite:/etc/gconf/gconf.xml.mandatory --type Boolean --set /apps/gdm/simple-greeter/disable_user_list True

23. Filezilla

$ sudo yum install filezilla

24. When installing this portable device as external drive on systems

1. Nvidia video drivers.

use nvidia driver from el-repo as sketched here:

https://sites.google.com/site/guenterbartsch/blog/installnvidiadriversoncentos6andrhel6

After that installs, go to /etc/yum.repos.d and change enabled=0 in elrepo file. That prevents automatic updates when yum runs.

2. Re-name the system to CRMDA010L (just add L to ordinary name of machine).

3. Join to domain

 

Posted in Uncategorized | Comments Off on RedHat 6.3 updates

Centos 6.3 64 bit

I just noticed my external disk for Centos was in 32 bit, and maybe we are safe to use 64.

Installed a fresh Centos 6.3 and then tried to make a journal to summarize the updates/hassles. IN the process, I generated 5 files of text output from the terminal, which I catted together. I see now that's too huge, but for records, it is uploaded to https://pj.freefaculty.org/scraps/install-eshell-all.txt

Still I wonder. No attachments allowed with blog posts?  I can't figure that out!  Aha! treat it as media!  install-eshell-all

I installed everything needed to develop rpm packages, and as I went, I was uploading results to

https://pj.freefaculty.org/EL/6/x86_64.

I dumped the SRPMS and the RPMS that same folder. Eventually, I will cycle back and sort it out, maybe build an RPM repository.

1) configure .rpmmacros in pauljohn32, create LinuxDownloads/redhat

%_topdir /home/pauljohn32/LinuxDownloads/redhat

%_unpackaged_files_terminate_build      0

%_missing_doc_files_terminate_build     0

2) Tried to build emacs 24, found missing packages. See file "install-shell.txt" for install of many packages and the EPEL repository.

installed emacs 24.2, and uploaded to https://pj.freefaculty.org/EL/6/x86_64

3) The system updater asked to update a lot of packages. OK!

4) Edit sshd to make X11forwarding ON by default

5) yum installs:

yum install xfig transfig xfig2ps blt tcl-devel tk-devel inkscape lyx texmacs

yum -y install R-devel
yum -y install R

yum -y install R-car R-lmtest R-multcomp R-mvtnorm R-systemfit R-zoo

yum -y install filezilla

yum -y install yumex

8). Put "pauljohn32" in the wheel group, edit /etc/sudoers to enable wheel
group

7). Ach. Lyx is version 1.6.  Build 2.0.3 and install.

build/install tex-simplecv first.

build requires;

fontpackages-devel is needed by lyx-2.0.3-1.el6.x86_64
enchant-devel is needed by lyx-2.0.3-1.el6.x86_64
hunspell-devel is needed by lyx-2.0.3-1.el6.x86_64
qt4-devel is needed by lyx-2.0.3-1.el6.x86_64

Finally, built LyX-2.0.3 RPM, uploaded that.

8) Can't find ESS package for EL6. WTF??

Build emacs-ess, install, test, upload

9) install my init.el on system.

10) swarm

install hdf5-devel, gcc, gcc-objc, gperf

11) openbugs: cant compile because of this error:

make[1]: Entering directory
`/home/pauljohn32/LinuxDownloads/redhat/BUILD/OpenBUGS-3.2.2/src'
gcc -o OpenBUGSCli OpenBUGSCli.c -L ../lib -Wl,-rpath,\$ORIGIN/../lib -m32
-lOpenBUGS
/usr/bin/ld: skipping incompatible
/usr/lib/gcc/x86_64-redhat-linux/4.4.6/libgcc_s.so when searching for -lgcc_s
/usr/bin/ld: skipping incompatible
/usr/lib/gcc/x86_64-redhat-linux/4.4.6/libgcc_s.so when searching for -lgcc_s
/usr/bin/ld: cannot find -lgcc_s
collect2: ld returned 1 exit status
make[1]: *** [OpenBUGSCli] Error 1
make[1]: Leaving directory
`/home/pauljohn32/LinuxDownloads/redhat/BUILD/OpenBUGS-3.2.2/src'
make: *** [all-recursive] Error 1
error: Bad exit status from /var/tmp/rpm-tmp.mELKHJ (%prep)
uild & upload

TRY:
$ sudo yum install libgcc-4.4.6-4.el6.i686

$ rpmbuild -ba openbugs-3.2.2.spec

$ sudo rpm -ivh openbugs-3.2.2-2.x86_64.rpm

##################################

12: To image disk to other devices, some changes were necessary for some laptops.

First, because I am frustrated with SELinux, turn that off. edit /etc/selinux/config and disable it. Restart.

Then use e2label to set the disk partitions with labeled "usbhome", "usbroot", "usbroot".  This cannot be done with the swap partition, it is necessary to turn off the swap (swapoff -a), then run mkswap -L usbswap /dev/sda3, then turn swap back on.  After that,  the UUID notations from /boot/grub/grub.conf are replaced by LABEL statements.

Similar change is required in /etc/fstab

13. Realized JAGS (Just another Gibbs Sampler) could be compiled and run.  Grabbed the SRPM from http://download.opensuse.org/repositories/home:/cornell_vrdc/CentOS_CentOS-6/src/

and rebuilt and uploaded it. Could have just used the RPM they provided, but decided to rebuild so that the build process would help me notice more development packages that need to be installed.

14. Check /usr/lib/R/etc/Renviron. Not improvements in setup, now there is a place to install packages without getting in the way of RPM installs. Lets use the first one here.

R_LIBS_SITE=${R_LIBS_SITE-'/usr/local/lib/R/site-library:/usr/local/lib/R/library:/usr/lib64/R/library:/usr/share/R/library'}

$ sudo mkdir -p /usr/local/lib/R/site-library

$ sudo R

> install.packages("rjags", lib="/usr/local/lib/R/site-library", dep=TRUE)

While I am at it, I might as well install more R packages.  Get the script

https://pj.freefaculty.org/R/SystemAdmin/R-labSelectInstall-01.R

Make some updates. Will change name to version -02 and leave in /usr/local/bin. Will be executable Rscript.

Along the way to do that, I noticed /usr/local/bin was empty and did not show in the PATH.  Will have to reboot to log out and log in to see if it notices the file there. Probably need to enter /etc/profile to fix it.

While testing, I ran R-labSelectInstall-02.R inside R, line by line, for testing.

Running just the first bit:

update.packages(ask=F, checkBuilt=T, dep=T)

launches a massive package rebuild that will probably expose more missing RPM devel packages. Continuing through the script R-labSelectInstall-02.R, these packages are installed in /usr/local/lib/R/site-library:

abind                  gWidgets              RArcInfo
actuar                 gWidgetsRGtk2         raster
ada                    gWidgetstcltk         rattle
ade4                   hdf5                  rbenchmark
ade4TkGUI              hexbin                Rcmdr
adehabitat             HSAUR                 Rcpp
AGD                    HyperbolicDist        RcppArmadillo
akima                  igraph                Rcsdp
amap                   igraph0               R.devices
Amelia                 igraphdata            relations
anchors                ineq                  relevent
aod                    influence.ME          relimp
ape                    inline                reshape
aplpack                iplots                reshape2
arules                 ipred                 rgenoud
arulesViz              isa2                  Rglpk
bayesm                 Iso                   RGtk2
bayesmix               ISwR                  rjags
bestglm                iterators             rJava
betareg                itertools             RJSONIO
biclust                JavaGD                rlecuyer
bitops                 JGR                   rmeta
BMA                    Kendall               R.methodsS3
brew                   kernlab               rms
BRugs                  labeling              ROCR
cairoDevice            languageR             R.oo
CarbonEL               lars                  rpanel
caTools                lasso2                rpart.plot
cba                    latentnet             rrcov
CircStats              latticist             R.rsp
clue                   lava                  RSQLite
clv                    lavaan                RSvgDevice
cmprsk                 lmec                  RSVGTipsDevice
cocorresp              lmeSplines            Rsymphony
coda                   lmm                   RUnit
colorspace             locfit                R.utils
combinat               logspline             SASmixed
corpcor                lokern                scales
cslogistic             longitudinal          scapeMCMC
cubature               longitudinalData      scatterplot3d
DBI                    longmemo              segmented
deldir                 longRPart             SemiPar
descr                  lpSolve               seriation
Devore5                mapproj               setRNG
Devore6                maps                  sets
diagram                maptools              sfsmisc
dichromat              mcgibbsit             shape
digest                 mclust                shapefiles
diptest                mcmc                  shapes
DistributionUtils      MCMCglmm              SkewHyperbolic
dlm                    MCMCpack              slam
doBy                   MCPAN                 sm
dse                    mda                   smoothSurv
e1071                  memisc                sn
earth                  memoise               sna
Ecdat                  MEMSS                 snow
ecodist                mice                  snowFT
effects                micEcon               sp
eha                    misc3d                spam
eiPack                 miscTools             spatialCovariance
ElemStatLearn          mitools               spatialkernel
ellipse                mix                   spatstat
ergm                   mlbench               spdep
ergm.userterms         mnormt                splancs
evaluate               MNP                   stabledist
faraway                msm                   StatDataML
fBasics                mstate                statmod
fdrtool                munsell               statnet
ffmanova               network               stepwise
fields                 networkDynamic        stringr
fImport                norm                  subselect
flexclust              np                    survBayes
flexmix                numDeriv              tcltk2
foreach                nws                   tclust
forecast               odfWeave              TeachingDemos
forward                orthopolynom          tensorA
fpc                    pan                   testthat
fracdiff               party                 tfplot
gam                    PBSmapping            tframe
gamlss                 pcaPP                 tkrplot
gamlss.data            permute               tree
gamlss.dist            pixmap                trimcluster
gamm4                  playwith              tripack
gclus                  plotmo                trust
gdata                  plotrix               TSP
gee                    plyr                  tweedie
geepack                pmml                  urca
GeneralizedHyperbolic  png                   UsingR
geoR                   polspline             VarianceGamma
geoRglm                polycor               vcd
ggplot2                polynom               vegan
glmmAK                 prabclus              verification
glmmBUGS               prim                  VGAM
glmmML                 prodlim               VIM
gmodels                proto                 waveslim
gof                    proxy                 WDI
googleVis              pscl                  weightedKmeans
GPArotation            pspline               wnominate
gpclib                 psych                 XLConnect
gplots                 qgraph                XLConnectJars
gregmisc               R2HTML                XML
gridBase               R2WinBUGS             Zelig
gsl                    RandomFields          zipfR
gtable                 randomSurvivalForest
gtools                 RANN

And these ones failed to build, I'll check why later. The first set is obvious, they just need to be removed from my wish list, while the latter ones fail because required devel software is not installed. Since I'm not using those now, I'll probably ignore them.

1: packages ‘Design’, ‘amer’, ‘DCluster’, ‘ZIGP’, ‘impute’ are not available (for R version 2.15.1)
2: In install.packages(desiredPackages[!alreadyHave], dependencies = c("Imports",  :
installation of package ‘Rmpi’ had non-zero exit status
3: In install.packages(desiredPackages[!alreadyHave], dependencies = c("Imports",  :
installation of package ‘rsprng’ had non-zero exit status
4: In install.packages(desiredPackages[!alreadyHave], dependencies = c("Imports",  :
installation of package ‘ncdf’ had non-zero exit status
5: In install.packages(desiredPackages[!alreadyHave], dependencies = c("Imports",  :
installation of package ‘ncdf4’ had non-zero exit status
6: In install.packages(desiredPackages[!alreadyHave], dependencies = c("Imports",  :
installation of package ‘fftw’ had non-zero exit status
7: In install.packages(desiredPackages[!alreadyHave], dependencies = c("Imports",  :
installation of package ‘RODBC’ had non-zero exit status
8: In install.packages(desiredPackages[!alreadyHave], dependencies = c("Imports",  :
installation of package ‘rgdal’ had non-zero exit status
9: In install.packages(desiredPackages[!alreadyHave], dependencies = c("Imports",  :
installation of package ‘rggobi’ had non-zero exit status
10: In install.packages(desiredPackages[!alreadyHave], dependencies = c("Imports",  :
installation of package ‘rgeos’ had non-zero exit status

I'm still having some trouble getting R-labSelectInstall-02.R to run from the command line using Rscript, I want to add it as a weekly cron job when it is fixed.

Installed ggobi and ggobi-devel so that rggobi does install

14. Tested yum groups, noticed I'd be getting some potentially useful packages

$ sudo yum groupinstall  "Desktop Platform Development"

 

Installing : libXp-1.0.0-15.1.el6.x86_64                                 1/11
Installing : libXp-devel-1.0.0-15.1.el6.x86_64                           2/11
Installing : openmotif-2.3.3-4.el6.x86_64                                3/11
Installing : mesa-libGLw-6.5.1-10.el6.x86_64                             4/11
Installing : libmng-devel-1.0.10-4.1.el6.x86_64                          5/11
Installing : libXmu-devel-1.0.5-1.el6.x86_64                             6/11
Installing : openmotif-devel-2.3.3-4.el6.x86_64                          7/11
Installing : mesa-libGLw-devel-6.5.1-10.el6.x86_64                       8/11
Installing : qt3-devel-3.3.8b-30.el6.x86_64                              9/11
Installing : libXtst-devel-1.0.99.2-3.el6.x86_64                        10/11
Installing : xorg-x11-docs-1.3-6.1.el6.noarch                           11/11
Verifying  : libXmu-devel-1.0.5-1.el6.x86_64                             1/11
Verifying  : libXp-devel-1.0.0-15.1.el6.x86_64                           2/11
Verifying  : xorg-x11-docs-1.3-6.1.el6.noarch                            3/11
Verifying  : openmotif-devel-2.3.3-4.el6.x86_64                          4/11
Verifying  : openmotif-2.3.3-4.el6.x86_64                                5/11
Verifying  : libmng-devel-1.0.10-4.1.el6.x86_64                          6/11
Verifying  : mesa-libGLw-6.5.1-10.el6.x86_64                             7/11
Verifying  : mesa-libGLw-devel-6.5.1-10.el6.x86_64                       8/11
Verifying  : libXp-1.0.0-15.1.el6.x86_64                                 9/11
Verifying  : libXtst-devel-1.0.99.2-3.el6.x86_64                        10/11
Verifying  : qt3-devel-3.3.8b-30.el6.x86_64                             11/11

Installed:
libXtst-devel.x86_64 0:1.0.99.2-3.el6 mesa-libGLw-devel.x86_64 0:6.5.1-10.el6
qt3-devel.x86_64 0:3.3.8b-30.el6      xorg-x11-docs.noarch 0:1.3-6.1.el6

Dependency Installed:
libXmu-devel.x86_64 0:1.0.5-1.el6      libXp.x86_64 0:1.0.0-15.1.el6
libXp-devel.x86_64 0:1.0.0-15.1.el6    libmng-devel.x86_64 0:1.0.10-4.1.el6
mesa-libGLw.x86_64 0:6.5.1-10.el6      openmotif.x86_64 0:2.3.3-4.el6
openmotif-devel.x86_64 0:2.3.3-4.el6

 

15. yum install sshfs ntfs-3g ntfsprogs

Put user in fuse group. Wish that were for all users to make it automatic!

Install flashplayer RPM from adobe.com

 

Posted in Linux | Comments Off on Centos 6.3 64 bit

Cisco AnyConnect VPN on RedHat/Centos Enterprise Linux

I've just wasted several hours on configuring VPN for my university's system.

The University adopted Cisco AnyConnect, which is provided in a shell script "vpnsetup.sh". There are several flaws in the script, however, which make this a relatively tough, confusing install. But it seems to work in the end.

Step 1. Install the RPM package "chkconfig". Without that, the vpnsetup script fails. vpnsetup.sh assumes chkconfig is installed and used to set daemons, but it is no longer default on Fedora or RedHat.

After installing chkconfig, run vpnsetup.sh again, then there is an error message you try to use Cisco AnyConnect client:

AnyConnect cannot confirm it is connected to your secure gateway. The
local network may not be trustworthy. Please try another network.

There is a separate fix for that. This makes no sense to me at all, but it does work.

Step 2. Carry out the weird, ad hoc fix that is described in the email I just received from KU IT and also on these websites.

I found it at these sites, just a few hours before I received this message from itcsc describing the same fix;

http://people.fas.harvard.edu/~pdurbin/blog/2011/09/15/getting-the-cisco-anyconnect-vpn-client-to-work-on-centos-6-x86_64.html

Blog posts at http://cuz.cx/lampshade/2010/01/running-cisco-anyconnect-on-64bit-fedora-12/
and http://puschitz.com/pblog/?p=39

This would have us insert symbolic links from several shared libraries in a directory /usr/local/firefox

# ln -s /usr/lib/libnss3.so .
# ln -s /usr/lib/libplc4.so .
# ln -s /usr/lib/libnspr4.so .
# ln -s /usr/lib/libsmime3.so .
# ln -s /usr/lib/nss/libsoftokn3.so .

Update: 2013-01-27

I just went through this same install with the VPN setup that the university is now providing. It is a
tarball, not a single self extracting script. anyconnect-linux-2.0.0343-k9.tar.gz. Inside, there is
a script vpn_install.sh. That still assumes that the chkconfig software is installed. That won't work when run with sudo, but logging in as root does work (sudo -s).

The script has some flaws I've not diagnosed fully yet, resulting in this error.

Removing previous installation...
insserv: warning: script 'K01vpnagentd_init' missing LSB tags and overrides
insserv: warning: script 'vpnagentd_init' missing LSB tags and overrides
insserv: warning: script 'vpnagentd_init' missing LSB tags and overrides
vpnagentd_init            0:off  1:off  2:on   3:on   4:on   5:on   6:off
Starting the VPN agent...
./vpn_install.sh: 166: [: unexpected operator
Done!

That unexpected operator error is annoying, but if you read vpn_install.sh, you see it is complaining
about the cleanup after the install, so I think it is harmless.

After that, the app menu does have Cisco AnyConnect under Internet, but it won't run. The error is "failed because of Certificate difficulties." Again, nobody can make it clear to me why we get that vague error or why this stupid fix works, but in /usr/local/firefox, create the symbolic links it asks for in Step 2.

Caution: I'm on Debian Multiarch right now, and first I tried to create the symbolic links from the
/usr/lib/x86-linux-gnu folder into /usr/local/firefox, but the certificate error still arose. However,
symlinking to the 32 bit versions fixed that. At the end, here's the output.

# ls -la /usr/local/firefox/
lrwxrwxrwx  1 root staff   35 Jan 27 20:41 libnspr4.so -> /usr/lib/i386-linux-gnu/libnspr4.so
lrwxrwxrwx  1 root staff   34 Jan 27 20:41 libnss3.so -> /usr/lib/i386-linux-gnu/libnss3.so
lrwxrwxrwx  1 root staff   34 Jan 27 20:42 libplc4.so -> /usr/lib/i386-linux-gnu/libplc4.so
lrwxrwxrwx  1 root staff   36 Jan 27 20:42 libsmime3.so -> /usr/lib/i386-linux-gnu/libsmime3.so
lrwxrwxrwx  1 root staff   38 Jan 27 20:42 libsoftokn3.so -> /usr/lib/i386-linux-gnu/libsoftokn3.so
Posted in Linux | Tagged | Comments Off on Cisco AnyConnect VPN on RedHat/Centos Enterprise Linux

Fighting through the LaTeX startup hassles

You kids know I'm not a Windows user, so I experience a different set of problems than you usually do. But the WinStat systems can be made to work, and the CRMDA systems can too. You just need to be a little patient and read the error messages.

Today I succeeded in compiling the ku-thesis document on a WinStat system,
but there were some hiccups. We've asked ITTC admins to install a few more packages for the LaTeX distribution, and I've uncovered one little flaw in the thesis document I created. Well, not so much as a flaw, but an omission that prevents compilation on these particular Windows systems (but not on other Windows systems or on Linux...)

I've been testing ResStat 2 (3389).

First, start simple. Compile a very simple LaTeX document, this one:

https://pj.freefaculty.org/guides/Computing-HOWTO/LatexAndLyx/LaTeX-General-1/example.tex

I just open that in Emacs, then Click "Command" -> TeXing Options -> Generate PDF, and then

Click "Command" again and click "LaTeX". That compiles the thing into pdf.

So I know the elementary setup is good.

2. Try a more difficult project.

But a lot of bells and whistles make this more complicated. Yesterday we learned of the need to install some packages on WinStat (packages, setspace). As far as I can tell, we are good to go to at-least compile a corrected version of the dissertation exemplar.

You may not know hat I made a LaTeX document template and style file for the KU Grad School last year. There is one version in LyX and also a LaTeX version exported from LyX. I just checked and I can compile that on the cluster compute nodes with pdflatex, but ran into some trouble in the WinStat systems.

Look in here:

https://pj.freefaculty.org/guides/Computing-HOWTO/KU-thesis/

The pdf file that should be produced as a result is in there. ALso there's a tarball that includes everything needed to reproduce that document.

If I open "ku-thesis.tex", and try to compile the same way, It fails, and the minibuffer says "hit C-c ` to see errors."

That says

ERROR: I can't write on the file 'thesis-ku.pdf'

The existing pdf file is sitting in the way of my work. I think it is open with the adobe reader, and on Windows, unlike Linux, that will block me from writing on the pdf again. So I close everything, and delete the pdf file for sanity.

Then I try to compile again,

########################

ERROR: Font \csname\endcsname=psyr at 12.0pt not loadable: Metric (TFM) file not found.

You requested a family/series/shape/size combination that is totally unknown. There are to cases in which this error can occur.
1) You used the \size macro to select a size that is not available.
2) If you did not do that, go to your local 'wizard' and complain fiercely that the font selection tables are corrupted!

#########################

That doesn't seem right, but I am willing to yell at wizards, if necessary. But at some point, you just end up yelling at yourself.

So either that system is missing some package I'm implicitly assuming, or there is some other package I can invoke that will fix it. I'm going to pursue fix #2.

We get better looking fonts in pdf from the Latin Modern font set, and the LaTeX package for that is installed in WinStat, so in the preamble of ku-thesis.tex, I insert this:

\usepackage{lmodern}

After that, I can compile the document.

There is an error, though, if we just compile once. Emacs minibuffer wanrs:

LaTeX: There were unresolved citation, (21) pages.

That reminds me there's some work left, because there's a bibtex bibliography file and a table of contents to fix.

In Linux, I know of two fixes for this problem. First, this is the standard LaTeX instruction. Tun the commands over and over, like this:

# pdflatex
# bibtex
# pdflatex
# pdflatex

That's boring, but it works.

Second, in Linux, we also have various programs and scripts that can do all of that for us. One that is recommended by the R project is "texi2pdf", and in my experience it works well.

On Windows, I'm finding this more trouble. When I click "Command" and "Bibtex", the minibuffer says

BibTeX finished with 2 error messages. Type 'C-c C-l' to display output.

The error there:

"I couldn't open style file apalike2.bst"

Apparently, the WinStat system is lacking in a bibliography file. I can ask the administrators to fix that for me, but I'm pretty sure I know of a solution for which I do not need their help.

Copy apalike2.bst into the working directory.

After that, Emacs is able to make bibtex work! Minibuffer says

BibTeX finished successfully. Run LaTeX again to get citations right.

Then it tells me to run LaTeX again to get reference right. But, at the end,

LaTeX: successfully formatted (22) pages.

###########################

Maybe you want to edit the tex file with some other program? My favorite LaTeX specific editor is "TexMaker", but it appears on WinStat we have TexWorks, which is provided with MikTeX:

In TeXWorks, I open ku-thesis.tex, and I see that TexWorks is clever. It notices my document has bibliography and references, and it sees special compiling is required.

You see that if you click "Typeset" and the top and it offers "pdfLaTeX + MakeIndex +BibTeX.

That program has a big green arrow on the top left, and if you click that, then the LaTeX compiler is called and all the work gets done. The pdf document pops up automatically.

So, as frustrating as this hour has been, I'm convinced (again) that this can be made to work.

Posted in Uncategorized | Tagged , , , , | Comments Off on Fighting through the LaTeX startup hassles

New Linux System Setup: Don’t forget

For all users, put in place some protections and conveniences.

In /etc/bash.bashrc or /etc/profile.d/safe.sh, add these alias lines:

alias rm='rm -i'
alias cp='cp -i'
alias mv='mv -i'

After that, the removal or destructive file copies will ask the user for confirmation. This was default on training wheels Linux I used years ago, don't know why they deviated from it. Also in same spot, can add aliases to customize colorization of ls output.

Stop vi from giving weird symbols when the arrow keys are used

In /etc/vim/vimrc, put

set nocompatible

After installing emacs and ess, put my favorite .emacs file settings in. On Debian, drop the file in /etc/emacs/site-start.d. On Redhat, find site-start.d down under /usr/share/emacs...

In the ssh settings, change the default config to allow X11 forwarding and assume X11 forwarding for outgoing ssh connections.

In /etc/ssh/ssh_config, which defaults settings for outgoing ssh connections, add this at bottom:

ForwardX11 yes
ForwardX11Trusted yes

In /etc/ssh/sshd_config, for incoming connections, do this:

X11Forwarding yes

For security, forbid remote root logins

PermitRootLogin no

Or allow only if the user has put PGP keys in the proper setup.

PermitRootLogin without-password

That is horrible terminology, I did not create it. It means NO ROOT LOGIN unless the PGP keys are set to allow connections between specific machines. "without-password" should be "pgp-key-only" or something similar, in my opinion. The point here is that an attacker knows there is a "root" account and might try to log in over and over to guess a password. Stop that!

Posted in Linux | Tagged , | Comments Off on New Linux System Setup: Don’t forget