## Archive for **August 2011**

## BibTex URL entries using underscore

Problem with bibtex ‘URL’ field entries having an underscore ‘_’ in them: When you run latex (after running bibtex), it complains about ‘Missing $’, apparently trying to treat the _ as a math-mode subscript.

##### Solution:

If you change the _ in the .bib file to \_ , then latex works fine by treating the \_ as \textunderscore in the .bbl file.

## Virtual CloneDrive

a free program from Slysoft

It installs a “virtual dvd rom” that shows up under “My Computer” along with your other drives. Just right click -> mount… on BD-ROM and it reads the ISO in such a way that it looks like a real CD/DVD in Windows. Then you treat it like you would any CD or DVD. As an added bonus, it hands lots of other ISO-style CD formats, like MDS, BIN, CCD etc…

## Natbib author-year_Problem in LyX

###### Problem:

I’d like to list my references with natbib and authordate, like:

Author, date, …

So I chose:

Settings > Bibliography Settings > “Use NatBib” > “Cite Style: Author-year”

and:

Insert > Citation Reference > Citation style: “Name-year”.

However I am still getting brackets and numbers, like

[1] Author, date, …

I am using the documentclass elsevier article.

###### Solution:

add “\bibpunct{(}{)}{,}{a}{,}{,}” in Document->Settings->LaTex Preamble

## Open Source Mathematics Package

Computer Algebra System (CAS)

vs.

Numerical System

there is problems when install SAGE in Ubuntu 11.04. check it later.

###### Basic commands

## Software Tutorial Information

Software Carpentry Helping scientists make better software since 1997

GAMS

http://agecon2.tamu.edu/people/faculty/mccarl-bruce/

R

Quick-R: http://www.statmethods.net/

“At this time, R and Python used together gives the most power and possibilities.”

Comparison of data analysis packages: R, Matlab, SciPy, Excel, SAS, SPSS, Stata

gah789 says:

January 2, 2011

I have used all of these programmes – and quite a few more – over the last 30 odd years. What one uses tends to reflect personal history, intellectual communities, cost, etc but there are various points not highlighted in the discussion.

1. SPSS & SAS were designed as data management packages in the days when memory & CPU were expensive. They have evolved into tools for analysing corporate databases, but they are ferociously expensive for ordinary users and dealing with academic licences is a pain. Increasingly the corporate focus means that they lag behind the state of the art in statistical methods, but there is no other choice when dealing with massive datasets – oh the days when such data sets had to be read from magnetic tapes!

2. Stata’s initial USP was graphics combined with reasonable data management and statistics, but it developed an active user community which has greatly expanded its statistical capabilities if your interests match those of the community. In my view, its scripting language is not as bad as suggested by other comments and there is lots of support for, say, writing your own maximum likelihood routine or Monte Carlo analysis.

3. R (& S-Plus), Matlab (& Octave), Gauss, … are essentially developments of matrix programming languages, but they are useless for any kind of complex data management. R has a horrible learning curve but a very active research community, so it is useful for implementations of new statistical techniques not available in pre-packaged form. For many casual users what matters is the existence of a front-end – Rcmdr, GaussX, etc – that takes away the complexity of the underlying program.

4. Excel should never be used for any kind of serious statistical analysis. It is very useful for organising data or writing/testing simple models, but the key problem is that you cannot document what has been done and it is so easy to make small but vital errors – mis-copying rows for example. Actually, Statistica, JMP, and similar menu-driven programs fall into the same category: they are very good for data exploration but very poor for setting up analyses that can be checked and replicated in a reliable manner.

5. Many of us have used a variety of programming languages for data management and analysis in the past, but that is daft today – unless you are dealing with massive datasets of the SAS type and can’t afford SAS. In such cases their primary use will be the extraction and manipulation of data that is voluminous and frequently updated, but not for data analysis.

For anyone thinking what to use the key questions to consider are:

A. Are you primarily concerned with data management or data analysis? If data management, then steer clear of matrix-oriented languages which assume that your datasets are small(ish) and reasonably well organised. On the other hand, R or Matlab are essential if you want to analyse financial options using data extracted from Bloomberg.

B. Are your statistical needs routine – or, at least, standard within a research community? If so, go for a standard package with a convenient interface and easy learning curve or the one most commonly used in your community. The vast majority of users can rely upon whatever is the standard package within their discipline or work environment – from econometrics to epidemiology – and they will get much better support if they stick with the standard choice.

C. How large an initial commitment of time and money do you expect to make? A researcher developing new statistical tools or someone analysing massive databases must expect to make a substantially larger initial investment in learning and/or developing software than someone who simply wants to deploy data analysis in the course of other work.

D. Are you a student or a professional researcher? Partly this is a matter of cost and partly a matter of the reproducibility of research results. Open source and other low cost programs are great for students, but if you are producing research for publication or repeated replication it is essential to have a chain of evidence. R programs can be checked and reproduced for standard datasets, but even here there is a problem with documenting the ways in which more complex datasets have been manipulated.

I am primarily an applied econometrician, but even within this field there is a substantial range of packages with large groups of users – from R, Matlab & Gauss through Stata to RATS & E-Views – according to the interests of the users and types of data examined. Personally, I use Stata much of the time but ultimately the choice of package is less important than good practice in managing and analysing data. That is the one thing about the older packages – they force you to document how your data was constructed and analysed which is as or more important than the statistical techniques that are used unless you are purely interested in statistical methods.

Stefan says: Feb 25,2009

Hi. I think this is a very incomplete comparison. If you want to make a real comparison, it should be more complete than this wiki article . And to give a bit of personal feedback:

I know 2 people using STATA (social science), 2 people using Excel (philosophy and economics), several using LabView (engineers), some using R (statistical science, astronomy), several using S-Lang (astronomy), several using Python (astronomy) and by using Python, I mean that they are using the packages they need, which might be numpy, scipy, matplotlib, mayavi2, pymc, kapteyn, pyfits, pytables and many more. And this is the main advantage of using a real language for data analysis: you can choose among the many solutions the one that fits you best. I also know several people who use IDL and ROOT (astronomy and physics).

I have used IDL, ROOT, PDL, (Excel if you really want to count that in) and Python and I like Python best 🙂

## TeX, LaTeX, and LyX

TeX is a programming language optimized to print books.

http://www.troubleshooters.com/linux/lyx/lyx_latex_tex.htm

## Computer Programming & Programming Language

A programming paradigm is a fundamental style of computer programming. Paradigms differ in the concepts and abstractions used to represent the elements of a program (such as objects, functions, variables, constraints, etc.) and the steps that compose a computation (assignment, evaluation, continuations, data flows, etc.).

Different programming languages advocate different programming paradigms. Some languages are designed to support one particular paradigm (Smalltalk supports object-oriented programming, Haskell supports functional programming), while other programming languages support multiple paradigms (such as Object Pascal, C++, Java, C#, Visual Basic, Common Lisp, Scheme, Perl, Python, Ruby, Oz and F Sharp).

1st generation: binary code

2nd generation: assembly languages

3rd generation (the first described as high-level languages): procedural languages, object-oriented languages

4th generation: declarative programming (e.g. constraint programming, functional programming, logical programming)

**Algebraic Modeling Languages (AML)** are high-level computer programming languages for describing and solving high complexity problems for large scale mathematical computation (i.e. large scale optimization type problems).[1] One particular advantage of some algebraic modeling languages like AIMMS[1], AMPL[2] or GAMS[1] is the similarity of their syntax to the mathematical notation of optimization problems. This allows for a very concise and readable definition of problems in the domain of optimization, which is supported by certain language elements like sets, indices, algebraic expressions, powerful sparse index and data handling variables, constraints with arbitrary names. The algebraic formulation of a model does not contain any hints how to process it.

An AML does not solve those problems directly; instead, it calls appropriate external algorithms to obtain a solution. These algorithms are called solvers and can handle certain kind of mathematical problems like:

linear problems

integer problems

(mixed integer) quadratic problems

mixed complementarity problems

mathematical programs with equilibrium constraints

constrained nonlinear systems

general nonlinear problems

non-linear programs with discontinuous derivatives

nonlinear integer problems

global optimization problems

stochastic optimization problems

## AMPL vs. GAMS

1. From http://stackoverflow.com/

In terms of functionality they are pretty much the same allowing to express most types of optimization problems. Personally, I prefer AMPL because it has intuitive and expressive syntax and it is very well documented in the book. Another important advantage of AMPL is that despite the fact that it is commercial you can avoid the vendor lock-in because there is an open source alternative – GNU MathProg (http://www.gnu.org/software/glpk). GAMS on the other hand has a more advanced IDE than those that exist for AMPL.

You can find an example of the same transportation problem from George Dantzig formulated in AMPL and GAMS in their Wikipedia articles (http://en.wikipedia.org/wiki/AMPL) and (http://en.wikipedia.org/wiki/General_Algebraic_Modeling_System).

Decision Tree for Optimization Software

One software for all problems – NO!

Steps:

Identify your problem –> Select the solver –> Code your problem in the software which has interface with the solver you choose

## Is Mathematica worth it?

Background: I’ve used Mathematica for 12 years now. I mostly use it to do pure math symbolic calculations (i.e. group theory).

1. Numerical optimization in Mathematica.

I strongly DO NOT recommend Mathematica for that task if your examples are medium to large scale and nonlinear-non-convex.

Mathematica optimization solvers (and add-ons) are rather limited, have poor performance for medium to large scale nonlinear problems (i.e. long calculation times) and don’t offer a complete toolbox to cover mixed integer nonlinear or linear problems.

If you want to buy a commercial software for that task, I highly recommend purchasing AMPL or GAMS. There are a lot of open-source solutions (e.g. http://www.coin-or.org/) and the NEOS server can help in carrying out tests (http://www.neos-server.org/neos/).

There are also a lot of open-source codes for derivative free optimization if your objective or constraint functions involve noise, are non-differentiable or their derivative is not available for some reason (e.g. the result of a computer simulation in another program).

2. Symbolic calculations in Mathematica.

If your main focus is symbolic calculations, then I highly recommend Mathematica.

3. Performance

If you want your code to have low calculation times, then I suggest coding things in a compiled language (e.g. C, Fortran) and using open source libraries.

I make this remark because I initially used Mathematica for numerical optimization as well as other numerical code and I concluded that calculation times were too long for medium to large scale problems I was considering at the time. I have since moved to AMPL and also code in C using open-source and home-made numerical libraries.

There are a ton of open-source (and very reliable) linear algebra, algebraic (or differential) equation solving routines and optimization solvers (e.g see http://www.netlib.org/).

##### How to run Mathematica in server?

Northwestern University School of Management Mathematica