Wednesday, March 28, 2012

The Julia Language

The purpose of this post is to mention the Julia Language. It is a new language for technical computing. Its main strength is that it runs faster than R, MATLAB...etc. The code is compiled Just-In-Time. In the backend, amongst other things, it has LAPACK and ARPACK.

So check out http://julialang.org/

Saturday, March 24, 2012

R Programming Syntax Quickstart

If you have ANY programming experience in other languages, this guide will get you started in R very quickly.

Logic Operators

a == ba equals b
a != ba is not equal to b
a > ba is greater than b
a < ba is less than b
a >= ba is greater than OR equal to b
a <= ba is less than OR equal to b
(condition 1) & (condition 2)(condition 1) AND (condition 2)
(condition 1) | (condition 2)(condition 1) OR (condition 2)


Also, try the following to understand "&&" and "||":

> a<-c(1:10) > b<-a > c<-b > c[1:4]<-.5 > (a == b) && (a > c)
[1] TRUE
> (a == b) & (a > c)
[1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
> (a == b) | (a > c)
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> (a == b) || (a > c)
[1] TRUE


IF statements

The general example:

if( condition ) {

} else if( other condition ){

} else {

}


The specific example:

a<-55
if( a <= 54.9 ) {
   print("a is less than or equal to 54.9")
} else if( a == 55 ){
   print("a equals 55")
} else {
   print("a is greater than 54.9 and not 55")
}



For Loops

The general example:

for(variable in vector) {

}



Specific examples:

#example 1
for(i in 1:10) {
   print(i)
}

#example 2
index.vector<-c(4,3,7,5)
numberz<-runif(10)
print(numberz)

for(i in index.vector) {
   print(numberz[i])
}

#example 3
for(i in 1:10) {
   if(i == 3) {
      next
   } else if(i == 7) {
      break
   }
   print(i)
}

#example 4
mat<-matrix(0,3,4)
print(mat)

for(i in 1:3) {
   for(j in 1:4) {
      mat[i,j]<-rnorm(1)
   }
}



While Loops

General Example:

while(condition) {

}

Note that you must something write something within the while that will update at least one of the variables in the condition. Otherwise, you could have a perpetual loop.

Specific Example:

i<- -1
while( i < 10) {

    print(i)
    i<-i+1 
}

Repeat Loop

In a repeat loop, you not only explicitly update variables, you must also explicitly test the condition.
Specific Example:

i<- -1
repeat{
   print(i)
   i<-i+1
   if( i == 10) {
      break
   }
}


Functions

For example, you could have a function that evaluates a formula. A function can call other functions.

General Example:

function_name<-function(parameters) {


   return(return_variable)
}


Specific Example:

calcQuadratic<-function(a, b, c, x) {
   y<-a*x*x+b*x+c
   return(y)
}

calcQuadratic(2,3,5,.07)

my.var<-calcQuadratic(3.32,7.6,5.999,3.2)
print(my.var)



BANG!!

Testing for seasonal unit roots in R

I will explain seasonal unit root testing in R. Briefly, R is a language for statistical computing. It is very similar to MATLAB, SAS...etc. The website is http://www.r-project.org

Suppose that a our dataset is seasonal and that we intend to use a seasonal ARIMA model. We need to test our time to see if it is seasonal integrated.

Version 3.x of the "forecast" R package has a new function for testing for seasonal unit roots. The function is nsdiffs().

R also comes with a US Accidental Deaths dataset.

So to follow along, open up R and type the following:

>USAccDeaths

You will then see the US Accidental Deaths dataset. You can see that it is monthly.

Now install the "forecast" R package from CRAN. Then load it.

To view the help file for the nsdiffs() type:

>?nsdiffs

It will bring up a page that is for both nsdiffs and ndiffs.

There are two tests that have been implemented in nsdiffs, the OCSB test (default) and the Canova-Hansen test. You can also speicify the seasonal period of your dataset. USAccDeaths is a TS object and the seasonal period or "frequency" is a data member of the USAccDeaths/TS object.

To perform the OCSB test:

>nsdiffs(USAccDeaths)

To perform the Canova-Hansen test:

>nsdiffs(USAccDeaths, test="ch")

The ouput: "1" means that there is a seasonal unit root and "0" that there is no seasonal unit root.

You will notice that the two different tests give two different answers. This is because the Canova-Hansen test is less likely to decide in favour of a seasonal unit root than the OCSB test. Unlike the Canova-Hansen test, the OCSB test has a null hypothesis of a unit root. The USAccDeaths dataset is "on the edge". Osborn (1990) writes that when in doubt, it's better to seasonally difference.



Bibliography:
Osborn, DR (1990) "A survey of seasonality in UK macroeconomic variables", International Journal of Forecasting 6(3):327-336

Osborn DR, Chui APL, Smith J, and Birchenhall CR (1988) "Seasonality and the order of integration for consumption", Oxford Bulletin of Economics and Statistics 50(4):361-377.

Canova F and Hansen BE (1995) "Are Seasonal Patterns Constant over Time? A Test for Seasonal Stability", Journal of Business and Economic Statistics 13(3):237-252.

Analytics Blog

I started with the ""Insurance Blog", which started as keyword laden drivel. However, there is a limited amount of drivel that I can produce. Eventually, good statistical computing info started to flow - interspersed with keyword laden drivel. According so some research by some start-up analytics company, "insurance" is one of the highest paying keywords. ;-)

When I saw in Google Analytics that my drivel blog was coming up in searches and actually helping people, I became proud of my content. So here is a blog that I am completely proud of: all the info, without the drivel.