Sunday, December 12, 2010

Rails 2.3.8: Upgrading to Bundler

On the way to upgrading an application to Rails 3, I want to try replacing Rails' current gem management system with Bundler. I'll do this by following the Rails 2.3 instructions on the Bundler site.

As always, begin by creating a new branch and switching into it:

$ git branch bundler
$ git checkout bundler

Now we can begin installing Bundler.
First, insert the following code into /config/boot.rb, right above the last line (Rails.boot!):

class Rails::Boot
  def run
    load_initializer

    Rails::Initializer.class_eval do
      def load_gems
        @bundler_loaded ||= Bundler.require :default, Rails.env
      end
    end

    Rails::Initializer.run(:set_load_path)
  end
end

Next, create a new file, config/preinitializer.rb, and insert the following:

begin
    require "rubygems"
      require "bundler"
rescue LoadError
    raise "Could not load the bundler gem. Install it with `gem install bundler`."
end

if Gem::Version.new(Bundler::VERSION) <= Gem::Version.new("0.9.24")
    raise RuntimeError, "Your bundler version is too old for Rails 2.3." +
         "Run `gem install bundler` to upgrade."
end

begin
    # Set up load paths for all bundled gems
      ENV["BUNDLE_GEMFILE"] = File.expand_path("../../Gemfile", __FILE__)
        Bundler.setup
rescue Bundler::GemNotFound
    raise RuntimeError, "Bundler couldn't find some gems." +
          "Did you run `bundle install`?"
end

Then, create the file /Gemfile and copy the following code into it:

source :gemcutter
gem "rails", "~> 2.3.5"
gem "sqlite3-ruby", :require => "sqlite3"

# bundler requires these gems in all environments
# gem "nokogiri", "1.4.2"
# gem "geokit"

group :development do
  # bundler requires these gems in development
  # gem "rails-footnotes"
end

group :test do
  # bundler requires these gems while running tests
  # gem "rspec"
  # gem "faker"
end

I needed to add the gems which are specific to my application, including factory_girl. Furthermore, I use MySQL instead of SQLite, so I'll have to change those lines too. Ultimately, my new Gemfile looksl ike this:

source 'http://rubygems.org'
gem "rails", '2.3.8'
gem 'factory_girl'
gem 'searchlogic'

# bundler requires these gems in all environments
# gem "nokogiri", "1.4.2"
# gem "geokit"

group :development do
  # bundler requires these gems in development
  # gem "rails-footnotes"
end

group :test do
  # bundler requires these gems while running tests
  # gem "rspec"
  # gem "faker"
end

At this point int he Bundler instructions, it ends with "From this point on, you can follow the instructions in the Rails 3 guide" and then the command

$ rake db:migrate

I'm not sure why this is here, as it comes with no explanation. I don't think any migrations have been generated, so I'm not sure why we are supposed to run a migration. Nevertheless, I do it anyway:

$ rake db:migrate

As predicted, nothing happens. Clicking on the link "Learn more: Rails 3" takes us to the Rails 3 installation tutorial, where we are meant to continue following the steps.

At this point, to see if it's working, run bundler from the command line:

$ bundle install

To my astonishment, it works. To test it out, let's add Devise to the Gemfile and see if Bundler can install them:

gem "devise", "1.0.9"

Then from the command line:

$ sudo bundle install

Again, it works. Thanks Rails.

The next step would be to clean up the old code that's no longer needed for gem management. All I can think of is to remove the gem lines from /config/environment.rb by commenting them out:

#RAILS_GEM_VERSION = '2.3.8' unless defined? RAILS_GEM_VERSION
#  config.gem "factory_girl"
#  config.gem "searchlogic"

After restarting the server, everything still works, so I guess it's a successful install of Bundler. Having that out of the way will make the transition to Rails just a little bit easier.

To finish up, we need to commit our changes, merge back into the master branch, and delete the bundler branch:

$ git add .
$ git commit -am "upgraded to Bundler"
$ git checkout master
$ git merge bundler
$ git branch -d bundler

Done.

Saturday, December 11, 2010

Setup vs. Set up

The word "setup" comes up a lot in software blogs. Many writers misuse it. Two common cases arise:

Noun

Setup (one word)
"When you've finished the setup script, continue by...."
"The setup process is relatively simple...."

Verb

Set up (two words)
"This is a walkthrough with all steps you need to set up a resource..."
"If you're having trouble setting up the system...."
"This should allow you to set up the...."
"Then the user can set up his own...."
"One can set up a new product by...."

If you're a blogger, distinguish yourself by using the proper case.

Monday, December 6, 2010

the R Project for Statistical Computing

R (r-project.org) is a free (and open source) language and software environment for statistical computing and graphics. If you need to perform statistical analysis or create visualizations of data sets, check out R. It works on most operating systems; ask Google for any one of the many fine installation tutorials.

The biggest barrier to entry to using R comes from its lack of a graphical user interface (GUI). Instead of seeing and manipulating data and selecting commands from a menu as in Microsoft Excel, users of R type in functions at the command line. It looks just like the DOS prompt that so few people use now. Does this mean you have to be a programmer to use it? No, but when you start using it, you'll be creating mini computer programs without realizing it. The rest of this article contains information about a few things I found useful when learning R.

R Commander

require(Rcmdr)
R Commander is the closest thing that users have to a GUI. It is probably the best way to start learning how to use R, because as a user selects commands to run from the menu, she can see the result entered into the command line. However, one should quickly move toward manipulating R using the command line directly; it will become immediately evident how much more powerful the command line really is.

R Commander. Image source: John Fox

To start using R Commander, use it to play around with some of the sample data that comes with a standard installation of R. See below.

Included data sets

data()
R comes with many sample data sets pre-loaded. IPSUR (see below) uses many of these small data sets to illustrate concepts. To see a list of all the example data sets included by default with R, simply type in data() at the command prompt. To see a sample data set, simply type in its name. For example, type in cars to see the data about speed and stopping distance of cars. Playing with these data sets using R Commander is a great way to start learning R syntax. For example, try the command hist(cars$speed) to see R generate a histogram of data from the "speed" column of the data set called "cars."

Introduction to Probability and Statistics Using R

To get started with R, I'd recommend the freely available textbook Introduction to Probability and Statistics Using R (IPSUR [PDF]). Like so many other great internet resources, it's free and open source. The author, G. Jay Kerns of Youngstown State University, has compiled the whole book in LaTeX (also open source), producing a publication-quality textbook based specifically on using the R language and environment. This book makes use of the many example data sets included in R. It includes installation instructions and walks the reader through the mathematics of statistics while using R to perform the calculations.

I did not use IPSUR to learn statistics; I only used it to apply what I already knew about statistics to the R environment. As such, I can't comment on it as a statistics book, but only as an R book. It may be that the typical user should start with a introductory statistics book to learn the vocabulary and motivation for statistics, and then move to IPSUR and R with that foundation in place.

The lattice package

library(lattice)
I found the lattice package to be particularly useful in creating quality graphics. Make it available to R by typing library(lattice) at the command line. With the lattice package loaded, you have more graphical functions available. For example, instead of hist(), you also histogram(). Try histogram(cars$speed) and compare it to the default hist(cars$speed).

Data sets with two variables can be displayed easily using a simple XY chart, or the Cartesian plane familiar from high school algebra. The relationship is visualized by plotting each variable along one of the two axes. Lattice gets most of its power from its ability to meaningfully display data sets with more than two variables.

For example, look at the "environmental" data set (loaded along with the lattice package) by typing in the command head(environmental). Notice that it has four variables: Ozone, radiation, temperature, and wind. (You could see this more directly with command names(environmental).) To visualize the relationship of all possible pairs of these variables together, use the command splom(~environmental, data=environmental).

Some useful commands:

?lattice Open the help file for the command "lattice"; replace "lattice" with any R command to see its help file.
head(cars) View the column headings and the first five rows of the data set "cars"; replace "cars" with any data set to see a truncated listing, which will avoid printing out possibly thousands of lines of data in the command window. Useful for seeing what a particular data set looks like.
neuse$month_integer <- with(neuse, as.POSIXlt(neuse$DATE)$mon)
boxplot(Log_SURFCHLA~SOURCE + month_integer, data=neuse)
splom(~ cbind(Murder, Assault, Rape), data = USArrests)
histogram(~Log_SURFCHLA|month_integer, data = neuse)

Other statistics packages

R isn't the only option. Below are some other software packages that have robust statistical computational capacity. Unfortunately, all are closed-source and proprietary.

Minitab ($1,395; Minitab 16)
Stata ($1,245; Stata IC)
SAS ($8,100; SAS Analytics Pro)
Excel ($139.99; Excel 2010)

Sunday, December 5, 2010

Reactions to Rob Bell's Velvet Elvis

Velvet Elvis, by Rob Bell

Velvet Elvis made me realize for the first time that as a congregant, I can not only affect and change my church, but that I am actually connected to the first-generation Christians from the book of Acts.

And so these first Christians passed on the faith to the next generation who passed it on to the next generation who passed it on to the next generation until it got to ... us. Here. Today. Those who follow Jesus belong to his church. And now it is our turn. It is out turn to step up and take responsibility for who the church is going to be for a new generation. It is our turn to redefine and reshape and dream it all up again. (p. 164)

I copied this down when I read it because it beautifully put into words what I had been feeling and realizing: that I am an adult now, I am a member of First Pres Berkeley, I serve on a committee, people listen to me, I introduce myself to new people, I have lots of friends there, I know the staff, it is my church, I am part of it, and therefore I can change it and affect it.

Basically, I realized that I must both connect the church I attend with the crazy stuff that the apostles were doing in Acts (planting churches, getting arrested, writing letters, founding Christianity), and realize that I am part of it and I can change it. It was founded in 30 AD or whatever, and what we're part of now is the same thing, and we have the same power to change it that the founders had. If I want it to be a certain way, then I can push it in that direction.

The church is nothing but a group of people with common beliefs who come together to worship, serve, love, question, struggle, give, take, communicate, sing, play, eat, reach out, listen, and pray. I am one of those people, and I do all of those things, so how I do them is part of what defines my church. My church is, literally, whatever I make it.

Image source: amazon.com

The second important point that the book drove home for me was the concept of unconditional service as evangelism. From page 167:

And this is because the most powerful things happen when the church surrenders its desire to convert people and convince them to join. It is when the church gives itself away in radical acts of service and compassion, expecting nothing in return, that the way of Jesus is most vividly put on display. To do this, the church most stop thinking of everybody primarily in categories of in or out, saved or not, believer or nonbeliever. Besides the fact that these terms are offensive to the "un" and "non," they work against Jesus's teachings about how we are to treat each other. Jesus commanded us to love our neighbor, and our neighbor can be anybody. We are all created in the image of God, and we are all sacred, valuable creations of God. Everybody matters. To treat people differently based on who believes what is to fail to respect the image of God in everyone. As the book of James says, "God shows no favoritism." So we don't either.

Oftentimes the Christian community has sent the message that we love people and build relationships in order to convert them to the Christian faith. So there is an agenda. And when there is an agenda, it isn't really love, is it? It's something else. We have to rediscover love, period. Love that loves because it is what Jesus teaches us to do. We have to surrender our agendas. Because some people aren't going to become Christians like us no matter how hard we push. They just aren't. And at some point we have to commit them to God, trusting that God loves them more than we ever could. (p. 167)

To me, Rob Bell's point--as well as the point of a large chunk of the New Testament--is that we should just love everyone, all the time, unconditionally. Yes it's impossible. Yes it's inspiring.

15 Reasons Why I Love Open Source

Open source software is free in two senses of the word: It is free as in beer, and free as in speech. The "source" part of open source refers to the actual lines of computer code that to compile to form the final software package. "Open" means that anyone can download that code, look at it, change it, use it, re-compile it, improve it, learn from it, adapt it, and integrate it his or her own projects.

The community. Open source communities comprise friendly, helpful, enthusiastic, passionate, generous people. They appear on list-serves, discussion groups, IRC channels, blogs, forums, and other arenas to discuss nearly every open source project or topic you can think of.
The teamwork. On large projects, community members typically contribute only small bits of code at a time, sometimes only one line. Some users don't contribute code at all, but focus on graphics or documentation.
The licenses. MIT license, GNU license, Mozilla license, and others. People have put a lot of thought into making these licenses fair, solid, and protective of open source ideals. Most open source software projects apply one of these licenses, which assure the users that the software will always be free and open source.
The cost. With an open source license attached, the software is free, always, with no exceptions, ever.
The quality. It's constantly improving, and it has spawned some of the greatest software available. In the realm of web development, nearly all of the best packages are community-driven and open source, such as the Apache web server (which serves 59% of all web pages), the Linux OS, the WordPress platform, and Wikipedia (the underlying software, not the content).
The evolution. The software industry changes rapidly, and the nothing beats the open source community at riding, pushing, adapting to, benefiting from, and gaining from that change. For example in the Ruby on Rails community, a popular authentication module called restful_authentication discontinued development as the Rails core itself moved on to more sophisticated technology; as a result, along came Devise, a better, faster, cleaner, and more elegant solution. The community evolved.
The add-ons. Ruby and Ruby on Rails has Gems. R has packages. Firefox has add-ons. The modularity bult into these popular programs allows the community to easily contribute functionality, and it allows user to easily adopt that new functionality.
The software. The software itself is great. I use Ubuntu Linux, Open Office, Ruby on Rails, R, Firefox, and others. Open source software virtually can't be bad--otherwise users would flee and development would cease and the project would die.
The updates. I love getting updates. I love downloading the newest versions and reading about all the improvements that volunteers have made. I love it when I get a slick new version of Ubuntu every six months. In the private software world, programs sometimes stagnate. Microsoft released Internet Explorer 6 in 2001, then essentially ceased developing it for years until Firefox (open source) took over a huge piece of its market.
The security. Linux, the open source operating system that can replace Windows, has no viruses that I know of. Ruby on Rails provides security fixes so fast they once tripped over their own feet and broke their own code base.
The distribution. Open source projects sometimes have hundreds or thousands of contributors. The wide distribution of volunteers means that popular projects will virtually never stall.
The transparency. An open source project will typically provide plenty of information about what developments are currently underway, where the project is headed, and when to expect updates and new releases.
The competition. Open source projects compete with each other for users, and they compete with private industry groups for users. This healthy competition forces both sides to continue producing good software, and on the private side, it helps drive costs down for proprietary software. In many cases, it has lead to private companies releasing their software for free (while keeping their code "closed-source"). Free software: Skype, Internet Explorer, Adobe Reader. Without open-source equivalents, these companies might have no reason not to charge for this software.
The equivalents. For many proprietary software packages, the open source community has created free equivalents. Microsoft Office has Open Office; Adobe Photoshop has GIMP; Microsft Windows and Apple OsX have Linux. See Open Source as Alternative (osalt.com) for a complete list.
The names. The open source community gives creative, apt, and frequently humorous names to its creations. For example, Cinerella is a movie movie studio package, and Pidgin is an instant messenger platform that "speaks" to many different services.

Look around your computer at the programs you use the most, and consider trying out an open source alternative. Beginners: Start by replacing Internet Explorer with Firefox. More advanced users: Download Open Office instead of paying hundreds of dollars for the next release of Microsoft Offce (NB: MS Office has legitimately better features, but only power users can tell the difference). Advanced: Download Ubuntu and try a whole new operating system.

Thursday, December 2, 2010

What is LaTeX?

LaTeX (pronounced "lay-tech") is for typesetting documents. Microsoft Word is also for typesetting documents, but there are a few major differences between Word and LaTeX. The biggest difference is that Word provides a what-you-see-is-what-you-get (WYSIWYG) editor as a graphical user interface (GUI). LaTeX by itself does not a WYSIWYG editor or any sort of GUI. Instead, LaTeX requires the user to edit raw, unformatted, plain text, much as if he were writing HTML. In that way, creating a LaTeX document is like creating an HTML web page.

To create an HTML page (hypertext markup language), you type in the content you want in your document, then you "mark up" the content using HTML tags such as headings, paragraphs, lists, tables, and links. The HTML file itself a plain-text file with a ".html" file extension, such as "example.html". The HTML page is associated with a CSS file (cascading style sheets) which applies visual styles to the marked up text. Then a web browser like Internet Explorer or Firefox renders the document in the browser window. To summarize, the HTML file holds the marked-up content, the CSS file controls how the marked-up content looks, and the browser renders the final product.

LaTeX works much the same way. But instead of using HTML tags to mark up the content, one uses LaTeX tags to mark up the content. And instead of using a ".html" extension, a LaTeX file has a ".tex" extension, such as "example.tex". And instead of a CSS file to define the visual style, the LaTeX program itself defines the visual style. And instead of rendering the final product in a browser, LaTeX typically renders the final product as a PDF file.

If that seems restrictive, that's because it is. With no CSS file to define the visual layout of the document, the new user has little control over how the rendered document looks. But therein lies the power of LaTeX: Behind the scenes, LaTeX has thousands of rules and algorithms for determining the best way to visually present the content. A Stanford University math professor created the foundation for LaTeX in the 1970s, and with 40 years of development behind it, LaTeX does a very good job of deciding how to render your content. Users of LaTeX argue convincingly that authors should worry about content and let document designers worry about layout.

LaTeX is best used for creating text documents--essentially anything you would create in Microsoft Word is a fair candidate for LaTeX. For the beginning user, it is not the best choice for creating graphics or flow charts or slide presentations.

To get started using LaTeX (it's free), use the excellent installation instructions in the "Get LaTeX" page at www.latex-project.org.