Wednesday, March 10, 2010

Hadoop And Hive Configuration on Ubuntu kamic

1. To Install Hadoop
=================

Setting up your Apt Repository

1.

Add repository. Create a new file /etc/apt/sources.list.d/cloudera.list with the following contents, taking care to replace DISTRO with the name of your distribution (find out by running lsb_release -c)

For the stable repository, use…

deb http://archive.cloudera.com/debian karmic-stable contrib
deb-src http://archive.cloudera.com/debian karmic-stable contrib

For the testing repository use…

deb http://archive.cloudera.com/debian karmic-testing contrib
deb-src http://archive.cloudera.com/debian karmic-testing contrib

2.

Add repository key. (optional) Add the Cloudera Public GPG Key to your repository by executing the following command:

curl -s http://archive.cloudera.com/debian/archive.key | sudo apt-key add -

This allows you to verify that you are downloading genuine packages.
3.

Update APT package index. Simply run:

sudo apt-get update

4.

Find and install packages. You may now find and install packages from the Cloudera repository using your favorite APT package manager (e.g apt-get, aptitude, or dselect). For example:

apt-cache search hadoop
sudo apt-get install hadoop

2. To Install Hive
===============

Installing Hive is simple and only requires having Java 1.6 and Ant installed on your machine.

Hive is available via SVN at http://svn.apache.org/repos/asf/hadoop/hive/trunk. You can download it by running the following command.

$ svn co http://svn.apache.org/repos/asf/hadoop/hive/trunk hive

To build hive, execute the following command on the base directory:

$ ant package

It will create the subdirectory build/dist with the following contents:

* README.txt: readme file.
* bin/: directory containing all the shell scripts
* lib/: directory containing all required jar files)
* conf/: directory with configuration files
* examples/: directory with sample input and query files

Subdirectory build/dist should contain all the files necessary to run hive. You can run it from there or copy it to a different location, if you prefer.

In order to run Hive, you must have hadoop in your path or have defined the environment variable HADOOP_HOME with the hadoop installation directory.

Moreover, we strongly advise users to create the HDFS directories /tmp and /user/hive/warehouse (aka hive.metastore.warehouse.dir) and set them chmod g+w before tables are created in Hive.

To use hive command line interface (cli) go to the hive home directory (the one with the contents of build/dist) and execute the following command:

$ bin/hive

Metadata is stored in an embedded Derby database whose disk storage location is determined by the hive configuration variable named javax.jdo.option.ConnectionURL. By default (see conf/hive-default.xml), this location is ./metastore_db

Using Derby in embedded mode allows at most one user at a time. To configure Derby to run in server mode, look at HiveDerbyServerMode.

3. Setting up Hadoop/Hive to use MySQL as metastore
================================================

Many believe MySQL is a better choice for such purpose, so here I'm going to show how we can configure our cluster which we created previously to use a MySQL server as the metastore for Hive.

First we need to install MySQL. In this scenario, I'm going to install MySQL on our Master node, which is named centos1.



When logged in as root user:

yum install mysql-server

Now make sure MySQL server is started:

/etc/init.d/mysqld start

Next, I'm going to create a new MySQL user for hadoop/hive:

mysql
mysql> CREATE USER 'hadoop'@'centos1' IDENTIFIED BY 'hadoop';
mysql> GRANT ALL PRIVILEGES ON *.* TO 'hadoop'@'centos1' WITH GRANT OPTION;
mysql> exit
To make sure this new user can connect to MySQL server, switch to user hadoop:

We need to change the hive configuration so it can use MySQL:

nano /hadoop/hive/conf/hive-site.xml

and new configuration values are:


hive.metastore.local
true



javax.jdo.option.ConnectionURL
jdbc:mysql://centos1:3306/hive?createDatabaseIfNotExist=true



javax.jdo.option.ConnectionDriverName
com.mysql.jdbc.Driver



javax.jdo.option.ConnectionUserName
hadoop



javax.jdo.option.ConnectionPassword
hadoop



Some of the above parameters do not match what we did to setup derby server in previous post, so I decided to delete the jpox.properties file:

rm /hadoop/hive/conf/jpox.properties
hive needs to have the MySQL jdbc drivers, so we need to download and copy it to hive/lib folder:

cd /hadoop
wget http://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.11.tar.gz/from/http://mysql.he.net/
tar -xvzf mysql-connector-java-5.1.11.tar.gz
cp mysql-connector-java-5.1.11/*.jar /hadoop/hive/lib

To make sure all settings are done correctly, we can do this:

cd /hadoop/hive
bin/hive
hive> show tables;

Thursday, January 7, 2010

Facebook integration in rails

People have found the integration of Facebook Connect tricky and while great libraries like facebooker handle the API part, actually getting the profile linking and integration flow is harder. So I’ve written this tutorial to integrate the most often starter plugin for authentication and registration in Ruby On Rails restful_authentication with Facebook Connect to allow your users to login and register through Connect.

First of all lets state what this integration is going to achieve.

* As a user I can register to the site through entering my details so I can access all that great functionality
* As a user I can login to the site through my entered username and password
* As a user I can register to the site through Facebook Connect so I don’t have to fill in that form
* As a user I can login to the site through Facebook Connect so I don’t have to remember two passwords
* As a user I can connect my existing site user with my Facebook Connect user so I can later login through Facebook Connect

We also have some constraints we need to consider

* As a user if I register a user through entering my details and later login through Facebook Connect I want to make sure I retain my old user account

So read on and i’ll have you Connected in 15 minutes.

We will first create a standard restful_authentication Rails application. I’m going to user mysql for this example


rails -d mysql connect_tutorial

Install restful authentication


git clone git://github.com/technoweenie/restful-authentication.git restful_authentication
cd ../..
./script/generate authenticated user sessions

Create our database


rake db:create
rake db:migrate

Move include AuthenticatedSystem from the Sessions Controller to the Application Controller

Start the server and browse to http://localhost:3000/signup. Bingo, Restful Authentication in 3 minutes. Don’t create any users yet we need to make to add some fields to connect up our accounts


script/generate migration add_users_fb


def self.up
add_column :users, :fb_user_id, :integer
add_column :users, :email_hash, :string
#if mysql
execute("alter table users modify fb_user_id bigint")
end

def self.down
remove_column :users, :fb_user_id
remove_column :users, :email_hash
end


rake db:migrate

We need two extra columns for our users. One to store the facebook user id in fb_user_id and another to store a special hash of our users email address which we can use to later match new Facebook users to existing accounts to take care of our constraint.

For the facebook connect we are going to use the facebooker plugin. This will handle the API level communication for us.


script/plugin install git://github.com/mmangino/facebooker.git
script/generate xd_receiver

The last line creates a cross-domain receiver file for Facebook Connect to callback on.

You are now going to have to create a Facebook Application on Facebook to get your API key and secret. Head over to http://www.facebook.com/developers/createapp.php



(Enter your own application name)



Take a note of the api_key, secret and callback url and add these to config/facebooker.yml


development:
api_key: {YOUR_KEY}
secret_key: {YOUR_SECRET}
canvas_page_name:
callback_url: http://localhost:3000/
pretty_errors: true
set_asset_host_to_callback_url: true
tunnel:
public_host_username:
public_host:
public_port: 4007
local_port: 3000

We need to initialise the Facebook Connect on every page. Basically this consists of 3 things. Adding a namespace decleration for FBML, adding the Facebook Connect Javascript and initialising the Javascript. Luckily rfacebooker can do this for us, so we create a generic layout index.html.erb

<%= javascript_include_tag :defaults%>


<%= fb_connect_javascript_tag %>
<%= init_fb_connect "XFBML"%>
<%=yield%>


And add the following to ApplicationController


layout 'index'
before_filter :set_facebook_session
helper_method :facebook_session

You are now ready to roll with some Facebook Connect tags. Add the following to the bottom of sessions/new.html.erb


or login with Facebook connect


<%= fb_login_button('window.location = "/users/link_user_accounts";')%>

And the following to users/new.html.erb


or register with Facebook connect


<%= fb_login_button('window.location = "/users/link_user_accounts";')%>

We also add another registration field for name


<%= label_tag 'name' %>

<%= f.text_field :name %>



These are going to create FBML tags which the Facebook connect Javascript will render as our Connect buttons

Start the server and go to http://localhost:3000/login. You should get this, if you don’t retrace your steps you have done something wrong.



Now it’s time to integrate. We need to do 3 main things

1. When you are logged in through a Facebook session then login through restful authentication
2. Link accounts between Facebook and Restful Authentication
3. Create accounts when someone login or register with facebook.

We need to add to lib/authenticated_system.rb. Change the current_user method to


def current_user
@current_user ||= (login_from_session || login_from_basic_auth || login_from_cookie || login_from_fb) unless @current_user == false
end

And add


def login_from_fb
if facebook_session
self.current_user = User.find_by_fb_user(facebook_session.user)
end
end

This will handle the seamless login for us. Now we need to add to our User model.


#find the user in the database, first by the facebook user id and if that fails through the email hash
def self.find_by_fb_user(fb_user)
User.find_by_fb_user_id(fb_user.uid) || User.find_by_email_hash(fb_user.email_hashes)
end
#Take the data returned from facebook and create a new user from it.
#We don't get the email from Facebook and because a facebooker can only login through Connect we just generate a unique login name for them.
#If you were using username to display to people you might want to get them to select one after registering through Facebook Connect
def self.create_from_fb_connect(fb_user)
new_facebooker = User.new(:name => fb_user.name, :login => "facebooker_#{fb_user.uid}", :password => "", :email => "")
new_facebooker.fb_user_id = fb_user.uid.to_i
#We need to save without validations
new_facebooker.save(false)
new_facebooker.register_user_to_fb
end

#We are going to connect this user object with a facebook id. But only ever one account.
def link_fb_connect(fb_user_id)
unless fb_user_id.nil?
#check for existing account
existing_fb_user = User.find_by_fb_user_id(fb_user_id)
#unlink the existing account
unless existing_fb_user.nil?
existing_fb_user.fb_user_id = nil
existing_fb_user.save(false)
end
#link the new one
self.fb_user_id = fb_user_id
save(false)
end
end

#The Facebook registers user method is going to send the users email hash and our account id to Facebook
#We need this so Facebook can find friends on our local application even if they have not connect through connect
#We hen use the email hash in the database to later identify a user from Facebook with a local user
def register_user_to_fb
users = {:email => email, :account_id => id}
Facebooker::User.register([users])
self.email_hash = Facebooker::User.hash_email(email)
save(false)
end
def facebook_user?
return !fb_user_id.nil? && fb_user_id > 0
end

This allows authentication to look up users either from their stored Facebook ID, or a hash of their email address. It also adds methods for our createing and linking. After any user is created we need to register them we Facebook Connect so add to the User model


after_create :register_user_to_fb

In the previous views Facebook Connect login button we added an after login javascript callback. This is to link our accounts after a user has gone through the callback process. We need to add this to the user controller


def link_user_accounts
if self.current_user.nil?
#register with fb
User.create_from_fb_connect(facebook_session.user)
else
#connect accounts
self.current_user.link_fb_connect(facebook_session.user.id) unless self.current_user.fb_user_id == facebook_session.user.id
end
redirect_to '/'
end

Don’t forget to add a route for this.


map.resources :users, :collection => {:link_user_accounts => :get}

Finally we need to have someone to go after login. Let’s create a home page under Users controller users/home.html.erb


<% if logged_in? %>
You are logged in as <%= current_user.name %>
<% if current_user.facebook_user? %>
"

Logout


<% else %>

why don't you connect with your facebook account


<%= fb_login_button('window.location = "/users/link_user_accounts";')%>

<%= link_to 'Logout', logout_path%>


<% end %>
<% else %>
You are not logged in!
<%= link_to 'Signup', signup_path%> or <%= link_to 'Login', login_path%>
<% end %>

And map it to root and delete public/index.html


map.root :controller => "users", :action => "home"

And its done. Stop the clock. Start the server and go to http://localhost:3000/login and press the big connect button



Login with your facebook account. Restful Authentication with Facebook Connect. Done!