Quantcast
Channel: ITTreats » Uncategorized
Viewing all articles
Browse latest Browse all 21

What is CASSANDRA?

0
0


Cassandra is a hybrid non-relational database in the same class as Google’s BigTable. With Cassandra, like a document store, you don’t have to decide what fields you need in your records ahead of time. You can add and remove arbitrary fields on the fly.

features

There are a number of reasons to choose Cassandra for your website. Compared to other databases, three big features stand out:

  • Flexible schema: with Cassandra, like a document store, you don’t have to decide what fields you need in your records ahead of time. You can add and remove arbitrary fields on the fly. This is an incredible productivity boost, especially in large deployments.
  • True scalability: Cassandra scales horizontally in the purest sense. To add more capacity to a cluster, turn on another machine. You don’t have restart any processes, change your application queries, or manually relocate any data.
  • Multi-datacenter awareness: you can adjust your node layout to ensure that if one datacenter burns in a fire, an alternative datacenter will have at least one full copy of every record.

Some other features that help put Cassandra above the competition :

  • Range queries: unlike most key/value stores, you can query for ordered ranges of keys.
  • List datastructures: super columns add a 5th dimension to the hybrid model, turning columns into lists. This is very handy for things like per-user indexes.
  • Distributed writes: you can read and write any data to anywhere in the cluster at any time. There is never any single point of failure.

The data model

The usual way to refer to a piece of data is as follows: a keyspace, a column family, a key, an optional super column, and a column.

  • Keyspace (also confusingly called “table”): the outer-most level of organization. This is usually the name of the application. For example, ‘Twitter’ and ‘WordPress’ are both good keyspaces. Keyspaces must be defined at startup in the storage-conf.xml file in your $cassandra_path/conf.
  • Column family: a slice of data corresponding to a particular key. Each column family is stored in a separate file on disk, so it can be useful to put frequently accessed data in one column family, and rarely accessed data in another. Some good column family names might be :Posts, :Users and :UserAudits. Column families must be defined at startup.
  • Key: the permanent name of the record. You can query over ranges of keys in a column family, like :start => ‘10050’, :finish => ‘10070’—this is the only index Cassandra provides for free. Keys are defined on the fly.

After the column family level, the organization can diverge—this is a feature unique to Cassandra. You can choose either

  • A column: this is a tuple with a name and a value. Good columns might be 'screen_name' => 'lisa4718' or 'Google' => 'http://google.com'.

    It is common to not specify a particular column name when requesting a key; the response will then be an ordered hash of all columns. For example, querying for (:Users, '174927') might return:

    {'name' => 'Lisa Jones', 
     'gender' => 'f', 
     'screen_name' => 'lisa4718'}

    In this case, name, gender, and screen_name are all column names. Columns are defined on the fly, and different records can have different sets of column names, even in the same keyspace and column family. This lets you use the column name itself as either structure or data. Columns can be stored in recency order, or alphabetical by name, and all columns keep a timestamp.

  • A super column: this is a named list. It contains standard columns, stored in recency order.

    Say Lisa Jones has bookmarks in several categories. Querying (:UserBookmarks, '174927') might return:

    {'work' => {
        'Google' => 'http://google.com', 
        'IBM' => 'http://ibm.com'}, 
     'todo': {...}, 
     'cooking': {...}}

    Here, work, todo, and cooking are all super column names. They are defined on the fly, and there can be any number of them per row. :UserBookmarks is the name of the super column family. Super columns are stored in alphabetical order, with their sub columns physically adjacent on the disk.

Super columns and standard columns cannot be mixed at the same (4th) level of dimensionality. You must define at startup which column families contain standard columns, and which contain super columns with standard columns inside them.

Super columns are a great way to store one-to-many indexes to other records: make the sub column names TimeUUIDs (or whatever you’d like to use to sort the index), and have the values be the foreign key. We saw an example of this strategy in the demo, above.

Installation guide:

Downlaod the Cassandra and extract the into local directory. after extracting change the storage-conf-.xml for you own keyspaces and columnfamilies.

To start the cassandra : bin/cassandra.sh

To check the cassandra nodes status: bin/nodeprobe

Cassandra client: to view the inserted data, delete the data etc..  bin/cassandra_cli.sh

in cassandra clinet terminal: connect to the cassandra cluster(check in storage-conf.xml) like : connect localhost/9160

to know more options type help.

The post What is CASSANDRA? appeared first on ITTreats.


Viewing all articles
Browse latest Browse all 21

Latest Images

Trending Articles





Latest Images