Hive is a data warehouse solution that allows users to execute SQL-like queries on Hadoop. It was initialized by Facebook in 2010.

What problem does Hive try to solve?

The tech Giants like Facebook then were experiencing rapid data growth. Facebook data grew from 15TB in 2007 to 700TB in 2010, a scale the traditional DB could not handle well.

Before Hive, there is the distributed computing and storage system Hadoop. However the map-reduce interface is not user friendly. It usually took several hours to just write a word count program. …


This article summarizes two terms, CAP and ACID of data storage, which are commonly mentioned in system design (see the full list of concepts in System Design Introduction For Interview).

ACID

ACID is the abbreviation of Atomicity, Consistency, Isolation and Durability. All these properties characterize a transaction, a logical unit grouping several reads and writes.

  • Atomicity (sometimes called abortability): All operations in a transaction are either committed or aborted. The partial commit is not allowed.
  • Consistency: Certain invariant statements about the data. It is more an application property.
  • Isolation: Each transaction virtually views it as the only transaction running on the…

ZooKeeper is a service for coordinating processes of distributed applications. The common coordination use cases are configuration, group membership and leader election. One approach to address the coordination problem is to develop services for each coordination needs. However, ZooKeeper tackles the problem by providing a coordination kernel, based on which new primitives can be built to support high level use cases like configuration.

As a service, Zookeeper comprises an ensemble of servers, with one leader and multiple followers. Zookeeper data are replicated on these servers to achieve high availability and performance. The read requests are processed locally for low latency…


TinyUrl is a URL shortening service. In this service, user can 1) get a short alias of a long URL. 2) retrieve the original Url with the shorten alias.

One example of this service is https://tinyurl.com/.

As a system design question, how do we approach it?

Requirements

What is the customers? Anyone with urls.

What is the service?

  1. convert a URL to a short one.
  2. retrieve the original url given the shortened alias.
  3. the shorten url expires after 3 years.

Scalability: How many daily visits? The tinyurl has 40M daily visit. As an interview question, assume the daily visit is 40M.

API Design


git concepts

  • repository: the “container” that tracks the changes (all the commits) to your project files.
  • commit: a snapshot of your project files (working tree) at a time point.
  • working tree (working directory): consists of files that you are currently working on.
  • index (staging area): compares the files in the working tree to the files in the repo (the current commit).
Untracked | Unmodified |    Modified |   Staged 
| — — git add — — — — — — — — — — — -----— ->|…

Huayu Zhang

Software Engineer@Facebook

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store