The purpose of programming tech evolution is modularity. To get modularity, people need abstractions. What's the differences between abstraction and type-saving sugars? A good package of common patterns (abstraction) encapsulates a higher level concept which has meaning *independent* of its implementation.
A package is most useful if its behavior is independent of the context(environment/global resource/etc.) of its use. Such a package is called referentially transparent. In other words, the output(behavior) of a referentially transparent package(function) will always be the same with the same input, because the explictly provided input data is the only source it depends. (What does 'same' mean here is an interesting problem, we'll see later)
To make a modular system, it is often necessary to think of a computational process as having state. In such cases, if the state can be naturally divided into independent parts, an important decomposition may be the division of the program into pieces which separately deal with the part of the state.
Referential transparency will permit programs to be divided into parts so that each part can be separately specified without a description of its implementation. The desirable result is that pieces can be separately written and debugged. At first people made no free variable recusive interpreter with seperated variable bindings (Environment) and procedure bindings. It's expressive power is very litmited.
Seperate Env and Procedure symbol tables will make procedure 2nd-class concept (thus you can't define a 'map' function in the interpreter), but merge these two table unintentionally bring in two property: free variables (in fact, before merge, procedure symbols are free variables, but they're not real 'variables' at that time) and dynamic scope variables.
To avoid function name conflicts, it would be nice to have a notation for functions as objects, or rather a way to write an sexp in code that would evaluate to a procedure. Lisp adapted such a notation from lambda calculus of Alonzo Church.
But lambda plus dynamic scoping will lead to the famous 'FUNARG' problem, so we need lexical scoping. What we want is when computing a lambda, use the environment in which it's evaluated instead of the environment in which it's executed. The solution is simply save the environment in the procedure object when evaluate a lambda (or function, they're the same thing, function is just a lambda with a name in symbol table). Within this change, we say that the procedure is closed in the current environment, and the procedure object is therefore called a *closure* of the procedure, or a *closed procedure*.
The problem of lexical scope is in the REPL: the new definition can only refer to previously defined names! We lose the ablity to define recursive procedure. This conflict, between REPL and lexical scope, is unavoidable, because such a incremental interactive top level loop for reading definitions inherently constitutes a violation of referential transparency, which we successfully got in our interpreter. A piece of code can be read in which refers to an as yet undefined identifier (the name of a procedure), and then later a definition for that identifier read in (thereby altering the meaning of the reference). If we insist on maintaining absolute referential transparency, we are forced to eliminate the incremental top level interaction, to give up interactive debugging (we can't redefine erroneous procedures easily), to give up incremental compilation of separate modules.
If we throw lexical scoping away and turn back to dynamic scoping we would lose a great deal of referential transparency and abstractive power. The solution can be a mixture: procedures must not be allowed to refer to variables internal to other procedures, but only to top-level variables existing at the time they are called. Therefore only the future top-level environment is to be included in the procedure object when it is eventually constructed. In this way free variable references will be dynamic only with respect to the top-level environment.
At this stage, we made our functions really referencial transparency, with no side effect. No side effect means no state, no state means you have to pass states up and down (as functions input and output) through the whole system. So no side effects conflicts with modular discipline. We are forced to introduce side effects as a technique for constructing modular systems. But side effects violate referential transparency, now we have two techniques for achieving modularity have come into direct conflict.
The concept of side effect is induced by particular choices of boundaries between parts of a larger system. If a system boundary encloses all processes of interest (the system is closed), we need no concept of side effect to describe that system as a whole in vacuo. If we wish to make an abstraction by dividing the system into modules more than one of which has independent state, then we have by this action created the concept of side effect.
The concept of side effect is inseparable from the notion of equality/identity/sameness. The only way one can observationally determine that a side effect has occurred is when the same object behaves in two different ways at different times. Conversely, the only way one can determine that two objects are the same is to perform a side effect on one and look for an appropriate change in the behavior of the other.
if CONS return new object on every call, then it has side effect! Because with the same input, it will generate different output (a totally new object).
If side effect are to be usable at all, the references to things denoted by variables must not make copies of those things. If the user is to be able to write procedures which produce lasting side effects on their arguments, then there must be a variable binding mechanism which does not make copies.
The ideal equality predicate should follow these two rules:
1). Two objects which are observed to behave differently must not be equal.
2). Conversely, we would like two objects which are adjudged unequal to exhibit differing behaviors under suitable circumstances.
Any useful equality predicate must satisfy 1), but it's hard to satisfy 2). (Another interesting view is, equality predicate should never be false-positive, but may be false-negative)
Based on above two rules: in the absence of RPLACA ("pure lisp"), EQUAL is preffered to EQ (like (==) in haskell); in the presence of side effects such as RPLACA, EQ is preferred to EQUAL.
Finally we found set-use-reset pattern is very helpful for modularity, and dynamic scope captures this pattern well. So we want to have both dynamic and lexical scope variables in our interpreter. We need to maintain separate environments for lexical and dynamic variables in interpreter to avoid certain problems. This will require a special syntax for distinguishing references to and bindings of the two kinds of variables.
Dynamic scoping provides an important abstraction for dealing with side effects in a controlled way. A low-level procedure may have state variables which are not of interest to intermedia routines, but which must be controlled at a high level. Dynamic scoping allows any procedure to get access to parts of the state when necessary, but permits most procedures to ignore the existence of the state variables. The existence of many dynamic variables permits the decomposition of the state in such a way that only the part of interest need be deal with.
 LISP was not originally derived from Church's lambda calculus. In early LISP ppl use "McCarthy conditional" to define recursion:
factorial[x] = [x=0 -> 1; T -> x*factorial[x-1]]
while in resursive function theory one would define it like this:
factorial(0) = 1
factorial(successor(x)) = successor(x) * factorial(x)
haskell adopt the later notion, while keep the first one too (case expression)