This package implements a general-purpose JavaScript
parser/compressor/beautifier toolkit. It is developed on NodeJS, but it
should work on any JavaScript platform supporting the CommonJS module system
(and if your platform of choice doesn't support CommonJS, you can easily
implement it, or discard the exports.* lines from UglifyJS sources).
The tokenizer/parser generates an abstract syntax tree from JS code. You can then traverse the AST to learn more about the code, or do various manipulations on it. This part is implemented in parse-js.js and it's a port to JavaScript of the excellent parse-js Common Lisp library from Marijn Haverbeke.
( See cl-uglify-js if you're looking for the Common Lisp version of UglifyJS. )
The second part of this package, implemented in process.js, inspects and manipulates the AST generated by the parser to provide the following:
eval() calls or with{} statements. In short, if eval() or
with{} are used in some scope, then all variables in that scope and any
variables in the parent scopes will remain unmangled, and any references
to such variables remain unmangled as well.various small optimizations that may lead to faster code but certainly lead to smaller code. Where possible, we do the following:
{}various optimizations for IF statements:
return, throw, break or continue statement, except
function/variable declarations).
The following transformations can in theory break code, although they're
probably safe in most practical cases. To enable them you need to pass the
--unsafe flag.
The following transformations occur:
new Array(1, 2, 3, 4) => [1,2,3,4] Array(a, b, c) => [a,b,c] new Array(5) => Array(5) new Array(a) => Array(a)
These are all safe if the Array name isn't redefined. JavaScript does allow one to globally redefine Array (and pretty much everything, in fact) but I personally don't see why would anyone do that.
UglifyJS does handle the case where Array is redefined locally, or even
globally but with a function or var declaration. Therefore, in the
following cases UglifyJS doesn't touch calls or instantiations of Array:
// case 1. globally declared variable
var Array;
new Array(1, 2, 3);
Array(a, b);
// or (can be declared later)
new Array(1, 2, 3);
var Array;
// or (can be a function)
new Array(1, 2, 3);
function Array() { ... }
// case 2. declared in a function
(function(){
a = new Array(1, 2, 3);
b = Array(5, 6);
var Array;
})();
// or
(function(Array){
return Array(5, 6, 7);
})();
// or
(function(){
return new Array(1, 2, 3, 4);
function Array() { ... }
})();
// etc.
obj.toString() => =obj+“”
UglifyJS is now available through NPM — npm install uglify-js should do
the job.
## clone the repository mkdir -p /where/you/wanna/put/it cd /where/you/wanna/put/it git clone git://github.com/mishoo/UglifyJS.git ## make the module available to Node mkdir -p ~/.node_libraries/ cd ~/.node_libraries/ ln -s /where/you/wanna/put/it/UglifyJS/uglify-js.js ## and if you want the CLI script too: mkdir -p ~/bin cd ~/bin ln -s /where/you/wanna/put/it/UglifyJS/bin/uglifyjs # (then add ~/bin to your $PATH if it's not there already)
There is a command-line tool that exposes the functionality of this library for your shell-scripting needs:
uglifyjs [ options... ] [ filename ]
filename should be the last argument and should name the file from which
to read the JavaScript code. If you don't specify it, it will read code
from STDIN.
Supported options:
-b or --beautify — output indented code; when passed, additional
options control the beautifier:
-i N or --indent N — indentation level (number of spaces)-q or --quote-keys — quote keys in literal objects (by default,
only keys that cannot be identifier names will be quotes).-c or ----consolidate-primitive-values — consolidates null, Boolean,
and String values. Known as aliasing in the Closure Compiler. Worsens the
data compression ratio of gzip.--ascii — pass this argument to encode non-ASCII characters as
\uXXXX sequences. By default UglifyJS won't bother to do it and will
output Unicode characters instead. (the output is always encoded in UTF8,
but if you pass this option you'll only get ASCII).-nm or --no-mangle — don't mangle names.-nmf or --no-mangle-functions – in case you want to mangle variable
names, but not touch function names.-ns or --no-squeeze — don't call ast_squeeze() (which does various
optimizations that result in smaller, less readable code).-mt or --mangle-toplevel — mangle names in the toplevel scope too
(by default we don't do this).--no-seqs — when ast_squeeze() is called (thus, unless you pass
--no-squeeze) it will reduce consecutive statements in blocks into a
sequence. For example, "a = 10; b = 20; foo();" will be written as
"a=10,b=20,foo();". In various occasions, this allows us to discard the
block brackets (since the block becomes a single statement). This is ON
by default because it seems safe and saves a few hundred bytes on some
libs that I tested it on, but pass --no-seqs to disable it.--no-dead-code — by default, UglifyJS will remove code that is
obviously unreachable (code that follows a return, throw, break or
continue statement and is not a function/variable declaration). Pass
this option to disable this optimization.-nc or --no-copyright — by default, uglifyjs will keep the initial
comment tokens in the generated code (assumed to be copyright information
etc.). If you pass this it will discard it.-o filename or --output filename — put the result in filename. If
this isn't given, the result goes to standard output (or see next one).--overwrite — if the code is read from a file (not from STDIN) and you
pass --overwrite then the output will be written in the same file.--ast — pass this if you want to get the Abstract Syntax Tree instead
of JavaScript as output. Useful for debugging or learning more about the
internals.-v or --verbose — output some notes on STDERR (for now just how long
each operation takes).-d SYMBOL[=VALUE] or --define SYMBOL[=VALUE] — will replace
all instances of the specified symbol where used as an identifier
(except where symbol has properly declared by a var declaration or
use as function parameter or similar) with the specified value. This
argument may be specified multiple times to define multiple
symbols - if no value is specified the symbol will be replaced with
the value true, or you can specify a numeric value (such as
1024), a quoted string value (such as "object" or
'https://github.com'), or the name of another symbol or keyword
(such as null or document).
This allows you, for example, to assign meaningful names to key
constant values but discard the symbolic names in the uglified
version for brevity/efficiency, or when used wth care, allows
UglifyJS to operate as a form of conditional compilation
whereby defining appropriate values may, by dint of the constant
folding and dead code removal features above, remove entire
superfluous code blocks (e.g. completely remove instrumentation or
trace code for production use).
Where string values are being defined, the handling of quotes are
likely to be subject to the specifics of your command shell
environment, so you may need to experiment with quoting styles
depending on your platform, or you may find the option
--define-from-module more suitable for use.-define-from-module SOMEMODULE — will load the named module (as
per the NodeJS require() function) and iterate all the exported
properties of the module defining them as symbol names to be defined
(as if by the --define option) per the name of each property
(i.e. without the module name prefix) and given the value of the
property. This is a much easier way to handle and document groups of
symbols to be defined rather than a large number of --define
options.--unsafe — enable other additional optimizations that are known to be
unsafe in some contrived situations, but could still be generally useful.
For now only these:
--max-line-len (default 32K characters) — add a newline after around
32K characters. I've seen both FF and Chrome croak when all the code was
on a single line of around 670K. Pass –max-line-len 0 to disable this
safety feature.--reserved-names — some libraries rely on certain names to be used, as
pointed out in issue #92 and #81, so this option allow you to exclude such
names from the mangler. For example, to keep names require and $super
intact you'd specify –reserved-names "require,$super".--inline-script – when you want to include the output literally in an
HTML <script> tag you can use this option to prevent </script from
showing up in the output.--lift-vars – when you pass this, UglifyJS will apply the following
transformations (see the notes in API, ast_lift_variables):
var declarations at the start of the scopevar declaration, if
possible.To use the library from JavaScript, you'd do the following (example for NodeJS):
var jsp = require("uglify-js").parser;
var pro = require("uglify-js").uglify;
var orig_code = "... JS code here";
var ast = jsp.parse(orig_code); // parse code and get the initial AST
ast = pro.ast_mangle(ast); // get a new AST with mangled names
ast = pro.ast_squeeze(ast); // get an AST with compression optimizations
var final_code = pro.gen_code(ast); // compressed code here
The above performs the full compression that is possible right now. As you
can see, there are a sequence of steps which you can apply. For example if
you want compressed output but for some reason you don't want to mangle
variable names, you would simply skip the line that calls
pro.ast_mangle(ast).
Some of these functions take optional arguments. Here's a description:
jsp.parse(code, strict_semicolons) – parses JS code and returns an AST.
strict_semicolons is optional and defaults to false. If you pass
true then the parser will throw an error when it expects a semicolon and
it doesn't find it. For most JS code you don't want that, but it's useful
if you want to strictly sanitize your code.pro.ast_lift_variables(ast) – merge and move var declarations to the
scop of the scope; discard unused function arguments or variables; discard
unused (named) inner functions. It also tries to merge assignments
following the var declaration into it.
If your code is very hand-optimized concerning var declarations, this
lifting variable declarations might actually increase size. For me it
helps out. On jQuery it adds 865 bytes (243 after gzip). YMMV. Also
note that (since it's not enabled by default) this operation isn't yet
heavily tested (please report if you find issues!).
Note that although it might increase the image size (on jQuery it gains
865 bytes, 243 after gzip) it's technically more correct: in certain
situations, dead code removal might drop variable declarations, which
would not happen if the variables are lifted in advance.
Here's an example of what it does:
function f(a, b, c, d, e) {
var q;
var w;
w = 10;
q = 20;
for (var i = 1; i < 10; ++i) {
var boo = foo(a);
}
for (var i = 0; i < 1; ++i) {
var boo = bar(c);
}
function foo(){ ... }
function bar(){ ... }
function baz(){ ... }
}
// transforms into ==>
function f(a, b, c) {
var i, boo, w = 10, q = 20;
for (i = 1; i < 10; ++i) {
boo = foo(a);
}
for (i = 0; i < 1; ++i) {
boo = bar(c);
}
function foo() { ... }
function bar() { ... }
}
pro.ast_mangle(ast, options) – generates a new AST containing mangled
(compressed) variable and function names. It supports the following
options:
toplevel – mangle toplevel names (by default we don't touch them).except – an array of names to exclude from compression.defines – an object with properties named after symbols to
replace (see the --define option for the script) and the values
representing the AST replacement value.pro.ast_squeeze(ast, options) – employs further optimizations designed
to reduce the size of the code that gen_code would generate from the
AST. Returns a new AST. options can be a hash; the supported options
are:
make_seqs (default true) which will cause consecutive statements in a
block to be merged using the "sequence" (comma) operatordead_code (default true) which will remove unreachable code.pro.gen_code(ast, options) – generates JS code from the AST. By
default it's minified, but using the options argument you can get nicely
formatted output. options is, well, optional :-) and if you pass it it
must be an object and supports the following properties (below you can see
the default values):
beautify: false – pass true if you want indented outputindent_start: 0 (only applies when beautify is true) – initial
indentation in spacesindent_level: 4 (only applies when beautify is true) –
indentation level, in spaces (pass an even number)quote_keys: false – if you pass true it will quote all keys in
literal objectsspace_colon: false (only applies when beautify is true) – wether
to put a space before the colon in object literalsascii_only: false – pass true if you want to encode non-ASCII
characters as \uXXXX.inline_script: false – pass true to escape occurrences of
</script in stringsThe beautifier can be used as a general purpose indentation tool. It's useful when you want to make a minified file readable. One limitation, though, is that it discards all comments, so you don't really want to use it to reformat your code, unless you don't have, or don't care about, comments.
In fact it's not the beautifier who discards comments — they are dumped at the parsing stage, when we build the initial AST. Comments don't really make sense in the AST, and while we could add nodes for them, it would be inconvenient because we'd have to add special rules to ignore them at all the processing stages.
The --define option can be used, particularly when combined with the
constant folding logic, as a form of pre-processor to enable or remove
particular constructions, such as might be used for instrumenting
development code, or to produce variations aimed at a specific
platform.
The code below illustrates the way this can be done, and how the symbol replacement is performed.
CLAUSE1: if (typeof DEVMODE === 'undefined') {
DEVMODE = true;
}
CLAUSE2: function init() {
if (DEVMODE) {
console.log("init() called");
}
....
DEVMODE && console.log("init() complete");
}
CLAUSE3: function reportDeviceStatus(device) {
var DEVMODE = device.mode, DEVNAME = device.name;
if (DEVMODE === 'open') {
....
}
}
When the above code is normally executed, the undeclared global
variable DEVMODE will be assigned the value true (see CLAUSE1)
and so the init() function (CLAUSE2) will write messages to the
console log when executed, but in CLAUSE3 a locally declared
variable will mask access to the DEVMODE global symbol.
If the above code is processed by UglifyJS with an argument of
--define DEVMODE=false then UglifyJS will replace DEVMODE with the
boolean constant value false within CLAUSE1 and CLAUSE2, but it
will leave CLAUSE3 as it stands because there DEVMODE resolves to
a validly declared variable.
And more so, the constant-folding features of UglifyJS will recognise
that the if condition of CLAUSE1 is thus always false, and so will
remove the test and body of CLAUSE1 altogether (including the
otherwise slightly problematical statement false = true; which it
will have formed by replacing DEVMODE in the body). Similarly,
within CLAUSE2 both calls to console.log() will be removed
altogether.
In this way you can mimic, to a limited degree, the functionality of the C/C++ pre-processor to enable or completely remove blocks depending on how certain symbols are defined - perhaps using UglifyJS to generate different versions of source aimed at different environments
It is recommmended (but not made mandatory) that symbols designed for
this purpose are given names consisting of UPPER_CASE_LETTERS to
distinguish them from other (normal) symbols and avoid the sort of
clash that CLAUSE3 above illustrates.
Here are updated statistics. (I also updated my Google Closure and YUI installations).
We're still a lot better than YUI in terms of compression, though slightly slower. We're still a lot faster than Closure, and compression after gzip is comparable.
| File | UglifyJS | UglifyJS+gzip | Closure | Closure+gzip | YUI | YUI+gzip |
|---|---|---|---|---|---|---|
| jquery-1.6.2.js | 91001 (0:01.59) | 31896 | 90678 (0:07.40) | 31979 | 101527 (0:01.82) | 34646 |
| paper.js | 142023 (0:01.65) | 43334 | 134301 (0:07.42) | 42495 | 173383 (0:01.58) | 48785 |
| prototype.js | 88544 (0:01.09) | 26680 | 86955 (0:06.97) | 26326 | 92130 (0:00.79) | 28624 |
| thelib-full.js (DynarchLIB) | 251939 (0:02.55) | 72535 | 249911 (0:09.05) | 72696 | 258869 (0:01.94) | 76584 |
Unfortunately, for the time being there is no automated test suite. But I ran the compressor manually on non-trivial code, and then I tested that the generated code works as expected. A few hundred times.
DynarchLIB was started in times when there was no good JS minifier. Therefore I was quite religious about trying to write short code manually, and as such DL contains a lot of syntactic hacks[1] such as “foo == bar ? a = 10 : b = 20”, though the more readable version would clearly be to use “if/else”.
Since the parser/compressor runs fine on DL and jQuery, I'm quite confident that it's solid enough for production use. If you can identify any bugs, I'd love to hear about them (use the Google Group or email me directly).
[1] I even reported a few bugs and suggested some fixes in the original parse-js library, and Marijn pushed fixes literally in minutes.
UglifyJS is released under the BSD license:
Copyright 2010 (c) Mihai Bazon <mihai.bazon@gmail.com>
Based on parse-js (http://marijn.haverbeke.nl/parse-js/).
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above
copyright notice, this list of conditions and the following
disclaimer.
* Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following
disclaimer in the documentation and/or other materials
provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER “AS IS” AND ANY
EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY,
OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.