- Random Variables
- Statistical Distributions
- Descriptive Statistical Functions
- Statistical Tests
- Statistical Plots
- Data Tables in Statistics
- Shuffle, Sort and Find
- Confidence Intervals

EMT contains many statistical distributions, tests, plots, and functions for reading and writing data. For examples and more functions read the following introduction notebook.

Statistical functions.

Euler has a reliable random number creator. It can be used to create random variables for many distributions. If you need a fixed sequence, you can set a seed value with seed(x). Otherwise, the time value (in seconds) at the start of the current Euler session will be used.

The newer functions for creating random variables start with rand...(), replacing older functions with less uniform naming.

functioncomment seed(x)

Set the seed for the random numbers After setting a seed, all random numbers will be determined from the seed.

functioncomment random(n,m)

Uniformly distributed random variables in [0,1] random() : One random variable random(n,m) : Matrix of random variables random(n) : Row vector of random variables random([n,m]) : Matrix of random variables The function fastrandom is a quicker, but less reliable, alternative.

functionranduniform(n : index, m : index, a : number, b : number)

Random samples uniformly of the interval (a,b) See:

random (Statistics with Euler Math Toolbox),

random (Maxima Documentation)

functioncomment intrandom(n,m,k)

Integer random variables in {1,...,k} intrandom(k) : One random variable intrandom(n,m,k) : Matrix of random variables intrandom(n,k) : Row vector of random variables intrandom([n,m],k) : Matrix of random variables

functionrandint(n,m,k=none)

Integer random variables in {1,...,k} randint(n,m,k) : Matrix of random variables randint(n,k) : Vector of random variables

functioncomment normal(n,m)

0-1-Gaussian distributed random variables normal() : One random variable normal(n,m) : Matrix of random variables normal(n) : Row vector of random variables normal([n,m]) : Matrix of random variables The function fastnormal() is a quicker, but less reliable, alternative.

functionrandnormal(n : index, m : index, mean : number = 0, stdev : nonnegative number = 1)

Random samples from a normal (Gaussian) distribution

The following distributions are based on Julia code by John D. Cook.

functionrandmatrix(n:index, m:index=none, f$:string)

Apply the random generator f$ to generate a matrix.

functionrandexponential(n : index, m : index=none, mean : positive number = 1)

Random matrix from an exponential distribution randexponential(n,m) : mean=1 randexponential(n,m,mean) : nxm matrix randexponential(n,mean=v) : vector with mean=v See:

randnormal (Statistics with Euler Math Toolbox),

randuniform (Statistics with Euler Math Toolbox)

functionrandgamma(n : index, m : index = none, shape : nonnegative number=1, scale : nonnegative number=1)

Random samples from a gamma distribution Implementation based on "A Simple Method for Generating Gamma Variables" by George Marsaglia and Wai Wan Tsang. ACM Transactions on Mathematical Software Vol 26, No 3, September 2000, pages 363-372. Example: >k=10; theta=2; >x=randgamma(10000,shape=k,scale=theta); >plot2d(x,>distribution); ... >plot2d("x^(k-1)*exp(-x/theta)/theta^k/gamma(k)", ... > >add,color=blue,thickness=2):

functionrandchi(n : index, m : index, dof : index)

Random samples from a chi square distribution

functionrandinversegamma(n : index, m : index, shape : positive number, scale : positive number)

return a random matrix from an inverse gamma random variable

functionrandweibull(n : index, m : index, shape : positive number, scale : positive number)

Random samples from a Weibull distribution

functionrandcauchy(n : index, m : index, mean : number=0, scale : positive number=1)

Random samples from a Cauchy distribution

functionrandt(n : index, m : index, dof : positive integer)

Random samples from a Student-t distribution See Seminumerical Algorithms by Knuth

functionrandlaplace(n : index, m : index, mean : number, scale : positive number)

Random samples from a Laplace distribution The Laplace distribution is also known as the double exponential distribution.

functionrandlognormal(n : index, m : index, mu : number, sigma : positive number)

Random samples from a log-normal distribution

functionrandbeta(n : index, m : index, a: positive number, b : positive number)

Random samples from a Beta distribution There are more efficient methods for generating beta samples. However such methods are little more efficient and much more complicated.

Euler has a lot of routines to generate random numbers (named "rand..."), like the built-in functions random and normal. Moreover, Euler has functions for distributions ("...dis") and their densities ("q..."). For examples, see the following introduction notebook.

This file provides more distributions, random numbers, and tests.

functioncomment bindis(k:natural, n:natural, p:number)

Cumulative binomial distribution Binomial distribution for i<=k out of n with probability p. From AlgLib.

functionmap binsum(k:natural, n:natural, p:number)

Binomial sum for getting k<=i out of n runs with probability p. Uses an actual summation to compute the binomial sum. binsum() is faster. See:

bindis (Statistics with Euler Math Toolbox),

normalsum (Statistics with Euler Math Toolbox)

functionmap invbindis(px:number, n:natural, p:number)

Inverse cumulative binomial distribution Finds k such that the probability of i<=k out of n is just more than px. The result may not be integer. Then k=floor(result). A binary intersection method is used. >bindis(4,10,0.6), invbindis(%,10,0.6) 0.1662386176 4

functioncomment bincdis(k,n,p)

Complementary cumulative binomial distribution Inverse of the binomial distribution for i<=k out of n with probability p. From AlgLib.

functioncomment invpbindis(k,n,px)

Inverse (for p) cumulative binomial distribution Solves px=bindis(k,n,p) for p. Assumes integer k and n. From AlgLib.

functionoverwrite normaldis(x : real, mean : real = 0, dev : real = 1)

Cumulative normal distribution This function calls the built-in _normaldis(x) with adjusted mean and standard deviation.

functionoverwrite invnormaldis(p : real, mean : real = 0, dev : real = 1)

Inverse of cumulative normal distribution This function calls the built-in _invnormaldis(x) and adjusts the mean and the standard deviation.

functioncomment erf(x)

Gauss error function This is the integral of exp(-t^2)/sqrt(pi) from -x to x (from AlgLib). It is connected to normaldis() via 2*normaldis(sqrt(2)*x)-1=erf(x).

functioncomment erfc(x)

Complementary Gauss error function 1-erf(x)

functionnormalsum(i:natural, n:natural, p:number)

Probability of getting i or less hits in n runs. Works like binsum, but is much faster for large n and medium p. See:

binsum (Statistics with Euler Math Toolbox)

functionmap hypergeomsum(i:natural, n:natural, itot:natural, ntot:natural)

Hypergemotric sum. This is the probability to get i or less hits, if n are picked randomly in an urn containing ntot objects, with itot good objects. i : we want i or less hits in n picked objects n : number of randomly picked objects itot : total number of positive objects ntot : total number of objects Examples: >1-hypergeomsum(7,13,13,52) // 8 or more spaces in Bridge 0.00126372228099 >columnsplot(hypergeomsum(0:20,20,20,40),lab=0:20): >hypergeomsum(4,20,20,40), 1-hypergeomsum(15,20,20,40) 0.000179983683393 0.000179983683393

functionqnormal(x, m=0, d=1)

Density (DPF) of the m-d-normal distribution This is the density function the Gauss normal distribution with mean m and standard deviation 1. See:

normaldis (Statistics with Euler Math Toolbox),

erf (Statistics with Euler Math Toolbox),

erf (Maxima Documentation)

functionmap gammarestr(x)

Special Gamma function, works only for 2x natural See:

gamma (Mathematical Functions),

gamma (Maxima Documentation)

functionqchidis(x, n)

Density (DPF) of the chi-squared distribution

functioncomment chidis(x,n)

Chi-squared distribution with n degrees of freedom Algorithm from AlgLib.

functioncomment chicdis(x,n)

Complementary chi-squared distribution with n degrees of freedom Algorithm from AlgLib. See:

chidis (Statistics with Euler Math Toolbox),

invchidis (Statistics with Euler Math Toolbox),

invchicdis (Statistics with Euler Math Toolbox)

functioninvchidis(x, n)

Inverse of of the chi-squared distribution See:

invchicdis (Statistics with Euler Math Toolbox)

functioncomment invchicdis(x, n)

Inverse of of the complentary chi-squared distribution Algrithm from AlgLib.

functionqtdis(t:real, n:nonnegative integer)

Density (DPF) of the student t distribution

functioncomment tdis(x:real, n:natural)

Student T distribution with n degrees of freedom Algrithm from AlgLib. See:

invtdis (Statistics with Euler Math Toolbox)

functioncomment invtdis(x:nonnegative, n:natural)

Inverse Student T distributio with n degrees of freedom Algrithm from AlgLib.

functionqfdis(x, n, m)

Denisity (DPF) of the F-distribution

functionoverwrite map fdis(x, a, b)

F distribution Vectorizes the built-in function _fdis(x,a,b).

functionoverwrite map fcdis(x, a, b)

F distribution Vectorizes the built-in function _fcdis(x,a,b).

functionoverwrite map invfcdis(x, a, b)

Complementary F distribution Vectorizes the built-in function _invfcdis(x,a,b)

functionmap invfdis(x, a, b)

Inverse of of the F distribution

functionmeandev(x:numerical, v=none)

Mean value and statistical standard deviation of [x1,...,xn] An optional additional parameter v contains the multiplicities of x. m=mean(x) will assign the mean value only! If x is a matrix the function works on each row. x : data (1xm or nxm) v : multiplicities (1xn or nxm) See:

mean (Maxima Documentation)

functionmean(x:numerical, v:real vector=none)

Mean value of x. An optional additional parameter contains multiplicities. See:

meandev (Statistics with Euler Math Toolbox),

median (Statistics with Euler Math Toolbox),

median (Maxima Documentation)

functiondev(x:numerical, v:real vector=none)

Experimental standard deviation of x An additional parameter may contain multiplicities. See:

meandev (Statistics with Euler Math Toolbox)

functionmedian(x, v=none, p:real vector=0.5)

Quantile such that p of the x[i] are less equal. v are optional multiplicities for the values. If x is a matrix, the function works on all rows of x. x : data (1xm or nxm) v : multiplicities (1xm or nxm) p : desired percentage (real or row vector) See:

mean (Statistics with Euler Math Toolbox),

mean (Maxima Documentation),

quartiles (Statistics with Euler Math Toolbox),

quantile (Statistics with Euler Math Toolbox),

quantile (Maxima Documentation)

functionpfold(v: real vector, w: real vector)

Distribution of the sum of two distributions v[i], w[i] contain the probabilities that each random variable is equal to i-1. result[i] contains the probability that the sum of the two random variables is i-1. See:

fold (Numerical Algorithms),

fftfold (Numerical Algorithms)

functioncomment quantile(v:vector,p:real)

Compute the p-quantile of the elements in v Function from AlgLib. This functions takes care of multiplicities of the two values closest to the quantile. For the lower, upper or middle quantile, use the median function. >quantile([1,2],20%) 1.2 >quantile([1,2,2],20%) 1.4

functioncovar(x:real vector, y:real vector)

Empirical covariance of x and y The covariance is the scalar product of x and y after centralization (x-mean(x),y-mean(y)) divided by the n-1, where n is the length of x and y. See:

covarmatrix (Statistics with Euler Math Toolbox)

functioncovarmatrix(x:real)

Empirical covariance matrix of the rows of x The covariance matrix contains the empirical covariances of the rows of x, i.e., the scalar products of the centralized rows divided by the number columns of x minus 1.

functionsphering(X)

Sphering of the matrix X. The matrix X contains samples of random variables in its rows. The sphering of X is a linear transformation Y=T.(X-m), such that the rows of B have mean 0 and an identity correlation matrix. Returns {Y,T,m) See:

covarmatrix (Statistics with Euler Math Toolbox)

functioncorrel(x:real vector, y:real vector)

Correlation of x and y The correlation is the salar product of the centralized and normalized vectors x and y.

functioncorrelmatrix(x:real)

Correlation matrix of the rows of x See:

covar (Statistics with Euler Math Toolbox)

functionranks(x)

Ranks of the elements of x in x. This is the number i of the item x[i] in the vector x. With multiplicities, the rank is the mean rank of the equal elements. Works for reals, real vectors, or string vectors x. See:

rankcorrel (Statistics with Euler Math Toolbox)

functionrankcorrel(x:real vector, y:real vector)

Correlation of x and y See:

ranks (Statistics with Euler Math Toolbox)

functionempdist(x:real vector, vsorted:real vector)

Empirical distribution The vector vsorted contains empirical data. Then we compute the empirical cumulative distribution (CPF) of the data at the points x[i]. x : vector of values, usually sorted vsorted : sorted(!) vector of empirical values. >short empdist(1:6,sort(intrandom(1,6000,6))) [ 0.16283 0.33083 0.49317 0.662 0.832 1 ]

functionrandpint(n:index, m:index, p:vector)

nxm random numbers with probabilities in p Generates nxm random numbers from 1 to k based on the vector of probabilities p[1],...,p[k].

functionrandmultinomial(n:index, m:index, p: vector)

n mulitnomial random numbers based on a density This generates n outcomes of m throws with probabilities p[1],...,p[k]. The result is a nxk matrix See:

randpint (Statistics with Euler Math Toolbox),

chitest (Statistics with Euler Math Toolbox)

functionchitest(x:real vector, y:positive vector, montecarlo=false, nmontecarlo=1000, p=false)

Perform a chi^2 test, if x has the expected frequency y This functions test an observed frequency x against an expected frequency y. E.g., if 40 men are found sampling 100 persons, then [40,60] has to be tested against [50,50]. The result of the test is too small, which means that the sample does not obey the expected frequency with an error less than 5%. For a meaningful test, sum(x) should be equal to sum(y), unless p=true. In this case, y is interpreted as a vector of probabilities not a vector of events. To get frequencies of data from the data, use "getfrequencies", "count", or "histo". montecarlo : If montecarlo is not zero, the method uses a Monte Carlo simulation. It generates nmontecarlo random events of sum(x) data with the distribution in y, and counts how often the statistics sum((x-y)^2/y) is larger than the observed statistics. x,y : two real row vectors (1xn) Returns the error level for rejecting the hypothesis that the observed frequency x has the expected frequency y. >load statistics; >x=[100,90]; y=[0.5,0.5]*sum(x); chitest(x,y) 0.468159909854 >chitest(x,y,>montecarlo) 0.43 >chitest(x,[0.5,0.5],>p) 0.468159909854 See:

getfrequencies (Statistics with Euler Math Toolbox),

count (Statistics with Euler Math Toolbox),

histo (Statistics with Euler Math Toolbox)

functiontestnormal(r:real vector, n:integer, v:real vector, .. m:number, d:number)

Test an observed frequency for normal distribution. Test the number of data v[i] in the ranges r[i],r[i+1] against the normal distribution with mean m and deviation d, using the chi^2 method. r : ranges (sorted 1xm vector) n : total number of data v : number of data in the ranges (1x(m-1) vector) m : expected mean value d : expected deviation Return the error we get, if we reject the normal distribution.

functionttest(m:number, d:real scalar, n:natural, mu:number)

T student test Test, if the measured mean m with measured deviation d of n data comes from a distribution with mean value mu. m : mean value of data d : standard deviation of data n : number of data mu : mean value to test for Returns the error alpha, if we reject that the data come from a distribution with mean mu.

functiontcompare(m1:number, d1:number, n1:natural, .. m2:number, d2:number, n2:natural)

Test, if two measured data agree in mean. The data must be normally distributed. Returns the error you make, if you reject that both data are from the same normal distribution. m1,m2 : means of the data d1,d2 : standard deviation of the data n1,n2 : number of data Returns the error alpha, if we reject that the data come from a distribution with the same expected mean.

functiontcomparedata(x:real vector, y:real vector)

Compare x and y for same mean Calls "tcompare" to compare the two observations for the same mean. Returns the error we make, if we reject that both data come from a distribution with the same expected mean.

functiontabletest(A:real)

Chi^2-Test the results a[i,j] for independence of the rows from the columns. The table test test for indepence of the rows of the tables from the column. E.g., if some items are observed [40,50] times for men, and [50,30] times for woman, we can ask, if the observations depend on the gender. In this case we can reject independece with 1.8% error level. This test should only be used for large table entries. Return the error you make, if you reject independence.

functionexpectedtable(A:real)

functioncontingency(A:real, correct=1)

Contigency Coefficent of a matrix A. If the coefficient is close to 0, we tend to say that the rows and the colums are independent. correct : Correct the coefficient, so that it is between 0 and 1

functionvaranalysis

varanalysis(x1,x2,x3,...) test for same mean. Test the data sets for the same mean, assuming normal distributed data sets. This is also known as one of the ANOVA tests. Returns the error we make, if we reject same mean. Example: >seed(0.5); v=normal(1,10)+1; w=normal(1,12)+2; u=normal(1,5); >varanalysis(v,w,u) 0.000556414242764 // reject same mean!

functionmediantest(a:real vector, b:real vector)

Median test for equal mean. Test the two distributions a and b on equal mean value. For this, both distributions are checked on exceeding the median of the cumulative distribution. Returns the error we make, if we reject that a and b can have the same mean.

functionranktest(a:real vector, b:real vector, eps=epsilon())

Mann-Whitney test tests a and b on same distribution Return the error we make, if we reject the same distribution.

functionsigntest(a:real vector, b:real vector)

Test, if the expected mean of a is not better than b Assume a(i) and b(i) are results of a treatment. Then we can ask, if a is better than b. a,b : row vectors of same size Return the error we make, if we decide that a is better than b.

functionwilcoxon(a:real vector, b:real vector, eps=sqrt(epsilon()))

Test, if the expected mean of a is not better than b This is a sharper test for the same problem as in "signtest". Returns the error you make, if you decide that a is better than b. See:

signtest (Statistics with Euler Math Toolbox)

functionquartiles(x, outliers=1.5)

Quartiles for each row of x. This computes [Min,Q1,M,Q2,Max], where M is the median, Q1 the median of the lower half and Q2 the median of the upper half. outliers : If none, Min and Max are the minimal and maximal values of the data. Otherwise, Min is the least data value, which is not smaller than Q1-outliers*range, where range=Q2-Q1. Similar for Max. See:

boxplot (Statistics with Euler Math Toolbox),

boxplot (Maxima Documentation)

functionboxplot(data:real, lab=none, style="0#", textcolor=none, outliers=1.5, pointstyle="o", range=none)

Summary of the quartiles in graphical form. data : vector or matrix. In case of a matrix, the rows are used. style : If present, it is used as fill style, the default is "O#" lab : Labels for each row of the data (vector of strings) textcolor : Color of the labels (vector of colors) outliers : Factor for the maximal whisker length or none pointstyle : Point style for outliers range : 1x2 vector for the plot range (or none) >x=normal(1000)*10+1000; boxplot(x): >x=randnormal(5,1000,100,10); boxplot(x,outliers=none): See:

quartiles (Statistics with Euler Math Toolbox),

barstyle (Euler Core)

functioncolumnsplot(x:vector, lab=none, style="O#", color=green, textcolor=none, width=0.4, frame=true, grid=true)

Plot the elements of x as columns. x : vector of values lab : a string vector with one label for each element of x. style,color : fill style and color for the bars textcolor : color for the labels See:

style (Euler Core),

style (Maxima Documentation),

color (Euler Core),

color (Maxima Documentation),

plot2d (Plot Functions),

plot2d (Maxima Documentation)

functiondataplot(x:real, y:real, style="[]w", color=1)

Plot the data (x,y) with point and line plots. x : real row vector y : real row vector or matrix (one row for each data). style : a style or a vector of styles color : a color or a vector of colors You can use a vector of styles and a vector of colors. These vectors must contain as many elements as there are rows of y. See:

statplot (Statistics with Euler Math Toolbox)

functionpiechart(x:real vector, style="0#", color=green, lab:string vector=none, r=1.5, textcolor=red)

plot the data x in a pie chart. x : the vector of data color : a color or a vector of colors (same length as x) style : a style or a vector of styles lab : a vector of labels (same length as x) r : The piechart has radius 1. To leave space use r=1.5.

functionstarplot(v, style="/", color=green, lab:string=none, rays:integer=0, pstyle="[]w", textcolor=red, r=1.5)

A star like plot with a filled star or with rays and dots only

functionlogimpulseplot(x, y=none, style="O#", color=green, d=0.1)

Logarithmic impulse plot of y.

functioncolumnsplot3d(z:real, srows=none, scols=none, angle=30°, height=40°, zoom=2.5, distance=5, crows:vector=none, ccols:vector=none, positive:integer=false)

Plot 3D columns from the matrix z. This function shows a 3D plot of columns with heights z[i,j] in a rectangular array. z can be any real nxm matrix. z : the values to be displayed srows : labels for the rows scols : labels for the columns crows : colors of the rows ccols : colors of the columns (alternatively) positive : plot only positive columns Example >x=normal(1,1000); y=normal(1,1000); >v=-6:6; z=find2(x,y,v,v); >columnsplot3d(z,v,v,>positive): See:

find2 (Statistics with Euler Math Toolbox)

functionmosaicplot(z: real, srows=none, scols=none, textcolor=red, color=green, style="O#")

Moasaic plot of the data in z. z : matrix with values srows, scols : label strings for the rows and columns (string vectors) color : a color or a vector of colors for the columns of the plot. style : a style or a vector of styles. For an example see the introduction to statistics.

functionscatterplots(M:real, lab=none, ticks=1, grid=4, style="..")

Plot all rows of M against all rows of M. The labels are shown in the diagonal of the plot. lab : labels for the rows.

functionstatplot(x, y=none, plottype="b", pstyle="[]w", lstyle="-", fstyle="O#", xl="", yl="", color=none, vertical=0)

Plots x against y. This is a simple form of using plot2d with point, line or bar options. The available plotplottypes are 'p' : point plot 'l' : line plot 'b' : both 'h' : histgram plot 's' : surface plot pstyle, lstyle, fstyle : Styles for the points, lines and bars color : color or color array vertical : vertical labels See:

style (Euler Core),

style (Maxima Documentation)

functiongetspectral(x)

Get a spectral color for 0<=x<=1. The scheme runs from blue (0) to red (1)

functioncolormap(A, spectral=0, color=white)

Plot a color map of the matrix A. Color have a color scale on the right. The color is either a fixed color (white by default) or spectral colors. Example >colormap(randexponential(50,50),color=yellow); ... >title("Exponential distribution"); ... >xlabel("n"); ylabel("m"):

functionwritetable(x, fixed:integer=0, wc:index=10, dc:nonnegative integer=2, labc=none, labr=none, wlabr=none, lablength=1, NA=".", NAval=NAN, ctok:index=none, tok:string vector=none, file=none, separator=none, comma=false, date=none, time=none)

Write a table x of statistical values wc : default width for all columns or vector of widths. This is used only if the separator is not set. dc : default decimal precision for all columns or vector of precision values. fixed : use fixed number of decimal digits (boolean or vector of boolean). labc : labels for the columns (string or real vector) lablength : increase the width of the columns, if labels are wider. labr : labels for the rows (string or real vector) NA, NAval : Token string and value to represent "Not Available". By default "." and NAN is used. comma : write with decimal comma instead of dot. separator : use this separator string instead of the default blanks. Note that the number of blanks is determined by wc, if no separator is given. date : vector of columns, which should be written as dates. time : vector of columns, which should be written as times. Write a table with labels for the columns and rows and formats for each row. A typical table looks like this A B C G 1.02 2 f H 3.05 5 m Each number in the table can be translated into a token string. This translation can be set with a global variable tok (string vector) which applies to all columns with indices in ctok (index vector). Or it can be set in each column with an assigned variable tok? (string vector), where ? is the number of the column. Note that these assigned variables need to be declared with :=, since they are not in the parameter list of readtable(). See the introduction to statistics for an example. See:

readtable (Statistics with Euler Math Toolbox)

functionreadtable(filename:string, clabs=1, rlabs=0, NA=".", NAval=NAN, ctok:index=none, tokens=[none], separator=none, comma=false, date=none, list=false)

Read a table from a file. clabs : The table has a line with headings rlabs : Each line has a heading label. NA, MAval : Sets the string and the returned value for NA (not available). ctok : Indices of the columns, where tokens are to be collected. tok1=..., tok2=... : Individual string arrays for columns. separator : Optional separating characters. comma : Use decimal commas instead of dots. date : vector of columns which contain a date. The table can have a header line (clabs=1) and row labels (rlabs=1). The entries of the table can be numbers (by default with decimal dots) or strings. In case of strings, these tokens are translated to unique numbers. The translation can either be set for each column separately in string vectors with names tok1, tok2 etc., or for the complete table in the tokens parameter. The tokens are collected from the columns with indices in the ctok vector. If a column has a tok? parameter (tok1, tok2, etc.), tokens are not collected automatically from that column but the translation in tok? is used. Note that your have to write tok1:=... since the token parameters are not pre-defined parameters in the parameter list. The table can also contain expressions with units or global variables. "Not Available" can be represented by a special string. The default is ".". In the numerical table, it is represented by default as NAN. If you do not like this, simply let NAN be represented by any other string and translate ti into a numerical token. Dates are converted to a unique day number. See the introduction for statistics for an example. The default separator is a comma, semicolon, blank or tabulator. If you have a file with semicolons and decimal commas, just enable >comma. This will replace all commas with dots before the evaluation. Returns {table, heading string, token strings, rowlabel strings} See:

writetable (Statistics with Euler Math Toolbox),

date (Basic Utilities),

day (Astronomical Functions),

day (Basic Utilities)

functiontablecol(M:real, j:nonnegative vector, NAval=NAN)

The non-NAN values in the columns j of the table M. To access a table column, you could simply use M[,j], where j is a row vector of indices or a single index. But this function skips any NAN values in any of the columns j. It returns the columns as rows (transposed) and the indices of the rows. NANval : The value that should be treated as "Not Available" Returns {colums as rows, indices of non-NAN rows}

functionselectrows(M:real, j:index, v:real vector, NAval=NAN)

Select the rows indices i with M[i,j] in v and not-NAN.

functionsortedrows(M:real, j:nonnegative integer vector)

Index of rows for sorted table with respect to columns in j The table gets sorted in lexicographic order. Returns : {sorted table, index of sorted values}

For statistical purposes and many other applications, Euler has efficient functions to find values in a vector.

functioncomment shuffle(v)

Shuffle the vector v See:

sort (Statistics with Euler Math Toolbox),

sort (Maxima Documentation)

functioncomment sort(v)

Sort the vector v The function returns {x,i}, where x is the sorted vector, and i is the vector of indices, which sort the vector. >v=shuffle(1:10) [6, 3, 1, 5, 10, 4, 9, 8, 2, 7] >{vx,i}=sort(v); vx, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] >v[i] [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] See:

shuffle (Statistics with Euler Math Toolbox)

functioncomment lexsort(A)

Lexicographic sort of the rows of A Returns {Asorted,i}, where i is the vector of indices, which sorts the rows of A. >A=intrandom(5,5,3) 2 1 2 1 2 1 3 3 1 2 3 3 2 1 2 3 1 3 2 2 3 2 1 1 1 >lexsort(A) 1 3 3 1 2 2 1 2 1 2 3 1 3 2 2 3 2 1 1 1 3 3 2 1 2 See:

sort (Maxima Documentation)

functionoverwrite unique(v)

Unique elements in v >v=intrandom(10,12) [6, 2, 3, 9, 6, 5, 7, 7, 10, 2] >unique(v) [2, 3, 5, 6, 7, 9, 10]

functioncomment find(v,x)

Find x in the intervals of the sorted vector v Returns the index i such that v(i) <= x < v(i+1). It returns 0 for elements smaller than v[0], and length(v) for elements larger or equal the last element of v. The function maps to x. The function works for sorted vectors of strings v, and strings or string vectors x using alphabetic (ASCII) string comparison. >s=random(10) [0.270906, 0.704419, 0.217693, 0.445363, 0.308411, 0.914541, 0.193585, 0.463387, 0.095153, 0.595017] >v=0.2:0.2:0.8 [0.2, 0.4, 0.6, 0.8] >find(v,s) [1, 3, 1, 2, 1, 4, 0, 2, 0, 2] See:

indexof (Statistics with Euler Math Toolbox),

indexofsorted (Statistics with Euler Math Toolbox)

functioncomment count(v,n)

Counts v[i] in integer intervals [i-1,i] up to n Returns a vector n, where n[i] is the number of elements of v in the interval [i-1,i[ for 1<=i<=n. >count([0,0.1,0.2,1,1.5,2],2) [3, 2]

functioncomment indexof(v,x)

Find x in the vector v Find the first occurence of x in the vector v. Maps to x. >v=intrandom(10,4) [6, 5, 2, 2, 3, 8, 5, 4, 4, 2] >indexof(v,1:10) [0, 3, 5, 8, 2, 1, 0, 6, 0, 0] >indexof(["This","is","a","test"],"a") 3 See:

indexofsorted (Statistics with Euler Math Toolbox),

find (Statistics with Euler Math Toolbox)

functioncomment indexofsorted(v,x)

Find x in the sorted vector v Find the last occurence of x in the vector v. Note that indexof returns the first occurence. Maps to x. >v=sort(intrandom(10,4)) [3, 4, 5, 5, 5, 6, 8, 8, 9, 10] >indexofsorted(v,1:10) [0, 0, 1, 2, 5, 6, 0, 8, 9, 10] See:

find (Statistics with Euler Math Toolbox)

functioncomment multofsorted(v, x)

Counts x in the sorted vector v The function maps to x. >v=intrandom(1000,10); multofsorted(sort(v),1:10), sum(%) [88, 84, 126, 86, 110, 104, 86, 103, 113, 100] 1000 See:

getmultiplicities (Statistics with Euler Math Toolbox),

getfrequencies (Statistics with Euler Math Toolbox)

functiongetfrequencies(x:real vector, r: real vector)

Count the number of x in the intervals of the sorted vector r. The function returns the number of x[j] in the intervals r[i-1] to r[i]. x : real row vector (1xn) r : real sorted row vector (1xm) Returns the frequencies f as a row vector (1x(m-1)) See:

count (Statistics with Euler Math Toolbox),

histo (Statistics with Euler Math Toolbox),

multofsorted (Statistics with Euler Math Toolbox),

getmultiplicities (Statistics with Euler Math Toolbox)

functiongetmultiplicities(x, y, sorted=0)

Counts how often the elements of x appear in y. This works for string vectors and for real vectors. sorted : if true, then y is assumed to be sorted. See:

count (Statistics with Euler Math Toolbox),

getfrequencies (Statistics with Euler Math Toolbox),

multofsorted (Statistics with Euler Math Toolbox)

functiongetstatistics(x:real vector, y:real vector=none)

Return a statics of the values in the vector x. If y is none, the function returns {xu,mu}, where xu are the unique elements of x, and mu are the multiplicities of these values. Else the function returns {xu,yu,m}, where xu are the unique elements of x, yu the unique elements of y, and M is a table of multiplicities of pairs (xu[i],yu[j]) in (x[k],y[k]), k=1...n.

functionargs histo(d:real vector, n:index=10, integer:integer=0, even:integer=0, v:real vector=none, bar=1)

Computes {x,y} for histogram plots. d : 1xm vector of data Returns {x,y} whith x - End points of the intervals (equispaced n+1 points) y - The number of data in the subintervals (frequencies) integer : flag for distributions on integers even : flag for evenly spaced discrete distributions This should be is used by plot2d for bar styles. v : optional interval boundaries (ordered). bar : If true, the function returns two vectors for >bar in plot2d. If false, it returns a sawtooth function for plot2d. The plot function plot2d has parameters distribution=1, histogram=1 to achieve the same effect. See:

plot2d (Plot Functions),

plot2d (Maxima Documentation)

functionfind2(x:vector, y:vector, vx:vector=none, vy:vector=none, n:integer=none)

Matrix count for pairs x[i],y[i] in the bounds. x,y : Vectors of same size. vx,vy : Sorted vector of bounds, if present (must enclose x resp. y) n : If vx or vy is not present, number of intervals between the bounds of x. Returns a matrix with counts. See:

columnsplot3d (Statistics with Euler Math Toolbox)

functioncinormal(mean:numerical, sigma:numerical, alpha=0.05)

Confidence interval for known mean and standard deviation. See:

cimean (Statistics with Euler Math Toolbox)

functioncimean(data: real vector, alpha=0.05)

Confidence interval for the mean of normal distributed data This is a symmetric interval around the mean value of the data containing the true mean of the random experiment in 95% (default alpha=0.05) of the cases. The data are assumed to be from identically normal distributed independent random variables. Clopper-Pearson confidence interval for k hits in n. The upper bound of the interval is such that P(X<=k,p)=alpha/2, the lower bound such that P(X>=k,p)=alpha/2. In other words, if p is outside the interval then k is an event which is less likely then alpha. This interval estimator yields an interval which contains the true p in 95% (default alpha=0.05) of the cases. >clopperpearson(20,400) [0.0308831, 0.076167]