Making data analysis convenient and customizable through an open-source R programming language
R programming
language is one of the most popular tools that is currently being widely
adopted for statistical work. It is a very important tool used in Data Science.
It is an open-source programming language developed by a wide community of avid
developers across the globe. It is a combination of various packages and
graphical libraries which gets continuously added and upgraded by the
developers and available for free at the R project website cran.r-project.org. This
resource provides over 10,000 packages for programming in R. The interface of R
is called R Studio which is a comprehensive environment that provides the
ability to handle data, code, perform statistical modelling, and for developing
outcomes in graphical or textual format. The R console takes the commands as
input and is evaluated and executed subsequently. R language cannot
automatically detect auto-formatting characters such as quotes and dashes,
hence whenever a code is used from external sources the user should discreetly
use those in the R environment.
Important
Features of R-programming for
dissertation work
Dissertations which involve a large span of data,
available both on public domains as well as extracted from various sources,
have found immense use of R programming from data mining to create statistical
models and visuals. The application of R in data science is immense, and one
can use it to perform simple data mining, statistical analysis to machine
learning techniques. The user can create objects, functions and packages in R.
It is also supported by most operating systems and as it comes as an open-source
licensing and hence can be installed and used by anyone. For its free
availability, it is very commonly used in the academic world but also has
lately found its presence in various industries working in the data science
field.
R combines both the
procedural programming as well as object-oriented programing involving generic
functions and is therefore called as a comprehensive programming language. As
there are already over 10,000 built functions in R hence it provides
convenience for easier programming using these functions to the coder. R is an
interpreter-based language and hence can be portable independent of the
machine. Thus, it is also easy to debug an error in the code. It can handle
complex operations involving arrays, vectors, data frames and other objects
with variable sizes. It also provides robust data handling and storage options.
Besides all these, there is huge open community support for R programming to
provide technical support.
R Graphical User
Interface
R GUI is the standard
interface for working in R. The R console as shown in Figure 1 is the most
essential part of the R GUI. This is the window where R scripts, different
instructions and operations are passed. Several tools are embedded in this
console to facilitate the use of the interface. Whenever one accesses the R GUI
this console will appear.
Fig. 1: R Console in the R
Graphical User Interface
The “File” menu at the top
of this console in the main panel of R GUI should be clicked and then the “New
Script” option should be selected to start a new script in R. To exit an active
session the user should type “q()” after the R command prompt “>” as shown
in the above figure.
R Studio
R Studio is a
comprehensive and integrated development environment for R. It provides one
single window to facilitate editing of codes, bug notification, data view and
output generated from executing the code. It has the facility to access via web
browser and across various platforms. It includes an auto-update feature for
latest releases of R packages and therefore reduces manual intervention. As the
data view is also available on the same window, thus handling and coding on the
data gets convenient for the user. A snapshot of the R studio window is shown
in Figure 2.
Fig. 2: R Studio window
display on MAC OS
Key Components
of R Studio a user should know about
There are four
key components of R Studio that are used while programming on the R
environment.
Source: This space is present in the top left corner of
the window. It is the text editor that provides the user to code within source
scripting. Multiple lines of code could be entered here without executing these
and the same can be saved to files which are stored in local memory.
Console: This is used for interactive scripting in the R environment
and each line of code is executed before moving to the next line.
Workspace and
History: In the top right side
of R Studio, one can find the Workspace and History window. It shows the
history of all past commands that were executed and the list of all variables
of the data used and created during the work.
Files, Plots,
Package and Help: There are four
tabs in this bottom right window. The files tab helps the user to browse
through the files and folders in the computer. The Plots tab shows the graphs
and plots executed from the program if any. The packages option shows the list
of all installed packages, and as the name suggests, Help tab provides with the
built-in support system in R.
Benefits and limitations of R programing language in
dissertation work
Compared to
other technologies there are certain unique aspects of R programming which
makes it a programming language of choice. The graphical libraries available in
R like ggplot2, plotly etc. can help built appealing and customizable plots. As
it is a pure programing language hence there is very less restriction in
developing a plot, model or graph of choice. R can also read different data
formats and can source data from different databases, data files and even from
online web sources thus making it very convenient. It is an open-source
platform hence all its features come bearing no costs to the user. Given its
wide applicability and free availability, there is extensive community support
available.
However, it also
comes with a few limitations and difficulties in use. As it is a programming
language with no inbuilt features like that of other statistical tools such as
SAS, SPSS, STATA etc. hence the user is expected to learn and understand
programming in R. Though the merit of it is an open-source means it has no cost
implication but it continuously gets upgraded and new functions created on an
ongoing basis thus creating a challenge to the user to remain abreast with the
latest versions and capabilities. Thus a researcher working on a dissertation
is expected to understand and know the R programming language well to use it to
its fullest potential.
For more
insight:
Comments
Post a Comment