Its hard to miss “Data Science” or “Big Data” as the two hot topics at present. Wikipedia defines data science in the simplest terms as the science of extracting knowledge from data. The vast potential of data science applications is driving the job market and proportional investment from big companies. Consequently startups working in the area of data science ahave been mushrooming around the world.

Nepal hasn’t remaind untouched by the growing interest in data science. I have been following a string of startups from Nepal working in data science. These are exciting times for startup scenario in Nepal, and it is encouraging to see people experimenting with data science in their startup venture. Here I am listing some startups that I have been following:

- Oval Analytics – Your Data Science Partner
Oval Analytics is the brainchild of Hemanta Shrestha and Saurav Dhungana. Oval is perhaps the first technology company in Nepal with aim to provide data analytics services to local clients in addition to external clients. This is a challenging task given the limited market within the country. Oval Analytics wants to become an important part of the data science community in the country.

- Data Nepal – Nepal Unleashed
DataNepal was a startup with an aim to become the goto repository for

*“socio-economic, demographic, environmental, developmental and geospatial data “*related to Nepal. The data was mainly collected from public domain and made available in more friendly formats (JSON, CSV, XML). - Graph Nepal
Graph Nepal is perhaps the first startup with focus on data visualization and infographics focused on local issues. Visualization is a powerful part of conveying the story based on big data analytics.

- Kathmandu Living Labs
- Cloud Factory

Let me know @sauravrt, about other startups from Nepal who are working in the area of data science, analytics and visualization. I’d be happy to know more of them and add to my list here.

Written with StackEdit.

]]>

I recently visited Washington DC, the country’s capital, with my wife and some friends. It was a three day visit over the Memorial day weekend. This was my second time in the capital. A combination of perfect weather and good company made this a memorable trip for us.

Preparation

Travel: Our plan was to drive all the way to DC. On a normal traffic it should take us around 8 hrs to reach DC from Boston. We rented a car big enough to fit six people. We had three of us who could share the drive. Also we planned to use Waze app and Garmin GPS with live traffic to keep our eye on traffic condition ahead.

Lodging: We decided to try out Airbnb for our stay in DC. After couple of days of collaborative search we were able to find a host who would take in six guests. The host had good reviews from previous guests and place was located behind the US Naval Observatory . So we felt pretty confident about the host and neighborhood.

Day 0 ( May 23, 2014)

The reservation for the rented car had a pick up time of 12 pm, but a call to customer service early in the morning confirmed that we could pick up the car earlier. So three of us who would be driving set off towards the Logan airport where the rental car was located. After quick negotiation we were able to get a slightly bigger car (Chevy Suburban instead of Tahoe). On the hindsight we are glad we made that choice. The Suburban was plenty spacious of six people and had enough luggage space too.

We drove back to our apartment where we loaded our luggage onto the car and by 12 pm we were on the road. Our plan was to head out by 12 pm so that we would reach DC by 9 pm in the evening. So we were pretty pleased with our organization. We took I-90 W all the way to Sturbridge and split off to I-84 to head down south. We were worried that we would hit New York City evening rush hour traffic.

]]>

Unfortunately, due to my internship commitments I wasn’t able to attend the actual conference. My adviser presented the poster on my behalf.

]]>

traceroute tracks the route packets taken from an IP network on their way to a given host. It utilizes the IP protocol’s time to live (TTL) field and attempts to elicit an ICMP TIME_EXCEEDED response from each gateway along the path to the host.

From time to time, I like to run my traceroute to explore how I connect to different websites via my ISP’s network. It is specially interesting for me to to traceroute tests from US to servers hosted in Nepal, as I have some idea about how internet traffic flows in/out of Nepal. Today I’ll present results on traceroute test to Nepal Telecom’s (NT) website. Since NT had optical fiber links through to India and beyond, it will be interesting to see which links are utilized for a packet to reach from US to NT.

**Analysis:**

- Hops 1 – 9 , the packets are still in US
- The trace starts from my router and goes into Comcast network and to BOS (Boston) in hop 4 and comes down to NYC ( New York City ) in hop 7 via routers at Woburn and Needham MA in hos 5 and 6 respectively
- At NYC, the packet drops off from Comcast network to L3’s 10 Gigabit ethernet links. Since NYC is the main landing site for Trans-Atlantic optical fiber cables coming ashore east coast in US, it is expected that the packet going out to Nepal would also follow the same path.
- From NYC , the next hop(9) is to Airtel in India via L3’s 10Gigabit link
- Among several telecom operators in India, NT has bought the largest bandwidth with Airtel, so it makes sense that the route via Airtel’s network is most viable one.
- At hop 11, the packet reached India. This is evident from the jump in round trip delay to ~400ms which translates to ~ 11,000 km. The fiber landing site is most likely Mumbai, India.
- From there on, the packet enters Nepal at hop 12. The router IP 202.70.x.x belongs to NT. The router at hope 12 is a Border Gateway router, most probably at Bhairahawa where most of NT connection goes through to India.
- The the packet goes through Butwal to Pokhara (pkr.btw) in hope 13
- From Pokhara, the packet reaches NT’s Intn’l Exchange Bldg at Patan on hop 14. From there on the packet finally reaches the webserver at NT’s central office at Bhadrakali.

This was a traceroute analysis through Comcast network. Next I’ll do the same kind of analysis for trace through Verizon DSL network.

]]>

**V.34 **is the standard protocol recommended by ITU for modems operating on legacy copper pair. The V.34 allows upto 33.8 kbit/s bidirectional data transfer. ( Refer to Wikipedia for more )

Today I came across a recording of the V.34 dialup modem startup signalling audio sequence (here) and I decided to take a look at its spectral content. The figure below shows the temporal and spectrogram plot of ~18s of of signalling sequence. ( The total startup time for V.34 modem is about 10 – 13s)

Then I looked up the start up signalling sequence for V.34 protocol and found this paper. Briefly the startup signalling involves four phases which can be summarized infollowing steps ( focus on frequency content of signalling signals ):

Phase I ( Network interaction )

- A 2100Hz answer tone modulated with 15Hz sine wave is exchanged. ( The 15Hz modulated sine wave is not distinct in the spectrogram, but I will take faith on the specification for V.34 that it is present )

Phase II ( Ranging and probing )

- This phase involves three steps : Initial information exchange [INFO0], Probing & Rangin and a second information exchange [INFO2]
- The information exchange is done at 600bps using DPSK modulated FDM tones at 1200Hz and 2400Hz
- Probing is used to estimate channel characteristic. The probing signals consists of set of tones 150Hz apart starting from 150Hz to 3750Hz. However, tones at 900, 1200, 1800 and 2400Hz are omitted.

Phase III ( Equalize and training )

- This phase consists of a series of signals transmitted between the calling and the answering modem. The exchange consists of a sequence of scrambled binary 1s for fine tuning of the equalizer and echo canceller, and a repeating 16-bit scrambled sequence indicating the constellation size that will be used during. These scrambled sequences are transmitted using a four-point constellation. The scrambled sequence occupies the entire channel bandwidth.

Phase IV (Final duplex training )

- This phase consists of a sequence of scrambled binary 1s using either a 4- or 16-point QAM constellation.

I have tired to identify these sequence of events in the spectrogram above ( Larger version ). All of the signalling sequences listed above can be identified in the spectrogram. There was at least two set of signalling tones that I could not associate with the specification on V.34 protocol.

( Spent couple of hours this afternoon doing this exercise. Coming around more than 10 years after the days of dialup, this was a nice trip down the memory lane and moreover I can see what was going on the scene everytime my modem dialed up to the ISP )

]]>

In public key cryptography, the key has a *public part* and a *private part*. The public part is made known to everybody where as the private part is kept secret by the receiver ( My PGP public key ). Anyone who intends to send a message to the receiver encrypts the plaintext using the public key corresponding to the receiver. Once encrypted using the public key, the ciphertext can only be decrypted using the private key, which is safe with the receiver.

RSA is a public key cryptography algorithm jointly developed by R. Rivest, A. Shamir and L. Adleman and it was described in a paper in 1978. The name of the algorithm comprises of the first letters of the three authors surnames. The algorithm was originally patented by M.I.T. but was released to public domain in September 2000. The algorithm has three steps (1) Key generation (2) Encryption (3) Decryption.

**Key Generation**

The RSA key pair is generated as follows

* Generate a pair of prime numbers $latex *p$* and

* Compute $latex *n = pq$$*

* Compute the Euler’s function

* Find an integer $e$ such that and is coprime with i.e. $gcd(e,\phi(n)) = 1$.

* Find another integer such that . This is determined using extended Euclidean algorithm which gives where $k$ is some integer.

The public key consists of the pair and the private key consists of the pair .

**Encryption and Decryption**

RSA algorithm uses modulo exponentiation operation for both encryption and decryption. The plaintext is first converted to numeric codes before they are encrypted. For instance, the letters in the plaintext are represented as integers for example ‘a’ = 00, ‘b’ = 01 ‘z’ = 25. Once the plaintext is represented by numeric codes the ciphertext is generated as

The receiver decrypts the ciphertext using modulo exponentiation operation with private key pair as

The decryption works as follows:

Now according to the Fermat’s Little theorem, for any integer x and prime number p (which is not a factor of x), . Also by definition of the Euler’s function . Thus

This is true even when . Following similar argument for the prime number q,

Combining above two equations according to the Chinese Remainder Theorem, we get

.

Hence

( A complete explanation is available in the original paper )

As a second part of the project, I implemented a simple version of RSA algorithm in Python . The program can generate an RSA public and private key pair, encrypt a plaintext string and recover original message from the ciphertext. The keys generated are eight digits long. The plaintext can be a string ( Roman alphabets only for now, no special characters ). The program can be downloaded here.

]]>

A year later, I have decided I want to get the USRP up and running so I can do some cool stuff besides some abstract mathematics. I set out to install the device on a new system ( Ubuntu 10.04 LTS ). And it turns out that Ettus has done a good job of providing a much more detailed documentation on setting up the N210. Here I have made a list of topic and related links that a newbie may encounter when starting with the N210 USRP or in general N-series USRP from Ettus.

**The safest / easiest way to setup Gnuradio with UHD environment on Ubuntu is to use the build-gnuradio script:**

**N210 has issues with the pre-installed firmware and the FPGA code and hence it needs to be updated before the PC can talk with it. The firmware and FPGA images can be downloaded from here **

The LEDs on the front panel can be useful in debugging hardware and software issues. The LEDs reveal the following about the state of the device:

**LED A:**transmitting**LED B:**mimo cable link**LED C:**receiving**LED D:**firmware loaded**LED E:**reference lock**LED F:**CPLD loaded

]]>

This method heavily relies on the symbolic computation methodology. In parallel with the development of the mathematical framework for the polynomial method, the authors have developed a Matlab toolbox called the RMTool which leverages the symbolic computation capability of Matlab. However the RMTool uses the Maple Symbolic Toolbox for Matlab which Mathworks no longer supports (Above version 2009a). The original RMTool might even have been written in Maple itself and later forked to a Matlab toolbox. This has made RMTool extremely platform dependent tool preventing it from being used by other researchers. The method was proposed sometime in 2005 but I can hardly find an alternative reference on this topic or its application. It seems me that the rigidity of the RMTool may be limiting its potential as a powerful tool in random matrix analysis and application to engineering problems.

I have been looking at possibilities of porting the RMTool to the new Matlab Symbolic Toolbox which is based on MUPad. I am also planning on developing a new package ( from scratch :o) on a more robust and compatible platform ( Python with Sage ). But these ideas are still in early phase and need a thorough understanding of the polynomial. method.

]]>

#!/usr/bin/python # thousandPrime.py : Finds the 1000th prime number, fork of Problem Set 1 MIT 6.00 # Programmed: Saurav R Tuladhar # Date: Oct 7, 2011 # Declare state variables counter = 1; # Counts number of primes idx = 1; testNum = 2*idx + 1; # Odd numbers > 2 as candidates primeList = [2]; isPrime = True while counter < 10000: for x in primeList: if testNum % x == 0: # Only check for prime numbers < testNum. Based on Fundamental Theorem of Arithmetic. isPrime = False break if isPrime == True: primeList = primeList + [testNum] counter = counter + 1 # Reset variables isPrime = True idx = idx + 1 testNum = 2*idx + 1 print primeList[-1]

]]>

The chairman of the conference was my advisor Dr. John R. Buck. My lab partner David Hague gave a talk on his Compressed Sensing based active SONAR model inspired by bat’s biosonar capability.

Here are my reflections on the conference:

- The conference began with a reception banquet dinner where G. Clifford Carter was awarded the UASP Award. Apparently it turns out that G. C. Carter invented the Generalized Cross Correlation (GCC) method for time delay estimation. The acronym for this method matches the initials of Carter’s name.
- Although the conference was on underwater signal processing, there were three plenary sessions on underwater autonomy which mainly dealt with robotics and control systems oriented design problems for underwater deployment of autonomous vehicles. I was a bit disappointed to see very less of signal processing. However where were one session each on Array processing , Noise Modeling and Acoustics Communications which were in the ball park of my interest.
- The navy seems to have a huge interest in developing unmanned underwater autonomous vehicles and there a lots of companies and academic laboratories working on this area. I am not particularly interested on the navy’s perspective on this, but as far as I understand the systems development has largely shifted towards being software based design.
- Large fraction of presentations were focused on military (navy) applications or the signal processing problems they were trying to solve were from military applications point of view. The focus on military applications was a bit too much for my liking.
- Certainly there are some civilian applications of the results from these research.
- There were few presentations on Synthetic Aperture SONAR (SAS) and I came to an understanding that synthetic aperture is analogous to taking multiple photographs and stitching them together to form a panorama.
- There was a presentation by Aurther Baggeror on why MFP failed. My perception was that no body was sure why this particular method failed, but they already knew it had died.
- Interesting discussion on Coherence, brought up by Henry Cox.
- It was satisfying to see a large fraction of presentations using real field data for validation of their results. In computer simulations everything works :D.

]]>