Integer factorization in Flint: The Multiple Polynimial Quadratic Sieve
whoami
 Name : Kushagra Singh
 University : IIIT Delhi, India
 Email id : singh.kushagra.1996 [at thingy] gmail.com
 Github handle : kush789
I will be blogging about my progress in this blog.
lscpu
I am an 18 year old freshman student majoring in Computer Science. I have been using Ubuntu 14.04 LTS for work since the beginning of this year, and love it. I use Windows also when I feel like playing games and OSX sometimes, as it came preinstalled on my computer. Recently I stopped using IDE's for development, as they tend to make things overcomplicated. I respect the KISS principle, Keep It Simple, Silly! And hence, Sublime Text and the terminal is the combo for me.
I have been programming in C for over three years now, and am quite at home with it. I am well versed as to how git works. Other languages I am comfortable with are C++ and python. I also have a fair knowledge of Django, and deployment using nginx and uWSGI. I love to take part in hackathons.
I have taken core computer science and math courses relevant to this project, such as Data Structures and Algorithms and Linear Algebra. My end semester examinations get over in late April, leaving me with enough time to get ready for my GSoC project if I am selected. I have no commitments during the summers, and will be able to devote at least 40 to 50 hours a week toward my GSoC project.
git log
I have been an open source developer since last year. I started to contribute towards FLINT in January 2014. My contributions towards FLINT are:
(Merged) Enhanced ulong_extras.h/n_is_prime_pocklington  #110
I enhanced the pocklington primality test implemented in the ulong_extras module. My first patch took multiple rounds of refactoring, as I wasn’t used to FLINT code conventions.
(Unmerged) Rootrem_functions in ulong_extras module  #119
I implemented seven functions using different algorithms to find the cube root of an integer, and to calculate the nth root of an integer. The fastest methods were combined to create the function n_cbrt(), which returns a cube root very quickly. A list of functions implemented are:
 n_cbrt()
 n_cbrt_binary_search()
 n_cbrt_neweton_iteration()
 n_cbrt_chebyshev_approx()
 n_rootrem()
 n_cbrtrem()
 n_root()
Resolved bug in n_factor_partial()  #111
I found a bug in nfactorpartial() while enhancing n_is_prime_pocklington(), which I later solved in this patch.
Project Abstract
Integer factorization has always been an important and widely discussed problem in mathematics. A number of fast algorithms have been invented and implemented in various biginteger libraries over the last 30 years. The quadratic sieve algorithm, invented by Carl Pomerance in 1981 is one of the faster algorithms. Up to a 100 digits, it is certainly the fastest.
Currently FLINT lacks a robust module for big integer factorization. Currently there exists a very fast implementation of the William’s p+1 factoring algorithm, which can be used. However this algorithm is not always successful as it requires p + 1 to have small prime factors.
The current qsieve module in FLINT has a self initiating quadratic sieve. However it has issues with it, some of which are :
 In case factors are not found, it does not go back and try searching for more relations.
 The qsieve has been implemented for numbers only upto two limbs.
 Further the number is multiplied by a constant. So the upper bound on the number that can be factorized is smaller than two limbs
In fact for smaller numbers, it is just not the right kind of a quadratic sieve to be used. Rather a simple multiple polynomial quadratic sieve with no self initialization should be used. The target of my project would be to implement a double large prime variant of the multiple polynomial quadratic sieve that is capable of factoring a 60 digit number in 8 seconds on a single core, and a 100 digit number in under a day.
man ECM
Factorization using the Elliptical Curve Method
For our advanced variants of the QS, as explained below, we would be needing an auxiliary factoring algorithm, that can factor upto 40 digit numbers having 3 10 to 15 digit factors. For this task, a fairly optimized implementation of ECM algorithm would be necessary.
ECM method of factorization, suggested by Lenstra, makes use of elliptical curves for factorization. An elliptic curve is defined by the cubic
where a and b are real numbers. If y^2 does not contain any repeated factors, or in other words 4a^3 + 27b^2 is not 0, then we can use this curve to form a group. This group contains all real points lying on the curve, and a special point 0 at infinity.
The ECM algorithm has two phases. A brief description of the algorithm is given below.
Phase I
 In the phase I, we select a random elliptical curve y^2 = x^3 + ax + b (mod n) and a non trivial point P_0(x_0, y_0) on it. This is done by selecting random a, x_0 and y_0, and then calculating b by putting these values into the equation of the curve.
 Next we calculate, ∆E = 4a^3 + 27b^2. Let g = GCD(∆E, n). If g equals n, we start over with different values (a different curve altogether). If g is an integer between 1 and n, we have successfully found a non trivial factor of n. Else if g equals 1, we carry on to the next step.
 Next we select a smoothness bound B_1 for phase 1. For each prime p < B_1 and B_1  p, we multiply the values of p^(log _1 / log p). Let's call this product k.
 After calculating the value k, we calculate k times P_0. Let's call this point P_k. Let g = GCD(z_k, n). If g is an integer not equal to 1 or n, then we have found a factor. Otherwise if it is n, then we go back and start with a new curve. In case it is 1, we proceed to the phase II.
Phase II
 In this phase, we select another upper bound B_2, greater than B_1. For each prime B_1 < p < B_2, we carry out P_k (from phase 1) times the prime p. We also use a variable d, which holds the product of the z coordinates of the P_k's.
P_k = P_k * p (z_k is the z coordinate of P_k)
d = d * z_k Then we caclculate the GCD(d, n). Let it be g. If this value is greater than 1 and less than n, we have successfully found a factor. Other wise this method fails, and we start over with a new curve.
man QS
The Quadratic Sieve Algorithm
Factorization by the trial division method was first improved upon by Fermat. He proposed that if n can be expressed as a difference of two squares x**2 and y**2, then n can be factorized, as ( x + y ) * ( x  y ) = n. Albeit it is made things faster, it is not really faster when it comes to factoring large numbers.
A Quadratic Sieve works on the same principle. It tries to find two numbers x and y such that x is not congruent to ±y (mod n) and x**2 is congruent y**2 (mod n). This would mean that ( x  y ) * ( x + y ) is divisible by n. Hence we can factor n by finding out the GCD of (x+y, n) and (x  y, n).
Since finding two different numbers which satisfy this condition may be time consuming, we try to find a set of i integers integers x_i, such that Q(x_1)*Q(x_2)*...*Q(x_i) is a perfect square mod n, where Q(x) is defined as:
Another selection criteria for x is that Q(x) should be Bsmooth. B  smoothness means that a number can be expressed as a product of prime numbers less than a number B.
The algorithm can be partitioned into four sub parts.

The setup stage

Selecting smoothness bound B.
The right value of B will have a huge impact on the speed of the quadratic sieve.Initially, we can take an estimate, mentioned in Carl Pomerance’s paper on Smooth numbers and the quadratic sieve [1].
B = exp( (½ + O(1))(logn log log n) ** 1/2 )
The advantage of choosing a small B is that we don’t need many relations in latter stages. However if we choose a very small B, we may not find sufficient relations. A large B will also lead to more time being taken when we calculate the null space. This parameter’s exact value would be tuned heuristically once the rest of the algorithm has been implemented. As mentioned in Prime Numbers, A computational Perspective, the optimal Bvalue is more of an art than a science, and is perhaps best left to experimentation. [2] 
Generating a factor base
We need to select a set of integers x such that x**2  n is B  smooth. If a prime p in the factor base divides Q( x_i), then ( x ) ** 2 is congruent to n mod p. Hence n is a quadratic residue of p. From this we can conclude that the factor base will include only prime values less than or equal to B, such that (np) = 1 (the Jacobi symbol). Hence we don’t need to include all prime numbers less than B in our factor base.

Selecting smoothness bound B.

Searching for relations
This is the most time consuming part of the algorithm. We compute y = Q(x) for different values of x in the sieving interval that we define, and check whether y is completely factorized by the numbers in our factor base. If this is true, we can say that we have found a relation. If our factor base contains K primes, our aim is to find at least K + 1 B  Smooth numbers. We store the prime factorization of y (mod 2) in a matrix. In practice however, we aim to find at least K + 64 relations. This gives us a greater chance of finding a non trivial factor. Of course there may be cases in which these number of relations might also not be sufficient. In such cases we have to go back and sieve for more relations.

Selecting a sieving interval
The interval over which we sieve should be tuned so that we are sure that we get sufficient relations. In case at the end of the sieving stage we do not get sufficient relations, we go back and check for relations in a greater interval. As for B, this value will also be tuned heuristically once the algorithm has been implemented completely.
It will be beneficial to select an interval such that x lies in the range M to +M. This is done so that Q(x) is small, and hence has a greater chance of being B  smooth. An initial value of M can be estimated as B ** 3. [3]

Selecting a sieving interval

Removing duplicate relations
We will have to filter out duplicate relations before building the relation matrix. Multiplying duplicate relations will just give a square of itself, and will not aid in finding a non trivial factorization of n.
The large prime variant of the self initialization QS makes this task trivial. This is explained in more detail below.

The relation matrix
The i'th row of the relation matrix is the exponent vector mod 2 of y_i. Our aim is to find a set of y’s such that their product is a perfect square. This means that each prime in the factorization of their product has an even power. This is equivalent to finding a nontrivial linear dependence between the rows in the matrix. This gives us a set of y’s such that their product will be a square.

Factorization
We multiply the y’s (mod n) which form a perfect square (we found this set in the previous stage) to get y’’ ** 2. We need to calculate the square root of y’’ ** 2. However this is easy as we know the prime factorization of the y’s.
Next we multiply the x’s (mod n) corresponding to the respective y’s to get x’’. We can calculate the factorization of n by calculating GCD(x’’ + y’’, n) and GCD(x’’  y’’, n). If this gives us a trivial factor, we go back to Stage B and sieve in a greater range.
Since we are finding more than the number of relations just required (to be safe), it makes sense to use them to find multiple x’’ s and y’’ s and again apply GCD to find multiple factors of n. This will make sure that we don’t waste the computing effort we have put into finding these extra relations.
Increasing the Efficiency of the basic QS  (algorithmic optimizations)
The QS algorithm discussed so far in the proposal doesn’t seem too complicated to implement. However this particular version would be painfully slow. The more vital and involving part of the project would be to increase its efficiency. I have the following ideas in mind at the moment which I would be trying out. Undoubtedly, the most time consuming stage is the sieving stage. There are multiple ways that I plan to use to increase the speed of the sieving stage.

Finding relations smartly (Sieving)
While checking for B  smoothness of a y, we need not necessarily divide Q(x) for each x by primes in our factor base. If suppose we find that p  Q(x), then this implies that p  Q ( x + p ), pQ(x+2p), … , p  Q ( x + np ). So if we find the beginning x such that Q(x) is divisible by p, we need not check for every single number. Two such beginning points will exist for each prime in our factor base. Once we find them, we can skip p numbers starting from those numbers while checking.
These initial locations for each prime can be found out by taking a square root of n modulo p. A fast implementation of the Tonelli Shanks algorithm, which is the algorithm used for this problem already exists in FLINT’s ulong_extras module. However it would be necessary to port it as it might not be compatible with the data structures we might be using.
We don’t actually keep dividing a Q(x) repeatedly by the prime p for which we are sieving so as to get the actual factorization. A lot of this effort might turn out to be futile in case Q(x) is not B  smooth. Actually, we initialize a sieving array SI which is the length of our sieving interval, initialize it to zero and then add log(p) to SI[x] in case p Q(x) . These values can be precomputed and stored. At the end of sieving, we select those elements from our interval such that SI[j] is log ((½)M) + 0.5 * log ((½)N) [6]. These will give us relations, which we can later factorize.
After the sieving stage, we can, we can use Hansel Lifting to calculate which power of a prime divides Q(x), instead of dividing it repeatedly. This algorithm isn’t implemented in FLINT as of now, it would have to be done.

Fast Matrix Operations
Calculating the null space by the Gaussian method is a slow approach. Block Lanczos algorithm is a much better alternative that helps in calculating the null space of the matrix in GF2 very quickly. A very fast implementation of this algorithm is already present in the qsieve module. Although it will need modification if we attempt to implement a double prime variant of the QS. We can even think of using an existing implementaion of a parellel version of Block Wiederman algorithm, in case we consider to parallelise the linear algebra involved.

Using multiple polynomials
Instead of using just Q(x) = x**2  n, we can use multiple polynomials to find relations. This increases our chances of finding the desired number of relations in the given sieving interval. Further more, we can even decrease our sieving interval in case we use multiple polynomials [4].
In out MPQS,we will use polynomials of the type Q(x) = (ax) ** 2 + 2bx + c. We keep using multiple polynomials till we get the minimum number of relations we want to find.
Selection of the coefficients of the polynomial
 We select a such that it is a square. Then we choose a b such that b ** 2 ≡ n (mod a). This means b ** 2 ≡ n (mod p) for each prime p  a. This means that (n  p) = 1 (the jacobi symbol) for each prime p  a. Since all the factors in our factor base already satisfy this condition, It makes sense to use those primes to select a. The ideal value of a is a ≈ (√(2n))/M.
 b is selected such that 0 ≤ b < a and b ** 2 ≡ n (mod a).
 Finally c is chosen such that b**2  ac = n.
Multiplying Q(x) with a gives us,
aQ(x) = (ax)**2 + 2abx + ac
aQ(x) = (ax + b )**2  ( b**2  ac)
aQ(x) = (ax + b)**2  n
aQ(x) ≡ (ax + b)**2 (mod n)
Since we have chosen a as a square, clearly Q(x) should also be one. The main motivation behind using this approach is that by using several polynomials, we can make our sieving interval small. And in turn, Q(x) becomes small and hence there is a greater chance of it being B  smooth.

Self Initialization
The main motivation behind using different polynomials was to decrease the sieving interval. However selecting different polynomials may be a very expensive affair, as for each polynomial, we need to select the coefficients. Further we also need to solve the equation relation Q(x) = 0 (mod p) for each prime p in our factor base, to find the initial position to begin sieving. Not doing this would simply defeat the purpose of sieving.
Self initialization is a trick that lets us use multiple polynomials, and at the same time doesn’t take a lot of time in the pre  sieving stages. The trick is to fix the constant a, obviously still selected as per the constraints stated above. We can generate multiple polynomials by changing the values of b. There will be 2**(k1) values of b which we can use (not 2 ** k as one value in the factor base is 1, so half of them would be the same, just the opposite sign and hence will give us duplicate relations.). Here k is the number of elements in the factor base.
Hence the problem of finding Q(x) ≡ 0 (mod p) for each polynomial and for each prime in our factor base can be done in at once. This can help us as we now need about one tenth of the factors in our factor base when compared to the original QS [5].
A different approach to self initialization [6], gives much better performance. In this paper, the author suggested us to select the value of the coefficient a as a product of q0*q1*q2*...*qr. Out of these r + 1 primes, we fix r (from our factor base) and allow q0 to take prime values immediately following the bound B. This solves two purposes.
 It makes the process of removing duplicate relations trivial. If suppose for two values of a, q0*q1…*qr and q0’*q1’*...*qr’, the relation found is duplicate, then the sets {q0,...qr} and {q0’,...qr’} has to intersect. Fixing q1,...qr makes the task of removing duplicate relations trivial as any duplicate will contain the q0’ as a large prime.
 In this approach we can precompute all values which we need in the sieving beforehand. More details can be found on page 5 of this paper.

Using a pre multiplier (Knuth  Schroeppel algorithm)
It is often beneficial to factor k*n, instead of n (for a small value of k). The Knuth  Schroeppel algorithm picks out the optimal multiplier k such that (k*n  p) = 1 (the Jacobi symbol) for as many small primes as possible. This increases the number of small primes in our factor base.The Knuth  Schroeppel algorithm is used to find this optimal multiplier k.
This is advantageous, as the difference (or the step) between the roots of Q(x) for two small primes is obviously lesser than larger primes. Hence smaller primes contribute more to the sieve than larger numbers. Hence using a suitable multiplier increases the probability of finding relations.

Small prime variant
When we sieve our interval with a prime p, the number of sieving steps for a prime is 2M/p. This value is large in case p is small, and it’s log(p) value does not contribute much to total SI[x] value. Pomerance suggested that it is advantageous to not sieve with the smaller primes due to these reasons.
To compensate, we must reduce the threshold for selecting a particular x from SI[] so as to not miss any B  smooth number from our interval. The only drawback is that it might give us some wrong numbers, which are not B  smooth. However we can always check this later on and is a small price to pay in return of the time we saved in sieving.

Using partial relations (Single large prime variant)
In some cases, Q(x) might not be B  smooth. However in many cases, the factor of a potential B  smooth number is prime exceeding B. These are called partial relations. Essentially, Q(x) = p * q, where p is B  smooth and q is a prime number greater than B. If there exists two such partial relations with the same prime factor q, then their product will be a B  smooth number multiplied by a square. Since we need a square root of the Q(x_i)'s, we can use this relation as the x_i calculated in the end (product of the Q(x)'s) would be a number multiplied by the the large primes (provided we find two relations which have the same large prime. In other cases we don't take the relation.). We can take care of this fact in the GCD stage, while calculating the actual answer.
After the sieving stage, we can easily check whether the number remaining after the factorisation of Q(x) is a prime or not using the existing functions in FLINT.

The double large prime variant
The single large prime variant can be extended to cases where Q(x) is of the form p * q * r, where p is B  smooth and q and r are prime numbers greater than B. These are known as partial partial relations. In these cases, we need to find sufficient relations such that their product will yield a product of B  smooth numbers, multiplied by a square. Usually we select an upper bound for the value of the primes p and q, approximately B**2.
If we have three pp relations like (b1, q, p), (b2, q, r) and (b3, q, r) (b_i's are Bsmooth numbers), then these three can be multiplied to get a full relation (since each large primes is two in number, hence the square root can be calculated and we just need to take care of this fact in the GCD stage.). Similarly, ppp relations can too be combined to form full cycles.
In case of the double prime variant, obviously we cannot expect all partial partial relations to have the same large prime factors. There could be cases that this might happen, but generally this won’t be true. For example if we have large prime factors (p, q), (q, s), (s, r), (p, r), then multiplying all of them together would give us a full relation.
The two main problems that arise are cycle counting and cycle finding. We need fast algorithms that can detect cycles so as to combine possible pp relations and partial relations to form full relations. We also need a cycle counting algorithm, so that we know when to stop sieving for relations. Quite a bit on the implementation of these two algorithms has been talked upon in [8].
For cycle counting, we construct a graph with the a vertex for each large prime that appears atleast once. Now for example Q(x) = B * p * q, then an arc is constructed between the vertices p and q. There is also a special node labelled by 1. Note that since a partial relation is stored as (p, 1) (in case of PPMPQS), an edge would exist between p and 1. If C are the number of disconnected components of the graph, "A", the number of arcs/edges and "R" the number of relation in the graph, then we can calculate the number of cycles present in the graph by the formula :
cycles = C + R  P For cycle finding, I plan to implement an algorithm mentioned in [8]. It makes use of two hash tables :

Relations by Prime (RBP)
Each prime present in the partial partial and partial relation has a key, which will map to a linked list containing the relations which contain that prime.

Primes by Relation (PBR)
Each relation gets mapped to two linked lists, one containg the large primes that the relation has, and the second being a "chain". This would essentially hold the cycle. Initially it is empty.
The algorithm is as following. For each relation, say r_0 in the prb hash table, if the relation is partial, we retrieve the list of relations from the RBP table, corresponding to the large prime of the relaion in hand. We take each relation r_i in the relation list (apart from r_0). If r_i is a partial relation, then r_i and r_0 form a cycle, which can be found out using the "chain" (the other linked list in the PBR table) of r_0 and r_i. If r_i is not a partial relation, then the "chain" of r_0 and r_0 itself is appended to the chain of r_i, and the large prime of r_0 is deleted from the prime list of r_i. When all the list of relations corresponding to the large prime of r_0 gets exhausted, the large prime is deleted from the PBR table, and r_0 is deleted from the list corresponding to the large prime in the RPB table.
To implement the double large prime variant, there are four main subparts. These are:

Splitting the relations into the large prime factors.
For this we would have to implement a fast factoring algorithm. I plan to implement both the Polard rho, and the ECM. Details on this is mentioned below. We can use existing pseude prime checks to check whether the remainder after factorization of Q(x) is prime or not. If it fails to be proven as a composite number, we can simply leave it would be too large to be used as a single prime, even if it were proven to be so.

Remove singletons and duplicates
This would involve removing the partial partial relations which cannot possibly form a full relation. This would happen when there is a prime in one of the pp relations which occours only once in all the pp relations.
This can be done by checking the large primes of all the relations, and storing the number of times a prime appeared in all the relations. If this value is less than two, then the relation is removed. Note that it is possible that removing one singleton can lead to removal of further singletons. Hence, this process is repeated till no further relation is removed.
 Implement cycle counting (discussed above)
 Implement cycle finding (discussed above)

Relations by Prime (RBP)
Increasing the Efficiency of the basic QS  (implementational aspect)

Optimizing for cache misses while sieving
As mentioned multiple times, the most time consuming part is the sieving. Although each element would be only a byte , the size of the sieving array SI will still be huge. In order to make use of the L1 cache, the complete sieving interval must be broken down into multiple intervals and then sieved over. Once the sieving is done in one interval, if any relations are found, they are saved and the process is repeated again for other intervals.
The size of the intervals depend upon the size of the L1 cache. This is also known as cache blocking. The size will further reduce in case we are parallelizing the sieving. However parallelization is not the main aim of this project.

Saving space while storing data
It will be better if we pack data as tightly as possible. Generic FLINT data types can not really be used while we are implementing the QS. For example, the Sieving Array will be an array of 8 bit elements and not a C int. Another example would be the factor base. Any factor in our factor base is not expected to cross 15,484,863 (the 100, 000th prime. Even while factoring the RSA  129 moduli, a 600, 000 factor base was used). So all the primes can be saved in just 4 byte integers. Smaller primes can be stored in smaller spaces rather than allocating the same space for each prime.
In the sieving stage, it would be necessary to save the probable relations and data in a file. While factoring a large number, there won’t be enough space to save it in the memory.

Using parallel processing
The sieving stage of the algorithm checks for B  smoothness of Q(x), over a range of x. Since checking for one value of x is independent of any other value of x, we can distribute the load on to multiple cores of the computer if available. This would be incorporated at the end of the project if there is sufficient time, since our target is to implement a fast variant that works on just a single core.

Using SIMD
In most places in our QS, we would be packing data as tightly as possible. This would allow us to use SIMD and parallely carry out simple arithmetic operations (this should not be mixed up with sieving parallelly on different cores). An example would be while iterating in the sieve to find relations. Since we have a uint8_t array, we can utilize SSE and speed up the process. For example, the following use of SIMD gave me a significant speed up (almost 4x).
__m128i temp_1, temp_2, temp_3; int i; uint8_t a[16000]; uint8_t b[16000]; uint8_t c[16000];
Instead of iterating through each element and adding like this,
for (i = 0; i < 16000; i++) { c[i] = a[i] + b[i]; }
using SSE instructions (adding 16 elements at a time)
for (i = 0; i < 16000; i += 16) { temp_1 = _mm_loadu_si128((__m128i *) a[i]); temp_2 = _mm_loadu_si128((__m128i *) b[i]); temp_3 = _mm_add_epi8(temp_1, temp_2); _mm_storeu_si128((__m128i *) c[i], temp_3); }
gave almost a 4x improvement. The conventional method took 0.00009600 seconds, while SSE instructions brought it down to 0.00002500 seconds
Although while using SIMD, we will have to be cautious so that any compatibility issue isn’t raised (in case the instruction set isn’t supported by the architecture).Availability of SIMD instruction sets on the machine would have to be checked for beforehand, and accordingly be used.
from __future__ import plan
I plan to divide my target into the following smaller goals :
 Implement a naive quadratic sieve.
 Use multiple polynomials to sieve relations.
 Implement Polard Rho factoring Algorithm
 Implement the ECM factoring algorithm for relatively smaller numbers
 Change sieve to give partial, partial partial and partial partial relations
 Remove duplicate relations and singletons
 Implement the cycle counting algorithm
 Implement the cycle finding algorithm
 Combine relations
 Port the linear algebra to be used (Block Lancoz)
 Use GCD to find factors
 Implement the small prime variant
 Use SIMD while sieving
 Implement sieving parallely in multiple cores. Our primary aim is making it efficient on a single core.
If time permits, I would also like to do the following, in order of preference:
At the end of the summer, FLINT will not only have a fast triple large prime variant of MPQS, but also an optimized version of the ECM and Pollard Rho for numbers upto two words (128 bit numbers).
My end semester exams get over on 30th April, giving me sufficient time to prep up in the community bonding period for the coding period. The community bonding period would be a crucial stage of my GSoC project. In these four weeks I plan to accomplish the following tasks:
 Study the algorithm in more detail. There are subparts and ways to make sieving more efficient which I would like to know more about. This period would be the ideal time to do this work. I would also utilize this period to communicate with my mentors and work upon the execution path in more detail.
 I would go through the existing qsieve module and understand the existing implementation. Since it works like a charm for numbers up to two limbs, it is a nice starting point for us.
 Chalk out the data structures I will be using to implement the quadratic sieve. These will have to be thought out very carefully so as to save space.
 I would implement as many helper functions as possible in this period. For example, the Hansel lifting algorithm.
I would be attending a family member's marriage from 6th May to 12th May and hence would not be availible during these seven days. Apart from these dates, I don't have any other commitments during the summer.
My goal before the mid term evaluation would be to implement a naive multi quadratic sieve, and the factoring algorithms which will be used later on. These include the Pollard rho and the ECM algorithm (for numbers upto 2 words on a 64 bit machine). After the mid  term evaluations I would be implementing the triple large prime variant, which would use the factoring algorithms coded in the previous part of the project. I have mentioned the strech goals above, which I will implement if time permits. At the end of the summer, I aim to implement a fast variant of the quadratic sieve that can factor a 60 digit number in 8 seconds on a single core, and a 100 digit number in under a day.
It goes without saying that no function would be pushed on to the git repository without associated tests and proper documentation.
./run
Execution of the plan

Week 1 (May 25)
 In the first week I will be deciding the initial values of the smoothness bound B and sieving range M. Although these values will be heuristically tuned later on, an initial estimate can be taken from mentioned in Carl Pomerance’s paper on Smooth numbers and the quadratic sieve [1].
 The second task I would be carrying out would be to generate a factor base. What I have in mind is that the factor base would be a structure, comprising of an 4 byte integer array of factors and some other constants related to the factor base, such as the number of factors in the factor base. I would use a combination of precomputed array of primes and the sieve of Eratosthenes for this.
 I would be coding a sieve() function which would sieve for relations for x in the range [M, M] such that Q(x) is B  smooth. This would make use of the simple polynomial Q(x) = x**2  n. The sieve function will have to be designed so as to minimize cache misses. I will be coding it so that it uses multiple polynomials. Generation of the coefficients of the polynomial has been explained above.

Week 2 (June 1)
 Writing the relation data to a file would be required. There might not be sufficient space in memory to store all of it in case we attempt to factor a huge number. This will be taken care of in this week.
 In this week I will also code the Pollard Rho algorithm for factoring "smaller" numbers. It will not be the original one that Pollard suggested, but an optimized one using a faster cycle finding algorithm (The Monte Carlo Factorizing algorithm). This shouldn't take a lot of time. One existing implementation of the algorithm I have seen is in the MIRACL library. It should take a couple of hours, or a day at the max.
 In this week I will also begin working on the implementation of the ECM algorithm. First I would code the structures required for a an eleptical curve. The exact design of these structures will be finalized in the community bonding period, after further discussions with the mentors.

Week 3 (June 8)

Once the structures for ECM are done, I will work upon the curve operations. These include
 Point addition
 Point Doubling
 Scalar Multiplication of a point
 After the curve operations, I will code the funtion which generate the elliptical curves. Once this has been done, I will begin working on the Phase I of the ECM. This should be completed by the end of the week.

Once the structures for ECM are done, I will work upon the curve operations. These include

Week 4 (June 13)
 This week will be used to extend ECM to phase II. As discussed with community mentors, this need not be a very advanced version of the phase II, as the numbers which we are targetting to factor, are relatively smaller (It is an auxiliary factoring algorithm in this project, after all).
 Once both phase I and phase II are done, the rest of the time will be used to optimize the ECM algorithm.

Week 5 (June 21)
 I plan to keep this week as a buffer week, to catch up in case I am running behind schedule.
 In this week I will also add the tests for the auxiliary factoring algorithms (ECM and Pollar Rho).
 At the end of this week, mid term evaluations would be conducted. I would have completed my mid term goal, i.e. to implement a naive sieve (only till the sieving part) that uses multiple polynomials, and the auxiliary factoring algorithms (ECM and Pollard Rho).

Week 6 (June 29)
 In this week, I would get back to working on the Quadratic Sieve. I would begin by altering the sieveing so that it produces partial, partial partial and partial partial partial relations as well and writes them to a file in a required format. This format would be decided later on.
 An upper bound up till which the large prime factor can be would have to be decided. In most papers and implementations this value has been taken as B**2.
 This large composite or prime factor of Q(x) (after dividing it with the factors in the Factor Base), can be factored using ECM and Pollard Rho which will be coded in the intial half, along with the existing qsieve in FLINT. This relation will be saved to a file only if the large composite prime can be exporessed as a product of two or three primes. I will use the ECM and Pollard Rho coded in the initial half of the summer for the factoring, and existing FLINT functions to check whether the factorized numbers are prime.

Week 7 (July 6)
 In this week, I filter the non full relations so that they can be used further. This would include removing duplicates and removing singletons. A partial relation a * q (where a is B  smooth) would be stored as (q, 1, 1), pp's as (q, r, 1) and ppp's as (q, r, s).
 This week would also be used to implement the cycle counting algorithm. As explained before, a quick cycle counting algorithm is required to know when to stop sieving.
 It would also be required to implement an algorithm to detect the disconnected components of the graph, as it would be reqired for the above point.

Week 8(July 13)
 This week would be used to detect the cycles. First, the two hash maps will be designed. Since a hash table data structure does'nt exisit, it would have to be coded. For this purpose, I plan to use Cuckoo hasing.
 Once the hash maps have been designed, the algorithm to detect the cycles will be implemented, as explained above.

Week 9 (July 20)
 Once the cycles have been found out, the relations in the cycles would be combined to generate full relations.
 Once the full relations have been found out, the linear algebra part will be carried out (Calculation of null space of the relation matrix). The linear algebra code is alreadt present in FLINT and can be used.

Week 10 (July 27)
 This week would be used as a buffer week to complete any work in case I would be running behind schedule.
 Stronger and larger tests would be used, as at this point, I plan to have implemented a triple large prime variant

Week 11 (August 3)
 This week would be used to clean up the code and optimize it. The initial parameters (B and M) which we assumed will also be fine tuned in this period.

Week 12 (August 10)
 This week would be used to implement some level of parallelization (using SIMD, different cores for different polynomials).
 Also I will try to implement the small prime variant in this week.

Week 13 (August 17)
 Final tests and a revision of the whole code would be done to check for any possible optimizations.
 A final round of intensive testing and updation of the documentation.
My third semester begins on 3rd August, so I will have to give some time to my studies as well. I plan to keep the next three weeks a bit lighter, however this does not mean deferring from our primary goal. The goal will be completed at any costs. If in case I am running behind schedule, I will have sufficient time in the remaining three weeks to cover up.
I have a good grasp over the related math, and have a fair idea as to how I would proceed. However there would be a crucial role played by the mentors and the experienced members of the community during the summer in guiding me, as they have much more experience in implementing algorithms. Quadratic Sieves can always be optimized more and made better. While searching for resources, I came across a person who spent over ten years writing a QSieve! With guidance of my mentors, I can surely say that I can complete this project over the summer. I have been a part of the FLINT community a little over two months, and in all algorithms I have coded, there was a significant speedup between the initial and final version. In this process, I have learnt quite a few tricks. However there's a lot in the bagto learn, and I am eager to do that over the summer. This shall be a lot of fun.
References
 Smooth numbers and the quadratic sieve  Carl Pomerance
 Page 276, Prime Numbers, A Computational Perspective  Richard Crandall and Carl Pomerance
 Section 7, The Quadratic Sieve Factoring Algorithm  Eric Landquist
 Page 274, Prime Numbers, A Computational Perspective  Richard Crandall and Carl Pomerance
 Pg 329  340, The Multiple Polynomial Quadratic Sieve Method of Computation  Silverman, Robert
 Factoring with the Quadratic Sieve on large vector computers  Page 271, Journal of Computational and Applied Mathematics.
 Page 4, Integer Factorization and Computing Discrete Logarithms in Maple  Aaron Bradford, Michael Monagan, Colin Percival
 Page 8, MPQS with three large primes  Paul Leyland, Arjen Lenstra, Bruce Dodson, Alec Muffet, Sam Wagstaff.