MimasV2 – using Block RAM

Last tutorial, we saw about the different blocks inside a FPGA. I mentioned one of the blocks to be Block RAM (BRAM). BRAM are dedicated high performance memory blocks hardwired into the FPGA. In this tutorial, we will be using BRAM to store values of a tone and then output them through the audio jack.

It is useful to also know about distributed RAM. I would like to spend some time here on distributed RAM.

Distributed RAM is so named because it is made up by connecting a number of LUTs which are scattered around the FPGA fabric. These LUTs come from different slices belonging to different CLBs in the FPGA. In contrast, a BRAM block is hardwired inside the FPGA at a specific position in the die. BRAMs are of fixed sizes for example 18Kb in our FPGA. Our FPGA contains 32 such blocks providing a maximum of 576Kb of BRAM.

Here is a diagram depicting how a distributed RAM is formed:

Distributed RAM configurations.png

LUTs are basic building blocks of dist. RAMs. Lets say we have resources in the form of ‘logic elements’, each of which contains two 4-input LUTs.

As explained earlier, basically a 4-input LUT is a 16 entry RAM with each entry 1-bit wide. As shown in the top circuit, a single 4-input LUT can function as a 16×1 RAM with the addition of some extra logic to accommodate WE, WCLK and other signals. The figure also illustrates how the two 4-input LUTs can be configured as a (from top to bottom) 32×1 single port RAM, 16×2 single port RAM, 16×1 dual port RAM.

A more detailed explanation can be found here, here, and here.

One more important difference between BRAM and dist. RAM is that BRAMs are fully synchronous(both read and write) where as in dist. RAM, read can be made asynchronous.

When to go for BRAM or Distributed RAM in your design? Rules of the thumb:

  1. If you need large amount of memory (>1KB) then go for BRAM. Large dist. memory would waste logic resources.
  2. Small and fast memory go for dist. RAM for example for FIFOs. This is because dist. RAM can be conveniently placed near to the logic that is using it – hence routing delay is lesser. BRAM is available only at specific locations.

The XST itself usually does a good job of choosing this for you(but don’t bet your life on it!).

 

Using the CoreGen tool to create BRAM

Lets now start looking at Block RAMs. If you want to instantiate a BRAM block in your code, the Xilinx CoreGen tool provides an easy GUI based method of doing so.

We will be creating a BRAM, and also initializing it with the tone values. So when the FPGA starts up, these values will already be present inside the BRAM block. We just have to read them and put them out through the audio jack. Since there is no need to write to the BRAM in our application, we may as well use a ROM(it is just the same RAM block without the WE signal).

Before generating the ROM, let us first generate the values which we want to initialize it with. The coreGen tool accepts a .coe file with a specific format. The format is:

memory_initialization_radix=10;
memory_initialization_vector=
0,
9,
18,
28,
37,
46,
56,
…….. . . . . .

The first line specifies the radix(base) of the entered values. We are going to enter in decimal – so base 10. The second line should be specified before starting the values. After that the entered values should be comma separated. The first value goes into address 0, and so forth. You need not specify initialization values for the entire memory block. We are going to use a 256 word memory, but need only 196 values – so specify only the first 196 values.

I am going to use MATLAB to generate the tone values and also to fill up the values inside the file. Much better than manually entering, if you ask me! Here is the

MATLAB code.


b=0:3:226; %First tone with increment of 3
b=b';
b=b/max(b);
b=b*pi; %this will generate only half of a sine wave. Its OK. Will be useful later.
b=sin(b);
b(end)=b(end-1)/2; % sin(0) = sin(pi), so make last value some thing else
b=b/max(b);
b=b*226;
b=floor(b);

filename = 'sineVals.coe'; %this will be created in your MATLAB work space directory
fid = fopen(filename,'wt');
fprintf(fid,'memory_initialization_radix=10;\n');
fprintf(fid,'memory_initialization_vector=\n');

for i=1:length(b)
fprintf(fid,'%d,\n', b(i));
end

b=0:6:226; %second tone with increment of 6
b=b';
b=b/max(b);
b=b*pi;
b=sin(b);
b(end)=b(end-1)/2;
b=b/max(b);
b=b*226;
b=floor(b);

for i=1:length(b)
fprintf(fid,'%d,\n', b(i));
end

b=0:8:226;
b=b';
b=b/max(b);
b=b*pi;
b=sin(b);
b(end)=b(end-1)/2;
b=b/max(b);
b=b*226;
b=floor(b);

for i=1:length(b)
fprintf(fid,'%d,\n', b(i));
end

b=0:10:226;
b=b';
b=b/max(b);
b=b*pi;
b=sin(b);
b(end)=b(end-1)/2;
b=b/max(b);
b=b*226;
b=floor(b);

for i=1:length(b)
fprintf(fid,'%d,\n', b(i));
end

b=0:13:226;
b=b';
b=b/max(b);
b=b*pi;
b=sin(b);
b(end)=b(end-1)/2;
b=b/max(b);
b=b*226;
b=floor(b);

for i=1:length(b)
fprintf(fid,'%d,\n', b(i));
end

b=0:20:226; %last tone with increment of 20
b=b';
b=b/max(b);
b=b*pi;
b=sin(b);
b(end)=b(end-1)/2;
b=b/max(b);
b=b*226;
b=floor(b);

for i=1:length(b)
fprintf(fid,'%d,\n', b(i));
end

fclose(fid);

Now create a new Xilinx project. Copy the generated file into the Xilinx project directory.

Next we have to start a core generator project. Go to Tools->Core generator. Once the new window opens, create a new Core generator project and name it whatever you like. Importantly, go to the Generation tab on the left and select Verilog as the design entry.

In the left hand side you will find a number of readily available cores provided by Xilinx free of cost which you can integrate into your project. Go to Memory & storage elements-> Block memory generator. A new GUI based window will open up. Enter the component name. This is the ‘module name’ which you have to use in your HDL code. I named it BRAM_sineVals. Select interface type as Native. You can also see the RAM block with the signals used in the left hand side.

Move to next page and select single port ROM. In the algorithm, select minimum area. In the next page, select width as 8 and depth as 256. Check the load initialization file, and select the .coe file you generated. Optionally you can also fill the remaining memory locations to 0(or any value you want).

You can skip the next page. In the last page you can see the resources being utilized by the BRAM module. Spartan 6 consists of BRAMs as 18Kb block which can be independently used as two 9Kb blocks. You should see that a single 9K block is used in our design(as expected).

You can click Generate and once done, you should see that a folder called ipcore_dir in your project folder. Inside that you should see BRAM_sineVals.ngc file. This is the direct netlist of the design. If you remember from previous tutorial, the XST tool can accept ‘cores’ as input. This is the netlist(core) we need.

Inside ipcore_dir you should also see a BRAM_sineVals.veo which shows the manner in which the BRAM module should be instantiated inside our design. You can open it with a text editor and check.

You should also look into the ipcore_dir->BRAM_sineVals->doc. This contains the datasheet for the Block memory generator tool. Make sure you give it a read.

You can close the CoreGen tool. In your Xilinx project create as top_module.v. Here are its contents.

top_module.v


`timescale 1ns / 1ps

module top_module(L,R,a,b,c,d,e,f,g,h,e1,e2,e3,tx,rx,clk,reset);

output L,R;
output a,b,c,d,e,f,g,h,e1,e2,e3,tx;
input rx,clk,reset;

localparam [1:0] idle=2'b00, data1=2'b01, data2=2'b10, data3=2'b11;
localparam [7:0] a1s=0,a1e=75,a2s=76,a2e=113,a3s=114,a3e=142,a4s=143,a4e=165,a5s=166,a5e=183,a6s=184,a6e=195;  //these are the start and end addresses of each tone in memory

// Signals for UART submodule
reg rd_uart = 1'b0;
reg wr_uart = 1'b0;
reg [7:0] data_out = 0 ;
wire [7:0] data_in ;
wire full ;
wire empty ;
reg [7:0] data_rx = 0 ;
reg [1:0] rx_state = 0 ;

reg [9:0] disp_num = 0 ;
reg [9:0] disp_num1 =10'd1;
wire [3:0] dig0,dig1,dig2 ;
reg [3:0] currentDig ;
reg [1:0] digCnt = 2'd0 ;
reg dig0en,dig1en,dig2en;
assign e1=dig2en; assign e2=dig1en; assign e3=dig0en;

reg [15:0] timer_cnt = 0 ;
reg [7:0] pwmCnt = 0 ;
reg [11:0] sampCnt = 0 ;
reg sampClk = 0 ;
reg [7:0] sampleVal = 0 ;

assign L = sampleVal >= pwmCnt;
assign R = sampleVal >= pwmCnt;

reg [7:0] as = 8'd0; //current start and end addresses
reg [7:0] ae = 8'd75;
reg [7:0] RAMaddr = 0; //RAM addr and data
wire [7:0] RAMout;

bin2BCD a1({dig2,dig1,dig0},disp_num1);
BCDto7Seg b1(a,b,c,d,e,f,g,h,currentDig);

//Block RAM
My_BRAM RAM1 ( //the packing module
.clka(clk), // input clka
.addra(RAMaddr), // input [7 : 0] addra
.douta(RAMout) // output [7 : 0] douta
);

// DIVISOR = 326 for 19200 baudrate, 100MHz sys clock
uart #( .DIVISOR (9'd326),
.DVSR_BIT (4'd9) ,
.Data_Bits (4'd8) ,
.FIFO_Add_Bit (3'd4)
) uart ( .clk (clk),
.rd_uart (rd_uart) ,
.reset (reset) ,
.rx (rx) ,
.w_data (data_out) ,
.wr_uart (wr_uart) ,
.r_data (data_in) ,
.rx_empty (empty) ,
.tx (tx) ,
.tx_full (full)
);

// Timer logic
always @(posedge clk)
begin
if(!reset)
timer_cnt <= 0;
else
timer_cnt <= timer_cnt + 1'b1;
end

always @(posedge clk or negedge reset)
begin
if(!reset)
sampCnt<=0;
else if(sampCnt==12'd2273)
begin sampClk<=1'b1; sampCnt<=0; end
else
begin sampCnt<=sampCnt+1'b1; sampClk<=1'b0; end
end

always @(posedge clk or negedge reset)
begin
if(!reset)
pwmCnt<=0;
else if(pwmCnt==8'd226)
pwmCnt<=0;
else
pwmCnt<=pwmCnt+1'b1;
end

always @(posedge sampClk)
begin
sampleVal<=RAMout;
if(RAMaddr==ae) //if end address of tone is reached, go back to start
RAMaddr<=as;
else
RAMaddr<=RAMaddr+1'b1;
end

always @(disp_num1)
begin
case(disp_num1) //select start and end addresses according to UART input from 1-6
10'd1: begin as<=a1s;ae<=a1e; end
10'd2: begin as<=a2s;ae<=a2e; end
10'd3: begin as<=a3s;ae<=a3e; end
10'd4: begin as<=a4s;ae<=a4e; end
10'd5: begin as<=a5s;ae<=a5e; end
10'd6: begin as<=a6s;ae<=a6e; end
default: begin as<=a1s;ae<=a1e; end
endcase
end

always @(posedge timer_cnt[10] or negedge reset)
begin
if(!reset)
begin
rx_state<=idle;
disp_num<=0;
disp_num1<=10'd1;
end
else
begin
if(!empty)
begin
rd_uart <= 1'b1;
data_rx ="0" && data_rx<="9") begin rx_state<=data1; disp_num="0" && data_rx<="9") begin rx_state<=data2; disp_num<=4'd10*disp_num + (data_rx-"0"); end
else if(data_rx=="\r") begin
rx_state=10'd1 && disp_num<=10'd6) begin disp_num1<=disp_num; end
disp_num<=0; end
else begin rx_state<=idle; disp_num="0" && data_rx<="9") begin rx_state<=data3; disp_num<=4'd10*disp_num + (data_rx-"0"); end
else if(data_rx=="\r") begin
rx_state=10'd1 && disp_num<=10'd6) begin disp_num1<=disp_num; end
disp_num<=0; end
else begin rx_state<=idle; disp_num<=0; end
end
data3: begin
if(data_rx!="\r") begin rx_state<=idle; disp_num<=0; end
else begin
rx_state=10'd1 && disp_num<=10'd6) begin disp_num1<=disp_num; end
disp_num<=0; end
end
endcase
end
else
rd_uart<=1'b0;
end
end

always @(posedge timer_cnt[15]) begin
case (digCnt)
2'd0: begin
digCnt <= 2'd1;
dig2en <= 1'b1;dig1en <= 1'b1;dig0en <= 1'b0;
currentDig <= dig0;
end
2'd1: begin
digCnt <= 2'd2;
if(dig1==4'd0 & dig2==4'd0) begin dig2en <= 1'b1;dig1en <= 1'b1;dig0en <= 1'b1; end
else begin dig2en <= 1'b1;dig1en <= 1'b0;dig0en <= 1'b1; currentDig <= dig1; end
end
2'd2: begin
digCnt <= 2'd0;
if(dig2==4'd0) begin dig2en <= 1'b1;dig1en <= 1'b1;dig0en <= 1'b1; end
else begin dig2en <= 1'b0;dig1en <= 1'b1;dig0en <= 1'b1; currentDig <= dig2; end
end
endcase
end

endmodule

 

You might have noticed that I have instantiated a module named My_BRAM. This is the module which 'packs' the BRAM_sineVals.ngc netlist to be used in your design. XST requires that .ngc files be 'packaged' in another dummy module which will be the one instantiated in your final design.

Here are the contents of the file which packages the netlist file. Name it

My_BRAM.v.


module My_BRAM(clka,addra,douta); //dummy packing module

input clka; /*you can change these variable names of the outside module as you like. I have just kept it the same */
input [7:0] addra;
output [7:0] douta;

BRAM_sineVals u1( //instantiating actual module inside dummy module
.clka(clka), // input clka
.addra(addra), // input [7 : 0] addra
.douta(douta) // output [7 : 0] douta
);

endmodule

module BRAM_sineVals(clka,addra,douta); /*This format and variable names must be the same as that found in BRAM_sineVals.veo */

input clka;
input [7:0] addra;
output [7:0] douta;

endmodule

Also don’t forget to add the .ngc file to the project. Make sure you also add the other UART, bin2BCD,UCF etc from previous tutorials.

You can implement the design and test if it is working by entering 1-6 from your serial terminal.

You can find the entire project folder in zip format in my GitHub repo:

https://github.com/Anirudh-R/Mimas-V2/blob/master/audioSineRAM.zip

That concludes this tutorial. If you have any questions, comments are welcome.

Next tutorial would be one of the most interesting tutorial! We would be interfacing with the external LPDDR chip on the board. It would take considerable amount of understanding and learning to get through. A very enjoyable experience if you ask me!

Cheers,

Anirudh

 

 

 

 

 

Author: anirudhr

Electronics hobbyist with special interest in the field of Software Defined Radios, FPGAs and DSP. Beginner but always ready to learn new stuff. Nothing like Hands on experience to teach you something.

Leave a comment