NC State University

ECE 741: Sequential Machines

Homework #6 - Spring 2003Assignment

Synthesize, route, extract, and verify a clock-tree for the OpenRisc1200 design. This tutorial takes roughly 55 MB of disk-space to run, so be sure to make enough space before you begin to avoid exceeding your quota. You should create a compressed archive of your workspace before starting, if you need to.

Tutorial

  1. Synthesizing the Design
    • Copy the file or1200.tar.gz locally. Unzip and untar it.
    • Examine the file or1200/rtl/verilog/or1200_defines.v. Note that the strings OR1200_ARTISAN_SSP, _SDP, and _STP have been defined. This means that this design will include memories that don't actually exist yet. This will generate a lot of warnings and create some confusion over the total number of clock sinks, but we may ignore this for now.
    • Change to the directory or1200/syn/synopsys/bin and examine the file top.scr. There are several things to note in this file:
      • There are actually three clocks in this design, but we will only be working with one of them (clk_i).
      • The variable CLK_UNCERTAINTY has been set to 100 ps. This value is used with the set_clock_skew command.
      • The set_fix_hold command is used with the primary clock. Without this command, Design Compiler will make no attempt to fix hold-time constraint violations.
      • The commands set_input_delay and set_driving_cell have been used to model all inputs (except the primary clock and reset lines) as being driven with a D flip-flop connected to the same clock tree (hence a single CLK-Q delay is annotated). This may not be the best assumption, but it's what the makers of the OpenRisc core assumed, so we'll stick with it for now. This information is important when verifying timing with PrimeTime, since we want to use the same assumptions there that we did for synthesis.
      • The commands set_output_delay and set_load have been used to model all outputs as driving 4 D flip-flops connected to the same clock tree (hence a single setup-time is annotated). Again, this assumption was made by the makers of the core.
    • Change to the ../run directory. Type add synopsys and then execute the command ../bin/run_syn &. This will take 10-20 minutes to synthesize the core and place the netlist in the file ../out/final_or1200_top.v.
    • Examine the file ../log/final_or1200_timing.log. Note that the critical path delay for clk_i of 25.3ns is well below the 40ns target. Note also that the shortest path delay is from one flip-flop connected directly to another. Since the CLK-Q delay is 196ps, however, this is below the clock-uncertainty of 100ps, so there was no need to insert delay-buffers to fix hold-time violations.
    • Examine the file ../log/final_or1200_area.log. The total area is 1.57 mm2. You'll come back to this later when resynthsizing the design to see how much it grows.
  2. Synthesizing the Clock-Tree
    • Change back to the root for this assignment and execute the command dirSetup.py run/fe. Change to the run/fe directory, type add cadence and run encounter.
    • As you did for Homework #3, choose Design -> Design Import and load the vlogin.conf file. Set the verilog file to ../../or1200/syn/synopsys/out/final_or1200_top.v and the top cell to or1200_top. You may want to save this file as the new vlogin.conf file to save time later. Import the design.
    • Place the design with Place -> Place and the default options. This will take 2-4 minutes.
    • Copy the file or1200_top.ctstch into the run/fe directory and examine the file. The only thing you would really need to change for a different design would be the RootPin. The other settings may or may not be realistic constraints for this design's clock-tree, but we have no way of knowing one way or the other without more information. So we'll just build a clock tree using this info and see how well it works.
    • Choose Clock -> Synthesize Clock Tree. The Specification File should already be set to or1200_top.ctstch. Make sure also that the "Save Netlist" button is selected. We will need this netlist later to run PrimeTime.
    • Click OK on the "Synthesize Clock Tree dialog" box. Synthesis will take 2.5 to 5 minutes to run.
    • Change to the directory or1200_top_cts and open the file or1200_top_cts.ctsrpt.html in your favorite browser. This file reports the high-level predicted performance of the synthesized clock-tree. Note that there are 1151 sinks and a total of 88 bufers. Most importantly, the rising-edge skew is 115.8 ps, which is pretty close to the skew target of 100ps in the ctstch file. The rising transition time at the sinks is 340ps, which is also under the 400ps target. Follow the "detail" link to get a full report, which includes the instance names of all buffers and sinks along with the rising and falling insertion delay to each one. There's also a skew summary for each level of the tree. This can be helpful if you need to manually tune a clock tree to reduce skew. If we add up all of the "buffers" listed for the last level (Level 5), we get 1151, the number of sinks (that means that these are actually the sinks, not buffers). If we add up the number of buffers for level 4, we get 72. Remember that the total number of buffers was 88, so most of the buffers are in this stage.
    • Now let's view the graphical information in FE. Choose Clock -> Display Clock Tree. Click the "Display Clock Tree" button in the dialog box and click OK. You should see a display like the one below. This display shows the last level of the clock tree. Each of the 72 buffers in this stage are highlighted in white and the sinks that it drives are highlighted in the same color and connected with yellow lines.
      Display Clock Tree
    • You can also examine higher levels of the clock-tree in FE. Choose Clock -> Clear Clock Tree Display, and then Clock -> Display Clock Tree again. This time, select "Display Min/Max Paths" in the dialog box and click Ok. As shown below, you should see two paths from the boundary pin to two clock sinks. The longest insertion delay is in red, and the shortest is in green.
      Minimum and Maximum Paths
  3. Placing and Routing the Design
    • We will skip the insertion of filler-cells to save time and disk-space. Normally, you would insert them now.
    • In FE, choose Route -> Trial Route and perform a trial route as in Homework #3. Then choose Route -> NanoRoute and click OK with the Default options. This will fully route the design in 2-4 minutes.
    • Choose Timing -> Extract RC. Set the output to SPEF and deselect all other outputs (to save disk-space). Click Ok. This will create the or1200_top.spef file.
    • Choose Route -> Save Route -> DEF to create the or1200_top.def file.
    • Choose Design -> Save Design to save the design. Exit FE.
  4. Generating the sinks.tcl file
    In order to analyze the clock-tree properly in PrimeTime, we need to create lists of the sink-nodes and last-stage buffer outpu-nodes. We can do this with an OpenAccess Python script.
    • Start a new shell... you won't be able to run First Encounter and OpenAccess in the same shell due to link-library incompatibilities.
    • Change to the root directory for this assignment and run the command dirSetup.py run/oa.
    • Change to the run/oa directory and execute the following command:
      def2oa -def ../fe/or1200_top.def -lib mylib -cell or1200_top -view autoLayout -tech TSMC025_deep
      This command will take 1-2 minutes to run. You will get a number of errors because the memories are missing, but you can ignore these for now.
    • Copy the script gensinklist.py into this directory and examine it. Note the following:
      • The main procedure traceNet is called on the primary clock net clk_i.
      • This procedure recursively traces input-instance-terminals of the net, descending along the outputs of any cells in the "buflist" (which matches the list of buffers in the or1200_top.ctstch file).
      • When a cell is encountered that is not in the "buflist", a string is added to the "sinks" list that matches the instance and pin name as PrimeTime expects.
      • The instance and pin name of the driver is also added to the "driver" list (if it has not already been added).
      • Once the lists of sinks and drivers are complete, a TCL script called sinks.tcl is created to define these lists in PrimeTime.
    • Execute this script with the command python gensinklist.py. Make sure that the sinks.tcl file was created successfully.
  5. Verifying the ClockTree with PrimeTime
    • Change to the root directory for this assignment and run the command dirSetup.py run/pt.
    • Change to the run/pt directory and copy the script clockskew.tcl into it. Note the following about the script:
      • This script defines a TCL procedure called clockskew with 4 arguments: the source pin, the list of sinks, the list of last-stage buffers, and the name of a file to which the individual insertion delay values will be written.
      • The procedure cycles through the sink nodes, measuring the delay and searching for the minimum and maximum. It writes the delay values for each sink to the output file as it goes. It uses two main commands to get the delay values: get_timing_paths and get_attribute.
      • Once the min and max insertion delay are found, the procedure cycles through the last-stage drivers to find the min and max transition times using the get_attribute command again. PrimeTime will not report a transition time for an input pin (sink), so we have to use the output pins of the last-stage buffers instead.
      • Once complete, the procedure prints a summary of the info it found to STDOUT.
      • The main body of the script starts by sourcing the sinks.tcl file that was generated earlier.
      • Next, the verilog netlist is read. Note that this is the verilog netlist generated by FE's clock-tree synthesizer, not the one output by DesignCompiler!
      • Next, the input delay constraints are set to match the constraints in the top.scr script from step 1. We don't bother with the output constraints, because they don't make any difference when checking the hold-time constraints.
      • Next, the shortest-path is written to clockskew.rpt, along with the output of the clockskew procedure defined above.
      • This initial analysis was done without any parasitic back-annotation. The next line loads the SPEF file generated earlier.
      • The last lines of the script repeat the analysis using the back-annotated parasitic information.
    • Type add synopsys if you have not done so already and execute the command pt_shell -f clockskew.tcl. This command will take 3-6 minutes to run.
    • Examine the clockskew.rpt file. This file is very large due to the thousands of "Failed to compute C-effective" warnings, which we will have to ignore for now. Note that the post-back-annotated skew measurement is almost twice as much as the amount predicted by FE and 18 times larger than what we asked for in the .ctstch file (this shows us how realistic our original constraints were). The insertion-delay and transition times are also significantly larger. Note also that the number of sinks is 1136, instead of 1151 as the clock-tree synthesis tool reported earlier. Normally, this should alert us to the fact that something is wrong with our verification flow and to fix the problem immediately. However, in this case the discrepancy is due to the fact that the memories were missing when we ran the gensinklist.py script, so we may ignore it for now. Here is a copy of my of my clockskew.rpt file (with most of the "C-effective" warnings removed) for comparison.
    • Type add matlab and start matlab. Copy the script insdelay.m to this directory and run it. You should see output similar to figure below. Viewing your clock skew in this way is also helpful for hand-tuning the clock-tree. Note that the skew is very similar for groups of about 14 sinks. This is because the last-stage buffers are driving about 14 sinks each, and our gensinklist.py script wrote them into the sinks.tcl file in order as it traversed the tree.
      Clock Skew
  6. Analyzing and Resynthesizing the Clock Tree
    Let's now consider the quality of our clock-tree. As stated in class, we will apply the following constraint to our clock-tree to ensure that we have enough of a safety margin:

    tskew < tpath,min - trise,max

    Based on the PrimeTime analysis, we have a skew of 183ps, minimum-path delay of 197ps, and max transition time of 597ps, which means that we are failing to meet this constraint by 583ps! What are we going to do? We can fix this problem with two changes:
    • Increase the minimum path delay by increasing the value of CLK_UNCERTAINTY in top.scr and resynthesizing
    • Reduce the transition time at the clock sinks by decreasing the values of SinkMaxTran and BufMaxTran in the or1200_top.ctstch file.
    Assuming that the clock skew will remain around 200ps, we can set the CLK_UNCERTAINTY to 500ps and the Max Transition times to 200ps. According to PrimeTime, the actual max transition time was about 50% greater than the clock-tree synthesis tool predicted, so we can expect a transition time of around 300ps. This means that we should meet our safety margin nearly exactly. Make these changes and redo steps 1-5 of the tutorial. Before you start, make sure that you back up the following files, since they will be overwritten, and you will need them to answer questions later:
    • or1200/syn/synopsys/log/final_or1200_top_timing.log
    • or1200/syn/synopsys/log/final_or1200_top_area.log
    • run/fe/or1200_top_cts/or1200_top_cts.ctsrpt.html
    • run/pt/clockskew.rpt
  7. When finished, answer the following questions and record your answers in your own copy of the hw6.txt file:
    • Does the new clock-tree meet our hold-time safety-margin constraint?
    • Because the new clock-tree has a shorter transition time, it is likely to have more buffers. How many more buffers were needed with a clock-uncertainty of 500ps rather than 100ps?
    • For this design, what is the area penalty of using a clock- uncertainty of 500ps instead of 100ps?
    • For this design, what is the cycle-time penalty (according to the pre-layout Design-Compiler estimates) of using a clock-uncertainty of 500ps instead of 100ps?

Submission

You should turn in a .tar.gz archive containing the following files:
  1. Your own hw6.txt file with answers to the questions in part 7 above
  2. The or1200_top_timing.log file from your final synthesis run
  3. The or1200_top_area.log file from your final synthesis run
  4. The final_or1200_top.ctsrpt.html file from your final clock-tree synthesis run
  5. The clockskew.rpt file from your final PrimeTime analysis. Before including this file, strip out the warnings with the stripwarnings.py script (usage: python stripwarnings.py < oldfile > newfile). We would exceed the course locker's quota very quickly, otherwise!
  6. If your timing values for the first synthesis run differ significantly (by more than 50ps) from any of the values in the tutorial, then include your or1200_top_timing.log, or1200_top_area.log, final_or1200_top.ctsrpt.html, and clockskew.rpt files from your first run. These files will help me to figure out what's going on!
ECE Department | College of Engineering | NC State University | Contact Us | © 2007 WolfTech Web Team